From: Junio C Hamano <gitster@pobox.com>
To: Michael Schroeder <mls@suse.de>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] submodule: truncate the oid when fetchig commits
Date: Thu, 14 Aug 2025 08:42:31 -0700 [thread overview]
Message-ID: <xmqqqzxd6haw.fsf@gitster.g> (raw)
In-Reply-To: <aJ37eHEGMw6RgmZC@suse.de> (Michael Schroeder's message of "Thu, 14 Aug 2025 17:06:32 +0200")
Michael Schroeder <mls@suse.de> writes:
> If a submodule uses a different hash algorithm than used in
> the main repository, the recorded submodule commit is padded
> with zeros.
Hmph. I suspect that this "extra zero bits" would happen when your
superproject uses a longer hash (e.g. SHA-256) than the hash used by
the submodule you are fetching? If the arrangement is reversed,
would we see a different and possibly an even severe problem?
I do not mean to say that you need to solve the problem in both
direction at all. But perhaps
... uses a hash algorithm with shorter hash length than what is
used in the main repository, ...
would clarify what problem you are solving in this patch better.
> This is usually not a problem as the default is to
> do submodule clones non-shallow and the commit can be found
> in the local objects.
>
> But this is not true if the --shallow-submodules clone option is
> used (or the --depth option in the submodule update call).
> In this case, the commit is often not reachable and a fetch of the
> specific commit is done. But the fetch cannot deal with the zero
> padding and interprets the commit as a name. Because of this,
> the checkout will fail.
>
> Implement truncation of the recorded commit to the correct size
> corresponding to the hash algorithm used in the submodule.
When we use the verb "truncate" in the context of this project, I
think we always use the word to mean making it shorter than its
natural length. But in the above, the word is used to remove excess
so that we use it at its natural length. I cannot offhand offer a
better phrase, so I'll let it go for now, but suggestions for better
wording is very much welcome.
> builtin/submodule--helper.c | 18 ++++++++++++------
> 1 file changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index 07a1935cbe..ef21eb42b8 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -72,7 +72,7 @@ static char *resolve_relative_url(const char *rel_url, const char *up_path, int
> return resolved_url;
> }
>
> -static int get_default_remote_submodule(const char *module_path, char **default_remote)
> +static int get_default_remote_submodule(const char *module_path, char **default_remote, const struct git_hash_algo **hash_algo)
> {
> const struct submodule *sub;
> struct repository subrepo;
> @@ -106,6 +106,9 @@ static int get_default_remote_submodule(const char *module_path, char **default_
>
> *default_remote = xstrdup(remote_name);
>
> + if (hash_algo)
> + *hash_algo = subrepo.hash_algo;
> +
> repo_clear(&subrepo);
> free(url);
>
> @@ -1272,7 +1275,7 @@ static void sync_submodule(const char *path, const char *prefix,
> goto cleanup;
>
> strbuf_reset(&sb);
> - code = get_default_remote_submodule(path, &default_remote);
> + code = get_default_remote_submodule(path, &default_remote, NULL);
> if (code)
> exit(code);
>
> @@ -2319,16 +2322,19 @@ static int fetch_in_submodule(const char *module_path, int depth, int quiet,
> if (depth)
> strvec_pushf(&cp.args, "--depth=%d", depth);
> if (oid) {
> - char *hex = oid_to_hex(oid);
> + char hexbuffer[GIT_MAX_HEXSZ + 1];
> + char *hex = oid_to_hex_r(hexbuffer, oid);
> char *remote;
> + const struct git_hash_algo *hash_algo = NULL;
> int code;
>
> - code = get_default_remote_submodule(module_path, &remote);
> + code = get_default_remote_submodule(module_path, &remote, &hash_algo);
It feels highly unsatisfying to have oid_to_hex_r() blindly read and
convert oid to hex (assuming the hash length of the current
repository) without using the knowledge of how much of that object
name is valid. We obtain that knowledge shortly after we call
oid_to_hex_r() in hash_algo here. That is the only reason why we
manually have to truncate below, isn't it?
In other words, shouldn't we be finding out hash_algo first, and
then use something like hash_to_hex_algop_r() to convert only the
valid bits of the oid into the buffer?. That way, we do not have to
manually add '\0' in this function below.
> if (code) {
> child_process_clear(&cp);
> return code;
> }
> -
> + if (hash_algo)
> + hex[hash_algo->hexsz] = 0; /* truncate to correct size */
> strvec_pushl(&cp.args, remote, hex, NULL);
> free(remote);
> }
> @@ -2635,7 +2641,7 @@ static int update_submodule(struct update_data *update_data)
> char *remote_ref;
> int code;
>
> - code = get_default_remote_submodule(update_data->sm_path, &remote_name);
> + code = get_default_remote_submodule(update_data->sm_path, &remote_name, NULL);
> if (code)
> return code;
> code = remote_submodule_branch(update_data->sm_path, &branch);
next prev parent reply other threads:[~2025-08-14 15:42 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-14 15:06 [PATCH] submodule: truncate the oid when fetchig commits Michael Schroeder
2025-08-14 15:42 ` Junio C Hamano [this message]
2025-08-18 9:15 ` [PATCH v2] " Michael Schroeder
2025-08-14 22:16 ` [PATCH] " brian m. carlson
2025-08-18 9:30 ` Michael Schroeder
2025-08-19 0:45 ` brian m. carlson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqqzxd6haw.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=mls@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).