git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* git fails to checkout SHA1 submodule in SHA256 repo with --depth=1
@ 2025-11-12 12:58 Martin Wilck
  2025-11-12 16:32 ` Junio C Hamano
  2025-11-12 23:54 ` [PATCH] object-file: disallow adding submodules of different hash algo brian m. carlson
  0 siblings, 2 replies; 21+ messages in thread
From: Martin Wilck @ 2025-11-12 12:58 UTC (permalink / raw)
  To: git; +Cc: Adrian Schroeter

> What did you do before the bug happened? (Steps to reproduce your
issue)

# This is a SHA256 repository with a SHA1 submodule 
git clone -b next https://src.opensuse.org/mwilck/multipath-tools

cd multipath-tools
git submodule init

# Submoudle URL: https://github.com/openSUSE/multipath-tools, 
# branch "next
git submodule update --depth 1

> What did you expect to happen? (Expected behavior)

Successful checkout of the submodule.

> What happened instead? (Actual behavior)

The following error:

fatal: couldn't find remote ref 37f9a4c9c4658da7f9b2b0345836360d2fb119a0000000000000000000000000
fatal: Fetched in submodule path 'multipath-tools', but it did not contain 37f9a4c9c4658da7f9b2b0345836360d2fb119a0000000000000000000000000. Direct fetching of that commit failed.

> What's different between what you expected and what actually
happened?

It failed.

> Anything else you want to add:

"git submodule update" (without --depth 1) succeeds.

The problem occurs whether or not I use "git submodule set-branch" to
select the correct remote branch before running "git submodule update".
Neither the "branch" setting in .gitmodules nor in .git/config seems to
matter.

The problem seems to be that "git submodule update --depth 1" fetches
the remote HEAD only, even if submodule.<name>.branch is set to
something different (here: "next"). (In my case, HEAD was not an
ancestor of the desired branch ("next"), nor vice-versa).

I found the following workaround to fetch the desired commit:

SUBMODULE=multipath-tools
SHA=$(git -C $SUBMODULE ls-remote origin | awk '/refs\/heads\/next/ { print $1; }')
git -C $SUBMODULE fetch --depth=1 origin next
git -C $SUBMODULE reset --hard $SHA

but that's not the desired solution, because the checkout needs to be
scripted. Ultimately I want to run
"git clone --recurse-submodules --depth 1", which currently fails as
well.

A plain "git clone" of the submodule works as expected:

git clone -b next --depth=1
https://github.com/openSUSE/multipath-tools.git

In a SHA1 repository, all these operations seem to work as one would
naïvely expect. (I was using different repositories though, I currently
don't have a SHA1 clone of the SHA256 repo I experimented with).

[System Info]
git version:
git version 2.52.0.rc1.458.g3549877.dirty
cpu: x86_64
built from commit: 3549877a16bc196d0d99bc2f8441eedf0102fcc8
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
rust: disabled
libcurl: 8.14.1
OpenSSL: OpenSSL 3.1.4 24 Oct 2023
zlib: 1.2.13
SHA-1: SHA1_DC
SHA-256: SHA256_BLK
default-ref-format: files
default-hash: sha1
uname: Linux 6.4.0-150600.23.70-default #1 SMP PREEMPT_DYNAMIC Wed Sep
10 10:54:24 UTC 2025 (225af75) x86_64
compiler info: gnuc: 7.5
libc info: glibc: 2.38
$SHELL (typically, interactive shell): /bin/bash


[Enabled Hooks]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: git fails to checkout SHA1 submodule in SHA256 repo with --depth=1
  2025-11-12 12:58 git fails to checkout SHA1 submodule in SHA256 repo with --depth=1 Martin Wilck
@ 2025-11-12 16:32 ` Junio C Hamano
  2025-11-12 23:37   ` brian m. carlson
  2025-11-12 23:54 ` [PATCH] object-file: disallow adding submodules of different hash algo brian m. carlson
  1 sibling, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2025-11-12 16:32 UTC (permalink / raw)
  To: Martin Wilck; +Cc: brian m. carlson, git, Adrian Schroeter

Martin Wilck <mwilck@suse.com> writes:

>> Subject: Re: git fails to checkout SHA1 submodule in SHA256 repo with --depth=1

I think it is not supposed to work to mix repositories like this,
regardless of any other option like --depth.  I think brian gave a
response to that effect in a thread in the past few months.

    ... goes and looks ...

https://lore.kernel.org/git/aJ5gOPQ9oologqj-@fruit.crustytoothpaste.net/
https://lore.kernel.org/git/aKPJNNWMW9gtueEK@fruit.crustytoothpaste.net/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: git fails to checkout SHA1 submodule in SHA256 repo with --depth=1
  2025-11-12 16:32 ` Junio C Hamano
@ 2025-11-12 23:37   ` brian m. carlson
  2025-11-13 10:15     ` Martin Wilck
  0 siblings, 1 reply; 21+ messages in thread
From: brian m. carlson @ 2025-11-12 23:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Martin Wilck, git, Adrian Schroeter

[-- Attachment #1: Type: text/plain, Size: 898 bytes --]

On 2025-11-12 at 16:32:01, Junio C Hamano wrote:
> Martin Wilck <mwilck@suse.com> writes:
> 
> >> Subject: Re: git fails to checkout SHA1 submodule in SHA256 repo with --depth=1
> 
> I think it is not supposed to work to mix repositories like this,
> regardless of any other option like --depth.  I think brian gave a
> response to that effect in a thread in the past few months.
> 
>     ... goes and looks ...
> 
> https://lore.kernel.org/git/aJ5gOPQ9oologqj-@fruit.crustytoothpaste.net/
> https://lore.kernel.org/git/aKPJNNWMW9gtueEK@fruit.crustytoothpaste.net/

Yes, that isn't going to work and it never will unless we add some
extension mechanism for that purpose.  The repository in question is
corrupt.

I've just written a patch to check for this case and produce an error in
git add, which I will send shortly.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] object-file: disallow adding submodules of different hash algo
  2025-11-12 12:58 git fails to checkout SHA1 submodule in SHA256 repo with --depth=1 Martin Wilck
  2025-11-12 16:32 ` Junio C Hamano
@ 2025-11-12 23:54 ` brian m. carlson
  2025-11-13  3:26   ` Jeff King
  2025-11-15  0:58   ` [PATCH v2 1/2] " brian m. carlson
  1 sibling, 2 replies; 21+ messages in thread
From: brian m. carlson @ 2025-11-12 23:54 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Martin Wilck, Adrian Schroeter

The design of the hash algorithm transition plan is that objects stored
must be entirely in one algorithm since we lack any way to indicate a
mix of algorithms.  This also includes submodules, but we have
traditionally not enforced this, which leads to various problems when
trying to clone or check out the the submodule from the remote.

Since this cannot work in the general case, restrict adding a submodule
of a different algorithm to the index.  Add tests for git add and git
submodule add that these are rejected.

Note that we cannot check this in git fsck because the malformed
submodule is stored in the tree as an object ID which is either
truncated (when a SHA-256 submodule is added to a SHA-1 repository) or
padded with zeros (when a SHA-1 submodule is added to a SHA-256
repository).  We cannot detect even the latter case because someone
could have an actual submodule that actually ends in 24 zeros, which
would be a false positive.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 object-file.c              |  6 +++++-
 t/t3700-add.sh             | 27 +++++++++++++++++++++++++++
 t/t7400-submodule-basic.sh | 27 +++++++++++++++++++++++++++
 3 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/object-file.c b/object-file.c
index 4675c8ed6b..8c43c52ed0 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1661,7 +1661,11 @@ int index_path(struct index_state *istate, struct object_id *oid,
 		strbuf_release(&sb);
 		break;
 	case S_IFDIR:
-		return repo_resolve_gitlink_ref(istate->repo, path, "HEAD", oid);
+		if (repo_resolve_gitlink_ref(istate->repo, path, "HEAD", oid))
+			return -1;
+		if (&hash_algos[oid->algo] != istate->repo->hash_algo)
+			return error(_("cannot add a submodule of a different hash algorithm"));
+		break;
 	default:
 		return error(_("%s: unsupported file type"), path);
 	}
diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index df580a5806..b075eb9b11 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -541,6 +541,33 @@ test_expect_success 'all statuses changed in folder if . is given' '
 	)
 '
 
+test_expect_success 'cannot add a submodule of a different algorithm' '
+	git init --object-format=sha256 sha256 &&
+	(
+		cd sha256 &&
+		test_commit abc &&
+		git init --object-format=sha1 submodule &&
+		(
+			cd submodule &&
+			test_commit def
+		) &&
+		test_must_fail git add submodule &&
+		test $(git ls-files --stage | grep ^160000 | wc -l) -eq 0
+	) &&
+	git init --object-format=sha1 sha1 &&
+	(
+		cd sha1 &&
+		test_commit abc &&
+		git init --object-format=sha256 submodule &&
+		(
+			cd submodule &&
+			test_commit def
+		) &&
+		test_must_fail git add submodule &&
+		test $(git ls-files --stage | grep ^160000 | wc -l) -eq 0
+	)
+'
+
 test_expect_success CASE_INSENSITIVE_FS 'path is case-insensitive' '
 	path="$(pwd)/BLUB" &&
 	touch "$path" &&
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index fd3e7e355e..b190182d62 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -407,6 +407,33 @@ test_expect_success 'submodule add in subdirectory with relative path should fai
 	test_grep toplevel output.err
 '
 
+test_expect_success 'submodule add of a different algorithm fails' '
+	git init --object-format=sha256 sha256 &&
+	(
+		cd sha256 &&
+		test_commit abc &&
+		git init --object-format=sha1 submodule &&
+		(
+			cd submodule &&
+			test_commit def
+		) &&
+		test_must_fail git submodule add "$submodurl" submodule &&
+		test $(git ls-files --stage | grep ^160000 | wc -l) -eq 0
+	) &&
+	git init --object-format=sha1 sha1 &&
+	(
+		cd sha1 &&
+		test_commit abc &&
+		git init --object-format=sha256 submodule &&
+		(
+			cd submodule &&
+			test_commit def
+		) &&
+		test_must_fail git submodule add "$submodurl" submodule &&
+		test $(git ls-files --stage | grep ^160000 | wc -l) -eq 0
+	)
+'
+
 test_expect_success 'setup - add an example entry to .gitmodules' '
 	git config --file=.gitmodules submodule.example.url git://example.com/init.git
 '

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] object-file: disallow adding submodules of different hash algo
  2025-11-12 23:54 ` [PATCH] object-file: disallow adding submodules of different hash algo brian m. carlson
@ 2025-11-13  3:26   ` Jeff King
  2025-11-13  3:56     ` Jeff King
  2025-11-13 23:15     ` brian m. carlson
  2025-11-15  0:58   ` [PATCH v2 1/2] " brian m. carlson
  1 sibling, 2 replies; 21+ messages in thread
From: Jeff King @ 2025-11-13  3:26 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Junio C Hamano, Martin Wilck, Adrian Schroeter

On Wed, Nov 12, 2025 at 11:54:34PM +0000, brian m. carlson wrote:

> Since this cannot work in the general case, restrict adding a submodule
> of a different algorithm to the index.  Add tests for git add and git
> submodule add that these are rejected.

This makes sense. I had meant to follow up on our conversation and patch
from last month, but it was still on my todo list. Fortunately that
earlier attempt gives me something concrete to compare to. ;)

> diff --git a/object-file.c b/object-file.c
> index 4675c8ed6b..8c43c52ed0 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1661,7 +1661,11 @@ int index_path(struct index_state *istate, struct object_id *oid,
>  		strbuf_release(&sb);
>  		break;
>  	case S_IFDIR:
> -		return repo_resolve_gitlink_ref(istate->repo, path, "HEAD", oid);
> +		if (repo_resolve_gitlink_ref(istate->repo, path, "HEAD", oid))
> +			return -1;
> +		if (&hash_algos[oid->algo] != istate->repo->hash_algo)
> +			return error(_("cannot add a submodule of a different hash algorithm"));
> +		break;
>  	default:
>  		return error(_("%s: unsupported file type"), path);
>  	}

OK, you're checking for it here in index_path(), whereas my earlier
attempt did it in add_to_index(). For the most part, I think your spot
makes more sense, as it is at a lower level. add_to_index() eventually
calls into index_path(), and so do some other code paths.

That does leave two interesting oddities:

  1. In add_to_index(), we have this code:

          if (S_ISDIR(st_mode)) {
                  if (repo_resolve_gitlink_ref(the_repository, path, "HEAD", &oid) < 0)
                          return error(_("'%s' does not have a commit checked out"), path);
                  while (namelen && path[namelen-1] == '/')
                          namelen--;
          }

     which is run before we hit index_path(). So it may get an oid
     result with an unexpected hash. I think that's OK, because nobody
     ever looks at it (which would be a lot more obvious if we declared
     the variable inside the conditional block here).

     This whole lookup does feel a little funny and redundant. It comes
     from f937bc2f86 (add: error appropriately on repository with no
     commits, 2019-04-09), and the main goal is making the error message
     better. But should we just improve the error message from
     index_path() for this case (in which case the resolve call above go
     away)?

     I think this is mostly orthogonal to your patch and we can ignore
     it for now. I only bring it up because now it's weird that we are
     trying to catch the hash mismatch, but have this unchecked extra
     resolve.

  2. There are paths in add_to_index() that _don't_ hit index_path(). In
     particular, intent-to-add entries. So with your patch, even though
     a regular "git add" is forbidden:

       $ git add repo
       error: cannot add a submodule of a different hash algorithm
       error: unable to index file 'repo'
       fatal: updating files failed

     I can still do this:

       $ git add -N repo
       warning: adding embedded git repository: repo
       $ git ls-files -s
       160000 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0	repo

     which skips the hash check entirely. Which kind of makes sense,
     because the resulting index entry does not have a real oid in it at
     all (it gets the empty blob oid). But it does have a real 160000
     mode.

     Can we make things worse from there? If we try to update it, for
     example, that will fail:

       $ git add -u
       error: cannot add a submodule of a different hash algorithm
       error: unable to index file 'repo'
       fatal: updating files failed

     So...maybe this is OK?

> +test_expect_success 'cannot add a submodule of a different algorithm' '
> +	git init --object-format=sha256 sha256 &&
> +	(
> +		cd sha256 &&
> +		test_commit abc &&
> +		git init --object-format=sha1 submodule &&
> +		(
> +			cd submodule &&
> +			test_commit def
> +		) &&
> +		test_must_fail git add submodule &&
> +		test $(git ls-files --stage | grep ^160000 | wc -l) -eq 0
> +	) &&

Makes sense. Purists might complain about "git ls-files" on the left
hand side of a pipe, but I think it is OK here. Though you can golf away
a few subprocesses at the same time with:

diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index b075eb9b11..6a1d4e2659 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -547,12 +547,10 @@ test_expect_success 'cannot add a submodule of a different algorithm' '
 		cd sha256 &&
 		test_commit abc &&
 		git init --object-format=sha1 submodule &&
-		(
-			cd submodule &&
-			test_commit def
-		) &&
+		test_commit -C submodule def &&
 		test_must_fail git add submodule &&
-		test $(git ls-files --stage | grep ^160000 | wc -l) -eq 0
+		git ls-files --stage >entries &&
+		test_grep ! ^160000 entries
 	) &&
 	git init --object-format=sha1 sha1 &&
 	(

but we are getting into nits there.

Is it worth checking the stderr of the failing "git add submodule" call?
Adding a repo directly via "git add" is already something we generate a
warning for, and it's possible we might eventually make it an error. In
which case the command would fail without even hitting your new code,
but we'd have no idea. Adding in a test_grep for "cannot add a submodule
of a different hash algorithm" would at least make sure we're hitting
the error we expect.

-Peff

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] object-file: disallow adding submodules of different hash algo
  2025-11-13  3:26   ` Jeff King
@ 2025-11-13  3:56     ` Jeff King
  2025-11-13 16:29       ` Junio C Hamano
  2025-11-13 23:15     ` brian m. carlson
  1 sibling, 1 reply; 21+ messages in thread
From: Jeff King @ 2025-11-13  3:56 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Junio C Hamano, Martin Wilck, Adrian Schroeter

On Wed, Nov 12, 2025 at 10:26:24PM -0500, Jeff King wrote:

>      This whole lookup does feel a little funny and redundant. It comes
>      from f937bc2f86 (add: error appropriately on repository with no
>      commits, 2019-04-09), and the main goal is making the error message
>      better. But should we just improve the error message from
>      index_path() for this case (in which case the resolve call above go
>      away)?
> 
>      I think this is mostly orthogonal to your patch and we can ignore
>      it for now. I only bring it up because now it's weird that we are
>      trying to catch the hash mismatch, but have this unchecked extra
>      resolve.

So this is what I'd propose on top of your patch. I can hold onto it for
later if we don't want to muddy up what you're trying to do.

-- >8 --
Subject: [PATCH] read-cache: drop submodule check from add_to_cache()

In add_to_cache(), we treat any directories as submodules, and complain
if we can't resolve their HEAD. This call to resolve_gitlink_ref() was
added by f937bc2f86 (add: error appropriately on repository with no
commits, 2019-04-09), with the goal of improving the error message for
empty repositories.

But we already resolve the submodule HEAD in index_path(), which is
where we find the actual oid we're going to use. Resolving it again here
introduces some downsides:

  1. It's more work, since we have to open up the submodule repository's
     files twice.

  2. There are call paths that get to index_path() without going through
     add_to_cache(). For instance, we'd want a similar informative
     message if "git diff empty" finds that it can't resolve the
     submodule's HEAD. (In theory we can also get there through
     update-index, but AFAICT it refuses to consider directories as
     submodules at all, and just complains about them).

  3. The resolution in index_path() catches more errors that we don't
     handle here. In particular, it will validate that the object format
     for the submodule matches that of the superproject. This isn't a
     bug, since our call in add_to_cache() throws away the oid it gets
     without looking at it. But it certainly caused confusion for me
     when looking at where the object-format check should go.

So instead of resolving the submodule HEAD in add_to_cache(), let's just
teach the call in index_path() to actually produce an error message
(which it already does for other cases). That's probably what f937bc2f86
should have done in the first place, and it gives us a single point of
resolution when adding a submodule to the index.

The resulting output is slightly more verbose, as we propagate the error
up the call stack, but I think that's OK (and again, matches many other
errors we get when indexing fails).

I've left the text of the error message as-is, though it is perhaps
overly specific.  There are many reasons that resolving the submodule
HEAD might fail, though outside of corruption or system errors it is
probably most likely that the submodule HEAD is simply on an unborn
branch.

Signed-off-by: Jeff King <peff@peff.net>
---
 object-file.c  | 2 +-
 read-cache.c   | 3 ---
 t/t3700-add.sh | 1 +
 3 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/object-file.c b/object-file.c
index 8c43c52ed0..a7438b6205 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1662,7 +1662,7 @@ int index_path(struct index_state *istate, struct object_id *oid,
 		break;
 	case S_IFDIR:
 		if (repo_resolve_gitlink_ref(istate->repo, path, "HEAD", oid))
-			return -1;
+			return error(_("'%s' does not have a commit checked out"), path);
 		if (&hash_algos[oid->algo] != istate->repo->hash_algo)
 			return error(_("cannot add a submodule of a different hash algorithm"));
 		break;
diff --git a/read-cache.c b/read-cache.c
index 032480d0c7..990d4ead0d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -706,7 +706,6 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
 	int add_option = (ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE|
 			  (intent_only ? ADD_CACHE_NEW_ONLY : 0));
 	unsigned hash_flags = pretend ? 0 : INDEX_WRITE_OBJECT;
-	struct object_id oid;
 
 	if (flags & ADD_CACHE_RENORMALIZE)
 		hash_flags |= INDEX_RENORMALIZE;
@@ -716,8 +715,6 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
 
 	namelen = strlen(path);
 	if (S_ISDIR(st_mode)) {
-		if (repo_resolve_gitlink_ref(the_repository, path, "HEAD", &oid) < 0)
-			return error(_("'%s' does not have a commit checked out"), path);
 		while (namelen && path[namelen-1] == '/')
 			namelen--;
 	}
diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index b075eb9b11..d8cc0e4c66 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -388,6 +388,7 @@ test_expect_success 'error on a repository with no commits' '
 	test_must_fail git add empty >actual 2>&1 &&
 	cat >expect <<-EOF &&
 	error: '"'empty/'"' does not have a commit checked out
+	error: unable to index file '"'empty/'"'
 	fatal: adding files failed
 	EOF
 	test_cmp expect actual
-- 
2.52.0.rc2.237.g020256b90a


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: git fails to checkout SHA1 submodule in SHA256 repo with --depth=1
  2025-11-12 23:37   ` brian m. carlson
@ 2025-11-13 10:15     ` Martin Wilck
  2025-11-13 22:51       ` brian m. carlson
  2025-11-14 22:55       ` Marc Branchaud
  0 siblings, 2 replies; 21+ messages in thread
From: Martin Wilck @ 2025-11-13 10:15 UTC (permalink / raw)
  To: brian m. carlson, Junio C Hamano, git, Adrian Schroeter

On Wed, 2025-11-12 at 23:37 +0000, brian m. carlson wrote:
> On 2025-11-12 at 16:32:01, Junio C Hamano wrote:
> > Martin Wilck <mwilck@suse.com> writes:
> > 
> > > > Subject: Re: git fails to checkout SHA1 submodule in SHA256
> > > > repo with --depth=1
> > 
> > I think it is not supposed to work to mix repositories like this,
> > regardless of any other option like --depth.  I think brian gave a
> > response to that effect in a thread in the past few months.
> > 
> >     ... goes and looks ...
> > 
> > https://lore.kernel.org/git/aJ5gOPQ9oologqj-@fruit.crustytoothpaste.net/
> > https://lore.kernel.org/git/aKPJNNWMW9gtueEK@fruit.crustytoothpaste.net/
> 
> Yes, that isn't going to work and it never will unless we add some
> extension mechanism for that purpose.  The repository in question is
> corrupt.

Ok, thanks for the clarification.

Let me just explain the use case: The distribution ((open)SUSE) has
switched to git for version control of its packages. We have chosen
SHA256, because we'll need to support the distribution for many years
to come, much longer than SHA1 is going to be considered good enough.

We can store the source code of the package e.g. in the form of
tarballs (and we do). But it's convenient and efficient, and thus
tempting for developers, to simply link to an existing repository
hosting the sources, using a submodule. And upstream repos still use
SHA1. This is what lead us to experiment with this sort of mixed
repository.

I get it that the concept is flawed and unsupported. Up to now, that
wasn't obvious to me.

So what we can do now is either keep storing tarballs, or wait until
there's a full solution for migration between / interoperability of
different hash algorithms, and until the source code repos we're
interested in have been fully migrated to SHA256. In some special
cases, where (open)SUSE owns the source repositories, we may be able to
simply migrate to a SHA256 forge. We can also invent a "poor man's
submodule" mechanism to link to sources on some external repository
from ours [1].

Do you see any other approach that I'm overlooking?

Another question: If I, in the current repo [2], create a commit on top
removing the submodule and replacing it by a tarball, would the
repository remain broken, as it would still have the deprecated
SHA256/SHA1 combination in the history? Should I expect errors if run
e.g. "git rebase" or "git bisect" in a repository like this? IOW, do I
need to rewrite the history of this repo, eliminating all instances of
such mixed-hash submodules, to be on the safe side?

Thanks,
Martin

[1] Such a thing exists already, but to me it feels less clean and
elegant than using native git functionality.
[2] https://src.opensuse.org/mwilck/multipath-tools

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] object-file: disallow adding submodules of different hash algo
  2025-11-13  3:56     ` Jeff King
@ 2025-11-13 16:29       ` Junio C Hamano
  2025-11-14 23:26         ` brian m. carlson
  0 siblings, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2025-11-13 16:29 UTC (permalink / raw)
  To: Jeff King; +Cc: brian m. carlson, git, Martin Wilck, Adrian Schroeter

Jeff King <peff@peff.net> writes:

> So this is what I'd propose on top of your patch. I can hold onto it for
> later if we don't want to muddy up what you're trying to do.

I do agree with both of the above.  The patch below makes perfect
sense to me, and it is more about the quality of implementation of
this codepath in general, than the primary theme of Brian's changes,
so there is no strong reason they have to come in a single series.

> -- >8 --
> Subject: [PATCH] read-cache: drop submodule check from add_to_cache()
>
> In add_to_cache(), we treat any directories as submodules, and complain
> if we can't resolve their HEAD. This call to resolve_gitlink_ref() was
> added by f937bc2f86 (add: error appropriately on repository with no
> commits, 2019-04-09), with the goal of improving the error message for
> empty repositories.
>
> But we already resolve the submodule HEAD in index_path(), which is
> where we find the actual oid we're going to use. Resolving it again here
> introduces some downsides:
>
>   1. It's more work, since we have to open up the submodule repository's
>      files twice.
>
>   2. There are call paths that get to index_path() without going through
>      add_to_cache(). For instance, we'd want a similar informative
>      message if "git diff empty" finds that it can't resolve the
>      submodule's HEAD. (In theory we can also get there through
>      update-index, but AFAICT it refuses to consider directories as
>      submodules at all, and just complains about them).
>
>   3. The resolution in index_path() catches more errors that we don't
>      handle here. In particular, it will validate that the object format
>      for the submodule matches that of the superproject. This isn't a
>      bug, since our call in add_to_cache() throws away the oid it gets
>      without looking at it. But it certainly caused confusion for me
>      when looking at where the object-format check should go.
>
> So instead of resolving the submodule HEAD in add_to_cache(), let's just
> teach the call in index_path() to actually produce an error message
> (which it already does for other cases). That's probably what f937bc2f86
> should have done in the first place, and it gives us a single point of
> resolution when adding a submodule to the index.
>
> The resulting output is slightly more verbose, as we propagate the error
> up the call stack, but I think that's OK (and again, matches many other
> errors we get when indexing fails).
>
> I've left the text of the error message as-is, though it is perhaps
> overly specific.  There are many reasons that resolving the submodule
> HEAD might fail, though outside of corruption or system errors it is
> probably most likely that the submodule HEAD is simply on an unborn
> branch.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  object-file.c  | 2 +-
>  read-cache.c   | 3 ---
>  t/t3700-add.sh | 1 +
>  3 files changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/object-file.c b/object-file.c
> index 8c43c52ed0..a7438b6205 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1662,7 +1662,7 @@ int index_path(struct index_state *istate, struct object_id *oid,
>  		break;
>  	case S_IFDIR:
>  		if (repo_resolve_gitlink_ref(istate->repo, path, "HEAD", oid))
> -			return -1;
> +			return error(_("'%s' does not have a commit checked out"), path);
>  		if (&hash_algos[oid->algo] != istate->repo->hash_algo)
>  			return error(_("cannot add a submodule of a different hash algorithm"));
>  		break;
> diff --git a/read-cache.c b/read-cache.c
> index 032480d0c7..990d4ead0d 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -706,7 +706,6 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
>  	int add_option = (ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE|
>  			  (intent_only ? ADD_CACHE_NEW_ONLY : 0));
>  	unsigned hash_flags = pretend ? 0 : INDEX_WRITE_OBJECT;
> -	struct object_id oid;
>  
>  	if (flags & ADD_CACHE_RENORMALIZE)
>  		hash_flags |= INDEX_RENORMALIZE;
> @@ -716,8 +715,6 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
>  
>  	namelen = strlen(path);
>  	if (S_ISDIR(st_mode)) {
> -		if (repo_resolve_gitlink_ref(the_repository, path, "HEAD", &oid) < 0)
> -			return error(_("'%s' does not have a commit checked out"), path);
>  		while (namelen && path[namelen-1] == '/')
>  			namelen--;
>  	}
> diff --git a/t/t3700-add.sh b/t/t3700-add.sh
> index b075eb9b11..d8cc0e4c66 100755
> --- a/t/t3700-add.sh
> +++ b/t/t3700-add.sh
> @@ -388,6 +388,7 @@ test_expect_success 'error on a repository with no commits' '
>  	test_must_fail git add empty >actual 2>&1 &&
>  	cat >expect <<-EOF &&
>  	error: '"'empty/'"' does not have a commit checked out
> +	error: unable to index file '"'empty/'"'
>  	fatal: adding files failed
>  	EOF
>  	test_cmp expect actual

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: git fails to checkout SHA1 submodule in SHA256 repo with --depth=1
  2025-11-13 10:15     ` Martin Wilck
@ 2025-11-13 22:51       ` brian m. carlson
  2025-11-13 22:57         ` Martin Wilck
  2025-11-14 22:55       ` Marc Branchaud
  1 sibling, 1 reply; 21+ messages in thread
From: brian m. carlson @ 2025-11-13 22:51 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Junio C Hamano, git, Adrian Schroeter

[-- Attachment #1: Type: text/plain, Size: 3961 bytes --]

On 2025-11-13 at 10:15:30, Martin Wilck wrote:
> Let me just explain the use case: The distribution ((open)SUSE) has
> switched to git for version control of its packages. We have chosen
> SHA256, because we'll need to support the distribution for many years
> to come, much longer than SHA1 is going to be considered good enough.
> 
> We can store the source code of the package e.g. in the form of
> tarballs (and we do). But it's convenient and efficient, and thus
> tempting for developers, to simply link to an existing repository
> hosting the sources, using a submodule. And upstream repos still use
> SHA1. This is what lead us to experiment with this sort of mixed
> repository.
> 
> I get it that the concept is flawed and unsupported. Up to now, that
> wasn't obvious to me.

It wasn't obvious to a lot of people because the assumption is only
documented in the transition plan.  Mixed algorithm submodules are
something we should have thought about earlier on, but we didn't.

The original transition plan was to have full interoperability support
early on, in which case this wouldn't have been a problem, but for
technical reasons it ended up being much easier to have SHA-256-only
repositories, so we finished that first.  I then lost interest in the
project for many years (having kind of burnt out on the work) and, with
the exception of one set of patches that were sent in, nobody else
picked it up either.  So this ideally would have been implemented sooner
under the original plan, but it wasn't.  This is one of the pitfalls of
open source projects, as we all know.

> So what we can do now is either keep storing tarballs, or wait until
> there's a full solution for migration between / interoperability of
> different hash algorithms, and until the source code repos we're
> interested in have been fully migrated to SHA256. In some special
> cases, where (open)SUSE owns the source repositories, we may be able to
> simply migrate to a SHA256 forge. We can also invent a "poor man's
> submodule" mechanism to link to sources on some external repository
> from ours [1].

There is work underway for SHA-1/SHA-256 interoperability.  However, it
is a thing I'm working on primarily in my free time (although also a
little at work) and nobody else has stepped up to contribute, so it will
take some time to complete and is not expected to be included in Git
3.0 without additional assistance.

To be clear, the existing interoperability code does work for
repositories without submodules and you may find it on the
`sha256-interop` branch at https://github.com/bk2204/git.git, but it
isn't production ready.  I might use it myself, but I would not
recommend others do so in its current state.

For right now, I would say that you should keep storing tarballs.

> Another question: If I, in the current repo [2], create a commit on top
> removing the submodule and replacing it by a tarball, would the
> repository remain broken, as it would still have the deprecated
> SHA256/SHA1 combination in the history? Should I expect errors if run
> e.g. "git rebase" or "git bisect" in a repository like this? IOW, do I
> need to rewrite the history of this repo, eliminating all instances of
> such mixed-hash submodules, to be on the safe side?

If the mismatched submodule remains in the history, then it will be
broken, so you'll almost certainly want to rewrite the history.

When the interoperability work is done and fully functional, a
repository in this situation will not work at all with dual algorithms.
The reason for that is that the SHA-256 main repository will try to use
the submodule value as a SHA-256 submodule (using the incorrect padded
SHA-1 value as a SHA-256 value) and it won't be able to create the
appropriate SHA-1/SHA-256 mapping since that object ID doesn't really
exist in the submodule.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: git fails to checkout SHA1 submodule in SHA256 repo with --depth=1
  2025-11-13 22:51       ` brian m. carlson
@ 2025-11-13 22:57         ` Martin Wilck
  0 siblings, 0 replies; 21+ messages in thread
From: Martin Wilck @ 2025-11-13 22:57 UTC (permalink / raw)
  To: brian m. carlson, Junio C Hamano, git, Adrian Schroeter

On Thu, 2025-11-13 at 22:51 +0000, brian m. carlson wrote:
> On 2025-11-13 at 10:15:30, Martin Wilck wrote:
> > 
> > I get it that the concept is flawed and unsupported. Up to now,
> > that
> > wasn't obvious to me.
> 
> It wasn't obvious to a lot of people because the assumption is only
> documented in the transition plan.  Mixed algorithm submodules are
> something we should have thought about earlier on, but we didn't.
> 
> The original transition plan was to have full interoperability
> support
> early on, in which case this wouldn't have been a problem, but for
> technical reasons it ended up being much easier to have SHA-256-only
> repositories, so we finished that first.  I then lost interest in the
> project for many years (having kind of burnt out on the work) and,
> with
> the exception of one set of patches that were sent in, nobody else
> picked it up either.  So this ideally would have been implemented
> sooner
> under the original plan, but it wasn't.  This is one of the pitfalls
> of
> open source projects, as we all know.
>
> [...]

Thank you for the detailed response, and good luck finding some support
for this very important project.

Martin


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] object-file: disallow adding submodules of different hash algo
  2025-11-13  3:26   ` Jeff King
  2025-11-13  3:56     ` Jeff King
@ 2025-11-13 23:15     ` brian m. carlson
  1 sibling, 0 replies; 21+ messages in thread
From: brian m. carlson @ 2025-11-13 23:15 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, Martin Wilck, Adrian Schroeter

[-- Attachment #1: Type: text/plain, Size: 5408 bytes --]

On 2025-11-13 at 03:26:19, Jeff King wrote:
> This makes sense. I had meant to follow up on our conversation and patch
> from last month, but it was still on my todo list. Fortunately that
> earlier attempt gives me something concrete to compare to. ;)

That's why I CC'd you.  I hadn't seen your patch and wasn't sure if I'd
missed it or not.  I wanted to avoid us doing duplicate work.

> OK, you're checking for it here in index_path(), whereas my earlier
> attempt did it in add_to_index(). For the most part, I think your spot
> makes more sense, as it is at a lower level. add_to_index() eventually
> calls into index_path(), and so do some other code paths.

Yeah, I was hoping to catch any code path someone might use that updated
the index by path and this seemed like the most likely candidate.  I
think we'd be okay for `git update-index` or other things that update by
hash because they always interpret the object ID as the main algorithm.

> That does leave two interesting oddities:
> 
>   1. In add_to_index(), we have this code:
> 
>           if (S_ISDIR(st_mode)) {
>                   if (repo_resolve_gitlink_ref(the_repository, path, "HEAD", &oid) < 0)
>                           return error(_("'%s' does not have a commit checked out"), path);
>                   while (namelen && path[namelen-1] == '/')
>                           namelen--;
>           }
> 
>      which is run before we hit index_path(). So it may get an oid
>      result with an unexpected hash. I think that's OK, because nobody
>      ever looks at it (which would be a lot more obvious if we declared
>      the variable inside the conditional block here).
> 
>      This whole lookup does feel a little funny and redundant. It comes
>      from f937bc2f86 (add: error appropriately on repository with no
>      commits, 2019-04-09), and the main goal is making the error message
>      better. But should we just improve the error message from
>      index_path() for this case (in which case the resolve call above go
>      away)?
> 
>      I think this is mostly orthogonal to your patch and we can ignore
>      it for now. I only bring it up because now it's weird that we are
>      trying to catch the hash mismatch, but have this unchecked extra
>      resolve.

Yeah, I think we can.  I may have touched this case in my interop
series, but I think it's just to add another parameter to the function.
I don't recall anything super interesting about this in terms of hash
interoperability here.

>   2. There are paths in add_to_index() that _don't_ hit index_path(). In
>      particular, intent-to-add entries. So with your patch, even though
>      a regular "git add" is forbidden:
> 
>        $ git add repo
>        error: cannot add a submodule of a different hash algorithm
>        error: unable to index file 'repo'
>        fatal: updating files failed
> 
>      I can still do this:
> 
>        $ git add -N repo
>        warning: adding embedded git repository: repo
>        $ git ls-files -s
>        160000 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0	repo
> 
>      which skips the hash check entirely. Which kind of makes sense,
>      because the resulting index entry does not have a real oid in it at
>      all (it gets the empty blob oid). But it does have a real 160000
>      mode.

Right.  I at first tried to add the check there and the I realized it
was actually storing the empty blob OID and thus the hash algorithm
would never mismatch.

>      Can we make things worse from there? If we try to update it, for
>      example, that will fail:
> 
>        $ git add -u
>        error: cannot add a submodule of a different hash algorithm
>        error: unable to index file 'repo'
>        fatal: updating files failed
> 
>      So...maybe this is OK?

I think it is, or at least it's the best we can do.  If you do a commit
with the empty blob OID as your submodule, you already have a corrupt
repository, so catching it later on when you do `git add` or `git
update-index` should be okay.

> Makes sense. Purists might complain about "git ls-files" on the left
> hand side of a pipe, but I think it is OK here. Though you can golf away
> a few subprocesses at the same time with:

I stole that line from elsewhere in t3700 and modified it.  I agree that
it is maybe not the ideal style for our codebase, but I did appreciate
the elegance of doing the operation in a single line.

> Is it worth checking the stderr of the failing "git add submodule" call?
> Adding a repo directly via "git add" is already something we generate a
> warning for, and it's possible we might eventually make it an error. In
> which case the command would fail without even hitting your new code,
> but we'd have no idea. Adding in a test_grep for "cannot add a submodule
> of a different hash algorithm" would at least make sure we're hitting
> the error we expect.

I can do a v2 with those changes.  I will say that I also added changes
for `git submodule add`, which would catch the problem you mentioned,
because I thought that was the code most likely to have a different
implementation at some point in the future and I wanted to catch that
and also anybody doing the sensible thing in terms of adding submodules.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: git fails to checkout SHA1 submodule in SHA256 repo with --depth=1
  2025-11-13 10:15     ` Martin Wilck
  2025-11-13 22:51       ` brian m. carlson
@ 2025-11-14 22:55       ` Marc Branchaud
  2025-11-15 20:14         ` brian m. carlson
  1 sibling, 1 reply; 21+ messages in thread
From: Marc Branchaud @ 2025-11-14 22:55 UTC (permalink / raw)
  To: Martin Wilck, brian m. carlson, Junio C Hamano, git,
	Adrian Schroeter


On 2025-11-13 03:15, Martin Wilck wrote:
> On Wed, 2025-11-12 at 23:37 +0000, brian m. carlson wrote:
>> On 2025-11-12 at 16:32:01, Junio C Hamano wrote:
>>> Martin Wilck <mwilck@suse.com> writes:
>>>
>>>>> Subject: Re: git fails to checkout SHA1 submodule in SHA256
>>>>> repo with --depth=1
>>>
>>> I think it is not supposed to work to mix repositories like this,
>>> regardless of any other option like --depth.  I think brian gave a
>>> response to that effect in a thread in the past few months.
>>>
>>>      ... goes and looks ...
>>>
>>> https://lore.kernel.org/git/aJ5gOPQ9oologqj-@fruit.crustytoothpaste.net/
>>> https://lore.kernel.org/git/aKPJNNWMW9gtueEK@fruit.crustytoothpaste.net/
>>
>> Yes, that isn't going to work and it never will unless we add some
>> extension mechanism for that purpose.  The repository in question is
>> corrupt.
> 
> Ok, thanks for the clarification.
> 
> Let me just explain the use case: The distribution ((open)SUSE) has
> switched to git for version control of its packages. We have chosen
> SHA256, because we'll need to support the distribution for many years
> to come, much longer than SHA1 is going to be considered good enough.
> 
> We can store the source code of the package e.g. in the form of
> tarballs (and we do). But it's convenient and efficient, and thus
> tempting for developers, to simply link to an existing repository
> hosting the sources, using a submodule. And upstream repos still use
> SHA1. This is what lead us to experiment with this sort of mixed
> repository.
> 
> I get it that the concept is flawed and unsupported. Up to now, that
> wasn't obvious to me.
> 
> So what we can do now is either keep storing tarballs, or wait until
> there's a full solution for migration between / interoperability of
> different hash algorithms, and until the source code repos we're
> interested in have been fully migrated to SHA256. In some special
> cases, where (open)SUSE owns the source repositories, we may be able to
> simply migrate to a SHA256 forge. We can also invent a "poor man's
> submodule" mechanism to link to sources on some external repository
> from ours [1].
> 
> Do you see any other approach that I'm overlooking?

Set up SHA256 mirrors of the SHA1 repos you want to track in submodules?

I'm assuming that it would be easy to keep such mirrors up to date.  I'm 
not familiar enough with Brian's work to know if a SHA256 repo can have
a SHA1 remote (which would mean that updates would just be a simple "git 
fetch origin"), but even if that's not possible I'm guessing that 
scripting some kind of regularly-run translation of new commits wouldn't 
be too hard.

(This feels similar to Git's early years, when people set up Git mirrors 
of repos that used other VCSes like subversion...)

That might also help encourage your upstreams to switch, if you can give 
them a ready-to-use SHA256 version of their repos.

		M.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] object-file: disallow adding submodules of different hash algo
  2025-11-13 16:29       ` Junio C Hamano
@ 2025-11-14 23:26         ` brian m. carlson
  2025-11-15  1:53           ` Jeff King
  0 siblings, 1 reply; 21+ messages in thread
From: brian m. carlson @ 2025-11-14 23:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git, Martin Wilck, Adrian Schroeter

[-- Attachment #1: Type: text/plain, Size: 762 bytes --]

On 2025-11-13 at 16:29:00, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
> > So this is what I'd propose on top of your patch. I can hold onto it for
> > later if we don't want to muddy up what you're trying to do.
> 
> I do agree with both of the above.  The patch below makes perfect
> sense to me, and it is more about the quality of implementation of
> this codepath in general, than the primary theme of Brian's changes,
> so there is no strong reason they have to come in a single series.

That's certainly true, but it still applies cleanly on top of my revised
patch and I'll just include it in v2.  I'm running tests now and should
send out the series later tonight.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 1/2] object-file: disallow adding submodules of different hash algo
  2025-11-12 23:54 ` [PATCH] object-file: disallow adding submodules of different hash algo brian m. carlson
  2025-11-13  3:26   ` Jeff King
@ 2025-11-15  0:58   ` brian m. carlson
  2025-11-15  0:58     ` [PATCH v2 2/2] read-cache: drop submodule check from add_to_cache() brian m. carlson
                       ` (2 more replies)
  1 sibling, 3 replies; 21+ messages in thread
From: brian m. carlson @ 2025-11-15  0:58 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Martin Wilck, Adrian Schroeter

The design of the hash algorithm transition plan is that objects stored
must be entirely in one algorithm since we lack any way to indicate a
mix of algorithms.  This also includes submodules, but we have
traditionally not enforced this, which leads to various problems when
trying to clone or check out the the submodule from the remote.

Since this cannot work in the general case, restrict adding a submodule
of a different algorithm to the index.  Add tests for git add and git
submodule add that these are rejected.

Note that we cannot check this in git fsck because the malformed
submodule is stored in the tree as an object ID which is either
truncated (when a SHA-256 submodule is added to a SHA-1 repository) or
padded with zeros (when a SHA-1 submodule is added to a SHA-256
repository).  We cannot detect even the latter case because someone
could have an actual submodule that actually ends in 24 zeros, which
would be a false positive.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 object-file.c              |  6 +++++-
 t/t3700-add.sh             | 25 +++++++++++++++++++++++++
 t/t7400-submodule-basic.sh | 25 +++++++++++++++++++++++++
 3 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/object-file.c b/object-file.c
index 4675c8ed6b..8c43c52ed0 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1661,7 +1661,11 @@ int index_path(struct index_state *istate, struct object_id *oid,
 		strbuf_release(&sb);
 		break;
 	case S_IFDIR:
-		return repo_resolve_gitlink_ref(istate->repo, path, "HEAD", oid);
+		if (repo_resolve_gitlink_ref(istate->repo, path, "HEAD", oid))
+			return -1;
+		if (&hash_algos[oid->algo] != istate->repo->hash_algo)
+			return error(_("cannot add a submodule of a different hash algorithm"));
+		break;
 	default:
 		return error(_("%s: unsupported file type"), path);
 	}
diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index df580a5806..9a2c8dbcc2 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -541,6 +541,31 @@ test_expect_success 'all statuses changed in folder if . is given' '
 	)
 '
 
+test_expect_success 'cannot add a submodule of a different algorithm' '
+	git init --object-format=sha256 sha256 &&
+	(
+		cd sha256 &&
+		test_commit abc &&
+		git init --object-format=sha1 submodule &&
+		test_commit -C submodule def &&
+		test_must_fail git add submodule 2>err &&
+		test_grep "cannot add a submodule of a different hash algorithm" err &&
+		git ls-files --stage >entries &&
+		test_grep ! ^160000 entries
+	) &&
+	git init --object-format=sha1 sha1 &&
+	(
+		cd sha1 &&
+		test_commit abc &&
+		git init --object-format=sha256 submodule &&
+		test_commit -C submodule def &&
+		test_must_fail git add submodule 2>err &&
+		test_grep "cannot add a submodule of a different hash algorithm" err &&
+		git ls-files --stage >entries &&
+		test_grep ! ^160000 entries
+	)
+'
+
 test_expect_success CASE_INSENSITIVE_FS 'path is case-insensitive' '
 	path="$(pwd)/BLUB" &&
 	touch "$path" &&
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index fd3e7e355e..e6b551daad 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -407,6 +407,31 @@ test_expect_success 'submodule add in subdirectory with relative path should fai
 	test_grep toplevel output.err
 '
 
+test_expect_success 'submodule add of a different algorithm fails' '
+	git init --object-format=sha256 sha256 &&
+	(
+		cd sha256 &&
+		test_commit abc &&
+		git init --object-format=sha1 submodule &&
+		test_commit -C submodule def &&
+		test_must_fail git submodule add "$submodurl" submodule 2>err &&
+		test_grep "cannot add a submodule of a different hash algorithm" err &&
+		git ls-files --stage >entries &&
+		test_grep ! ^160000 entries
+	) &&
+	git init --object-format=sha1 sha1 &&
+	(
+		cd sha1 &&
+		test_commit abc &&
+		git init --object-format=sha256 submodule &&
+		test_commit -C submodule def &&
+		test_must_fail git submodule add "$submodurl" submodule 2>err &&
+		test_grep "cannot add a submodule of a different hash algorithm" err &&
+		git ls-files --stage >entries &&
+		test_grep ! ^160000 entries
+	)
+'
+
 test_expect_success 'setup - add an example entry to .gitmodules' '
 	git config --file=.gitmodules submodule.example.url git://example.com/init.git
 '

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 2/2] read-cache: drop submodule check from add_to_cache()
  2025-11-15  0:58   ` [PATCH v2 1/2] " brian m. carlson
@ 2025-11-15  0:58     ` brian m. carlson
  2025-11-15 19:57       ` Junio C Hamano
  2025-11-15 19:53     ` [PATCH v2 1/2] object-file: disallow adding submodules of different hash algo Junio C Hamano
  2025-11-17  8:53     ` Martin Wilck
  2 siblings, 1 reply; 21+ messages in thread
From: brian m. carlson @ 2025-11-15  0:58 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Martin Wilck, Adrian Schroeter

From: Jeff King <peff@peff.net>

In add_to_cache(), we treat any directories as submodules, and complain
if we can't resolve their HEAD. This call to resolve_gitlink_ref() was
added by f937bc2f86 (add: error appropriately on repository with no
commits, 2019-04-09), with the goal of improving the error message for
empty repositories.

But we already resolve the submodule HEAD in index_path(), which is
where we find the actual oid we're going to use. Resolving it again here
introduces some downsides:

  1. It's more work, since we have to open up the submodule repository's
     files twice.

  2. There are call paths that get to index_path() without going through
     add_to_cache(). For instance, we'd want a similar informative
     message if "git diff empty" finds that it can't resolve the
     submodule's HEAD. (In theory we can also get there through
     update-index, but AFAICT it refuses to consider directories as
     submodules at all, and just complains about them).

  3. The resolution in index_path() catches more errors that we don't
     handle here. In particular, it will validate that the object format
     for the submodule matches that of the superproject. This isn't a
     bug, since our call in add_to_cache() throws away the oid it gets
     without looking at it. But it certainly caused confusion for me
     when looking at where the object-format check should go.

So instead of resolving the submodule HEAD in add_to_cache(), let's just
teach the call in index_path() to actually produce an error message
(which it already does for other cases). That's probably what f937bc2f86
should have done in the first place, and it gives us a single point of
resolution when adding a submodule to the index.

The resulting output is slightly more verbose, as we propagate the error
up the call stack, but I think that's OK (and again, matches many other
errors we get when indexing fails).

I've left the text of the error message as-is, though it is perhaps
overly specific.  There are many reasons that resolving the submodule
HEAD might fail, though outside of corruption or system errors it is
probably most likely that the submodule HEAD is simply on an unborn
branch.

Signed-off-by: Jeff King <peff@peff.net>
---
 object-file.c  | 2 +-
 read-cache.c   | 3 ---
 t/t3700-add.sh | 1 +
 3 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/object-file.c b/object-file.c
index 8c43c52ed0..a7438b6205 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1662,7 +1662,7 @@ int index_path(struct index_state *istate, struct object_id *oid,
 		break;
 	case S_IFDIR:
 		if (repo_resolve_gitlink_ref(istate->repo, path, "HEAD", oid))
-			return -1;
+			return error(_("'%s' does not have a commit checked out"), path);
 		if (&hash_algos[oid->algo] != istate->repo->hash_algo)
 			return error(_("cannot add a submodule of a different hash algorithm"));
 		break;
diff --git a/read-cache.c b/read-cache.c
index 032480d0c7..990d4ead0d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -706,7 +706,6 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
 	int add_option = (ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE|
 			  (intent_only ? ADD_CACHE_NEW_ONLY : 0));
 	unsigned hash_flags = pretend ? 0 : INDEX_WRITE_OBJECT;
-	struct object_id oid;
 
 	if (flags & ADD_CACHE_RENORMALIZE)
 		hash_flags |= INDEX_RENORMALIZE;
@@ -716,8 +715,6 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
 
 	namelen = strlen(path);
 	if (S_ISDIR(st_mode)) {
-		if (repo_resolve_gitlink_ref(the_repository, path, "HEAD", &oid) < 0)
-			return error(_("'%s' does not have a commit checked out"), path);
 		while (namelen && path[namelen-1] == '/')
 			namelen--;
 	}
diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index 9a2c8dbcc2..af93e53c12 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -388,6 +388,7 @@ test_expect_success 'error on a repository with no commits' '
 	test_must_fail git add empty >actual 2>&1 &&
 	cat >expect <<-EOF &&
 	error: '"'empty/'"' does not have a commit checked out
+	error: unable to index file '"'empty/'"'
 	fatal: adding files failed
 	EOF
 	test_cmp expect actual

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] object-file: disallow adding submodules of different hash algo
  2025-11-14 23:26         ` brian m. carlson
@ 2025-11-15  1:53           ` Jeff King
  0 siblings, 0 replies; 21+ messages in thread
From: Jeff King @ 2025-11-15  1:53 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Junio C Hamano, git, Martin Wilck, Adrian Schroeter

On Fri, Nov 14, 2025 at 11:26:15PM +0000, brian m. carlson wrote:

> On 2025-11-13 at 16:29:00, Junio C Hamano wrote:
> > Jeff King <peff@peff.net> writes:
> > 
> > > So this is what I'd propose on top of your patch. I can hold onto it for
> > > later if we don't want to muddy up what you're trying to do.
> > 
> > I do agree with both of the above.  The patch below makes perfect
> > sense to me, and it is more about the quality of implementation of
> > this codepath in general, than the primary theme of Brian's changes,
> > so there is no strong reason they have to come in a single series.
> 
> That's certainly true, but it still applies cleanly on top of my revised
> patch and I'll just include it in v2.  I'm running tests now and should
> send out the series later tonight.

Thanks, your v2 looks great to me!

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] object-file: disallow adding submodules of different hash algo
  2025-11-15  0:58   ` [PATCH v2 1/2] " brian m. carlson
  2025-11-15  0:58     ` [PATCH v2 2/2] read-cache: drop submodule check from add_to_cache() brian m. carlson
@ 2025-11-15 19:53     ` Junio C Hamano
  2025-11-17  8:53     ` Martin Wilck
  2 siblings, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2025-11-15 19:53 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Jeff King, Martin Wilck, Adrian Schroeter

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

>  object-file.c              |  6 +++++-
>  t/t3700-add.sh             | 25 +++++++++++++++++++++++++
>  t/t7400-submodule-basic.sh | 25 +++++++++++++++++++++++++
>  3 files changed, 55 insertions(+), 1 deletion(-)

Updated test reads much nicer.  Thanks for updating them.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/2] read-cache: drop submodule check from add_to_cache()
  2025-11-15  0:58     ` [PATCH v2 2/2] read-cache: drop submodule check from add_to_cache() brian m. carlson
@ 2025-11-15 19:57       ` Junio C Hamano
  2025-11-15 20:06         ` brian m. carlson
  0 siblings, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2025-11-15 19:57 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Jeff King, Martin Wilck, Adrian Schroeter

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> From: Jeff King <peff@peff.net>
>
> In add_to_cache(), we treat any directories as submodules, and complain
> if we can't resolve their HEAD. This call to resolve_gitlink_ref() was
> added by f937bc2f86 (add: error appropriately on repository with no
> commits, 2019-04-09), with the goal of improving the error message for
> empty repositories.
>
> But we already resolve the submodule HEAD in index_path(), which is
> where we find the actual oid we're going to use. Resolving it again here
> introduces some downsides:
>
>   1. It's more work, since we have to open up the submodule repository's
>      files twice.
>
>   2. There are call paths that get to index_path() without going through
>      add_to_cache(). For instance, we'd want a similar informative
>      message if "git diff empty" finds that it can't resolve the
>      submodule's HEAD. (In theory we can also get there through
>      update-index, but AFAICT it refuses to consider directories as
>      submodules at all, and just complains about them).
>
>   3. The resolution in index_path() catches more errors that we don't
>      handle here. In particular, it will validate that the object format
>      for the submodule matches that of the superproject. This isn't a
>      bug, since our call in add_to_cache() throws away the oid it gets
>      without looking at it. But it certainly caused confusion for me
>      when looking at where the object-format check should go.
>
> So instead of resolving the submodule HEAD in add_to_cache(), let's just
> teach the call in index_path() to actually produce an error message
> (which it already does for other cases). That's probably what f937bc2f86
> should have done in the first place, and it gives us a single point of
> resolution when adding a submodule to the index.
>
> The resulting output is slightly more verbose, as we propagate the error
> up the call stack, but I think that's OK (and again, matches many other
> errors we get when indexing fails).
>
> I've left the text of the error message as-is, though it is perhaps
> overly specific.  There are many reasons that resolving the submodule
> HEAD might fail, though outside of corruption or system errors it is
> probably most likely that the submodule HEAD is simply on an unborn
> branch.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---

A tangent.

You can (!) place your own sign-off after Peff's, as you would want
to certify

c. The contribution was provided directly to me by some other
   person who certified (a), (b) or (c) and I have not modified
   it.

of the DCO, but it seems that it is optional (which I did not know
about---the explanation in SubmittingPatches stops at "Indeed you
are encouraged to do so" without making it a requirement).

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/2] read-cache: drop submodule check from add_to_cache()
  2025-11-15 19:57       ` Junio C Hamano
@ 2025-11-15 20:06         ` brian m. carlson
  0 siblings, 0 replies; 21+ messages in thread
From: brian m. carlson @ 2025-11-15 20:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Martin Wilck, Adrian Schroeter

[-- Attachment #1: Type: text/plain, Size: 750 bytes --]

On 2025-11-15 at 19:57:17, Junio C Hamano wrote:
> A tangent.
> 
> You can (!) place your own sign-off after Peff's, as you would want
> to certify
> 
> c. The contribution was provided directly to me by some other
>    person who certified (a), (b) or (c) and I have not modified
>    it.
> 
> of the DCO, but it seems that it is optional (which I did not know
> about---the explanation in SubmittingPatches stops at "Indeed you
> are encouraged to do so" without making it a requirement).

Yes, I should have done so; my apologies.  Please feel free to add it:

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>

If you need, I can send a v3 with that fixed.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: git fails to checkout SHA1 submodule in SHA256 repo with --depth=1
  2025-11-14 22:55       ` Marc Branchaud
@ 2025-11-15 20:14         ` brian m. carlson
  0 siblings, 0 replies; 21+ messages in thread
From: brian m. carlson @ 2025-11-15 20:14 UTC (permalink / raw)
  To: Marc Branchaud; +Cc: Martin Wilck, Junio C Hamano, git, Adrian Schroeter

[-- Attachment #1: Type: text/plain, Size: 1296 bytes --]

On 2025-11-14 at 22:55:07, Marc Branchaud wrote:
> Set up SHA256 mirrors of the SHA1 repos you want to track in submodules?
> 
> I'm assuming that it would be easy to keep such mirrors up to date.  I'm not
> familiar enough with Brian's work to know if a SHA256 repo can have
> a SHA1 remote (which would mean that updates would just be a simple "git
> fetch origin"), but even if that's not possible I'm guessing that scripting
> some kind of regularly-run translation of new commits wouldn't be too hard.

You can set up a repository which supports both algorithms (for
instance, a SHA-256 main algorithm and SHA-1 compatibility) and then
interoperate with both algorithms, so yes, this is possible and it
really is as easy as `git fetch origin` once set up.

The current constraint is that the cloned repository must be a full
clone (not shallow or partial) and it must not have any submodules
itself.

Of course, that's what the interoperability work does and it's not
finished or even mostly upstreamed.  As I said, it's mostly functional,
but it's not production ready.  It also fails many tests because we have
many tests in our testsuite that require partial or shallow clones,
whose support is in progress.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] object-file: disallow adding submodules of different hash algo
  2025-11-15  0:58   ` [PATCH v2 1/2] " brian m. carlson
  2025-11-15  0:58     ` [PATCH v2 2/2] read-cache: drop submodule check from add_to_cache() brian m. carlson
  2025-11-15 19:53     ` [PATCH v2 1/2] object-file: disallow adding submodules of different hash algo Junio C Hamano
@ 2025-11-17  8:53     ` Martin Wilck
  2 siblings, 0 replies; 21+ messages in thread
From: Martin Wilck @ 2025-11-17  8:53 UTC (permalink / raw)
  To: brian m. carlson, git; +Cc: Junio C Hamano, Jeff King, Adrian Schroeter

On Sat, 2025-11-15 at 00:58 +0000, brian m. carlson wrote:
> 
> Note that we cannot check this in git fsck because the malformed
> submodule is stored in the tree as an object ID which is either
> truncated (when a SHA-256 submodule is added to a SHA-1 repository)
> or
> padded with zeros (when a SHA-1 submodule is added to a SHA-256
> repository).  We cannot detect even the latter case because someone
> could have an actual submodule that actually ends in 24 zeros, which
> would be a false positive.

Perhaps a warning could still be printed in this case? After all, the
probability for a false positive is about 1:16 million.

Thanks
Martin

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-11-17  8:53 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-12 12:58 git fails to checkout SHA1 submodule in SHA256 repo with --depth=1 Martin Wilck
2025-11-12 16:32 ` Junio C Hamano
2025-11-12 23:37   ` brian m. carlson
2025-11-13 10:15     ` Martin Wilck
2025-11-13 22:51       ` brian m. carlson
2025-11-13 22:57         ` Martin Wilck
2025-11-14 22:55       ` Marc Branchaud
2025-11-15 20:14         ` brian m. carlson
2025-11-12 23:54 ` [PATCH] object-file: disallow adding submodules of different hash algo brian m. carlson
2025-11-13  3:26   ` Jeff King
2025-11-13  3:56     ` Jeff King
2025-11-13 16:29       ` Junio C Hamano
2025-11-14 23:26         ` brian m. carlson
2025-11-15  1:53           ` Jeff King
2025-11-13 23:15     ` brian m. carlson
2025-11-15  0:58   ` [PATCH v2 1/2] " brian m. carlson
2025-11-15  0:58     ` [PATCH v2 2/2] read-cache: drop submodule check from add_to_cache() brian m. carlson
2025-11-15 19:57       ` Junio C Hamano
2025-11-15 20:06         ` brian m. carlson
2025-11-15 19:53     ` [PATCH v2 1/2] object-file: disallow adding submodules of different hash algo Junio C Hamano
2025-11-17  8:53     ` Martin Wilck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).