From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
Martin Wilck <mwilck@suse.com>,
Adrian Schroeter <adrian@suse.com>
Subject: Re: [PATCH] object-file: disallow adding submodules of different hash algo
Date: Thu, 13 Nov 2025 23:15:44 +0000 [thread overview]
Message-ID: <aRZmoAI_RjkRgB8l@fruit.crustytoothpaste.net> (raw)
In-Reply-To: <20251113032619.GA1739649@coredump.intra.peff.net>
[-- Attachment #1: Type: text/plain, Size: 5408 bytes --]
On 2025-11-13 at 03:26:19, Jeff King wrote:
> This makes sense. I had meant to follow up on our conversation and patch
> from last month, but it was still on my todo list. Fortunately that
> earlier attempt gives me something concrete to compare to. ;)
That's why I CC'd you. I hadn't seen your patch and wasn't sure if I'd
missed it or not. I wanted to avoid us doing duplicate work.
> OK, you're checking for it here in index_path(), whereas my earlier
> attempt did it in add_to_index(). For the most part, I think your spot
> makes more sense, as it is at a lower level. add_to_index() eventually
> calls into index_path(), and so do some other code paths.
Yeah, I was hoping to catch any code path someone might use that updated
the index by path and this seemed like the most likely candidate. I
think we'd be okay for `git update-index` or other things that update by
hash because they always interpret the object ID as the main algorithm.
> That does leave two interesting oddities:
>
> 1. In add_to_index(), we have this code:
>
> if (S_ISDIR(st_mode)) {
> if (repo_resolve_gitlink_ref(the_repository, path, "HEAD", &oid) < 0)
> return error(_("'%s' does not have a commit checked out"), path);
> while (namelen && path[namelen-1] == '/')
> namelen--;
> }
>
> which is run before we hit index_path(). So it may get an oid
> result with an unexpected hash. I think that's OK, because nobody
> ever looks at it (which would be a lot more obvious if we declared
> the variable inside the conditional block here).
>
> This whole lookup does feel a little funny and redundant. It comes
> from f937bc2f86 (add: error appropriately on repository with no
> commits, 2019-04-09), and the main goal is making the error message
> better. But should we just improve the error message from
> index_path() for this case (in which case the resolve call above go
> away)?
>
> I think this is mostly orthogonal to your patch and we can ignore
> it for now. I only bring it up because now it's weird that we are
> trying to catch the hash mismatch, but have this unchecked extra
> resolve.
Yeah, I think we can. I may have touched this case in my interop
series, but I think it's just to add another parameter to the function.
I don't recall anything super interesting about this in terms of hash
interoperability here.
> 2. There are paths in add_to_index() that _don't_ hit index_path(). In
> particular, intent-to-add entries. So with your patch, even though
> a regular "git add" is forbidden:
>
> $ git add repo
> error: cannot add a submodule of a different hash algorithm
> error: unable to index file 'repo'
> fatal: updating files failed
>
> I can still do this:
>
> $ git add -N repo
> warning: adding embedded git repository: repo
> $ git ls-files -s
> 160000 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0 repo
>
> which skips the hash check entirely. Which kind of makes sense,
> because the resulting index entry does not have a real oid in it at
> all (it gets the empty blob oid). But it does have a real 160000
> mode.
Right. I at first tried to add the check there and the I realized it
was actually storing the empty blob OID and thus the hash algorithm
would never mismatch.
> Can we make things worse from there? If we try to update it, for
> example, that will fail:
>
> $ git add -u
> error: cannot add a submodule of a different hash algorithm
> error: unable to index file 'repo'
> fatal: updating files failed
>
> So...maybe this is OK?
I think it is, or at least it's the best we can do. If you do a commit
with the empty blob OID as your submodule, you already have a corrupt
repository, so catching it later on when you do `git add` or `git
update-index` should be okay.
> Makes sense. Purists might complain about "git ls-files" on the left
> hand side of a pipe, but I think it is OK here. Though you can golf away
> a few subprocesses at the same time with:
I stole that line from elsewhere in t3700 and modified it. I agree that
it is maybe not the ideal style for our codebase, but I did appreciate
the elegance of doing the operation in a single line.
> Is it worth checking the stderr of the failing "git add submodule" call?
> Adding a repo directly via "git add" is already something we generate a
> warning for, and it's possible we might eventually make it an error. In
> which case the command would fail without even hitting your new code,
> but we'd have no idea. Adding in a test_grep for "cannot add a submodule
> of a different hash algorithm" would at least make sure we're hitting
> the error we expect.
I can do a v2 with those changes. I will say that I also added changes
for `git submodule add`, which would catch the problem you mentioned,
because I thought that was the code most likely to have a different
implementation at some point in the future and I wanted to catch that
and also anybody doing the sensible thing in terms of adding submodules.
--
brian m. carlson (they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
next prev parent reply other threads:[~2025-11-13 23:15 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-12 12:58 git fails to checkout SHA1 submodule in SHA256 repo with --depth=1 Martin Wilck
2025-11-12 16:32 ` Junio C Hamano
2025-11-12 23:37 ` brian m. carlson
2025-11-13 10:15 ` Martin Wilck
2025-11-13 22:51 ` brian m. carlson
2025-11-13 22:57 ` Martin Wilck
2025-11-14 22:55 ` Marc Branchaud
2025-11-15 20:14 ` brian m. carlson
2025-11-12 23:54 ` [PATCH] object-file: disallow adding submodules of different hash algo brian m. carlson
2025-11-13 3:26 ` Jeff King
2025-11-13 3:56 ` Jeff King
2025-11-13 16:29 ` Junio C Hamano
2025-11-14 23:26 ` brian m. carlson
2025-11-15 1:53 ` Jeff King
2025-11-13 23:15 ` brian m. carlson [this message]
2025-11-15 0:58 ` [PATCH v2 1/2] " brian m. carlson
2025-11-15 0:58 ` [PATCH v2 2/2] read-cache: drop submodule check from add_to_cache() brian m. carlson
2025-11-15 19:57 ` Junio C Hamano
2025-11-15 20:06 ` brian m. carlson
2025-11-15 19:53 ` [PATCH v2 1/2] object-file: disallow adding submodules of different hash algo Junio C Hamano
2025-11-17 8:53 ` Martin Wilck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aRZmoAI_RjkRgB8l@fruit.crustytoothpaste.net \
--to=sandals@crustytoothpaste.net \
--cc=adrian@suse.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mwilck@suse.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.