git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Martin Wilck <mwilck@suse.com>,
	Adrian Schroeter <adrian@suse.com>
Subject: Re: [PATCH] object-file: disallow adding submodules of different hash algo
Date: Thu, 13 Nov 2025 23:15:44 +0000	[thread overview]
Message-ID: <aRZmoAI_RjkRgB8l@fruit.crustytoothpaste.net> (raw)
In-Reply-To: <20251113032619.GA1739649@coredump.intra.peff.net>

[-- Attachment #1: Type: text/plain, Size: 5408 bytes --]

On 2025-11-13 at 03:26:19, Jeff King wrote:
> This makes sense. I had meant to follow up on our conversation and patch
> from last month, but it was still on my todo list. Fortunately that
> earlier attempt gives me something concrete to compare to. ;)

That's why I CC'd you.  I hadn't seen your patch and wasn't sure if I'd
missed it or not.  I wanted to avoid us doing duplicate work.

> OK, you're checking for it here in index_path(), whereas my earlier
> attempt did it in add_to_index(). For the most part, I think your spot
> makes more sense, as it is at a lower level. add_to_index() eventually
> calls into index_path(), and so do some other code paths.

Yeah, I was hoping to catch any code path someone might use that updated
the index by path and this seemed like the most likely candidate.  I
think we'd be okay for `git update-index` or other things that update by
hash because they always interpret the object ID as the main algorithm.

> That does leave two interesting oddities:
> 
>   1. In add_to_index(), we have this code:
> 
>           if (S_ISDIR(st_mode)) {
>                   if (repo_resolve_gitlink_ref(the_repository, path, "HEAD", &oid) < 0)
>                           return error(_("'%s' does not have a commit checked out"), path);
>                   while (namelen && path[namelen-1] == '/')
>                           namelen--;
>           }
> 
>      which is run before we hit index_path(). So it may get an oid
>      result with an unexpected hash. I think that's OK, because nobody
>      ever looks at it (which would be a lot more obvious if we declared
>      the variable inside the conditional block here).
> 
>      This whole lookup does feel a little funny and redundant. It comes
>      from f937bc2f86 (add: error appropriately on repository with no
>      commits, 2019-04-09), and the main goal is making the error message
>      better. But should we just improve the error message from
>      index_path() for this case (in which case the resolve call above go
>      away)?
> 
>      I think this is mostly orthogonal to your patch and we can ignore
>      it for now. I only bring it up because now it's weird that we are
>      trying to catch the hash mismatch, but have this unchecked extra
>      resolve.

Yeah, I think we can.  I may have touched this case in my interop
series, but I think it's just to add another parameter to the function.
I don't recall anything super interesting about this in terms of hash
interoperability here.

>   2. There are paths in add_to_index() that _don't_ hit index_path(). In
>      particular, intent-to-add entries. So with your patch, even though
>      a regular "git add" is forbidden:
> 
>        $ git add repo
>        error: cannot add a submodule of a different hash algorithm
>        error: unable to index file 'repo'
>        fatal: updating files failed
> 
>      I can still do this:
> 
>        $ git add -N repo
>        warning: adding embedded git repository: repo
>        $ git ls-files -s
>        160000 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0	repo
> 
>      which skips the hash check entirely. Which kind of makes sense,
>      because the resulting index entry does not have a real oid in it at
>      all (it gets the empty blob oid). But it does have a real 160000
>      mode.

Right.  I at first tried to add the check there and the I realized it
was actually storing the empty blob OID and thus the hash algorithm
would never mismatch.

>      Can we make things worse from there? If we try to update it, for
>      example, that will fail:
> 
>        $ git add -u
>        error: cannot add a submodule of a different hash algorithm
>        error: unable to index file 'repo'
>        fatal: updating files failed
> 
>      So...maybe this is OK?

I think it is, or at least it's the best we can do.  If you do a commit
with the empty blob OID as your submodule, you already have a corrupt
repository, so catching it later on when you do `git add` or `git
update-index` should be okay.

> Makes sense. Purists might complain about "git ls-files" on the left
> hand side of a pipe, but I think it is OK here. Though you can golf away
> a few subprocesses at the same time with:

I stole that line from elsewhere in t3700 and modified it.  I agree that
it is maybe not the ideal style for our codebase, but I did appreciate
the elegance of doing the operation in a single line.

> Is it worth checking the stderr of the failing "git add submodule" call?
> Adding a repo directly via "git add" is already something we generate a
> warning for, and it's possible we might eventually make it an error. In
> which case the command would fail without even hitting your new code,
> but we'd have no idea. Adding in a test_grep for "cannot add a submodule
> of a different hash algorithm" would at least make sure we're hitting
> the error we expect.

I can do a v2 with those changes.  I will say that I also added changes
for `git submodule add`, which would catch the problem you mentioned,
because I thought that was the code most likely to have a different
implementation at some point in the future and I wanted to catch that
and also anybody doing the sensible thing in terms of adding submodules.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

  parent reply	other threads:[~2025-11-13 23:15 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-12 12:58 git fails to checkout SHA1 submodule in SHA256 repo with --depth=1 Martin Wilck
2025-11-12 16:32 ` Junio C Hamano
2025-11-12 23:37   ` brian m. carlson
2025-11-13 10:15     ` Martin Wilck
2025-11-13 22:51       ` brian m. carlson
2025-11-13 22:57         ` Martin Wilck
2025-11-14 22:55       ` Marc Branchaud
2025-11-15 20:14         ` brian m. carlson
2025-11-12 23:54 ` [PATCH] object-file: disallow adding submodules of different hash algo brian m. carlson
2025-11-13  3:26   ` Jeff King
2025-11-13  3:56     ` Jeff King
2025-11-13 16:29       ` Junio C Hamano
2025-11-14 23:26         ` brian m. carlson
2025-11-15  1:53           ` Jeff King
2025-11-13 23:15     ` brian m. carlson [this message]
2025-11-15  0:58   ` [PATCH v2 1/2] " brian m. carlson
2025-11-15  0:58     ` [PATCH v2 2/2] read-cache: drop submodule check from add_to_cache() brian m. carlson
2025-11-15 19:57       ` Junio C Hamano
2025-11-15 20:06         ` brian m. carlson
2025-11-15 19:53     ` [PATCH v2 1/2] object-file: disallow adding submodules of different hash algo Junio C Hamano
2025-11-17  8:53     ` Martin Wilck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aRZmoAI_RjkRgB8l@fruit.crustytoothpaste.net \
    --to=sandals@crustytoothpaste.net \
    --cc=adrian@suse.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mwilck@suse.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).