From: Jeff King <peff@peff.net>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: git@vger.kernel.org
Subject: Re: [RFC PATCH 1/1] Define an extended tree format
Date: Wed, 1 Oct 2025 18:59:51 -0400 [thread overview]
Message-ID: <20251001225951.GA142037@peff.net> (raw)
In-Reply-To: <20251001211140.GA140550@peff.net>
On Wed, Oct 01, 2025 at 05:11:40PM -0400, Jeff King wrote:
> But there were two unexpected bits. The first is that the above doesn't
> just need a way to store the foreign-hash oid in a tree. It needs to be
> in the index, too.
>
> We could perhaps use the same hackery there, storing sha1-...-foo in the
> index with the sentinel hash. Or perhaps we could extend the index to
> support holding foreign hashes. I'm not sure which would be less
> annoying. If the index and trees don't match, then tree-to-index
> comparisons get weird. If the index and filesystem don't match, then
> index to filesystem comparisons get weird. My gut feeling is that making
> index-to-filesystem comparisons weird (so putting the hacked name into
> the index) is probably going to be better, just because the filesystem
> is already out of our control. So we can do sorted comparisons between
> the index and trees, but filesystem operations are inherently asking
> about paths one by one. But I'd have to dig deeper to get more confident
> in that opinion.
I know everybody was on the edge of their seats, so here's how far I got
(I will probably be offline for travel for a day or two after this).
This patch generates the hacked-up index entry when you do a "submodule
add" (or even a regular "git add"). I suspect it is missing some corner
cases (in particular, I doubt that it would correctly update with "git
add -u" because the actual filesystem path appears untracked from the
naive perspective of the index).
It is not too ugly, I think:
diff --git a/read-cache.c b/read-cache.c
index 06ad74db22..77072df2f7 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -708,6 +708,7 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
(intent_only ? ADD_CACHE_NEW_ONLY : 0));
unsigned hash_flags = pretend ? 0 : INDEX_WRITE_OBJECT;
struct object_id oid;
+ struct strbuf submodule_foreign_path = STRBUF_INIT;
if (flags & ADD_CACHE_RENORMALIZE)
hash_flags |= INDEX_RENORMALIZE;
@@ -719,6 +720,16 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
if (S_ISDIR(st_mode)) {
if (repo_resolve_gitlink_ref(the_repository, path, "HEAD", &oid) < 0)
return error(_("'%s' does not have a commit checked out"), path);
+ if (oid.algo != hash_algo_by_ptr(the_repository->hash_algo)) {
+ /* XXX maybe need to remove entry for "path"? */
+ strbuf_addf(&submodule_foreign_path, "%s-%s-%s",
+ hash_algos[oid.algo].name,
+ oid_to_hex(&oid),
+ path);
+ path = submodule_foreign_path.buf;
+ namelen = submodule_foreign_path.len;
+ }
+
while (namelen && path[namelen-1] == '/')
namelen--;
}
@@ -768,7 +779,20 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
}
}
if (!intent_only) {
- if (index_path(istate, &ce->oid, path, st, hash_flags)) {
+ /*
+ * We actually re-resolve submodules a second time in
+ * index_path(). That's perhaps something we should
+ * avoid in general (because it's racy), but it's
+ * particularly important for our submodule name
+ * hackery, since "path" is no longer something exists
+ * in the filesystem at all, and we want to override it with
+ * our fake sentinel OID anyway.
+ */
+ if (submodule_foreign_path.len) {
+ /* probably a constant would be less horrific */
+ oidclr(&ce->oid, the_repository->hash_algo);
+ ce->oid.hash[the_repository->hash_algo->rawsz - 1] = 1;
+ } else if (index_path(istate, &ce->oid, path, st, hash_flags)) {
discard_cache_entry(ce);
return error(_("unable to index file '%s'"), path);
}
But then I took a look at the reading side, and it gets pretty ugly.
Often we have just a submodule name, and we end up referencing it in the
index. But of course we don't know if we need to use the hacked-up name
to do the lookup or not, because it depends on the sentinel hash, but we
don't always have a hash yet! So you'd have to speculatively look up
both names in lots of places.
So I think I do lean more towards the index having the full, correct
information, and just doing this hack at the tree level (and munging as
we read/write the trees).
Or of course deciding it's all too horrible and abandoning it. All of
this was meant to be an exploration. I do think it makes a lot of
potential problems and hassles go away if we can intermix submodules of
different hash algorithms. I'm undecided on whether the solution is
worth than the problem, though.
-Peff
next prev parent reply other threads:[~2025-10-01 22:59 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-01 0:58 [RFC PATCH 0/1] Extended tree format for mixed submodules and conflicts brian m. carlson
2025-10-01 0:58 ` [RFC PATCH 1/1] Define an extended tree format brian m. carlson
2025-10-01 16:37 ` Junio C Hamano
2025-10-01 17:41 ` Jeff King
2025-10-01 21:11 ` Jeff King
2025-10-01 21:19 ` Junio C Hamano
2025-10-01 21:45 ` brian m. carlson
2025-10-01 23:00 ` Jeff King
2025-10-01 22:59 ` Jeff King [this message]
2025-10-01 21:21 ` Elijah Newren
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251001225951.GA142037@peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=sandals@crustytoothpaste.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).