From: Jeff King <peff@peff.net>
To: Rasmus Villemoes <ravi@prevas.dk>
Cc: git@vger.kernel.org, emkan@prevas.dk
Subject: Re: git clone with --dissociate sometimes fails to check out target commit
Date: Mon, 4 May 2026 05:51:10 -0400 [thread overview]
Message-ID: <20260504095110.GA599780@coredump.intra.peff.net> (raw)
In-Reply-To: <87h5onsi0f.fsf@prevas.dk>
On Mon, May 04, 2026 at 10:20:32AM +0200, Rasmus Villemoes wrote:
> Are we using --dissociate wrongly, or are we perhaps not maintaining
> those local mirror repos properly? They are essentially just created
> with 'git clone --mirror', with 'git remote update' run periodically.
>
> Naively, I'd expect the effects of --dissociate to only happen after
> everything else the clone command does has been done, but it seems that
> the ties to the reference repo are cut too soon.
No, you're using it correctly. The dissociate step should copy all of
the shared objects into the new repo, so it shouldn't matter whether we
do it before or after checkout. The objects are there either way.
But there's an interesting bug here with commit graphs. What happens is
this:
1. During the initial part of the clone, we may load a commit object
using the commit-graph file from the --reference repo. The commit
struct is left with a blank "maybe_tree" field, because we know we
can load it from the commit graph later (and don't want to spend
the effort to make a "struct tree" unless somebody asks for it).
2. During the dissociate step, we call odb_close(), since we're
throwing away the link to the reference repo, and we don't want to
use our in-process structs that point to it. That step also throws
away our open reference to the commit-graph file, and the
in-process slab that holds the graph positions we've loaded.
3. The checkout process needs the tree, so it calls
repo_get_commit_tree(). That sees that maybe_commit is NULL, so we
check whether it might be loaded from the graph file. But when we
ask about the graph position, we don't have one! It was in the slab
we threw away. So we return NULL, and the caller thinks the commit
is corrupt.
This is a bug that we theorized existed in a thread a while ago:
https://lore.kernel.org/git/20240110113914.GE16674@coredump.intra.peff.net/
but we didn't have a way to trigger it. Now we do. Hooray, I guess? ;)
The fallback load suggested in that message fixes it (modulo the fact
that it forgot to return commit->maybe_tree at the end of the function).
Below is a slightly safer version of the same concept that likewise
fixes the problem.
It's kind of ugly, but I think may be the least-bad solution. See that
earlier thread for more discussion of alternatives.
In the meantime, doing your dissociate clone with:
git -c core.commitGraph=false clone ...
should work around the problem.
---
diff --git a/commit.c b/commit.c
index 80d8d07875..50d736b339 100644
--- a/commit.c
+++ b/commit.c
@@ -434,16 +434,46 @@ static inline void set_commit_tree(struct commit *c, struct tree *t)
c->maybe_tree = t;
}
+static void load_tree_from_commit_contents(struct repository *r, struct commit *commit)
+{
+ enum object_type type;
+ unsigned long size;
+ char *buf;
+ const char *p;
+ struct object_id tree_oid;
+
+ buf = odb_read_object(r->objects, &commit->object.oid, &type, &size);
+ if (!buf)
+ return;
+
+ if (type == OBJ_COMMIT &&
+ skip_prefix(buf, "tree ", &p) &&
+ !parse_oid_hex(p, &tree_oid, &p) &&
+ *p == '\n')
+ commit->maybe_tree = lookup_tree(r, &tree_oid);
+
+ free(buf);
+}
+
struct tree *repo_get_commit_tree(struct repository *r,
- const struct commit *commit)
+ struct commit *commit)
{
if (commit->maybe_tree || !commit->object.parsed)
return commit->maybe_tree;
if (commit_graph_position(commit) != COMMIT_NOT_FROM_GRAPH)
return get_commit_tree_in_graph(r, commit);
- return NULL;
+ /*
+ * This is either a corrupt commit, or one which we partially loaded
+ * from a graph file but then subsequently threw away the graph data.
+ *
+ * Optimistically assume it's the latter and try to reload from
+ * scratch. This gives a performance penalty if it really is a corrupt
+ * commit, but presumably that happens rarely.
+ */
+ load_tree_from_commit_contents(r, commit);
+ return commit->maybe_tree;
}
struct object_id *get_commit_tree_oid(const struct commit *commit)
diff --git a/commit.h b/commit.h
index 58150045af..5eb1264077 100644
--- a/commit.h
+++ b/commit.h
@@ -163,7 +163,7 @@ void repo_unuse_commit_buffer(struct repository *r,
*/
void free_commit_buffer(struct parsed_object_pool *pool, struct commit *);
-struct tree *repo_get_commit_tree(struct repository *, const struct commit *);
+struct tree *repo_get_commit_tree(struct repository *, struct commit *);
struct object_id *get_commit_tree_oid(const struct commit *);
/*
next prev parent reply other threads:[~2026-05-04 9:57 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-04 8:20 git clone with --dissociate sometimes fails to check out target commit Rasmus Villemoes
2026-05-04 9:51 ` Jeff King [this message]
2026-05-04 9:54 ` Jeff King
2026-05-04 11:36 ` Rasmus Villemoes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260504095110.GA599780@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=emkan@prevas.dk \
--cc=git@vger.kernel.org \
--cc=ravi@prevas.dk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox