From: Justin Tobler <jltobler@gmail.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Matt Smiley <msmiley@gitlab.com>
Subject: Re: [PATCH 2/3] commit: make `repo_parse_commit_no_graph()` more robust
Date: Wed, 18 Feb 2026 13:23:47 -0600 [thread overview]
Message-ID: <aZYJyT3QZ2lJrkL-@denethor> (raw)
In-Reply-To: <20260216-b4-pks-receive-pack-optimize-shallow-v1-2-e98886daff2b@pks.im>
On 26/02/16 04:38PM, Patrick Steinhardt wrote:
> In the next commit we will start to parse more commits via the
> commit-graph. This change will lead to a segfault though because we try
> to access the tree of a commit via `repo_get_commit_tree()`, but:
>
> - The commit has been parsed via the commit-graph, and thus its
> `maybe_tree` field is not yet populated.
>
> - We cannot use the commit-graph to populate the commit's tree because
> we're in the process of writing the commit-graph.
>
> The consequence is that we'll get a `NULL` pointer for the tree in
> `write_graph_chunk_data()`.
IIUC, when a commit has been parsed via the commit graph, if the commit
graph is closed, there is no longer a way for commit object tree to be
read. This results `repo_get_commit_tree()` always returning NULL in
such scenarios.
> In theory we are already mindful of this situation, as we explicitly use
> `repo_parse_commit_no_graph()` to parse the commit without the help of
> the commit-graph. But that doesn't do the trick as the commit is already
> marked as parsed, so the function will not re-populate it. And as the
> commit-graph has been closed, neither will `get_commit_tree_oid()` be
> able to load the tree for us.
And `repo_parse_commit_no_graph()` doesn't work because the commit has
already been marked as parsed.
> It seems like this issue can only be hit under artificial circumstances:
> the error was hit via `git_test_write_commit_graph_or_die()`, which is
> run by git-commit(1) and git-merge(1) in case `GIT_TEST_COMMIT_GRAPH=1`:
>
> $ GIT_TEST_COMMIT_GRAPH=1 meson test t7507-commit-verbose \
> --test-args=-ix -i
> ...
> ++ git -c commit.verbose=true commit --amend
> hint: Waiting for your editor to close the file...
> ./test-lib.sh: line 1012: 55895 Segmentation fault (core dumped) git -c commit.verbose=true commit --amend
>
> To the best of my knowledge, this is the only case where we end up
> writing a commit-graph in the same process that might have already
> consulted the commit-graph to look up arbitrary objects. But regardless
> of that, this feels like a bigger accident that is just waiting to
> happen.
So I assume we end up closing the commit-graph when writing a new one.
If we need to read the trees of commits parsed via commit-graph, this
will trigger a segfault since the commit tree will always be NULL.
> Make the code more robust by extending `repo_parse_commit_no_graph()` to
> unparse a commit first in case we detect it's coming from a graph. This
> ensures that we will re-read the object without it, and thus we will
> populate `maybe_tree` properly.
Hmm, I wonder if this is conceptually the correct place to address this
problem. Naively, I would expect `repo_get_commit_tree()` to always be
capable of providing the commit tree. I guess the problem though is that
this would require `repo_get_commit_tree()` to detect this scenario and
reparse the object itself. Maybe we could at least have
`repo_get_commit_tree()` BUG() in this scenario though?
> This fix shouldn't have any performance consequences: the function is
> only ever called in the "commit-graph.c" code, and we'll only re-parse
> the commit at most once.
>
> Add an exclusion to our Coccinelle rules so that it doesn't complain
> about us accessing `maybe_tree` directly.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> commit.h | 14 ++++++++++++--
> contrib/coccinelle/commit.cocci | 2 +-
> 2 files changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/commit.h b/commit.h
> index 1635de418b..f2f39e1a89 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -103,16 +103,26 @@ static inline int repo_parse_commit(struct repository *r, struct commit *item)
> return repo_parse_commit_gently(r, item, 0);
> }
>
> +void unparse_commit(struct repository *r, const struct object_id *oid);
> +
> static inline int repo_parse_commit_no_graph(struct repository *r,
> struct commit *commit)
> {
> + /*
> + * When the commit has been parsed but its tree wasn't populated then
> + * this is an indicator that it has been parsed via the commit-graph.
> + * We cannot read the tree via the commit-graph, as we're explicitly
> + * told not to use it. We thus have to first un-parse the object so
> + * that we can re-parse it without the graph.
> + */
> + if (commit->object.parsed && !commit->maybe_tree)
> + unparse_commit(r, &commit->object.oid);
This unparses commits that were read via commit-graph so they can be
reparsed normally. Looks good.
-Justin
next prev parent reply other threads:[~2026-02-18 19:23 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-16 15:38 [PATCH 0/3] git-receive-pack(1): optimize `assign_shallow_commits_to_refs()` Patrick Steinhardt
2026-02-16 15:38 ` [PATCH 1/3] commit: avoid parsing non-commits in `lookup_commit_reference_gently()` Patrick Steinhardt
2026-02-18 18:26 ` Justin Tobler
2026-02-19 17:35 ` Junio C Hamano
2026-02-16 15:38 ` [PATCH 2/3] commit: make `repo_parse_commit_no_graph()` more robust Patrick Steinhardt
2026-02-18 19:23 ` Justin Tobler [this message]
2026-02-19 7:18 ` Patrick Steinhardt
2026-02-16 15:38 ` [PATCH 3/3] commit: use commit graph in `lookup_commit_reference_gently()` Patrick Steinhardt
2026-02-18 19:34 ` Justin Tobler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aZYJyT3QZ2lJrkL-@denethor \
--to=jltobler@gmail.com \
--cc=git@vger.kernel.org \
--cc=msmiley@gitlab.com \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.