From: Justin Tobler <jltobler@gmail.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Matt Smiley <msmiley@gitlab.com>
Subject: Re: [PATCH 2/3] commit: make `repo_parse_commit_no_graph()` more robust
Date: Wed, 18 Feb 2026 13:23:47 -0600 [thread overview]
Message-ID: <aZYJyT3QZ2lJrkL-@denethor> (raw)
In-Reply-To: <20260216-b4-pks-receive-pack-optimize-shallow-v1-2-e98886daff2b@pks.im>
On 26/02/16 04:38PM, Patrick Steinhardt wrote:
> In the next commit we will start to parse more commits via the
> commit-graph. This change will lead to a segfault though because we try
> to access the tree of a commit via `repo_get_commit_tree()`, but:
>
> - The commit has been parsed via the commit-graph, and thus its
> `maybe_tree` field is not yet populated.
>
> - We cannot use the commit-graph to populate the commit's tree because
> we're in the process of writing the commit-graph.
>
> The consequence is that we'll get a `NULL` pointer for the tree in
> `write_graph_chunk_data()`.
IIUC, when a commit has been parsed via the commit graph, if the commit
graph is closed, there is no longer a way for commit object tree to be
read. This results `repo_get_commit_tree()` always returning NULL in
such scenarios.
> In theory we are already mindful of this situation, as we explicitly use
> `repo_parse_commit_no_graph()` to parse the commit without the help of
> the commit-graph. But that doesn't do the trick as the commit is already
> marked as parsed, so the function will not re-populate it. And as the
> commit-graph has been closed, neither will `get_commit_tree_oid()` be
> able to load the tree for us.
And `repo_parse_commit_no_graph()` doesn't work because the commit has
already been marked as parsed.
> It seems like this issue can only be hit under artificial circumstances:
> the error was hit via `git_test_write_commit_graph_or_die()`, which is
> run by git-commit(1) and git-merge(1) in case `GIT_TEST_COMMIT_GRAPH=1`:
>
> $ GIT_TEST_COMMIT_GRAPH=1 meson test t7507-commit-verbose \
> --test-args=-ix -i
> ...
> ++ git -c commit.verbose=true commit --amend
> hint: Waiting for your editor to close the file...
> ./test-lib.sh: line 1012: 55895 Segmentation fault (core dumped) git -c commit.verbose=true commit --amend
>
> To the best of my knowledge, this is the only case where we end up
> writing a commit-graph in the same process that might have already
> consulted the commit-graph to look up arbitrary objects. But regardless
> of that, this feels like a bigger accident that is just waiting to
> happen.
So I assume we end up closing the commit-graph when writing a new one.
If we need to read the trees of commits parsed via commit-graph, this
will trigger a segfault since the commit tree will always be NULL.
> Make the code more robust by extending `repo_parse_commit_no_graph()` to
> unparse a commit first in case we detect it's coming from a graph. This
> ensures that we will re-read the object without it, and thus we will
> populate `maybe_tree` properly.
Hmm, I wonder if this is conceptually the correct place to address this
problem. Naively, I would expect `repo_get_commit_tree()` to always be
capable of providing the commit tree. I guess the problem though is that
this would require `repo_get_commit_tree()` to detect this scenario and
reparse the object itself. Maybe we could at least have
`repo_get_commit_tree()` BUG() in this scenario though?
> This fix shouldn't have any performance consequences: the function is
> only ever called in the "commit-graph.c" code, and we'll only re-parse
> the commit at most once.
>
> Add an exclusion to our Coccinelle rules so that it doesn't complain
> about us accessing `maybe_tree` directly.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> commit.h | 14 ++++++++++++--
> contrib/coccinelle/commit.cocci | 2 +-
> 2 files changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/commit.h b/commit.h
> index 1635de418b..f2f39e1a89 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -103,16 +103,26 @@ static inline int repo_parse_commit(struct repository *r, struct commit *item)
> return repo_parse_commit_gently(r, item, 0);
> }
>
> +void unparse_commit(struct repository *r, const struct object_id *oid);
> +
> static inline int repo_parse_commit_no_graph(struct repository *r,
> struct commit *commit)
> {
> + /*
> + * When the commit has been parsed but its tree wasn't populated then
> + * this is an indicator that it has been parsed via the commit-graph.
> + * We cannot read the tree via the commit-graph, as we're explicitly
> + * told not to use it. We thus have to first un-parse the object so
> + * that we can re-parse it without the graph.
> + */
> + if (commit->object.parsed && !commit->maybe_tree)
> + unparse_commit(r, &commit->object.oid);
This unparses commits that were read via commit-graph so they can be
reparsed normally. Looks good.
-Justin
next prev parent reply other threads:[~2026-02-18 19:23 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-16 15:38 [PATCH 0/3] git-receive-pack(1): optimize `assign_shallow_commits_to_refs()` Patrick Steinhardt
2026-02-16 15:38 ` [PATCH 1/3] commit: avoid parsing non-commits in `lookup_commit_reference_gently()` Patrick Steinhardt
2026-02-18 18:26 ` Justin Tobler
2026-02-19 17:35 ` Junio C Hamano
2026-02-16 15:38 ` [PATCH 2/3] commit: make `repo_parse_commit_no_graph()` more robust Patrick Steinhardt
2026-02-18 19:23 ` Justin Tobler [this message]
2026-02-19 7:18 ` Patrick Steinhardt
2026-02-16 15:38 ` [PATCH 3/3] commit: use commit graph in `lookup_commit_reference_gently()` Patrick Steinhardt
2026-02-18 19:34 ` Justin Tobler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aZYJyT3QZ2lJrkL-@denethor \
--to=jltobler@gmail.com \
--cc=git@vger.kernel.org \
--cc=msmiley@gitlab.com \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox