public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Justin Tobler <jltobler@gmail.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Matt Smiley <msmiley@gitlab.com>
Subject: Re: [PATCH 2/3] commit: make `repo_parse_commit_no_graph()` more robust
Date: Wed, 18 Feb 2026 13:23:47 -0600	[thread overview]
Message-ID: <aZYJyT3QZ2lJrkL-@denethor> (raw)
In-Reply-To: <20260216-b4-pks-receive-pack-optimize-shallow-v1-2-e98886daff2b@pks.im>

On 26/02/16 04:38PM, Patrick Steinhardt wrote:
> In the next commit we will start to parse more commits via the
> commit-graph. This change will lead to a segfault though because we try
> to access the tree of a commit via `repo_get_commit_tree()`, but:
> 
>   - The commit has been parsed via the commit-graph, and thus its
>     `maybe_tree` field is not yet populated.
> 
>   - We cannot use the commit-graph to populate the commit's tree because
>     we're in the process of writing the commit-graph.
> 
> The consequence is that we'll get a `NULL` pointer for the tree in
> `write_graph_chunk_data()`.

IIUC, when a commit has been parsed via the commit graph, if the commit
graph is closed, there is no longer a way for commit object tree to be
read. This results `repo_get_commit_tree()` always returning NULL in
such scenarios.

> In theory we are already mindful of this situation, as we explicitly use
> `repo_parse_commit_no_graph()` to parse the commit without the help of
> the commit-graph. But that doesn't do the trick as the commit is already
> marked as parsed, so the function will not re-populate it. And as the
> commit-graph has been closed, neither will `get_commit_tree_oid()` be
> able to load the tree for us.

And `repo_parse_commit_no_graph()` doesn't work because the commit has
already been marked as parsed.

> It seems like this issue can only be hit under artificial circumstances:
> the error was hit via `git_test_write_commit_graph_or_die()`, which is
> run by git-commit(1) and git-merge(1) in case `GIT_TEST_COMMIT_GRAPH=1`:
> 
>   $ GIT_TEST_COMMIT_GRAPH=1 meson test t7507-commit-verbose \
>       --test-args=-ix -i
>   ...
>   ++ git -c commit.verbose=true commit --amend
>   hint: Waiting for your editor to close the file...
>   ./test-lib.sh: line 1012: 55895 Segmentation fault         (core dumped) git -c commit.verbose=true commit --amend
> 
> To the best of my knowledge, this is the only case where we end up
> writing a commit-graph in the same process that might have already
> consulted the commit-graph to look up arbitrary objects. But regardless
> of that, this feels like a bigger accident that is just waiting to
> happen.

So I assume we end up closing the commit-graph when writing a new one.
If we need to read the trees of commits parsed via commit-graph, this
will trigger a segfault since the commit tree will always be NULL.

> Make the code more robust by extending `repo_parse_commit_no_graph()` to
> unparse a commit first in case we detect it's coming from a graph. This
> ensures that we will re-read the object without it, and thus we will
> populate `maybe_tree` properly.

Hmm, I wonder if this is conceptually the correct place to address this
problem. Naively, I would expect `repo_get_commit_tree()` to always be
capable of providing the commit tree. I guess the problem though is that
this would require `repo_get_commit_tree()` to detect this scenario and
reparse the object itself. Maybe we could at least have
`repo_get_commit_tree()` BUG() in this scenario though?

> This fix shouldn't have any performance consequences: the function is
> only ever called in the "commit-graph.c" code, and we'll only re-parse
> the commit at most once.
> 
> Add an exclusion to our Coccinelle rules so that it doesn't complain
> about us accessing `maybe_tree` directly.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  commit.h                        | 14 ++++++++++++--
>  contrib/coccinelle/commit.cocci |  2 +-
>  2 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/commit.h b/commit.h
> index 1635de418b..f2f39e1a89 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -103,16 +103,26 @@ static inline int repo_parse_commit(struct repository *r, struct commit *item)
>  	return repo_parse_commit_gently(r, item, 0);
>  }
>  
> +void unparse_commit(struct repository *r, const struct object_id *oid);
> +
>  static inline int repo_parse_commit_no_graph(struct repository *r,
>  					     struct commit *commit)
>  {
> +	/*
> +	 * When the commit has been parsed but its tree wasn't populated then
> +	 * this is an indicator that it has been parsed via the commit-graph.
> +	 * We cannot read the tree via the commit-graph, as we're explicitly
> +	 * told not to use it. We thus have to first un-parse the object so
> +	 * that we can re-parse it without the graph.
> +	 */
> +	if (commit->object.parsed && !commit->maybe_tree)
> +		unparse_commit(r, &commit->object.oid);

This unparses commits that were read via commit-graph so they can be
reparsed normally. Looks good.

-Justin

  reply	other threads:[~2026-02-18 19:23 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-16 15:38 [PATCH 0/3] git-receive-pack(1): optimize `assign_shallow_commits_to_refs()` Patrick Steinhardt
2026-02-16 15:38 ` [PATCH 1/3] commit: avoid parsing non-commits in `lookup_commit_reference_gently()` Patrick Steinhardt
2026-02-18 18:26   ` Justin Tobler
2026-02-19 17:35     ` Junio C Hamano
2026-02-16 15:38 ` [PATCH 2/3] commit: make `repo_parse_commit_no_graph()` more robust Patrick Steinhardt
2026-02-18 19:23   ` Justin Tobler [this message]
2026-02-19  7:18     ` Patrick Steinhardt
2026-02-16 15:38 ` [PATCH 3/3] commit: use commit graph in `lookup_commit_reference_gently()` Patrick Steinhardt
2026-02-18 19:34   ` Justin Tobler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZYJyT3QZ2lJrkL-@denethor \
    --to=jltobler@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=msmiley@gitlab.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox