git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Scott Leggett <scott@sl.id.au>
Subject: Re: [PATCH] commit-graph: retain commit slab when closing NULL commit_graph
Date: Fri, 5 Jan 2024 12:07:44 -0500	[thread overview]
Message-ID: <ZZg3YIEDCjbbGyVX@nand.local> (raw)
In-Reply-To: <20240105054142.GA2035092@coredump.intra.peff.net>

On Fri, Jan 05, 2024 at 12:41:42AM -0500, Jeff King wrote:
> This fixes a regression introduced in ac6d45d11f (commit-graph: move
> slab-clearing to close_commit_graph(), 2023-10-03), in which running:
>
>   git -c fetch.writeCommitGraph=true fetch --recurse-submodules
>
> multiple times in a freshly cloned repository causes a segfault. What
> happens in the second (and subsequent) runs is this:
>
>   1. We make a "struct commit" for any ref tips which we're storing
>      (even if we already have them, they still go into FETCH_HEAD).
>
>      Because the first run will have created a commit graph, we'll find
>      those commits in the graph.
>
>      The commit struct is therefore created with a NULL "maybe_tree"
>      entry, because we can load its oid from the graph later. But to do
>      that we need to remember that we got the commit from the graph,
>      which is recorded in a global commit_graph_data_slab object.
>
>   2. Because we're using --recurse-submodules, we'll try to fetch each
>      of the possible submodules. That implies creating a separate
>      "struct repository" in-process for each submodule, which will
>      require a later call to repo_clear().
>
>      The call to repo_clear() calls raw_object_store_clear(), which in
>      turn calls close_object_store(), which in turn calls
>      close_commit_graph(). And the latter frees the commit graph data
>      slab.
>
>   3. Later, when trying to write out a new commit graph, we'll ask for
>      their tree oid via get_commit_tree_oid(), which will see that the
>      object is parsed but with a NULL maybe_tree field. We'd then
>      usually pull it from the graph file, but because the slab was
>      cleared, we don't realize that we can do so! We end up returning
>      NULL and segfaulting.
>
>      (It seems questionable that we'd write a graph entry for such a
>      commit anyway, since we know we already have one. I didn't
>      double-check, but that may simply be another side effect of having
>      cleared the slab).

Yeah, I agree that we should not be re-writing a commit-graph entry that
already exists in an earlier (non-removed) layer of the commit-graph.
I haven't thought through all of the details here, but that sounds like
a potentially dangerous thing to be doing.

> So that fixes the regression in the least-risky way possible.

Thanks for the detailed explanation. I think that the fix presented here
is a reasonable approach, and is worthwhile to pick up.

> I do think there's some fragility here that we might want to follow up
> on. We have a global commit_graph_data_slab that contains graph
> positions, and our global commit structs depend on the that slab
> remaining valid. But close_commit_graph() is just about closing _one_
> object store's graph. So it's dangerous to call that function and clear
> the slab without also throwing away any "struct commit" we might have
> parsed that depends on it.

I definitely agree that there seems like a high risk of another
potentially-dangerous bug happening as a result of this area.

One thing that sticks out to me is that we have a scope mismatch
between the commit structs, the raw_object_store, and the commit slabs.
Slabs are tied to the lifetime of the raw_object_store, but the commit
objects are tied to the global lifetime of the process. I wonder if it
would make sense to have a slab per object-store, and require that you
pass both the commit *and* the object-store you're looking it up in as
arguments to any slab-related functions.

I dunno. I have not put a ton of thought into that ^^ approach either,
so let me know if it seems obviously bogus to you or if this is worth
looking into further.

Thanks,
Taylor

  reply	other threads:[~2024-01-05 17:07 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-05  5:41 [PATCH] commit-graph: retain commit slab when closing NULL commit_graph Jeff King
2024-01-05 17:07 ` Taylor Blau [this message]
2024-01-10 11:39   ` Jeff King
2024-01-10 16:38     ` Junio C Hamano
2024-01-11  7:53       ` Jeff King
2024-01-11 18:35         ` Junio C Hamano
2024-01-05 19:56 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZZg3YIEDCjbbGyVX@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=scott@sl.id.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).