From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 26245EAD3 for ; Thu, 11 Jan 2024 07:53:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=peff.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=peff.net Received: (qmail 29526 invoked by uid 109); 11 Jan 2024 07:53:14 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 11 Jan 2024 07:53:14 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 3680 invoked by uid 111); 11 Jan 2024 07:53:16 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 11 Jan 2024 02:53:16 -0500 Authentication-Results: peff.net; auth=none Date: Thu, 11 Jan 2024 02:53:13 -0500 From: Jeff King To: Junio C Hamano Cc: Taylor Blau , git@vger.kernel.org, Scott Leggett Subject: Re: [PATCH] commit-graph: retain commit slab when closing NULL commit_graph Message-ID: <20240111075313.GF48154@coredump.intra.peff.net> References: <20240105054142.GA2035092@coredump.intra.peff.net> <20240110113914.GE16674@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: On Wed, Jan 10, 2024 at 08:38:18AM -0800, Junio C Hamano wrote: > > It should be easy-ish to iterate through the slab and look at the > > commits that are mentioned in it. Though maybe not? Each commit knows > > its slab-id, but I'm not sure if we have a master list of commits to go > > the other way. > > We have table of all in-core objects, don't we? Oh, duh. Yes, we could iterate over obj_hash. I do think the "on demand" version I showed later in the message is better, though, as the work both scales with the number of affected commits (rather than the total number of objects) and can be done lazily (so callers that are not buggy pay no price at all). > > + * This will throw away the parents list, which is potentially sketchy. > > + * A better version of this would just unset commit->object.parsed > > + * and then do a minimal version of parse_commit() that _just_ loads > > + * the tree data (and/or graph position if available). > > Yeah, it is a concern that we may be working with an in-core commit > object whose parent list has already been rewritten during revision > traversal. Well thought out. Hmm. Looking at my "a better version" sentence, I guess we know that either: 1. The object really is corrupt (i.e., we could not load the tree). 2. It came from a graph in the first place. So what if we just tried harder to look it up in the graph file (rather than the slab) when we see COMMIT_NOT_FROM_GRAPH? And indeed, we even have a function to do this already! So something like this almost works: diff --git a/commit.c b/commit.c index b3223478bc..096a3d18d0 100644 --- a/commit.c +++ b/commit.c @@ -418,10 +418,12 @@ static inline void set_commit_tree(struct commit *c, struct tree *t) struct tree *repo_get_commit_tree(struct repository *r, const struct commit *commit) { + uint32_t pos; + if (commit->maybe_tree || !commit->object.parsed) return commit->maybe_tree; - if (commit_graph_position(commit) != COMMIT_NOT_FROM_GRAPH) + if (repo_find_commit_pos_in_graph(r, commit, &pos)) return get_commit_tree_in_graph(r, commit); return NULL; but: 1. It doesn't update the slab, so get_commit_tree_in_graph() then complains immediately, because it uses only commit_graph_position(). I guess we'd need to do: commit_graph_data_at(commit)->graph_pos = pos; somewhere. That does make the recently-found segfault go away. But... 2. I'm not sure if other spots would want similar treatment. We store generation numbers in the slab, too. I think we'll figure things out when they're not available, so there's no segfault problem. But it's maybe a potential performance issue. Likewise, it is probably a bug that the graph-writing code is looking at this commit at all (since we know it is already in a graph file). So we might be papering over that bug (that is, even though the segfault is gone, we are perhaps still writing a duplicate graph entry). > > I dunno. I do feel like this is a lurking maintenance headache, and > > might even be a triggerable bug. But without knowing of a way that it > > happens in the current code base, it feels like it would be easy to make > > things worse rather than better. > > Unfortunately I share the feeling X-<. I somehow sniped myself into thinking about it more, but that has only reinforced my feeling that I'm afraid to touch it. ;) -Peff