From: Derrick Stolee <stolee@gmail.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, "Taylor Blau" <me@ttaylorr.com>,
"Derrick Stolee" <dstolee@microsoft.com>,
"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: Re: [PATCH] upload-pack: disable commit graph more gently for shallow traversal
Date: Thu, 12 Sep 2019 15:27:58 -0400 [thread overview]
Message-ID: <875aea73-eebd-fd8a-4d4b-c3b9df027a04@gmail.com> (raw)
In-Reply-To: <20190912142306.GE23031@sigill.intra.peff.net>
On 9/12/2019 10:23 AM, Jeff King wrote:
> On Thu, Sep 12, 2019 at 08:23:49AM -0400, Derrick Stolee wrote:
>
>>> That creates an interesting problem for commits that have _already_ been
>>> parsed using the commit graph. Their commit->object.parsed flag is set,
>>> their commit->graph_pos is set, but their commit->maybe_tree may still
>>> be NULL. When somebody later calls repo_get_commit_tree(), we see that
>>> we haven't loaded the tree oid yet and try to get it from the commit
>>> graph. But since it has been freed, we segfault!
>>
>> OOPS! That is certainly a bad thing. I'm glad you found it, but I
>> am sorry for how you (probably) found it.
>
> Heh. I'll admit it was quite a slog of debugging, but _most_ of that was
> figuring out in which circumstance we'd have actually parsed the object.
> Finding the problematic end state was pretty easy from a coredump. :)
>
>>> diff --git a/commit-graph.c b/commit-graph.c
>>> index 9b02d2c426..bc5dd5913f 100644
>>> --- a/commit-graph.c
>>> +++ b/commit-graph.c
>>> @@ -41,6 +41,8 @@
>>> #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
>>> + GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
>>>
>>> +static int commit_graph_disabled;
>>
>> Should we be putting this inside the repository struct instead?
>
> Probably. The only caller will just pass the_repository, but it doesn't
> hurt to scope it down now.
>
> It could potentially go into the commit_graph itself, but it looks like
> with the incremental work we may have multiple such structs. It could
> also go into raw_object_store, but I think conceptually it's a
> repo-level thing.
>
> So I put it straight into "struct repository".
>
>> Your patch does not seem to actually cover the "I've already parsed some commits"
>> case, as you are only preventing the commit-graph from being prepared. Instead,
>> we need to have a short-circuit inside parse_commit() to avoid future parsing
>> from the commit-graph file.
>
> Maybe I was too clever, then. :)
>
> I didn't want to have to sprinkle "are we disabled" in parse_commit(),
> etc. But any such uses of the commit graph have to do:
>
> if (!prepare_commit_graph(r))
> return;
>
> to lazy-load it. So the logic to prepare becomes (roughly):
>
> if (disabled)
> return 0;
> if (already_loaded)
> return 1;
> return actually_load() ? 1 : 0;
>
> and "disabled" takes precedence.
>
> I've added this comment in prepare_commit_graph():
>
> /*
> * This must come before the "already attempted?" check below, because
> * we want to disable even an already-loaded graph file.
> */
> if (r->commit_graph_disabled)
> return 0;
>
> if (r->objects->commit_graph_attempted)
> return !!r->objects->commit_graph;
> r->objects->commit_graph_attempted = 1;
>
> Does that make more sense?
Ah. That does make sense. I now see the connection between parsing and this
change.
> Unrelated, but I also notice the top of prepare_commit_graph() has:
>
> if (git_env_bool(GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD, 0))
> die("dying as requested by the '%s' variable on commit-graph load!",
> GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD);
>
> as the very first thing. Meaning we're calling getenv() as part of every
> single parse_commit(), rather than just once per process. Seems like an
> easy efficiency win.
Absolutely. Move this to after the "have we attempted already?" condition.
Thanks,
-Stolee
next prev parent reply other threads:[~2019-09-12 19:28 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-12 0:04 [PATCH] upload-pack: disable commit graph more gently for shallow traversal Jeff King
2019-09-12 0:18 ` Jeff King
2019-09-12 1:11 ` [PATCH] list-objects: don't queue root trees unless revs->tree_objects is set Jeff King
2019-09-12 1:19 ` Jeff King
2019-09-12 12:31 ` Derrick Stolee
2019-09-12 16:52 ` Junio C Hamano
2019-09-12 22:34 ` Jeff King
2019-09-13 18:05 ` Junio C Hamano
2019-09-12 2:08 ` [PATCH] upload-pack: disable commit graph more gently for shallow traversal Taylor Blau
2019-09-12 14:03 ` Jeff King
2019-09-12 2:07 ` Taylor Blau
2019-09-12 11:06 ` SZEDER Gábor
2019-09-12 14:01 ` Jeff King
2019-09-12 12:46 ` Derrick Stolee
2019-09-12 13:59 ` Jeff King
2019-09-12 12:23 ` Derrick Stolee
2019-09-12 14:23 ` Jeff King
2019-09-12 19:27 ` Derrick Stolee [this message]
2019-09-12 14:41 ` [PATCH v2] upload-pack commit graph segfault fix Jeff King
2019-09-12 14:43 ` Jeff King
2019-09-12 14:44 ` [PATCH v2 1/2] commit-graph: bump DIE_ON_LOAD check to actual load-time Jeff King
2019-09-12 19:30 ` Derrick Stolee
2019-09-12 14:44 ` [PATCH v2 2/2] upload-pack: disable commit graph more gently for shallow traversal Jeff King
2019-09-13 13:26 ` Derrick Stolee
2019-09-12 16:56 ` [PATCH v2] upload-pack commit graph segfault fix Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=875aea73-eebd-fd8a-4d4b-c3b9df027a04@gmail.com \
--to=stolee@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=me@ttaylorr.com \
--cc=pclouds@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.