From: Junio C Hamano <gitster@pobox.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Karthik Nayak <karthik.188@gmail.com>,
Taylor Blau <me@ttaylorr.com>, Jeff King <peff@peff.net>
Subject: Re: [PATCH v2 2/2] commit: detect commits that exist in commit-graph but not in the ODB
Date: Tue, 24 Oct 2023 12:10:13 -0700 [thread overview]
Message-ID: <xmqqy1fr3kh6.fsf@gitster.g> (raw)
In-Reply-To: <0476d4855562b677ced106a4cc7788b46434cf21.1698060036.git.ps@pks.im> (Patrick Steinhardt's message of "Mon, 23 Oct 2023 13:27:20 +0200")
Patrick Steinhardt <ps@pks.im> writes:
> Commit graphs can become stale and contain references to commits that do
> not exist in the object database anymore. Theoretically, this can lead
> to a scenario where we are able to successfully look up any such commit
> via the commit graph even though such a lookup would fail if done via
> the object database directly.
>
> As the commit graph is mostly intended as a sort of cache to speed up
> parsing of commits we do not want to have diverging behaviour in a
> repository with and a repository without commit graphs, no matter
> whether they are stale or not. As commits are otherwise immutable, the
> only thing that we really need to care about is thus the presence or
> absence of a commit.
>
> To address potentially stale commit data that may exist in the graph,
> our `lookup_commit_in_graph()` function will check for the commit's
> existence in both the commit graph, but also in the object database. So
> even if we were able to look up the commit's data in the graph, we would
> still pretend as if the commit didn't exist if it is missing in the
> object database.
>
> We don't have the same safety net in `parse_commit_in_graph_one()`
> though. This function is mostly used internally in "commit-graph.c"
> itself to validate the commit graph, and this usage is fine. We do
> expose its functionality via `parse_commit_in_graph()` though, which
> gets called by `repo_parse_commit_internal()`, and that function is in
> turn used in many places in our codebase.
>
> For all I can see this function is never used to directly turn an object
> ID into a commit object without additional safety checks before or after
> this lookup. What it is being used for though is to walk history via the
> parent chain of commits. So when commits in the parent chain of a graph
> walk are missing it is possible that we wouldn't notice if that missing
> commit was part of the commit graph. Thus, a query like `git rev-parse
> HEAD~2` can succeed even if the intermittent commit is missing.
>
> It's unclear whether there are additional ways in which such stale
> commit graphs can lead to problems. In any case, it feels like this is a
> bigger bug waiting to happen when we gain additional direct or indirect
> callers of `repo_parse_commit_internal()`. So let's fix the inconsistent
> behaviour by checking for object existence via the object database, as
> well.
>
> This check of course comes with a performance penalty. The following
> benchmarks have been executed in a clone of linux.git with stable tags
> added:
>
> Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master)
> Time (mean ± σ): 2.913 s ± 0.018 s [User: 2.363 s, System: 0.548 s]
> Range (min … max): 2.894 s … 2.950 s 10 runs
>
> Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
> Time (mean ± σ): 3.834 s ± 0.052 s [User: 3.276 s, System: 0.556 s]
> Range (min … max): 3.780 s … 3.961 s 10 runs
>
> Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master)
> Time (mean ± σ): 13.841 s ± 0.084 s [User: 13.152 s, System: 0.687 s]
> Range (min … max): 13.714 s … 13.995 s 10 runs
>
> Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
> Time (mean ± σ): 13.762 s ± 0.116 s [User: 13.094 s, System: 0.667 s]
> Range (min … max): 13.645 s … 14.038 s 10 runs
>
> Summary
> git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran
> 1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
> 4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
> 4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master)
>
> We look at a ~30% regression in general, but in general we're still a
> whole lot faster than without the commit graph. To counteract this, the
> new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar.
Very nicely described. Will queue. I'll go offline for the rest of
the week but if there are no significant issues discovered by the
time I come back, let's declare a victory and merge these two
patches down to 'next'.
Thanks.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> commit.c | 16 +++++++++++++++-
> t/t5318-commit-graph.sh | 27 +++++++++++++++++++++++++++
> 2 files changed, 42 insertions(+), 1 deletion(-)
>
> diff --git a/commit.c b/commit.c
> index b3223478bc..7399e90212 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -28,6 +28,7 @@
> #include "shallow.h"
> #include "tree.h"
> #include "hook.h"
> +#include "parse.h"
>
> static struct commit_extra_header *read_commit_extra_header_lines(const char *buf, size_t len, const char **);
>
> @@ -572,8 +573,21 @@ int repo_parse_commit_internal(struct repository *r,
> return -1;
> if (item->object.parsed)
> return 0;
> - if (use_commit_graph && parse_commit_in_graph(r, item))
> + if (use_commit_graph && parse_commit_in_graph(r, item)) {
> + static int object_paranoia = -1;
> +
> + if (object_paranoia == -1)
> + object_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
> +
> + if (object_paranoia && !has_object(r, &item->object.oid, 0)) {
> + unparse_commit(r, &item->object.oid);
> + return quiet_on_missing ? -1 :
> + error(_("commit %s exists in commit-graph but not in the object database"),
> + oid_to_hex(&item->object.oid));
> + }
> +
> return 0;
> + }
>
> if (oid_object_info_extended(r, &item->object.oid, &oi, flags) < 0)
> return quiet_on_missing ? -1 :
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index c0cc454538..79467d7926 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -842,4 +842,31 @@ test_expect_success 'stale commit cannot be parsed when given directly' '
> )
> '
>
> +test_expect_success 'stale commit cannot be parsed when traversing graph' '
> + test_when_finished "rm -rf repo" &&
> + git init repo &&
> + (
> + cd repo &&
> +
> + test_commit A &&
> + test_commit B &&
> + test_commit C &&
> + git commit-graph write --reachable &&
> +
> + # Corrupt the repository by deleting the intermittent commit
> + # object. Commands should notice that this object is absent and
> + # thus that the repository is corrupt even if the commit graph
> + # exists.
> + oid=$(git rev-parse B) &&
> + rm .git/objects/"$(test_oid_to_path "$oid")" &&
> +
> + # Again, we should be able to parse the commit when not
> + # being paranoid about commit graph staleness...
> + GIT_COMMIT_GRAPH_PARANOIA=false git rev-parse HEAD~2 &&
> + # ... but fail when we are paranoid.
> + test_must_fail git rev-parse HEAD~2 2>error &&
> + grep "error: commit $oid exists in commit-graph but not in the object database" error
> + )
> +'
> +
> test_done
next prev parent reply other threads:[~2023-10-24 19:10 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-23 11:27 [PATCH v2 0/2] commit-graph: detect commits missing in ODB Patrick Steinhardt
2023-10-23 11:27 ` [PATCH v2 1/2] commit-graph: introduce envvar to disable commit existence checks Patrick Steinhardt
2023-10-30 21:29 ` Taylor Blau
2023-10-31 6:19 ` Patrick Steinhardt
2023-10-23 11:27 ` [PATCH v2 2/2] commit: detect commits that exist in commit-graph but not in the ODB Patrick Steinhardt
2023-10-24 19:10 ` Junio C Hamano [this message]
2023-10-30 21:32 ` Taylor Blau
2023-10-31 0:21 ` Junio C Hamano
2023-10-30 21:31 ` Taylor Blau
2023-10-31 7:16 ` [PATCH v3 0/2] commit-graph: detect commits missing in ODB Patrick Steinhardt
2023-10-31 7:16 ` [PATCH v3 1/2] commit-graph: introduce envvar to disable commit existence checks Patrick Steinhardt
2023-10-31 7:16 ` [PATCH v3 2/2] commit: detect commits that exist in commit-graph but not in the ODB Patrick Steinhardt
2023-10-31 19:16 ` [PATCH v3 0/2] commit-graph: detect commits missing in ODB Taylor Blau
2023-10-31 23:57 ` Junio C Hamano
2023-11-01 2:06 ` Junio C Hamano
2023-11-01 4:55 ` Patrick Steinhardt
2023-11-01 5:01 ` Junio C Hamano
2023-10-31 23:55 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqy1fr3kh6.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=karthik.188@gmail.com \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).