From: Taylor Blau <me@ttaylorr.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Karthik Nayak <karthik.188@gmail.com>,
Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>
Subject: Re: [PATCH v2 1/2] commit-graph: introduce envvar to disable commit existence checks
Date: Mon, 30 Oct 2023 17:29:33 -0400 [thread overview]
Message-ID: <ZUAgPVR8HxgEZEWo@nand.local> (raw)
In-Reply-To: <a89c4355285bc0bb0ec339818e6fe907f9ffd30e.1698060036.git.ps@pks.im>
On Mon, Oct 23, 2023 at 01:27:16PM +0200, Patrick Steinhardt wrote:
> Our `lookup_commit_in_graph()` helper tries to look up commits from the
> commit graph and, if it doesn't exist there, falls back to parsing it
> from the object database instead. This is intended to speed up the
> lookup of any such commit that exists in the database. There is an edge
> case though where the commit exists in the graph, but not in the object
> database. To avoid returning such stale commits the helper function thus
> double checks that any such commit parsed from the graph also exists in
> the object database. This makes the function safe to use even when
> commit graphs aren't updated regularly.
>
> We're about to introduce the same pattern into other parts of our code
> base though, namely `repo_parse_commit_internal()`. Here the extra
> sanity check is a bit of a tougher sell: `lookup_commit_in_graph()` was
> a newly introduced helper, and as such there was no performance hit by
> adding this sanity check. If we added `repo_parse_commit_internal()`
> with that sanity check right from the beginning as well, this would
> probably never have been an issue to begin with. But by retrofitting it
> with this sanity check now we do add a performance regression to
> preexisting code, and thus there is a desire to avoid this or at least
> give an escape hatch.
>
> In practice, there is no inherent reason why either of those functions
> should have the sanity check whereas the other one does not: either both
> of them are able to detect this issue or none of them should be. This
> also means that the default of whether we do the check should likely be
> the same for both. To err on the side of caution, we thus rather want to
> make `repo_parse_commit_internal()` stricter than to loosen the checks
> that we already have in `lookup_commit_in_graph()`.
All well reasoned. I think the most compelling reason is that we're
already doing this extra check in lookup_commit_in_graph(), and having
that be somewhat inconsistent with repo_parse_commit_internal() feels
error-prone to me.
> The escape hatch is added in the form of a new GIT_COMMIT_GRAPH_PARANOIA
> environment variable that mirrors GIT_REF_PARANOIA. If enabled, which is
> the default, we will double check that commits looked up in the commit
> graph via `lookup_commit_in_graph()` also exist in the object database.
> This same check will also be added in `repo_parse_commit_internal()`.
Sounds good.
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> Documentation/git.txt | 9 +++++++++
> commit-graph.c | 6 +++++-
> commit-graph.h | 6 ++++++
> t/t5318-commit-graph.sh | 21 +++++++++++++++++++++
> 4 files changed, 41 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/git.txt b/Documentation/git.txt
> index 11228956cd..22c2b537aa 100644
> --- a/Documentation/git.txt
> +++ b/Documentation/git.txt
> @@ -911,6 +911,15 @@ for full details.
> should not normally need to set this to `0`, but it may be
> useful when trying to salvage data from a corrupted repository.
>
> +`GIT_COMMIT_GRAPH_PARANOIA`::
> + If this Boolean environment variable is set to false, ignore the
> + case where commits exist in the commit graph but not in the
> + object database. Normally, Git will check whether commits loaded
> + from the commit graph exist in the object database to avoid
> + issues with stale commit graphs, but this check comes with a
> + performance penalty. The default is `1` (i.e., be paranoid about
> + stale commits in the commit graph).
> +
The first two sentences seem to be flipped. Perhaps:
When loading a commit object from the commit-graph, Git will perform
an existence check on the object in the ODB before parsing it out of
the commit-graph. The default is "true", which enables the
aforementioned behavior. Setting this to "false" disables the
existential check when parsing commits from a commit-graph.
> `GIT_ALLOW_PROTOCOL`::
> If set to a colon-separated list of protocols, behave as if
> `protocol.allow` is set to `never`, and each of the listed
> diff --git a/commit-graph.c b/commit-graph.c
> index fd2f700b2e..12ec31902e 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -939,14 +939,18 @@ int repo_find_commit_pos_in_graph(struct repository *r, struct commit *c,
>
> struct commit *lookup_commit_in_graph(struct repository *repo, const struct object_id *id)
> {
> + static int object_paranoia = -1;
> struct commit *commit;
> uint32_t pos;
>
> + if (object_paranoia == -1)
> + object_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
> +
I don't think that this is a reroll-able issue, but calling this
variable object_paranoia to store a setting for *graph* paranoia feels
like a good itch to scratch. But obviously not a big deal ;-).
> @@ -821,4 +821,25 @@ test_expect_success 'overflow during generation version upgrade' '
> )
> '
>
> +test_expect_success 'stale commit cannot be parsed when given directly' '
> + test_when_finished "rm -rf repo" &&
> + git init repo &&
> + (
> + cd repo &&
> + test_commit A &&
> + test_commit B &&
> + git commit-graph write --reachable &&
> +
> + oid=$(git rev-parse B) &&
> + rm .git/objects/"$(test_oid_to_path "$oid")" &&
> +
> + # Verify that it is possible to read the commit from the
> + # commit graph when not being paranoid, ...
> + GIT_COMMIT_GRAPH_PARANOIA=false git rev-list B &&
> + # ... but parsing the commit when double checking that
> + # it actually exists in the object database should fail.
> + test_must_fail git rev-list -1 B
Would "cat-file -p" be more direct here than "rev-list -1"?
Thanks,
Taylor
next prev parent reply other threads:[~2023-10-30 21:29 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-23 11:27 [PATCH v2 0/2] commit-graph: detect commits missing in ODB Patrick Steinhardt
2023-10-23 11:27 ` [PATCH v2 1/2] commit-graph: introduce envvar to disable commit existence checks Patrick Steinhardt
2023-10-30 21:29 ` Taylor Blau [this message]
2023-10-31 6:19 ` Patrick Steinhardt
2023-10-23 11:27 ` [PATCH v2 2/2] commit: detect commits that exist in commit-graph but not in the ODB Patrick Steinhardt
2023-10-24 19:10 ` Junio C Hamano
2023-10-30 21:32 ` Taylor Blau
2023-10-31 0:21 ` Junio C Hamano
2023-10-30 21:31 ` Taylor Blau
2023-10-31 7:16 ` [PATCH v3 0/2] commit-graph: detect commits missing in ODB Patrick Steinhardt
2023-10-31 7:16 ` [PATCH v3 1/2] commit-graph: introduce envvar to disable commit existence checks Patrick Steinhardt
2023-10-31 7:16 ` [PATCH v3 2/2] commit: detect commits that exist in commit-graph but not in the ODB Patrick Steinhardt
2023-10-31 19:16 ` [PATCH v3 0/2] commit-graph: detect commits missing in ODB Taylor Blau
2023-10-31 23:57 ` Junio C Hamano
2023-11-01 2:06 ` Junio C Hamano
2023-11-01 4:55 ` Patrick Steinhardt
2023-11-01 5:01 ` Junio C Hamano
2023-10-31 23:55 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZUAgPVR8HxgEZEWo@nand.local \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=karthik.188@gmail.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).