From: Patrick Steinhardt <ps@pks.im>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Karthik Nayak <karthik.188@gmail.com>,
Taylor Blau <me@ttaylorr.com>
Subject: Re: [PATCH] commit: detect commits that exist in commit-graph but not in the ODB
Date: Tue, 17 Oct 2023 08:37:14 +0200 [thread overview]
Message-ID: <ZS4rmtBTYnp2RMiY@tanuki> (raw)
In-Reply-To: <xmqq1qdy1iyr.fsf@gitster.g>
[-- Attachment #1: Type: text/plain, Size: 5942 bytes --]
On Fri, Oct 13, 2023 at 11:21:48AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
>
> > @@ -572,8 +572,13 @@ int repo_parse_commit_internal(struct repository *r,
> > return -1;
> > if (item->object.parsed)
> > return 0;
> > - if (use_commit_graph && parse_commit_in_graph(r, item))
> > + if (use_commit_graph && parse_commit_in_graph(r, item)) {
> > + if (!has_object(r, &item->object.oid, 0))
> > + return quiet_on_missing ? -1 :
> > + error(_("commit %s exists in commit-graph but not in the object database"),
> > + oid_to_hex(&item->object.oid));
> > return 0;
> > + }
>
> Ever since this codepath was introduced by 177722b3 (commit:
> integrate commit graph with commit parsing, 2018-04-10), we blindly
> trusted what commit-graph file says. This change is a strict
> improvement in the correctness department, but there are two things
> that are a bit worrying.
>
> One. The additional check should certainly be cheaper than a full
> reading and parsing of an object, either from a loose object file or
> from a pack entry. It may not hurt performance too much, but it
> still would give us more confidence if we know by how much we are
> pessimising good cases where the commit-graph does match reality.
> Our stance on these secondary files that store precomputed values
> for optimization purposes is in general to use them blindly unless
> in exceptional cases where the operation values the correctness even
> when the validity of these secondary files is dubious (e.g., "fsck"),
> and doing this extra check regardless of the caller at this low level
> of the callchain is a bit worrying.
Fair point indeed. The following is a worst-case scenario benchmark of
of the change where we do a full topological walk of all reachable
commits in the graph, executed in linux.git. We parse commit parents via
`repo_parse_commit_gently()`, so the new code path now basically has to
check for object existence of every reachable commit:
Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master)
Time (mean ± σ): 2.913 s ± 0.018 s [User: 2.363 s, System: 0.548 s]
Range (min … max): 2.894 s … 2.950 s 10 runs
Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
Time (mean ± σ): 3.834 s ± 0.052 s [User: 3.276 s, System: 0.556 s]
Range (min … max): 3.780 s … 3.961 s 10 runs
Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master)
Time (mean ± σ): 13.841 s ± 0.084 s [User: 13.152 s, System: 0.687 s]
Range (min … max): 13.714 s … 13.995 s 10 runs
Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
Time (mean ± σ): 13.762 s ± 0.116 s [User: 13.094 s, System: 0.667 s]
Range (min … max): 13.645 s … 14.038 s 10 runs
Summary
git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran
1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master)
The added check does lead to a performance regression indeed, which is
not all that unexpected. That being said, the commit-graph still results
in a significant speedup compared to the case where we don't have it.
> Another is that by the time parse_commit_in_graph() returns true and
> we realize that the answer we got is bogus by asking has_object(),
> item->object.parsed has already been toggled on, so the caller now
> has a commit object that claimed it was already parsed and does not
> match reality. Hopefully the caller takes an early exit upon seeing
> a failure from parse_commit_gently() and the .parsed bit does not
> matter, but maybe I am missing a case where it does. I dunno.
We could also call `unparse_commit()` when we notice the stale commit
graph item. This would be in the same spirit as the rest of this patch
as it would lead to an overall safer end state.
In any case I'll wait for additional input before sending a v2, most
importantly to see whether we think that consistency trumps performance
in this case. Personally I'm still of the mind that it should, which
also comes from the fact that we were fighting with stale commit graphs
several times in production data.
Patrick
> Other than that, sounds very sensible and the code change is clean.
>
> Will queue. Thanks.
>
> > diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> > index ba65f17dd9..25f8e9e2d3 100755
> > --- a/t/t5318-commit-graph.sh
> > +++ b/t/t5318-commit-graph.sh
> > @@ -821,4 +821,27 @@ test_expect_success 'overflow during generation version upgrade' '
> > )
> > '
> >
> > +test_expect_success 'commit exists in commit-graph but not in object database' '
> > + test_when_finished "rm -rf repo" &&
> > + git init repo &&
> > + (
> > + cd repo &&
> > +
> > + test_commit A &&
> > + test_commit B &&
> > + test_commit C &&
> > + git commit-graph write --reachable &&
> > +
> > + # Corrupt the repository by deleting the intermittent commit
> > + # object. Commands should notice that this object is absent and
> > + # thus that the repository is corrupt even if the commit graph
> > + # exists.
> > + oid=$(git rev-parse B) &&
> > + rm .git/objects/"$(test_oid_to_path "$oid")" &&
> > +
> > + test_must_fail git rev-parse HEAD~2 2>error &&
> > + grep "error: commit $oid exists in commit-graph but not in the object database" error
> > + )
> > +'
> > +
> > test_done
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2023-10-17 6:37 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-09 10:55 [PATCH 0/3] rev-list: add support for commits in `--missing` Karthik Nayak
2023-10-09 10:55 ` [PATCH 1/3] revision: rename bit to `do_not_die_on_missing_objects` Karthik Nayak
2023-10-09 10:55 ` [PATCH 2/3] rev-list: move `show_commit()` to the bottom Karthik Nayak
2023-10-09 10:55 ` [PATCH 3/3] rev-list: add commit object support in `--missing` option Karthik Nayak
2023-10-09 22:02 ` [PATCH 0/3] rev-list: add support for commits in `--missing` Junio C Hamano
2023-10-10 6:19 ` Patrick Steinhardt
2023-10-10 17:09 ` Junio C Hamano
2023-10-11 10:37 ` Karthik Nayak
2023-10-11 16:54 ` Junio C Hamano
2023-10-12 10:44 ` Karthik Nayak
2023-10-12 11:04 ` Patrick Steinhardt
2023-10-12 13:23 ` Karthik Nayak
2023-10-12 16:17 ` Junio C Hamano
2023-10-13 5:53 ` Patrick Steinhardt
2023-10-13 8:38 ` Patrick Steinhardt
2023-10-13 12:37 ` [PATCH] commit: detect commits that exist in commit-graph but not in the ODB Patrick Steinhardt
2023-10-13 18:21 ` Junio C Hamano
2023-10-17 6:37 ` Patrick Steinhardt [this message]
2023-10-17 18:34 ` Junio C Hamano
2023-10-19 6:45 ` Patrick Steinhardt
2023-10-19 8:25 ` Patrick Steinhardt
2023-10-19 17:16 ` Junio C Hamano
2023-10-20 10:00 ` Jeff King
2023-10-20 17:35 ` Junio C Hamano
2023-10-23 10:15 ` Patrick Steinhardt
2023-10-13 17:07 ` [PATCH 0/3] rev-list: add support for commits in `--missing` Junio C Hamano
2023-10-12 16:26 ` Junio C Hamano
2023-10-16 10:38 ` [PATCH v2 " Karthik Nayak
2023-10-16 10:38 ` [PATCH v2 1/3] revision: rename bit to `do_not_die_on_missing_objects` Karthik Nayak
2023-10-16 10:38 ` [PATCH v2 2/3] rev-list: move `show_commit()` to the bottom Karthik Nayak
2023-10-16 10:38 ` [PATCH v2 3/3] rev-list: add commit object support in `--missing` option Karthik Nayak
2023-10-16 16:24 ` [PATCH v2 0/3] rev-list: add support for commits in `--missing` Junio C Hamano
2023-10-16 19:01 ` Karthik Nayak
2023-10-16 20:33 ` Junio C Hamano
2023-10-19 12:10 ` [PATCH v3 " Karthik Nayak
2023-10-19 12:10 ` [PATCH v3 1/3] revision: rename bit to `do_not_die_on_missing_objects` Karthik Nayak
2023-10-19 12:10 ` [PATCH v3 2/3] rev-list: move `show_commit()` to the bottom Karthik Nayak
2023-10-19 12:10 ` [PATCH v3 3/3] rev-list: add commit object support in `--missing` option Karthik Nayak
2023-10-19 22:05 ` Junio C Hamano
2023-10-19 23:35 ` Junio C Hamano
2023-10-20 11:14 ` Karthik Nayak
2023-10-20 14:47 ` Karthik Nayak
2023-10-20 17:45 ` Junio C Hamano
2023-10-20 16:41 ` Junio C Hamano
2023-10-24 11:34 ` Karthik Nayak
2023-10-24 12:26 ` [PATCH v4 0/3] rev-list: add support for commits in `--missing` Karthik Nayak
2023-10-24 12:26 ` [PATCH v4 1/3] revision: rename bit to `do_not_die_on_missing_objects` Karthik Nayak
2023-10-24 12:26 ` [PATCH v4 2/3] rev-list: move `show_commit()` to the bottom Karthik Nayak
2023-10-24 12:26 ` [PATCH v4 3/3] rev-list: add commit object support in `--missing` option Karthik Nayak
2023-10-24 17:45 ` Junio C Hamano
2023-10-25 0:35 ` Junio C Hamano
2023-10-25 9:34 ` Karthik Nayak
2023-10-25 6:40 ` Patrick Steinhardt
2023-10-26 12:37 ` Junio C Hamano
2023-10-26 10:11 ` [PATCH v5 0/3] rev-list: add support for commits in `--missing` Karthik Nayak
2023-10-26 10:11 ` [PATCH v5 1/3] revision: rename bit to `do_not_die_on_missing_objects` Karthik Nayak
2023-10-26 10:11 ` [PATCH v5 2/3] rev-list: move `show_commit()` to the bottom Karthik Nayak
2023-10-26 10:11 ` [PATCH v5 3/3] rev-list: add commit object support in `--missing` option Karthik Nayak
2023-10-27 6:25 ` Patrick Steinhardt
2023-10-27 7:54 ` Karthik Nayak
2023-10-27 7:59 ` Karthik Nayak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZS4rmtBTYnp2RMiY@tanuki \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=karthik.188@gmail.com \
--cc=me@ttaylorr.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).