git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Cc: Karthik Nayak <karthik.188@gmail.com>,
	Junio C Hamano <gitster@pobox.com>, Taylor Blau <me@ttaylorr.com>,
	Jeff King <peff@peff.net>
Subject: [PATCH v2 0/2] commit-graph: detect commits missing in ODB
Date: Mon, 23 Oct 2023 13:27:11 +0200	[thread overview]
Message-ID: <cover.1698060036.git.ps@pks.im> (raw)

[-- Attachment #1: Type: text/plain, Size: 6482 bytes --]

Hi,

this is version 2 of my patch series to more readily detect commits
parsed from the commit graph which are missing in the object database.
I've split this out into a separate thread as version 1 was sent in
reply to a different series to extend git-rev-list(1)'s `--missing`
option so that I don't continue to hijack this thread.

Changes compared to v1:

    - I've added a preparatory patch that introduced a new
      GIT_COMMIT_GRAPH_PARANOIA environment variable as suggested by
      Peff. This envvar is retrofitted to the preexisting check in
      `lookup_commit_in_graph()` so that the behaviour for this sanity
      check is consistent.

    - `repo_parse_commit_internal()` now also honors this new envvar.

    - I've amended the commit message in v2 to include the benchmark
      that demonstrates the performance regression.

    - We now un-parse the commit when parsing it via the commit graph
      succeeded, but it doesn't exist in the ODB.

Thanks for all the feedback and discussion around this.

Patrick

[1]: <b0bf576c51a706367a758b8e30eca37edb9c2734.1697200576.git.ps@pks.im>

Patrick Steinhardt (2):
  commit-graph: introduce envvar to disable commit existence checks
  commit: detect commits that exist in commit-graph but not in the ODB

 Documentation/git.txt   |  9 ++++++++
 commit-graph.c          |  6 +++++-
 commit-graph.h          |  6 ++++++
 commit.c                | 16 +++++++++++++-
 t/t5318-commit-graph.sh | 48 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 83 insertions(+), 2 deletions(-)

Range-diff against v1:
-:  ---------- > 1:  a89c435528 commit-graph: introduce envvar to disable commit existence checks
1:  6ec1e340f8 ! 2:  0476d48555 commit: detect commits that exist in commit-graph but not in the ODB
    @@ Commit message
         behaviour by checking for object existence via the object database, as
         well.
     
    +    This check of course comes with a performance penalty. The following
    +    benchmarks have been executed in a clone of linux.git with stable tags
    +    added:
    +
    +        Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master)
    +          Time (mean ± σ):      2.913 s ±  0.018 s    [User: 2.363 s, System: 0.548 s]
    +          Range (min … max):    2.894 s …  2.950 s    10 runs
    +
    +        Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
    +          Time (mean ± σ):      3.834 s ±  0.052 s    [User: 3.276 s, System: 0.556 s]
    +          Range (min … max):    3.780 s …  3.961 s    10 runs
    +
    +        Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master)
    +          Time (mean ± σ):     13.841 s ±  0.084 s    [User: 13.152 s, System: 0.687 s]
    +          Range (min … max):   13.714 s … 13.995 s    10 runs
    +
    +        Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
    +          Time (mean ± σ):     13.762 s ±  0.116 s    [User: 13.094 s, System: 0.667 s]
    +          Range (min … max):   13.645 s … 14.038 s    10 runs
    +
    +        Summary
    +          git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran
    +            1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
    +            4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
    +            4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master)
    +
    +    We look at a ~30% regression in general, but in general we're still a
    +    whole lot faster than without the commit graph. To counteract this, the
    +    new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar.
    +
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
      ## commit.c ##
    +@@
    + #include "shallow.h"
    + #include "tree.h"
    + #include "hook.h"
    ++#include "parse.h"
    + 
    + static struct commit_extra_header *read_commit_extra_header_lines(const char *buf, size_t len, const char **);
    + 
     @@ commit.c: int repo_parse_commit_internal(struct repository *r,
      		return -1;
      	if (item->object.parsed)
      		return 0;
     -	if (use_commit_graph && parse_commit_in_graph(r, item))
     +	if (use_commit_graph && parse_commit_in_graph(r, item)) {
    -+		if (!has_object(r, &item->object.oid, 0))
    ++		static int object_paranoia = -1;
    ++
    ++		if (object_paranoia == -1)
    ++			object_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
    ++
    ++		if (object_paranoia && !has_object(r, &item->object.oid, 0)) {
    ++			unparse_commit(r, &item->object.oid);
     +			return quiet_on_missing ? -1 :
     +				error(_("commit %s exists in commit-graph but not in the object database"),
     +				      oid_to_hex(&item->object.oid));
    ++		}
    ++
      		return 0;
     +	}
      
    @@ commit.c: int repo_parse_commit_internal(struct repository *r,
      		return quiet_on_missing ? -1 :
     
      ## t/t5318-commit-graph.sh ##
    -@@ t/t5318-commit-graph.sh: test_expect_success 'overflow during generation version upgrade' '
    +@@ t/t5318-commit-graph.sh: test_expect_success 'stale commit cannot be parsed when given directly' '
      	)
      '
      
    -+test_expect_success 'commit exists in commit-graph but not in object database' '
    ++test_expect_success 'stale commit cannot be parsed when traversing graph' '
     +	test_when_finished "rm -rf repo" &&
     +	git init repo &&
     +	(
    @@ t/t5318-commit-graph.sh: test_expect_success 'overflow during generation version
     +		oid=$(git rev-parse B) &&
     +		rm .git/objects/"$(test_oid_to_path "$oid")" &&
     +
    ++		# Again, we should be able to parse the commit when not
    ++		# being paranoid about commit graph staleness...
    ++		GIT_COMMIT_GRAPH_PARANOIA=false git rev-parse HEAD~2 &&
    ++		# ... but fail when we are paranoid.
     +		test_must_fail git rev-parse HEAD~2 2>error &&
     +		grep "error: commit $oid exists in commit-graph but not in the object database" error
     +	)
-- 
2.42.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

             reply	other threads:[~2023-10-23 11:27 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-23 11:27 Patrick Steinhardt [this message]
2023-10-23 11:27 ` [PATCH v2 1/2] commit-graph: introduce envvar to disable commit existence checks Patrick Steinhardt
2023-10-30 21:29   ` Taylor Blau
2023-10-31  6:19     ` Patrick Steinhardt
2023-10-23 11:27 ` [PATCH v2 2/2] commit: detect commits that exist in commit-graph but not in the ODB Patrick Steinhardt
2023-10-24 19:10   ` Junio C Hamano
2023-10-30 21:32     ` Taylor Blau
2023-10-31  0:21       ` Junio C Hamano
2023-10-30 21:31   ` Taylor Blau
2023-10-31  7:16 ` [PATCH v3 0/2] commit-graph: detect commits missing in ODB Patrick Steinhardt
2023-10-31  7:16   ` [PATCH v3 1/2] commit-graph: introduce envvar to disable commit existence checks Patrick Steinhardt
2023-10-31  7:16   ` [PATCH v3 2/2] commit: detect commits that exist in commit-graph but not in the ODB Patrick Steinhardt
2023-10-31 19:16   ` [PATCH v3 0/2] commit-graph: detect commits missing in ODB Taylor Blau
2023-10-31 23:57     ` Junio C Hamano
2023-11-01  2:06       ` Junio C Hamano
2023-11-01  4:55         ` Patrick Steinhardt
2023-11-01  5:01           ` Junio C Hamano
2023-10-31 23:55   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1698060036.git.ps@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=karthik.188@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).