From: Patrick Steinhardt <ps@pks.im>
To: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, me@ttaylorr.com, gitster@pobox.com,
abhishekkumar8222@gmail.com,
Derrick Stolee <derrickstolee@github.com>
Subject: Re: [PATCH 3/7] commit-graph: start parsing generation v2 (again)
Date: Mon, 28 Feb 2022 16:18:20 +0100 [thread overview]
Message-ID: <YhzkdMxrIGlNutr6@ncase> (raw)
In-Reply-To: <a3436b92a32f7f6dd02ad61eb2337a4d088d5e9c.1645735117.git.gitgitgadget@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 5822 bytes --]
On Thu, Feb 24, 2022 at 08:38:32PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
>
> The 'read_generation_data' member of 'struct commit_graph' was
> introduced by 1fdc383c5 (commit-graph: use generation v2 only if entire
> chain does, 2021-01-16). The intention was to avoid using corrected
> commit dates if not all layers of a commit-graph had that data stored.
> The logic in validate_mixed_generation_chain() at that point incorrectly
> initialized read_generation_data to 1 if and only if the tip
> commit-graph contained the Corrected Commit Date chunk.
>
> This was "fixed" in 448a39e65 (commit-graph: validate layers for
> generation data, 2021-02-02) to validate that read_generation_data was
> either non-zero for all layers, or it would set read_generation_data to
> zero for all layers.
>
> The problem here is that read_generation_data is not initialized to be
> non-zero anywhere!
>
> This change initializes read_generation_data immediately after the chunk
> is parsed, so each layer will have its value present as soon as
> possible.
>
> The read_generation_data member is used in fill_commit_graph_info() to
> determine if we should use the corrected commit date or the topological
> levels stored in the Commit Data chunk. Due to this bug, all previous
> versions of Git were defaulting to topological levels in all cases!
>
> This can be measured with some performance tests. Using the Linux kernel
> as a testbed, I generated a complete commit-graph containing corrected
> commit dates and tested the 'new' version against the previous, 'old'
> version.
>
> First, rev-list with --topo-order demonstrates a 26% improvement using
> corrected commit dates:
>
> hyperfine \
> -n "old" "$OLD_GIT rev-list --topo-order -1000 v3.6" \
> -n "new" "$NEW_GIT rev-list --topo-order -1000 v3.6" \
> --warmup=10
>
> Benchmark 1: old
> Time (mean ± σ): 57.1 ms ± 3.1 ms
> Range (min … max): 52.9 ms … 62.0 ms 55 runs
>
> Benchmark 2: new
> Time (mean ± σ): 45.5 ms ± 3.3 ms
> Range (min … max): 39.9 ms … 51.7 ms 59 runs
>
> Summary
> 'new' ran
> 1.26 ± 0.11 times faster than 'old'
>
> These performance improvements are due to the algorithmic improvements
> given by walking fewer commits due to the higher cutoffs from corrected
> commit dates.
>
> However, this comes at a cost. The additional I/O cost of parsing the
> corrected commit dates is visible in case of merge-base commands that do
> not reduce the overall number of walked commits.
>
> hyperfine \
> -n "old" "$OLD_GIT merge-base v4.8 v4.9" \
> -n "new" "$NEW_GIT merge-base v4.8 v4.9" \
> --warmup=10
>
> Benchmark 1: old
> Time (mean ± σ): 110.4 ms ± 6.4 ms
> Range (min … max): 96.0 ms … 118.3 ms 25 runs
>
> Benchmark 2: new
> Time (mean ± σ): 150.7 ms ± 1.1 ms
> Range (min … max): 149.3 ms … 153.4 ms 19 runs
>
> Summary
> 'old' ran
> 1.36 ± 0.08 times faster than 'new'
>
> Performance issues like this are what motivated 702110aac (commit-graph:
> use config to specify generation type, 2021-02-25).
>
> In the future, we could fix this performance problem by inserting the
> corrected commit date offsets into the Commit Date chunk instead of
> having that data in an extra chunk.
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
> commit-graph.c | 3 +++
> t/t4216-log-bloom.sh | 2 +-
> t/t5318-commit-graph.sh | 14 ++++++++++++--
> t/t5324-split-commit-graph.sh | 9 +++++++--
> 4 files changed, 23 insertions(+), 5 deletions(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index a19bd96c2ee..8e52bb09552 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -407,6 +407,9 @@ struct commit_graph *parse_commit_graph(struct repository *r,
> &graph->chunk_generation_data);
> pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW,
> &graph->chunk_generation_data_overflow);
> +
> + if (graph->chunk_generation_data)
> + graph->read_generation_data = 1;
> }
>
> if (r->settings.commit_graph_read_changed_paths) {
I wanted to test your changes because they seem quite exciting in the
context of my work as well, but this commit seems to uncover a bug with
how we handle overflows. I originally triggered the bug when trying to
do a mirror-fetch, but as it turns it seems to trigger now whenever the
commit-graph is being read:
$ git commit-graph verify
fatal: commit-graph requires overflow generation data but has none
$ git commit-graph write --split
Finding commits for commit graph among packed objects: 100% (10235119/10235119), done.
fatal: commit-graph requires overflow generation data but has none
$ git commit-graph write --split=replace
Finding commits for commit graph among packed objects: 100% (10235119/10235119), done.
fatal: commit-graph requires overflow generation data but has none
I initially assumed this may be a bug with how we previously wrote the
commit-graph, but removing all chains still reliably triggers it:
$ rm -f objects/info/commit-graphs/*
$ git commit-graph write --split
Finding commits for commit graph among packed objects: 100% (10235119/10235119), done.
fatal: commit-graph requires overflow generation data but has none
I haven't yet found the time to dig deeper into why this is happening.
While the repository is publicly accessible at [1], unfortunately the
bug seems to be triggered by a commit that's only kept alive by an
internal reference.
Patrick
[1]: https://gitlab.com/gitlab-com/www-gitlab-com.git
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2022-02-28 15:18 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-24 20:38 [PATCH 0/7] Commit-graph: Generation Number v2 Fixes, v3 implementation Derrick Stolee via GitGitGadget
2022-02-24 20:38 ` [PATCH 1/7] test-read-graph: include extra post-parse info Derrick Stolee via GitGitGadget
2022-02-24 20:38 ` [PATCH 2/7] commit-graph: fix ordering bug in generation numbers Derrick Stolee via GitGitGadget
2022-02-24 22:15 ` Junio C Hamano
2022-02-25 13:51 ` Derrick Stolee
2022-02-25 17:35 ` Junio C Hamano
2022-02-24 20:38 ` [PATCH 3/7] commit-graph: start parsing generation v2 (again) Derrick Stolee via GitGitGadget
2022-02-28 15:18 ` Patrick Steinhardt [this message]
2022-02-28 16:23 ` Derrick Stolee
2022-02-28 16:59 ` Patrick Steinhardt
2022-02-28 18:44 ` Derrick Stolee
2022-03-01 9:46 ` Patrick Steinhardt
2022-03-01 10:35 ` Patrick Steinhardt
2022-03-01 14:06 ` Derrick Stolee
2022-03-01 14:53 ` Patrick Steinhardt
2022-03-01 15:25 ` Derrick Stolee
2022-03-02 13:57 ` Patrick Steinhardt
2022-03-02 14:57 ` Derrick Stolee
2022-03-02 18:15 ` Junio C Hamano
2022-03-02 18:46 ` Derrick Stolee
2022-03-02 22:42 ` Junio C Hamano
2022-03-03 11:19 ` Patrick Steinhardt
2022-03-03 16:00 ` Derrick Stolee
2022-03-04 14:03 ` Derrick Stolee
2022-03-07 10:34 ` Patrick Steinhardt
2022-03-07 13:45 ` Derrick Stolee
2022-03-07 17:22 ` Junio C Hamano
2022-03-10 13:58 ` Patrick Steinhardt
2022-03-10 17:18 ` Derrick Stolee
2022-02-24 20:38 ` [PATCH 4/7] commit-graph: fix generation number v2 overflow values Derrick Stolee via GitGitGadget
2022-02-24 22:35 ` Junio C Hamano
2022-02-25 13:53 ` Derrick Stolee
2022-02-25 17:38 ` Junio C Hamano
2022-02-24 20:38 ` [PATCH 5/7] commit-graph: document file format v2 Derrick Stolee via GitGitGadget
2022-02-24 22:55 ` Junio C Hamano
2022-02-25 22:31 ` Ævar Arnfjörð Bjarmason
2022-02-28 13:44 ` Derrick Stolee
2022-02-28 14:27 ` Ævar Arnfjörð Bjarmason
2022-02-28 16:39 ` Derrick Stolee
2022-02-28 21:14 ` Ævar Arnfjörð Bjarmason
2022-03-01 14:19 ` Derrick Stolee
2022-03-01 14:29 ` Ævar Arnfjörð Bjarmason
2022-03-01 15:59 ` Derrick Stolee
2022-02-24 20:38 ` [PATCH 6/7] commit-graph: parse " Derrick Stolee via GitGitGadget
2022-02-24 23:01 ` Junio C Hamano
2022-02-25 13:54 ` Derrick Stolee
2022-02-24 20:38 ` [PATCH 7/7] commit-graph: write " Derrick Stolee via GitGitGadget
2022-02-24 21:42 ` [PATCH 0/7] Commit-graph: Generation Number v2 Fixes, v3 implementation Junio C Hamano
2022-02-24 23:06 ` Junio C Hamano
2022-02-25 13:55 ` Derrick Stolee
2022-02-28 13:53 ` [PATCH v2 0/4] Commit-graph: Generation Number v2 Fixes Derrick Stolee via GitGitGadget
2022-02-28 13:53 ` [PATCH v2 1/4] test-read-graph: include extra post-parse info Derrick Stolee via GitGitGadget
2022-02-28 15:22 ` Ævar Arnfjörð Bjarmason
2022-02-28 13:53 ` [PATCH v2 2/4] commit-graph: fix ordering bug in generation numbers Derrick Stolee via GitGitGadget
2022-02-28 15:25 ` Ævar Arnfjörð Bjarmason
2022-02-28 13:53 ` [PATCH v2 3/4] commit-graph: start parsing generation v2 (again) Derrick Stolee via GitGitGadget
2022-02-28 15:30 ` Ævar Arnfjörð Bjarmason
2022-02-28 16:43 ` Derrick Stolee
2022-02-28 13:53 ` [PATCH v2 4/4] commit-graph: fix generation number v2 overflow values Derrick Stolee via GitGitGadget
2022-02-28 15:40 ` Ævar Arnfjörð Bjarmason
2022-03-01 17:23 ` [PATCH v2 0/4] Commit-graph: Generation Number v2 Fixes Ævar Arnfjörð Bjarmason
2022-03-01 19:48 ` [PATCH v3 0/5] " Derrick Stolee via GitGitGadget
2022-03-01 19:48 ` [PATCH v3 1/5] test-read-graph: include extra post-parse info Derrick Stolee via GitGitGadget
2022-03-01 19:48 ` [PATCH v3 2/5] t5318: extract helpers to lib-commit-graph.sh Derrick Stolee via GitGitGadget
2022-03-01 19:48 ` [PATCH v3 3/5] commit-graph: fix ordering bug in generation numbers Derrick Stolee via GitGitGadget
2022-03-01 20:13 ` Junio C Hamano
2022-03-01 20:30 ` Junio C Hamano
2022-03-02 14:13 ` Derrick Stolee
2022-03-01 19:48 ` [PATCH v3 4/5] commit-graph: start parsing generation v2 (again) Derrick Stolee via GitGitGadget
2022-03-01 19:48 ` [PATCH v3 5/5] commit-graph: fix generation number v2 overflow values Derrick Stolee via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YhzkdMxrIGlNutr6@ncase \
--to=ps@pks.im \
--cc=abhishekkumar8222@gmail.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.