From: Abhishek Kumar <abhishekkumar8222@gmail.com>
To: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: abhishekkumar8222@gmail.com, derrickstolee@github.com,
git@vger.kernel.org, gitster@pobox.com, me@ttaylorr.com,
peff@peff.net, stolee@gmail.com, ttaylorr@github.com
Subject: Re: [PATCH v2 0/6] Generation Number v2: Fix a tricky split graph bug
Date: Thu, 11 Feb 2021 10:14:39 +0530 [thread overview]
Message-ID: <YCS2N8WC5jQW1jCZ@Abhishek-Arch> (raw)
In-Reply-To: <pull.850.v2.git.1612234883.gitgitgadget@gmail.com>
On Tue, Feb 02, 2021 at 03:01:17AM +0000, Derrick Stolee via GitGitGadget wrote:
> Here is a bugfix for the recently-landed-in-next generation v2 topic
> (ak/corrected-commit-date).
>
> This was occasionally hitting us when computing new commit-graphs on
> existing repositories with the new bits. It was very hard to reproduce, and
> it turns out to be due to not parsing commits before accessing generation
> number data. Doing so in the right place demonstrates the bug of recomputing
> the corrected commit date even for commits in lower layers with computed
> values.
>
> The fix is split into these steps:
>
> 1. Parse commits more often before accessing their data. (This allows the
> bug to be demonstrated in the test suite.)
> 2. Check the full commit-graph chain for generation data chunks.
> 3. Don't compute corrected commit dates if the lower layers do not support
> them.
> 4. Parse the commit-graph file more often.
>
> Thanks, -Stolee
>
Thanks for the fast bug-fix. It must have been hard to track down why as
we weren't able to reproduce this behaviour before. As Taylor said
before, this patch is a clear improvement in readability and correctness.
Thanks
- Abhishek
>
> Updates in v2
> =============
>
> * Fixed some typos or other clarifications in commit messages.
>
> * The loop assigning read_generation_data is skipped if they already all
> agree with value 1.
>
> * I split compute_generation_numbers into two methods. This essentially
> splits the previous patch 4 into patches 4 & 5 here. The new patch 4 just
> splits the logic as-is, then the new patch 5 does the re-initialization
> of generation values when in the upgrade scenario.
>
> Derrick Stolee (6):
> commit-graph: use repo_parse_commit
> commit-graph: always parse before commit_graph_data_at()
> commit-graph: validate layers for generation data
> commit-graph: compute generations separately
> commit-graph: be extra careful about mixed generations
> commit-graph: prepare commit graph
>
> commit-graph.c | 138 +++++++++++++++++++++++++++++-----------
> commit.h | 5 +-
> t/t5318-commit-graph.sh | 21 ++++++
> 3 files changed, 125 insertions(+), 39 deletions(-)
>
>
> base-commit: 5a3b130cad0d5c770f766e3af6d32b41766374c0
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-850%2Fderrickstolee%2Fgen-v2-upgrade-fix-v2
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-850/derrickstolee/gen-v2-upgrade-fix-v2
> Pull-Request: https://github.com/gitgitgadget/git/pull/850
>
> Range-diff vs v1:
>
> 1: 9c605c99f66 = 1: 9c605c99f66 commit-graph: use repo_parse_commit
> 2: 82afa811dff ! 2: 454b183b9ba commit-graph: always parse before commit_graph_data_at()
> @@ Commit message
> problems when writing a commit-graph with corrected commit dates based
> on a commit-graph without them.
>
> - It has been difficult to identify the issue here becaus it was so hard
> + It has been difficult to identify the issue here because it was so hard
> to reproduce. It relies on this uninitialized data having a non-zero
> value, but also on specifically in a way that overwrites the existing
> data.
> 3: d554fa30660 ! 3: 3d223fa2156 commit-graph: validate layers for generation data
> @@ commit-graph.c: static struct commit_graph *load_commit_graph_chain(struct repos
>
> - if (!g)
> - return;
> --
> -- read_generation_data = !!g->chunk_generation_data;
> + while (read_generation_data && p) {
> + read_generation_data = p->read_generation_data;
> + p = p->base_graph;
> + }
>
> +- read_generation_data = !!g->chunk_generation_data;
> ++ if (read_generation_data)
> ++ return 1;
> +
> while (g) {
> - g->read_generation_data = read_generation_data;
> +- g->read_generation_data = read_generation_data;
> ++ g->read_generation_data = 0;
> g = g->base_graph;
> }
> +
> -+ return read_generation_data;
> ++ return 0;
> }
>
> struct commit_graph *read_commit_graph_one(struct repository *r,
> -: ----------- > 4: 05248ff222f commit-graph: compute generations separately
> 4: b267a9653a7 ! 5: 9bccee8fb63 commit-graph: be extra careful about mixed generations
> @@ Commit message
> existing commit-graph data has no corrected commit dates.
>
> While this improves our situation somewhat, we have not completely
> - solved the issue for correctly computing generation numbers for mixes
> + solved the issue for correctly computing generation numbers for mixed
> layers. That follows in the next change.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> @@ commit-graph.c: static void compute_generation_numbers(struct write_commit_graph
> _("Computing commit graph generation numbers"),
> ctx->commits.nr);
> +
> -+ if (ctx->write_generation_data && !ctx->trust_generation_numbers) {
> ++ if (!ctx->trust_generation_numbers) {
> + for (i = 0; i < ctx->commits.nr; i++) {
> + struct commit *c = ctx->commits.list[i];
> + repo_parse_commit(ctx->r, c);
> @@ commit-graph.c: static void compute_generation_numbers(struct write_commit_graph
> +
> for (i = 0; i < ctx->commits.nr; i++) {
> struct commit *c = ctx->commits.list[i];
> - uint32_t level;
> -@@ commit-graph.c: static void compute_generation_numbers(struct write_commit_graph_context *ctx)
> - corrected_commit_date = commit_graph_data_at(parent->item)->generation;
> -
> - if (level == GENERATION_NUMBER_ZERO ||
> -- corrected_commit_date == GENERATION_NUMBER_ZERO) {
> -+ (ctx->write_generation_data &&
> -+ corrected_commit_date == GENERATION_NUMBER_ZERO)) {
> - all_parents_computed = 0;
> - commit_list_insert(parent->item, &list);
> - break;
> -@@ commit-graph.c: static void compute_generation_numbers(struct write_commit_graph_context *ctx)
> - max_level = GENERATION_NUMBER_V1_MAX - 1;
> - *topo_level_slab_at(ctx->topo_levels, current) = max_level + 1;
> -
> -- if (current->date && current->date > max_corrected_commit_date)
> -- max_corrected_commit_date = current->date - 1;
> -- commit_graph_data_at(current)->generation = max_corrected_commit_date + 1;
> --
> -- if (commit_graph_data_at(current)->generation - current->date > GENERATION_NUMBER_V2_OFFSET_MAX)
> -- ctx->num_generation_data_overflows++;
> -+ if (ctx->write_generation_data) {
> -+ timestamp_t cur_g;
> -+ if (current->date && current->date > max_corrected_commit_date)
> -+ max_corrected_commit_date = current->date - 1;
> -+ cur_g = commit_graph_data_at(current)->generation
> -+ = max_corrected_commit_date + 1;
> -+ if (cur_g - current->date > GENERATION_NUMBER_V2_OFFSET_MAX)
> -+ ctx->num_generation_data_overflows++;
> -+ }
> - }
> - }
> - }
> + timestamp_t corrected_commit_date;
> @@ commit-graph.c: int write_commit_graph(struct object_directory *odb,
> } else
> ctx->num_commit_graphs_after = 1;
> @@ commit-graph.c: int write_commit_graph(struct object_directory *odb,
> - validate_mixed_generation_chain(ctx->r->objects->commit_graph);
> + ctx->trust_generation_numbers = validate_mixed_generation_chain(ctx->r->objects->commit_graph);
>
> - compute_generation_numbers(ctx);
> -
> + compute_topological_levels(ctx);
> + if (ctx->write_generation_data)
> 5: dddeec30ebf ! 6: 38086c85b52 commit-graph: prepare commit graph
> @@ Commit message
> commit-graph: prepare commit graph
>
> Before checking if the repository has a commit-graph loaded, be sure
> - to run prepare_commit_graph(). This is necessary because without this
> - instance we would not initialize the topo_levels slab for each of the
> - struct commit_graphs in the chain before we start to parse the
> - commits. This leads to possibly recomputing the topological levels for
> - commits in lower layers even when we are adding a small number of
> - commits on top.
> + to run prepare_commit_graph(). This is necessary because otherwise
> + the topo_levels slab is not initialized. As we compute topo_levels for
> + the new commits, we iterate further into the lower layers since the
> + first visit to each commit looks as though the topo_level is not
> + populated.
>
> By properly initializing the topo_slab, we fix the previously broken
> case of a split commit graph where a base layer has the
>
> --
> gitgitgadget
prev parent reply other threads:[~2021-02-11 4:45 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-01 17:15 [PATCH 0/5] Generation Number v2: Fix a tricky split graph bug Derrick Stolee via GitGitGadget
2021-02-01 17:15 ` [PATCH 1/5] commit-graph: use repo_parse_commit Derrick Stolee via GitGitGadget
2021-02-01 17:32 ` Taylor Blau
2021-02-01 17:15 ` [PATCH 2/5] commit-graph: always parse before commit_graph_data_at() Derrick Stolee via GitGitGadget
2021-02-01 18:44 ` Junio C Hamano
2021-02-01 17:15 ` [PATCH 3/5] commit-graph: validate layers for generation data Derrick Stolee via GitGitGadget
2021-02-01 17:39 ` Taylor Blau
2021-02-01 18:10 ` Derrick Stolee
2021-02-01 17:15 ` [PATCH 4/5] commit-graph: be extra careful about mixed generations Derrick Stolee via GitGitGadget
2021-02-01 18:04 ` Taylor Blau
2021-02-01 18:13 ` Derrick Stolee
2021-02-01 18:55 ` Junio C Hamano
2021-02-01 17:15 ` [PATCH 5/5] commit-graph: prepare commit graph Derrick Stolee via GitGitGadget
2021-02-01 18:25 ` Taylor Blau
2021-02-02 3:01 ` [PATCH v2 0/6] Generation Number v2: Fix a tricky split graph bug Derrick Stolee via GitGitGadget
2021-02-02 3:01 ` [PATCH v2 1/6] commit-graph: use repo_parse_commit Derrick Stolee via GitGitGadget
2021-02-02 3:01 ` [PATCH v2 2/6] commit-graph: always parse before commit_graph_data_at() Derrick Stolee via GitGitGadget
2021-02-03 1:08 ` Jonathan Nieder
2021-02-03 1:35 ` Derrick Stolee
2021-02-03 1:48 ` Jonathan Nieder
2021-02-03 3:07 ` Derrick Stolee
2021-02-03 15:34 ` Taylor Blau
2021-02-03 17:37 ` Eric Sunshine
2021-02-03 18:41 ` Junio C Hamano
2021-02-03 21:08 ` Taylor Blau
2021-02-03 2:06 ` Junio C Hamano
2021-02-03 3:09 ` Derrick Stolee
2021-02-07 19:04 ` SZEDER Gábor
2021-02-07 20:12 ` Junio C Hamano
2021-02-08 2:01 ` Derrick Stolee
2021-02-08 5:55 ` Junio C Hamano
2021-02-02 3:01 ` [PATCH v2 3/6] commit-graph: validate layers for generation data Derrick Stolee via GitGitGadget
2021-02-02 3:01 ` [PATCH v2 4/6] commit-graph: compute generations separately Derrick Stolee via GitGitGadget
2021-02-02 3:01 ` [PATCH v2 5/6] commit-graph: be extra careful about mixed generations Derrick Stolee via GitGitGadget
2021-02-02 3:01 ` [PATCH v2 6/6] commit-graph: prepare commit graph Derrick Stolee via GitGitGadget
2021-02-02 3:08 ` [PATCH v2 0/6] Generation Number v2: Fix a tricky split graph bug Taylor Blau
2021-02-11 4:44 ` Abhishek Kumar [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YCS2N8WC5jQW1jCZ@Abhishek-Arch \
--to=abhishekkumar8222@gmail.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
--cc=pull.850.v2.git.1612234883.gitgitgadget@gmail.com \
--cc=stolee@gmail.com \
--cc=ttaylorr@github.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).