From: Taylor Blau <me@ttaylorr.com>
To: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, me@ttaylorr.com, peff@peff.net,
gitster@pobox.com, abhishekkumar8222@gmail.com,
Derrick Stolee <derrickstolee@github.com>,
Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH 4/5] commit-graph: be extra careful about mixed generations
Date: Mon, 1 Feb 2021 13:04:14 -0500 [thread overview]
Message-ID: <YBhChR3ReDhAde87@nand.local> (raw)
In-Reply-To: <b267a9653a7560d1e59708f20106ef054d140a9f.1612199707.git.gitgitgadget@gmail.com>
On Mon, Feb 01, 2021 at 05:15:06PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
>
> When upgrading to a commit-graph with corrected commit dates from
> one without, there are a few things that need to be considered.
>
> When computing generation numbers for the new commit-graph file that
> expects to add the generation_data chunk with corrected commit
> dates, we need to ensure that the 'generation' member of the
> commit_graph_data struct is set to zero for these commits.
>
> Unfortunately, the fallback to use topological level for generation
> number when corrected commit dates are not available are causing us
> harm here: parsing commits notices that read_generation_data is
> false and populates 'generation' with the topological level.
>
> The solution is to iterate through the commits, parse the commits
> to populate initial values, then reset the generation values to
> zero to trigger recalculation. This loop only occurs when the
> existing commit-graph data has no corrected commit dates.
>
> While this improves our situation somewhat, we have not completely
> solved the issue for correctly computing generation numbers for mixes
> layers. That follows in the next change.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
> commit-graph.c | 32 +++++++++++++++++++++++---------
> 1 file changed, 23 insertions(+), 9 deletions(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 13992137dd0..08148dd17f1 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -1033,7 +1033,8 @@ struct write_commit_graph_context {
> split:1,
> changed_paths:1,
> order_by_pack:1,
> - write_generation_data:1;
> + write_generation_data:1,
> + trust_generation_numbers:1;
>
> struct topo_level_slab *topo_levels;
> const struct commit_graph_opts *opts;
> @@ -1452,6 +1453,15 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
> ctx->progress = start_delayed_progress(
> _("Computing commit graph generation numbers"),
> ctx->commits.nr);
> +
> + if (ctx->write_generation_data && !ctx->trust_generation_numbers) {
> + for (i = 0; i < ctx->commits.nr; i++) {
> + struct commit *c = ctx->commits.list[i];
> + repo_parse_commit(ctx->r, c);
> + commit_graph_data_at(c)->generation = GENERATION_NUMBER_ZERO;
> + }
> + }
> +
This took me a while to figure out since I spent quite a lot of time
thinking that you were setting the topological level to zero, _not_ the
corrected committer date.
Now that I understand which is which, I agree that this is the right way
to go forward.
That said, I do find it unnecessarily complex that we compute both the
generation number and the topological level in the same loops in
compute_generation_numbers()...
> for (i = 0; i < ctx->commits.nr; i++) {
> struct commit *c = ctx->commits.list[i];
> uint32_t level;
> @@ -1480,7 +1490,8 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
> corrected_commit_date = commit_graph_data_at(parent->item)->generation;
>
> if (level == GENERATION_NUMBER_ZERO ||
> - corrected_commit_date == GENERATION_NUMBER_ZERO) {
> + (ctx->write_generation_data &&
> + corrected_commit_date == GENERATION_NUMBER_ZERO)) {
...for exactly reasons like this. It does make sense that they could be
computed together since their computation is indeed quite similar. But
in practice I think you end up spending a lot of time reasoning around
complex conditionals like these.
So, I feel a little bit like we should spend some effort to split these
up. I'm OK with a little bit of code duplication (though if we can
factor out some common routine, that would also be nice). But I think
there's a tradeoff between DRY-ness and understandability, and that we
might be on the wrong side of it here.
> all_parents_computed = 0;
> commit_list_insert(parent->item, &list);
> break;
> @@ -1500,12 +1511,15 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
> max_level = GENERATION_NUMBER_V1_MAX - 1;
> *topo_level_slab_at(ctx->topo_levels, current) = max_level + 1;
>
> - if (current->date && current->date > max_corrected_commit_date)
> - max_corrected_commit_date = current->date - 1;
> - commit_graph_data_at(current)->generation = max_corrected_commit_date + 1;
> -
> - if (commit_graph_data_at(current)->generation - current->date > GENERATION_NUMBER_V2_OFFSET_MAX)
> - ctx->num_generation_data_overflows++;
> + if (ctx->write_generation_data) {
> + timestamp_t cur_g;
> + if (current->date && current->date > max_corrected_commit_date)
> + max_corrected_commit_date = current->date - 1;
> + cur_g = commit_graph_data_at(current)->generation
> + = max_corrected_commit_date + 1;
> + if (cur_g - current->date > GENERATION_NUMBER_V2_OFFSET_MAX)
> + ctx->num_generation_data_overflows++;
> + }
Looks like two things happened here:
- A new local variable was introduced to store the value of
'commit_graph_data_at(current)->generation' (now called 'cur_g'),
and
- All of this was guarded by a conditional on
'ctx->write_generation_data'.
The first one is a readability improvement, and the second is the
substantive one, no?
> }
> }
> }
> @@ -2396,7 +2410,7 @@ int write_commit_graph(struct object_directory *odb,
> } else
> ctx->num_commit_graphs_after = 1;
>
> - validate_mixed_generation_chain(ctx->r->objects->commit_graph);
> + ctx->trust_generation_numbers = validate_mixed_generation_chain(ctx->r->objects->commit_graph);
>
> compute_generation_numbers(ctx);
Makes sense.
Thanks,
Taylor
next prev parent reply other threads:[~2021-02-01 18:05 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-01 17:15 [PATCH 0/5] Generation Number v2: Fix a tricky split graph bug Derrick Stolee via GitGitGadget
2021-02-01 17:15 ` [PATCH 1/5] commit-graph: use repo_parse_commit Derrick Stolee via GitGitGadget
2021-02-01 17:32 ` Taylor Blau
2021-02-01 17:15 ` [PATCH 2/5] commit-graph: always parse before commit_graph_data_at() Derrick Stolee via GitGitGadget
2021-02-01 18:44 ` Junio C Hamano
2021-02-01 17:15 ` [PATCH 3/5] commit-graph: validate layers for generation data Derrick Stolee via GitGitGadget
2021-02-01 17:39 ` Taylor Blau
2021-02-01 18:10 ` Derrick Stolee
2021-02-01 17:15 ` [PATCH 4/5] commit-graph: be extra careful about mixed generations Derrick Stolee via GitGitGadget
2021-02-01 18:04 ` Taylor Blau [this message]
2021-02-01 18:13 ` Derrick Stolee
2021-02-01 18:55 ` Junio C Hamano
2021-02-01 17:15 ` [PATCH 5/5] commit-graph: prepare commit graph Derrick Stolee via GitGitGadget
2021-02-01 18:25 ` Taylor Blau
2021-02-02 3:01 ` [PATCH v2 0/6] Generation Number v2: Fix a tricky split graph bug Derrick Stolee via GitGitGadget
2021-02-02 3:01 ` [PATCH v2 1/6] commit-graph: use repo_parse_commit Derrick Stolee via GitGitGadget
2021-02-02 3:01 ` [PATCH v2 2/6] commit-graph: always parse before commit_graph_data_at() Derrick Stolee via GitGitGadget
2021-02-03 1:08 ` Jonathan Nieder
2021-02-03 1:35 ` Derrick Stolee
2021-02-03 1:48 ` Jonathan Nieder
2021-02-03 3:07 ` Derrick Stolee
2021-02-03 15:34 ` Taylor Blau
2021-02-03 17:37 ` Eric Sunshine
2021-02-03 18:41 ` Junio C Hamano
2021-02-03 21:08 ` Taylor Blau
2021-02-03 2:06 ` Junio C Hamano
2021-02-03 3:09 ` Derrick Stolee
2021-02-07 19:04 ` SZEDER Gábor
2021-02-07 20:12 ` Junio C Hamano
2021-02-08 2:01 ` Derrick Stolee
2021-02-08 5:55 ` Junio C Hamano
2021-02-02 3:01 ` [PATCH v2 3/6] commit-graph: validate layers for generation data Derrick Stolee via GitGitGadget
2021-02-02 3:01 ` [PATCH v2 4/6] commit-graph: compute generations separately Derrick Stolee via GitGitGadget
2021-02-02 3:01 ` [PATCH v2 5/6] commit-graph: be extra careful about mixed generations Derrick Stolee via GitGitGadget
2021-02-02 3:01 ` [PATCH v2 6/6] commit-graph: prepare commit graph Derrick Stolee via GitGitGadget
2021-02-02 3:08 ` [PATCH v2 0/6] Generation Number v2: Fix a tricky split graph bug Taylor Blau
2021-02-11 4:44 ` Abhishek Kumar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YBhChR3ReDhAde87@nand.local \
--to=me@ttaylorr.com \
--cc=abhishekkumar8222@gmail.com \
--cc=derrickstolee@github.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).