git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, me@ttaylorr.com, peff@peff.net,
	gitster@pobox.com, abhishekkumar8222@gmail.com,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH 4/5] commit-graph: be extra careful about mixed generations
Date: Mon, 1 Feb 2021 13:04:14 -0500	[thread overview]
Message-ID: <YBhChR3ReDhAde87@nand.local> (raw)
In-Reply-To: <b267a9653a7560d1e59708f20106ef054d140a9f.1612199707.git.gitgitgadget@gmail.com>

On Mon, Feb 01, 2021 at 05:15:06PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
>
> When upgrading to a commit-graph with corrected commit dates from
> one without, there are a few things that need to be considered.
>
> When computing generation numbers for the new commit-graph file that
> expects to add the generation_data chunk with corrected commit
> dates, we need to ensure that the 'generation' member of the
> commit_graph_data struct is set to zero for these commits.
>
> Unfortunately, the fallback to use topological level for generation
> number when corrected commit dates are not available are causing us
> harm here: parsing commits notices that read_generation_data is
> false and populates 'generation' with the topological level.
>
> The solution is to iterate through the commits, parse the commits
> to populate initial values, then reset the generation values to
> zero to trigger recalculation. This loop only occurs when the
> existing commit-graph data has no corrected commit dates.
>
> While this improves our situation somewhat, we have not completely
> solved the issue for correctly computing generation numbers for mixes
> layers. That follows in the next change.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  commit-graph.c | 32 +++++++++++++++++++++++---------
>  1 file changed, 23 insertions(+), 9 deletions(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 13992137dd0..08148dd17f1 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -1033,7 +1033,8 @@ struct write_commit_graph_context {
>  		 split:1,
>  		 changed_paths:1,
>  		 order_by_pack:1,
> -		 write_generation_data:1;
> +		 write_generation_data:1,
> +		 trust_generation_numbers:1;
>
>  	struct topo_level_slab *topo_levels;
>  	const struct commit_graph_opts *opts;
> @@ -1452,6 +1453,15 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
>  		ctx->progress = start_delayed_progress(
>  					_("Computing commit graph generation numbers"),
>  					ctx->commits.nr);
> +
> +	if (ctx->write_generation_data && !ctx->trust_generation_numbers) {
> +		for (i = 0; i < ctx->commits.nr; i++) {
> +			struct commit *c = ctx->commits.list[i];
> +			repo_parse_commit(ctx->r, c);
> +			commit_graph_data_at(c)->generation = GENERATION_NUMBER_ZERO;
> +		}
> +	}
> +

This took me a while to figure out since I spent quite a lot of time
thinking that you were setting the topological level to zero, _not_ the
corrected committer date.

Now that I understand which is which, I agree that this is the right way
to go forward.

That said, I do find it unnecessarily complex that we compute both the
generation number and the topological level in the same loops in
compute_generation_numbers()...

>  	for (i = 0; i < ctx->commits.nr; i++) {
>  		struct commit *c = ctx->commits.list[i];
>  		uint32_t level;
> @@ -1480,7 +1490,8 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
>  				corrected_commit_date = commit_graph_data_at(parent->item)->generation;
>
>  				if (level == GENERATION_NUMBER_ZERO ||
> -				    corrected_commit_date == GENERATION_NUMBER_ZERO) {
> +				    (ctx->write_generation_data &&
> +				     corrected_commit_date == GENERATION_NUMBER_ZERO)) {

...for exactly reasons like this. It does make sense that they could be
computed together since their computation is indeed quite similar. But
in practice I think you end up spending a lot of time reasoning around
complex conditionals like these.

So, I feel a little bit like we should spend some effort to split these
up. I'm OK with a little bit of code duplication (though if we can
factor out some common routine, that would also be nice). But I think
there's a tradeoff between DRY-ness and understandability, and that we
might be on the wrong side of it here.

>  					all_parents_computed = 0;
>  					commit_list_insert(parent->item, &list);
>  					break;
> @@ -1500,12 +1511,15 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
>  					max_level = GENERATION_NUMBER_V1_MAX - 1;
>  				*topo_level_slab_at(ctx->topo_levels, current) = max_level + 1;
>
> -				if (current->date && current->date > max_corrected_commit_date)
> -					max_corrected_commit_date = current->date - 1;
> -				commit_graph_data_at(current)->generation = max_corrected_commit_date + 1;
> -
> -				if (commit_graph_data_at(current)->generation - current->date > GENERATION_NUMBER_V2_OFFSET_MAX)
> -					ctx->num_generation_data_overflows++;
> +				if (ctx->write_generation_data) {
> +					timestamp_t cur_g;
> +					if (current->date && current->date > max_corrected_commit_date)
> +						max_corrected_commit_date = current->date - 1;
> +					cur_g = commit_graph_data_at(current)->generation
> +					      = max_corrected_commit_date + 1;
> +					if (cur_g - current->date > GENERATION_NUMBER_V2_OFFSET_MAX)
> +						ctx->num_generation_data_overflows++;
> +				}

Looks like two things happened here:

  - A new local variable was introduced to store the value of
    'commit_graph_data_at(current)->generation' (now called 'cur_g'),
    and

  - All of this was guarded by a conditional on
    'ctx->write_generation_data'.

The first one is a readability improvement, and the second is the
substantive one, no?

>  			}
>  		}
>  	}
> @@ -2396,7 +2410,7 @@ int write_commit_graph(struct object_directory *odb,
>  	} else
>  		ctx->num_commit_graphs_after = 1;
>
> -	validate_mixed_generation_chain(ctx->r->objects->commit_graph);
> +	ctx->trust_generation_numbers = validate_mixed_generation_chain(ctx->r->objects->commit_graph);
>
>  	compute_generation_numbers(ctx);

Makes sense.

Thanks,
Taylor

  reply	other threads:[~2021-02-01 18:05 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-01 17:15 [PATCH 0/5] Generation Number v2: Fix a tricky split graph bug Derrick Stolee via GitGitGadget
2021-02-01 17:15 ` [PATCH 1/5] commit-graph: use repo_parse_commit Derrick Stolee via GitGitGadget
2021-02-01 17:32   ` Taylor Blau
2021-02-01 17:15 ` [PATCH 2/5] commit-graph: always parse before commit_graph_data_at() Derrick Stolee via GitGitGadget
2021-02-01 18:44   ` Junio C Hamano
2021-02-01 17:15 ` [PATCH 3/5] commit-graph: validate layers for generation data Derrick Stolee via GitGitGadget
2021-02-01 17:39   ` Taylor Blau
2021-02-01 18:10     ` Derrick Stolee
2021-02-01 17:15 ` [PATCH 4/5] commit-graph: be extra careful about mixed generations Derrick Stolee via GitGitGadget
2021-02-01 18:04   ` Taylor Blau [this message]
2021-02-01 18:13     ` Derrick Stolee
2021-02-01 18:55   ` Junio C Hamano
2021-02-01 17:15 ` [PATCH 5/5] commit-graph: prepare commit graph Derrick Stolee via GitGitGadget
2021-02-01 18:25   ` Taylor Blau
2021-02-02  3:01 ` [PATCH v2 0/6] Generation Number v2: Fix a tricky split graph bug Derrick Stolee via GitGitGadget
2021-02-02  3:01   ` [PATCH v2 1/6] commit-graph: use repo_parse_commit Derrick Stolee via GitGitGadget
2021-02-02  3:01   ` [PATCH v2 2/6] commit-graph: always parse before commit_graph_data_at() Derrick Stolee via GitGitGadget
2021-02-03  1:08     ` Jonathan Nieder
2021-02-03  1:35       ` Derrick Stolee
2021-02-03  1:48         ` Jonathan Nieder
2021-02-03  3:07           ` Derrick Stolee
2021-02-03 15:34             ` Taylor Blau
2021-02-03 17:37               ` Eric Sunshine
2021-02-03 18:41               ` Junio C Hamano
2021-02-03 21:08                 ` Taylor Blau
2021-02-03  2:06         ` Junio C Hamano
2021-02-03  3:09           ` Derrick Stolee
2021-02-07 19:04           ` SZEDER Gábor
2021-02-07 20:12             ` Junio C Hamano
2021-02-08  2:01               ` Derrick Stolee
2021-02-08  5:55                 ` Junio C Hamano
2021-02-02  3:01   ` [PATCH v2 3/6] commit-graph: validate layers for generation data Derrick Stolee via GitGitGadget
2021-02-02  3:01   ` [PATCH v2 4/6] commit-graph: compute generations separately Derrick Stolee via GitGitGadget
2021-02-02  3:01   ` [PATCH v2 5/6] commit-graph: be extra careful about mixed generations Derrick Stolee via GitGitGadget
2021-02-02  3:01   ` [PATCH v2 6/6] commit-graph: prepare commit graph Derrick Stolee via GitGitGadget
2021-02-02  3:08   ` [PATCH v2 0/6] Generation Number v2: Fix a tricky split graph bug Taylor Blau
2021-02-11  4:44   ` Abhishek Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YBhChR3ReDhAde87@nand.local \
    --to=me@ttaylorr.com \
    --cc=abhishekkumar8222@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).