All of lore.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, christian.couder@gmail.com,
	gitster@pobox.com, johannes.schindelin@gmx.de,
	johncai86@gmail.com, jonathantanmy@google.com,
	karthik.188@gmail.com, kristofferhaugsbakk@fastmail.com,
	newren@gmail.com, peff@peff.net, ps@pks.im,
	Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH v2 10/13] pack-objects: refactor path-walk delta phase
Date: Tue, 6 May 2025 21:14:33 -0400	[thread overview]
Message-ID: <aBqz+VfSvecJdWHw@nand.local> (raw)
In-Reply-To: <622439d78557d94da899d21444920c27768d3c67.1742829770.git.gitgitgadget@gmail.com>

On Mon, Mar 24, 2025 at 03:22:46PM +0000, Derrick Stolee via GitGitGadget wrote:
> The current implementation is not integrated with threads, but could be
> done in a future update.

I think that "in a future update" may be worth replacing with "the
following commit", as the former suggests (to me) that it may be perused
outside of this series.

> Since we do not attempt to sort objects by size until after exploring
> all trees, we can remove the previous change to t5530 due to a different
> error message appearing first.

Makes sense.

> Signed-off-by: Derrick Stolee <stolee@gmail.com>
> ---
>  builtin/pack-objects.c       | 82 +++++++++++++++++++++++++-----------
>  pack-objects.h               | 12 ++++++
>  t/t5300-pack-object.sh       |  8 +++-
>  t/t5530-upload-pack-error.sh |  6 ---
>  4 files changed, 75 insertions(+), 33 deletions(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 0ea85754c52..d4e05ca4434 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3236,6 +3236,51 @@ static int should_attempt_deltas(struct object_entry *entry)
>  	return 1;
>  }
>
> +static void find_deltas_for_region(struct object_entry *list,
> +				   struct packing_region *region,
> +				   unsigned int *processed)
> +{
> +	struct object_entry **delta_list;
> +	uint32_t delta_list_nr = 0;
> +
> +	ALLOC_ARRAY(delta_list, region->nr);
> +	for (uint32_t i = 0; i < region->nr; i++) {

I know that these values are all unsigned, and very unlikely to get near
the 32-bit maximum, but I do think we should use size_t here (and
likewise when dealing with values that deal with how many entries in a
list we've allocated and indexes into that list).

> +		struct object_entry *entry = list + region->start + i;
> +		if (should_attempt_deltas(entry))
> +			delta_list[delta_list_nr++] = entry;
> +	}
> +
> +	QSORT(delta_list, delta_list_nr, type_size_sort);
> +	find_deltas(delta_list, &delta_list_nr, window, depth, processed);
> +	free(delta_list);
> +}

The rest of this function (modulo the inline comment removal, which I
wrote about below) appear to be a straightforward copy of the previous
home of this function.

> +static void find_deltas_by_region(struct object_entry *list,
> +				  struct packing_region *regions,
> +				  uint32_t start, uint32_t nr)

I imagine that "start" here is setup for the following commit which will
parallelize this task across multiple threads, with each thread starting
at a different position.

I wonder if (as an alternative) we could get away with passing in a
(struct packing_region *, size_t) pair and drop "start". I think this
can work if you pass in the result of adding whatever the value of
"start" would be to to_pack.regions here.

That somewhat hides the fact that this code is meant to be run across
multiple threads, but in a way that I think is worth doing. It lets the
function avoid having to do things like:

  for (size_t i = start; i < start + nr; i++)

and instead do the simpler:

  for (size_t i = 0; i < nr; i++)

since the caller already adjusted the regions pointer for us. As a
side-effect, it also means that the call below in prepare_pack() can
avoid passing a literal zero.

> +{
> +	unsigned int processed = 0;
> +	uint32_t progress_nr;

This uses a mix of uint32_t and unsigned int types. Should these all be
the same (and/or size_t's)?

> -	/*
> -	 * Find delta bases among this list of objects that all match the same
> -	 * path. This causes the delta compression to be interleaved in the
> -	 * object walk, which can lead to confusing progress indicators. This is
> -	 * also incompatible with threaded delta calculations. In the future,
> -	 * consider creating a list of regions in the full to_pack.objects array
> -	 * that could be picked up by the threaded delta computation.
> -	 */

Nice, it is very satisfying to see this comment go away ;-).

> -	if (sub_list_size && window) {
> -		QSORT(delta_list, sub_list_size, type_size_sort);
> -		find_deltas(delta_list, &sub_list_size, window, depth, processed);
> -	}
> +	*processed += oids->nr;
> +	display_progress(progress_state, *processed);
>
> -	free(delta_list);
>  	return 0;
>  }
>
> diff --git a/pack-objects.h b/pack-objects.h
> index d73e3843c92..7ba9f3448fe 100644
> --- a/pack-objects.h
> +++ b/pack-objects.h
> @@ -119,11 +119,23 @@ struct object_entry {
>  	unsigned ext_base:1; /* delta_idx points outside packlist */
>  };
>
> +/**

This is an extreme nitpick, but I think that "/**" (as opposed to "/*")
is preferred, as the former is used for Doxygen-style comments. I feel
like I have seen Junio comment on this before, but searching 'f:Junio
"/**"' yields a measly 83,000+ results, so I am unlikely to find it ;-).

(Not worth rerolling on its own, but if you are rerolling anyway, which
is what I gathered from some of your earlier replies, it may be worth
picking up along the way.)

> + * A packing region is a section of the packing_data.objects array
> + * as given by a starting index and a number of elements.
> + */
> +struct packing_region {
> +	uint32_t start;
> +	uint32_t nr;
> +};

Same note here about the uint32_t versus size_t.
> +
>  struct packing_data {
>  	struct repository *repo;
>  	struct object_entry *objects;
>  	uint32_t nr_objects, nr_alloc;
>
> +	struct packing_region *regions;
> +	uint64_t nr_regions, nr_regions_alloc;

Ditto, but this one jumps to uint64_t. What differentiates these two?

Thanks,
Taylor

  reply	other threads:[~2025-05-07  1:14 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-10  1:50 [PATCH 00/13] PATH WALK II: Add --path-walk option to 'git pack-objects' Derrick Stolee via GitGitGadget
2025-03-10  1:50 ` [PATCH 01/13] pack-objects: extract should_attempt_deltas() Derrick Stolee via GitGitGadget
2025-03-12 21:01   ` Taylor Blau
2025-03-20 19:48     ` Derrick Stolee
2025-03-10  1:50 ` [PATCH 02/13] pack-objects: add --path-walk option Derrick Stolee via GitGitGadget
2025-03-12 21:14   ` Taylor Blau
2025-03-20 19:46     ` Derrick Stolee
2025-03-10  1:50 ` [PATCH 03/13] pack-objects: update usage to match docs Derrick Stolee via GitGitGadget
2025-03-12 21:14   ` Taylor Blau
2025-03-10  1:50 ` [PATCH 04/13] p5313: add performance tests for --path-walk Derrick Stolee via GitGitGadget
2025-03-10  1:50 ` [PATCH 05/13] pack-objects: introduce GIT_TEST_PACK_PATH_WALK Derrick Stolee via GitGitGadget
2025-03-10  1:50 ` [PATCH 06/13] t5538: add tests to confirm deltas in shallow pushes Derrick Stolee via GitGitGadget
2025-03-10  1:50 ` [PATCH 07/13] repack: add --path-walk option Derrick Stolee via GitGitGadget
2025-03-10  1:50 ` [PATCH 08/13] pack-objects: enable --path-walk via config Derrick Stolee via GitGitGadget
2025-03-10  1:50 ` [PATCH 09/13] scalar: enable path-walk during push " Derrick Stolee via GitGitGadget
2025-03-10  1:50 ` [PATCH 10/13] pack-objects: refactor path-walk delta phase Derrick Stolee via GitGitGadget
2025-03-12 21:21   ` Taylor Blau
2025-03-20 19:57     ` Derrick Stolee
2025-03-10  1:50 ` [PATCH 11/13] pack-objects: thread the path-based compression Derrick Stolee via GitGitGadget
2025-03-10  1:50 ` [PATCH 12/13] path-walk: add new 'edge_aggressive' option Derrick Stolee via GitGitGadget
2025-03-10  1:50 ` [PATCH 13/13] pack-objects: allow --shallow and --path-walk Derrick Stolee via GitGitGadget
2025-03-10 17:28 ` [PATCH 00/13] PATH WALK II: Add --path-walk option to 'git pack-objects' Junio C Hamano
2025-03-12 20:47   ` Taylor Blau
2025-03-20 20:18     ` Derrick Stolee
2025-03-24 15:22 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2025-03-24 15:22   ` [PATCH v2 01/13] pack-objects: extract should_attempt_deltas() Derrick Stolee via GitGitGadget
2025-05-02 22:48     ` Taylor Blau
2025-03-24 15:22   ` [PATCH v2 02/13] pack-objects: add --path-walk option Derrick Stolee via GitGitGadget
2025-05-02 23:21     ` Taylor Blau
2025-05-06 19:39       ` Derrick Stolee
2025-05-16 15:27         ` Derrick Stolee
2025-03-24 15:22   ` [PATCH v2 03/13] pack-objects: update usage to match docs Derrick Stolee via GitGitGadget
2025-03-24 15:22   ` [PATCH v2 04/13] p5313: add performance tests for --path-walk Derrick Stolee via GitGitGadget
2025-05-02 23:25     ` Taylor Blau
2025-03-24 15:22   ` [PATCH v2 05/13] pack-objects: introduce GIT_TEST_PACK_PATH_WALK Derrick Stolee via GitGitGadget
2025-05-02 23:31     ` Taylor Blau
2025-05-06 19:43       ` Derrick Stolee
2025-03-24 15:22   ` [PATCH v2 06/13] t5538: add tests to confirm deltas in shallow pushes Derrick Stolee via GitGitGadget
2025-05-02 23:34     ` Taylor Blau
2025-05-16 15:32       ` Derrick Stolee
2025-03-24 15:22   ` [PATCH v2 07/13] repack: add --path-walk option Derrick Stolee via GitGitGadget
2025-05-02 23:38     ` Taylor Blau
2025-03-24 15:22   ` [PATCH v2 08/13] pack-objects: enable --path-walk via config Derrick Stolee via GitGitGadget
2025-05-02 23:42     ` Taylor Blau
2025-05-06 19:46       ` Derrick Stolee
2025-05-16 15:41         ` Derrick Stolee
2025-03-24 15:22   ` [PATCH v2 09/13] scalar: enable path-walk during push " Derrick Stolee via GitGitGadget
2025-05-07  0:58     ` Taylor Blau
2025-03-24 15:22   ` [PATCH v2 10/13] pack-objects: refactor path-walk delta phase Derrick Stolee via GitGitGadget
2025-05-07  1:14     ` Taylor Blau [this message]
2025-05-16 16:27       ` Derrick Stolee
2025-05-29  0:17         ` Taylor Blau
2025-03-24 15:22   ` [PATCH v2 11/13] pack-objects: thread the path-based compression Derrick Stolee via GitGitGadget
2025-05-07  1:33     ` Taylor Blau
2025-03-24 15:22   ` [PATCH v2 12/13] path-walk: add new 'edge_aggressive' option Derrick Stolee via GitGitGadget
2025-03-24 15:22   ` [PATCH v2 13/13] pack-objects: allow --shallow and --path-walk Derrick Stolee via GitGitGadget
2025-05-02 21:24   ` [PATCH v2 00/13] PATH WALK II: Add --path-walk option to 'git pack-objects' Junio C Hamano
2025-05-02 22:45     ` Taylor Blau
2025-05-02 23:44       ` Taylor Blau
2025-05-07  1:35         ` Taylor Blau
2025-05-16 18:11   ` [PATCH v3 " Derrick Stolee via GitGitGadget
2025-05-16 18:11     ` [PATCH v3 01/13] pack-objects: extract should_attempt_deltas() Derrick Stolee via GitGitGadget
2025-05-16 18:11     ` [PATCH v3 02/13] pack-objects: add --path-walk option Derrick Stolee via GitGitGadget
2025-05-16 18:11     ` [PATCH v3 03/13] pack-objects: update usage to match docs Derrick Stolee via GitGitGadget
2025-05-16 18:11     ` [PATCH v3 04/13] p5313: add performance tests for --path-walk Derrick Stolee via GitGitGadget
2025-05-16 18:11     ` [PATCH v3 05/13] pack-objects: introduce GIT_TEST_PACK_PATH_WALK Derrick Stolee via GitGitGadget
2025-05-16 18:11     ` [PATCH v3 06/13] t5538: add tests to confirm deltas in shallow pushes Derrick Stolee via GitGitGadget
2025-05-16 18:11     ` [PATCH v3 07/13] repack: add --path-walk option Derrick Stolee via GitGitGadget
2025-05-16 18:11     ` [PATCH v3 08/13] pack-objects: enable --path-walk via config Derrick Stolee via GitGitGadget
2025-05-16 18:11     ` [PATCH v3 09/13] scalar: enable path-walk during push " Derrick Stolee via GitGitGadget
2025-05-16 18:12     ` [PATCH v3 10/13] pack-objects: refactor path-walk delta phase Derrick Stolee via GitGitGadget
2025-05-16 18:12     ` [PATCH v3 11/13] pack-objects: thread the path-based compression Derrick Stolee via GitGitGadget
2025-05-16 18:12     ` [PATCH v3 12/13] path-walk: add new 'edge_aggressive' option Derrick Stolee via GitGitGadget
2025-05-16 18:12     ` [PATCH v3 13/13] pack-objects: allow --shallow and --path-walk Derrick Stolee via GitGitGadget
2025-05-29  0:20     ` [PATCH v3 00/13] PATH WALK II: Add --path-walk option to 'git pack-objects' Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aBqz+VfSvecJdWHw@nand.local \
    --to=me@ttaylorr.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=johncai86@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=karthik.188@gmail.com \
    --cc=kristofferhaugsbakk@fastmail.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.