From: Taylor Blau <me@ttaylorr.com>
To: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, christian.couder@gmail.com,
gitster@pobox.com, johannes.schindelin@gmx.de,
johncai86@gmail.com, jonathantanmy@google.com,
karthik.188@gmail.com, kristofferhaugsbakk@fastmail.com,
newren@gmail.com, peff@peff.net, ps@pks.im,
Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH v2 10/13] pack-objects: refactor path-walk delta phase
Date: Tue, 6 May 2025 21:14:33 -0400 [thread overview]
Message-ID: <aBqz+VfSvecJdWHw@nand.local> (raw)
In-Reply-To: <622439d78557d94da899d21444920c27768d3c67.1742829770.git.gitgitgadget@gmail.com>
On Mon, Mar 24, 2025 at 03:22:46PM +0000, Derrick Stolee via GitGitGadget wrote:
> The current implementation is not integrated with threads, but could be
> done in a future update.
I think that "in a future update" may be worth replacing with "the
following commit", as the former suggests (to me) that it may be perused
outside of this series.
> Since we do not attempt to sort objects by size until after exploring
> all trees, we can remove the previous change to t5530 due to a different
> error message appearing first.
Makes sense.
> Signed-off-by: Derrick Stolee <stolee@gmail.com>
> ---
> builtin/pack-objects.c | 82 +++++++++++++++++++++++++-----------
> pack-objects.h | 12 ++++++
> t/t5300-pack-object.sh | 8 +++-
> t/t5530-upload-pack-error.sh | 6 ---
> 4 files changed, 75 insertions(+), 33 deletions(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 0ea85754c52..d4e05ca4434 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3236,6 +3236,51 @@ static int should_attempt_deltas(struct object_entry *entry)
> return 1;
> }
>
> +static void find_deltas_for_region(struct object_entry *list,
> + struct packing_region *region,
> + unsigned int *processed)
> +{
> + struct object_entry **delta_list;
> + uint32_t delta_list_nr = 0;
> +
> + ALLOC_ARRAY(delta_list, region->nr);
> + for (uint32_t i = 0; i < region->nr; i++) {
I know that these values are all unsigned, and very unlikely to get near
the 32-bit maximum, but I do think we should use size_t here (and
likewise when dealing with values that deal with how many entries in a
list we've allocated and indexes into that list).
> + struct object_entry *entry = list + region->start + i;
> + if (should_attempt_deltas(entry))
> + delta_list[delta_list_nr++] = entry;
> + }
> +
> + QSORT(delta_list, delta_list_nr, type_size_sort);
> + find_deltas(delta_list, &delta_list_nr, window, depth, processed);
> + free(delta_list);
> +}
The rest of this function (modulo the inline comment removal, which I
wrote about below) appear to be a straightforward copy of the previous
home of this function.
> +static void find_deltas_by_region(struct object_entry *list,
> + struct packing_region *regions,
> + uint32_t start, uint32_t nr)
I imagine that "start" here is setup for the following commit which will
parallelize this task across multiple threads, with each thread starting
at a different position.
I wonder if (as an alternative) we could get away with passing in a
(struct packing_region *, size_t) pair and drop "start". I think this
can work if you pass in the result of adding whatever the value of
"start" would be to to_pack.regions here.
That somewhat hides the fact that this code is meant to be run across
multiple threads, but in a way that I think is worth doing. It lets the
function avoid having to do things like:
for (size_t i = start; i < start + nr; i++)
and instead do the simpler:
for (size_t i = 0; i < nr; i++)
since the caller already adjusted the regions pointer for us. As a
side-effect, it also means that the call below in prepare_pack() can
avoid passing a literal zero.
> +{
> + unsigned int processed = 0;
> + uint32_t progress_nr;
This uses a mix of uint32_t and unsigned int types. Should these all be
the same (and/or size_t's)?
> - /*
> - * Find delta bases among this list of objects that all match the same
> - * path. This causes the delta compression to be interleaved in the
> - * object walk, which can lead to confusing progress indicators. This is
> - * also incompatible with threaded delta calculations. In the future,
> - * consider creating a list of regions in the full to_pack.objects array
> - * that could be picked up by the threaded delta computation.
> - */
Nice, it is very satisfying to see this comment go away ;-).
> - if (sub_list_size && window) {
> - QSORT(delta_list, sub_list_size, type_size_sort);
> - find_deltas(delta_list, &sub_list_size, window, depth, processed);
> - }
> + *processed += oids->nr;
> + display_progress(progress_state, *processed);
>
> - free(delta_list);
> return 0;
> }
>
> diff --git a/pack-objects.h b/pack-objects.h
> index d73e3843c92..7ba9f3448fe 100644
> --- a/pack-objects.h
> +++ b/pack-objects.h
> @@ -119,11 +119,23 @@ struct object_entry {
> unsigned ext_base:1; /* delta_idx points outside packlist */
> };
>
> +/**
This is an extreme nitpick, but I think that "/**" (as opposed to "/*")
is preferred, as the former is used for Doxygen-style comments. I feel
like I have seen Junio comment on this before, but searching 'f:Junio
"/**"' yields a measly 83,000+ results, so I am unlikely to find it ;-).
(Not worth rerolling on its own, but if you are rerolling anyway, which
is what I gathered from some of your earlier replies, it may be worth
picking up along the way.)
> + * A packing region is a section of the packing_data.objects array
> + * as given by a starting index and a number of elements.
> + */
> +struct packing_region {
> + uint32_t start;
> + uint32_t nr;
> +};
Same note here about the uint32_t versus size_t.
> +
> struct packing_data {
> struct repository *repo;
> struct object_entry *objects;
> uint32_t nr_objects, nr_alloc;
>
> + struct packing_region *regions;
> + uint64_t nr_regions, nr_regions_alloc;
Ditto, but this one jumps to uint64_t. What differentiates these two?
Thanks,
Taylor
next prev parent reply other threads:[~2025-05-07 1:14 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-10 1:50 [PATCH 00/13] PATH WALK II: Add --path-walk option to 'git pack-objects' Derrick Stolee via GitGitGadget
2025-03-10 1:50 ` [PATCH 01/13] pack-objects: extract should_attempt_deltas() Derrick Stolee via GitGitGadget
2025-03-12 21:01 ` Taylor Blau
2025-03-20 19:48 ` Derrick Stolee
2025-03-10 1:50 ` [PATCH 02/13] pack-objects: add --path-walk option Derrick Stolee via GitGitGadget
2025-03-12 21:14 ` Taylor Blau
2025-03-20 19:46 ` Derrick Stolee
2025-03-10 1:50 ` [PATCH 03/13] pack-objects: update usage to match docs Derrick Stolee via GitGitGadget
2025-03-12 21:14 ` Taylor Blau
2025-03-10 1:50 ` [PATCH 04/13] p5313: add performance tests for --path-walk Derrick Stolee via GitGitGadget
2025-03-10 1:50 ` [PATCH 05/13] pack-objects: introduce GIT_TEST_PACK_PATH_WALK Derrick Stolee via GitGitGadget
2025-03-10 1:50 ` [PATCH 06/13] t5538: add tests to confirm deltas in shallow pushes Derrick Stolee via GitGitGadget
2025-03-10 1:50 ` [PATCH 07/13] repack: add --path-walk option Derrick Stolee via GitGitGadget
2025-03-10 1:50 ` [PATCH 08/13] pack-objects: enable --path-walk via config Derrick Stolee via GitGitGadget
2025-03-10 1:50 ` [PATCH 09/13] scalar: enable path-walk during push " Derrick Stolee via GitGitGadget
2025-03-10 1:50 ` [PATCH 10/13] pack-objects: refactor path-walk delta phase Derrick Stolee via GitGitGadget
2025-03-12 21:21 ` Taylor Blau
2025-03-20 19:57 ` Derrick Stolee
2025-03-10 1:50 ` [PATCH 11/13] pack-objects: thread the path-based compression Derrick Stolee via GitGitGadget
2025-03-10 1:50 ` [PATCH 12/13] path-walk: add new 'edge_aggressive' option Derrick Stolee via GitGitGadget
2025-03-10 1:50 ` [PATCH 13/13] pack-objects: allow --shallow and --path-walk Derrick Stolee via GitGitGadget
2025-03-10 17:28 ` [PATCH 00/13] PATH WALK II: Add --path-walk option to 'git pack-objects' Junio C Hamano
2025-03-12 20:47 ` Taylor Blau
2025-03-20 20:18 ` Derrick Stolee
2025-03-24 15:22 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2025-03-24 15:22 ` [PATCH v2 01/13] pack-objects: extract should_attempt_deltas() Derrick Stolee via GitGitGadget
2025-05-02 22:48 ` Taylor Blau
2025-03-24 15:22 ` [PATCH v2 02/13] pack-objects: add --path-walk option Derrick Stolee via GitGitGadget
2025-05-02 23:21 ` Taylor Blau
2025-05-06 19:39 ` Derrick Stolee
2025-05-16 15:27 ` Derrick Stolee
2025-03-24 15:22 ` [PATCH v2 03/13] pack-objects: update usage to match docs Derrick Stolee via GitGitGadget
2025-03-24 15:22 ` [PATCH v2 04/13] p5313: add performance tests for --path-walk Derrick Stolee via GitGitGadget
2025-05-02 23:25 ` Taylor Blau
2025-03-24 15:22 ` [PATCH v2 05/13] pack-objects: introduce GIT_TEST_PACK_PATH_WALK Derrick Stolee via GitGitGadget
2025-05-02 23:31 ` Taylor Blau
2025-05-06 19:43 ` Derrick Stolee
2025-03-24 15:22 ` [PATCH v2 06/13] t5538: add tests to confirm deltas in shallow pushes Derrick Stolee via GitGitGadget
2025-05-02 23:34 ` Taylor Blau
2025-05-16 15:32 ` Derrick Stolee
2025-03-24 15:22 ` [PATCH v2 07/13] repack: add --path-walk option Derrick Stolee via GitGitGadget
2025-05-02 23:38 ` Taylor Blau
2025-03-24 15:22 ` [PATCH v2 08/13] pack-objects: enable --path-walk via config Derrick Stolee via GitGitGadget
2025-05-02 23:42 ` Taylor Blau
2025-05-06 19:46 ` Derrick Stolee
2025-05-16 15:41 ` Derrick Stolee
2025-03-24 15:22 ` [PATCH v2 09/13] scalar: enable path-walk during push " Derrick Stolee via GitGitGadget
2025-05-07 0:58 ` Taylor Blau
2025-03-24 15:22 ` [PATCH v2 10/13] pack-objects: refactor path-walk delta phase Derrick Stolee via GitGitGadget
2025-05-07 1:14 ` Taylor Blau [this message]
2025-05-16 16:27 ` Derrick Stolee
2025-05-29 0:17 ` Taylor Blau
2025-03-24 15:22 ` [PATCH v2 11/13] pack-objects: thread the path-based compression Derrick Stolee via GitGitGadget
2025-05-07 1:33 ` Taylor Blau
2025-03-24 15:22 ` [PATCH v2 12/13] path-walk: add new 'edge_aggressive' option Derrick Stolee via GitGitGadget
2025-03-24 15:22 ` [PATCH v2 13/13] pack-objects: allow --shallow and --path-walk Derrick Stolee via GitGitGadget
2025-05-02 21:24 ` [PATCH v2 00/13] PATH WALK II: Add --path-walk option to 'git pack-objects' Junio C Hamano
2025-05-02 22:45 ` Taylor Blau
2025-05-02 23:44 ` Taylor Blau
2025-05-07 1:35 ` Taylor Blau
2025-05-16 18:11 ` [PATCH v3 " Derrick Stolee via GitGitGadget
2025-05-16 18:11 ` [PATCH v3 01/13] pack-objects: extract should_attempt_deltas() Derrick Stolee via GitGitGadget
2025-05-16 18:11 ` [PATCH v3 02/13] pack-objects: add --path-walk option Derrick Stolee via GitGitGadget
2025-05-16 18:11 ` [PATCH v3 03/13] pack-objects: update usage to match docs Derrick Stolee via GitGitGadget
2025-05-16 18:11 ` [PATCH v3 04/13] p5313: add performance tests for --path-walk Derrick Stolee via GitGitGadget
2025-05-16 18:11 ` [PATCH v3 05/13] pack-objects: introduce GIT_TEST_PACK_PATH_WALK Derrick Stolee via GitGitGadget
2025-05-16 18:11 ` [PATCH v3 06/13] t5538: add tests to confirm deltas in shallow pushes Derrick Stolee via GitGitGadget
2025-05-16 18:11 ` [PATCH v3 07/13] repack: add --path-walk option Derrick Stolee via GitGitGadget
2025-05-16 18:11 ` [PATCH v3 08/13] pack-objects: enable --path-walk via config Derrick Stolee via GitGitGadget
2025-05-16 18:11 ` [PATCH v3 09/13] scalar: enable path-walk during push " Derrick Stolee via GitGitGadget
2025-05-16 18:12 ` [PATCH v3 10/13] pack-objects: refactor path-walk delta phase Derrick Stolee via GitGitGadget
2025-05-16 18:12 ` [PATCH v3 11/13] pack-objects: thread the path-based compression Derrick Stolee via GitGitGadget
2025-05-16 18:12 ` [PATCH v3 12/13] path-walk: add new 'edge_aggressive' option Derrick Stolee via GitGitGadget
2025-05-16 18:12 ` [PATCH v3 13/13] pack-objects: allow --shallow and --path-walk Derrick Stolee via GitGitGadget
2025-05-29 0:20 ` [PATCH v3 00/13] PATH WALK II: Add --path-walk option to 'git pack-objects' Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aBqz+VfSvecJdWHw@nand.local \
--to=me@ttaylorr.com \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=johannes.schindelin@gmx.de \
--cc=johncai86@gmail.com \
--cc=jonathantanmy@google.com \
--cc=karthik.188@gmail.com \
--cc=kristofferhaugsbakk@fastmail.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).