From: Taylor Blau <me@ttaylorr.com>
To: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, christian.couder@gmail.com,
gitster@pobox.com, johannes.schindelin@gmx.de,
johncai86@gmail.com, jonathantanmy@google.com,
karthik.188@gmail.com, kristofferhaugsbakk@fastmail.com,
newren@gmail.com, peff@peff.net, ps@pks.im,
Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH v2 11/13] pack-objects: thread the path-based compression
Date: Tue, 6 May 2025 21:33:35 -0400 [thread overview]
Message-ID: <aBq4b2R0pAS5PiZm@nand.local> (raw)
In-Reply-To: <ae73d26319ac5db338507988eb23f1ecaac67671.1742829770.git.gitgitgadget@gmail.com>
On Mon, Mar 24, 2025 at 03:22:47PM +0000, Derrick Stolee via GitGitGadget wrote:
> Using the Git repository as a test repo, the p5313 performance test
> shows that the resulting size of the repo is the same, but the threaded
> implementation gives gains of varying degrees depending on the number of
> objects being packed. (This was tested on a 16-core machine.)
>
> Test HEAD~1 HEAD
> ---------------------------------------------------
> 5313.20: big pack 2.38 1.99 -16.4%
> 5313.21: big pack size 16.1M 16.0M -0.2%
> 5313.24: repack 107.32 45.41 -57.7%
> 5313.25: repack size 213.3M 213.2M -0.0%
Very nice indeed!
> This ~60% reduction in 'git repack --path-walk' time is typical across
> all repos I used for testing. What is interesting is to compare when the
> overall time improves enough to outperform the --name-hash-version=1
> case. These time improvements correlate with repositories with data
> shapes that significantly improve their data size as well. The
I had to read this sentence a couple of times to wrap my head around it,
but perhaps you can double-check my understanding. Are you saying that
reducing the size of the resulting packs suggests that our runtime with
the path-walk algorithm is improved enough to be competitive with (or
better than) the non-path-walk implementation with name-hash v1?
I think that's what you mean, and I admit that I don't find my own
restatement to be all that much clearer. So I think it's fine to leave
this as-is, though it is interesting to think about why the statement
is true.
I suspect (without having tested it thoroughly) that a significant
proportion of the time saved on the path-walk side is because we found
better deltas that are easier to compress, as a result of a more
well-tuned search technique. I don't think you necessarily have to get
too far into the details here, but I was curious (not for commit message
polishing, but purely for my own curiosity) if you had any thoughts or
measurements of why this might be the case.
But...
> --path-walk feature frequently takes longer than --name-hash-verison=2,
> trading some extrac computation for some additional compression. The
s/extrac/extra/
> natural place where this additional computation comes from is the two
> compression passes that --path-walk takes, though the first pass is
> naturally faster due to the path boundaries avoiding a number of delta
> compression attempts.
... I should have kept reading before commenting the above! ;-)
> Signed-off-by: Derrick Stolee <stolee@gmail.com>
> ---
> builtin/pack-objects.c | 163 ++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 161 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index d4e05ca4434..2a6246c1e78 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -2964,6 +2964,7 @@ static void find_deltas(struct object_entry **list, unsigned *list_size,
> struct thread_params {
> pthread_t thread;
> struct object_entry **list;
> + struct packing_region *regions;
> unsigned list_size;
> unsigned remaining;
> int window;
> @@ -3281,6 +3282,164 @@ static void find_deltas_by_region(struct object_entry *list,
> stop_progress(&progress_state);
> }
>
> +static void *threaded_find_deltas_by_path(void *arg)
> +{
> + struct thread_params *me = arg;
> +
> + progress_lock();
> + while (me->remaining) {
> + while (me->remaining) {
I am not quite following the double-while loop here. Could you help
clarify what's going on here? Is this due to the work-stealing below?
> +static void ll_find_deltas_by_region(struct object_entry *list,
> + struct packing_region *regions,
> + uint32_t start, uint32_t nr)
> +{
> + struct thread_params *p;
> + int i, ret, active_threads = 0;
> + unsigned int processed = 0;
> + uint32_t progress_nr;
> + init_threaded_search();
> +
> + if (!nr)
> + return;
> +
> + progress_nr = regions[nr - 1].start + regions[nr - 1].nr;
> + if (delta_search_threads <= 1) {
> + find_deltas_by_region(list, regions, start, nr);
> + cleanup_threaded_search();
> + return;
> + }
> +
> + if (progress > pack_to_stdout)
> + fprintf_ln(stderr, _("Path-based delta compression using up to %d threads"),
> + delta_search_threads);
This looks a copy-and-paste from the non-path-walk code, but I wonder if
it might be worth using Q_() here instead of _() to provide better
output in the case that delta_search_threads is 1.
The rest of the implementation of ll_find_deltas_by_region() looks good
to me.
Thanks,
Taylor
next prev parent reply other threads:[~2025-05-07 1:33 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-10 1:50 [PATCH 00/13] PATH WALK II: Add --path-walk option to 'git pack-objects' Derrick Stolee via GitGitGadget
2025-03-10 1:50 ` [PATCH 01/13] pack-objects: extract should_attempt_deltas() Derrick Stolee via GitGitGadget
2025-03-12 21:01 ` Taylor Blau
2025-03-20 19:48 ` Derrick Stolee
2025-03-10 1:50 ` [PATCH 02/13] pack-objects: add --path-walk option Derrick Stolee via GitGitGadget
2025-03-12 21:14 ` Taylor Blau
2025-03-20 19:46 ` Derrick Stolee
2025-03-10 1:50 ` [PATCH 03/13] pack-objects: update usage to match docs Derrick Stolee via GitGitGadget
2025-03-12 21:14 ` Taylor Blau
2025-03-10 1:50 ` [PATCH 04/13] p5313: add performance tests for --path-walk Derrick Stolee via GitGitGadget
2025-03-10 1:50 ` [PATCH 05/13] pack-objects: introduce GIT_TEST_PACK_PATH_WALK Derrick Stolee via GitGitGadget
2025-03-10 1:50 ` [PATCH 06/13] t5538: add tests to confirm deltas in shallow pushes Derrick Stolee via GitGitGadget
2025-03-10 1:50 ` [PATCH 07/13] repack: add --path-walk option Derrick Stolee via GitGitGadget
2025-03-10 1:50 ` [PATCH 08/13] pack-objects: enable --path-walk via config Derrick Stolee via GitGitGadget
2025-03-10 1:50 ` [PATCH 09/13] scalar: enable path-walk during push " Derrick Stolee via GitGitGadget
2025-03-10 1:50 ` [PATCH 10/13] pack-objects: refactor path-walk delta phase Derrick Stolee via GitGitGadget
2025-03-12 21:21 ` Taylor Blau
2025-03-20 19:57 ` Derrick Stolee
2025-03-10 1:50 ` [PATCH 11/13] pack-objects: thread the path-based compression Derrick Stolee via GitGitGadget
2025-03-10 1:50 ` [PATCH 12/13] path-walk: add new 'edge_aggressive' option Derrick Stolee via GitGitGadget
2025-03-10 1:50 ` [PATCH 13/13] pack-objects: allow --shallow and --path-walk Derrick Stolee via GitGitGadget
2025-03-10 17:28 ` [PATCH 00/13] PATH WALK II: Add --path-walk option to 'git pack-objects' Junio C Hamano
2025-03-12 20:47 ` Taylor Blau
2025-03-20 20:18 ` Derrick Stolee
2025-03-24 15:22 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2025-03-24 15:22 ` [PATCH v2 01/13] pack-objects: extract should_attempt_deltas() Derrick Stolee via GitGitGadget
2025-05-02 22:48 ` Taylor Blau
2025-03-24 15:22 ` [PATCH v2 02/13] pack-objects: add --path-walk option Derrick Stolee via GitGitGadget
2025-05-02 23:21 ` Taylor Blau
2025-05-06 19:39 ` Derrick Stolee
2025-05-16 15:27 ` Derrick Stolee
2025-03-24 15:22 ` [PATCH v2 03/13] pack-objects: update usage to match docs Derrick Stolee via GitGitGadget
2025-03-24 15:22 ` [PATCH v2 04/13] p5313: add performance tests for --path-walk Derrick Stolee via GitGitGadget
2025-05-02 23:25 ` Taylor Blau
2025-03-24 15:22 ` [PATCH v2 05/13] pack-objects: introduce GIT_TEST_PACK_PATH_WALK Derrick Stolee via GitGitGadget
2025-05-02 23:31 ` Taylor Blau
2025-05-06 19:43 ` Derrick Stolee
2025-03-24 15:22 ` [PATCH v2 06/13] t5538: add tests to confirm deltas in shallow pushes Derrick Stolee via GitGitGadget
2025-05-02 23:34 ` Taylor Blau
2025-05-16 15:32 ` Derrick Stolee
2025-03-24 15:22 ` [PATCH v2 07/13] repack: add --path-walk option Derrick Stolee via GitGitGadget
2025-05-02 23:38 ` Taylor Blau
2025-03-24 15:22 ` [PATCH v2 08/13] pack-objects: enable --path-walk via config Derrick Stolee via GitGitGadget
2025-05-02 23:42 ` Taylor Blau
2025-05-06 19:46 ` Derrick Stolee
2025-05-16 15:41 ` Derrick Stolee
2025-03-24 15:22 ` [PATCH v2 09/13] scalar: enable path-walk during push " Derrick Stolee via GitGitGadget
2025-05-07 0:58 ` Taylor Blau
2025-03-24 15:22 ` [PATCH v2 10/13] pack-objects: refactor path-walk delta phase Derrick Stolee via GitGitGadget
2025-05-07 1:14 ` Taylor Blau
2025-05-16 16:27 ` Derrick Stolee
2025-05-29 0:17 ` Taylor Blau
2025-03-24 15:22 ` [PATCH v2 11/13] pack-objects: thread the path-based compression Derrick Stolee via GitGitGadget
2025-05-07 1:33 ` Taylor Blau [this message]
2025-03-24 15:22 ` [PATCH v2 12/13] path-walk: add new 'edge_aggressive' option Derrick Stolee via GitGitGadget
2025-03-24 15:22 ` [PATCH v2 13/13] pack-objects: allow --shallow and --path-walk Derrick Stolee via GitGitGadget
2025-05-02 21:24 ` [PATCH v2 00/13] PATH WALK II: Add --path-walk option to 'git pack-objects' Junio C Hamano
2025-05-02 22:45 ` Taylor Blau
2025-05-02 23:44 ` Taylor Blau
2025-05-07 1:35 ` Taylor Blau
2025-05-16 18:11 ` [PATCH v3 " Derrick Stolee via GitGitGadget
2025-05-16 18:11 ` [PATCH v3 01/13] pack-objects: extract should_attempt_deltas() Derrick Stolee via GitGitGadget
2025-05-16 18:11 ` [PATCH v3 02/13] pack-objects: add --path-walk option Derrick Stolee via GitGitGadget
2025-05-16 18:11 ` [PATCH v3 03/13] pack-objects: update usage to match docs Derrick Stolee via GitGitGadget
2025-05-16 18:11 ` [PATCH v3 04/13] p5313: add performance tests for --path-walk Derrick Stolee via GitGitGadget
2025-05-16 18:11 ` [PATCH v3 05/13] pack-objects: introduce GIT_TEST_PACK_PATH_WALK Derrick Stolee via GitGitGadget
2025-05-16 18:11 ` [PATCH v3 06/13] t5538: add tests to confirm deltas in shallow pushes Derrick Stolee via GitGitGadget
2025-05-16 18:11 ` [PATCH v3 07/13] repack: add --path-walk option Derrick Stolee via GitGitGadget
2025-05-16 18:11 ` [PATCH v3 08/13] pack-objects: enable --path-walk via config Derrick Stolee via GitGitGadget
2025-05-16 18:11 ` [PATCH v3 09/13] scalar: enable path-walk during push " Derrick Stolee via GitGitGadget
2025-05-16 18:12 ` [PATCH v3 10/13] pack-objects: refactor path-walk delta phase Derrick Stolee via GitGitGadget
2025-05-16 18:12 ` [PATCH v3 11/13] pack-objects: thread the path-based compression Derrick Stolee via GitGitGadget
2025-05-16 18:12 ` [PATCH v3 12/13] path-walk: add new 'edge_aggressive' option Derrick Stolee via GitGitGadget
2025-05-16 18:12 ` [PATCH v3 13/13] pack-objects: allow --shallow and --path-walk Derrick Stolee via GitGitGadget
2025-05-29 0:20 ` [PATCH v3 00/13] PATH WALK II: Add --path-walk option to 'git pack-objects' Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aBq4b2R0pAS5PiZm@nand.local \
--to=me@ttaylorr.com \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=johannes.schindelin@gmx.de \
--cc=johncai86@gmail.com \
--cc=jonathantanmy@google.com \
--cc=karthik.188@gmail.com \
--cc=kristofferhaugsbakk@fastmail.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).