Git development
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Jeff King <peff@peff.net>, Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH 0/3] pack-objects: support bitmaps and delta-islands with `--path-walk`
Date: Fri, 29 May 2026 17:28:32 -0400	[thread overview]
Message-ID: <2d68fdb2-ac05-4331-b53e-53c2e9a2b3d4@gmail.com> (raw)
In-Reply-To: <ahnx/oBWe9yyBZFg@nand.local>

On 5/29/2026 4:07 PM, Taylor Blau wrote:
> On Fri, May 29, 2026 at 01:26:33PM -0400, Derrick Stolee wrote:
>> On 5/28/26 11:28 AM, Derrick Stolee wrote:
>>> On 5/27/26 7:18 PM, Taylor Blau wrote:
>>
>>> Do you have any end-to-end performance data to demonstrate that these
>>> changes are effective at scale? Are we still producing packfiles with the
>>> pack-file compression and now with .bitmap files? How does this impact
>>> the performance of a clone or fetch when using a bitmap index at read
>>> time?
>>
>> Here's my attempt to use our existing performance tests to analyze the
>> impact of this series.
>>
>> Running p5311 against the base of this topic and this topic with
>> GIT_TEST_PACK_PATH_WALK=1, I get this output:
> 
> Yikes. That's not great, but see below for what I think is going on.
> 
> (As an aside, we can focus in on either lookup=true or lookup=false,
> since these are just controlling whether or not the bitmap lookup table
> is written. On a repository as small as git.git, this shouldn't make a
> huge difference either way. I have a separate series to make this the
> default and to clean up the t/perf suite accordingly, but haven't sent
> it to the list yet.)
> 
>> Test                                     HEAD~3    HEAD
>> -----------------------------------------------------------------
>> [...]
> 
> I think this is either the primary reason why you're not seeing an
> improvement here, or at least related to it...
> 
>> Do you have a good feeling for why the path-walk feature doesn't
>> make a huge change in these test scenarios?
> 
> I think the problem is that we're relying on the TEST_ variable to tell
> pack-objects to generate a pack using --path-walk, but treat it as a
> fallback.
> 
> I suspect that since p5311 invokes repack *without* the '--path-walk'
> option, we end up in this case within cmd_pack_objects():
> 
> 	if (path_walk < 0) {
> 		if (use_bitmap_index > 0 ||
> 		    !use_internal_rev_list)
> 			path_walk = 0;
> 		else if (the_repository->gitdir &&
> 			 the_repository->settings.pack_use_path_walk)
> 			path_walk = 1;
> 		else
> 			path_walk = git_env_bool("GIT_TEST_PACK_PATH_WALK", 0);
> 	}
> 
> , where `path_walk` is *not* set, but `use_bitmap_index` is since p5311
> set the 'pack.writeBitmaps' configuration (as a separate aside, this
> should really prefer the non-deprecated 'repack.writeBitmaps' variant).
I see. When `--use-bitmap-index` is specified, then the test variable is
ignored. So my tests aren't actually measuring the intended end state.

> So in that case, we fall back to `path_walk = 0`, and don't even bother
> reading the `GIT_TEST_PACK_PATH_WALK` variable.
> 
> If I modify the perf test like so:
> 
> --- 8< ---
> diff --git a/t/perf/p5311-pack-bitmaps-fetch.sh b/t/perf/p5311-pack-bitmaps-fetch.sh
> index 047efb995d6..1c9c99216e3 100755
> --- a/t/perf/p5311-pack-bitmaps-fetch.sh
> +++ b/t/perf/p5311-pack-bitmaps-fetch.sh
> @@ -13,7 +13,7 @@ test_fetch_bitmaps () {
>  	test_expect_success 'create bitmapped server repo' '
>  		git config pack.writebitmaps true &&
>  		git config pack.writeBitmapLookupTable '"$1"' &&
> -		git repack -ad
> +		git repack -ad --path-walk
>  	'
> 
>  	# simulate a fetch from a repository that last fetched N days ago, for
> --- >8 ---
> , then I can get significantly improved results when running without the
> GIT_TEST_PACK_PATH_WALK variablle (here I'm truncating the
> 'lookup=false' case, which performs nearly identically):
> 
>     Test                                       HEAD~3            HEAD
>     ------------------------------------------------------------------------------------
>     5311.4: server   (1 days) (lookup=true)    2.57(2.52+0.04)   0.03(0.02+0.00) -98.8%
>     5311.5: size     (1 days)                           153.4K            153.4K +0.0%
>     5311.6: client   (1 days) (lookup=true)    0.00(0.01+0.00)   0.00(0.01+0.00) =
>     5311.8: server   (2 days) (lookup=true)    2.60(2.55+0.04)   0.02(0.02+0.00) -99.2%
>     5311.9: size     (2 days)                           153.4K            153.4K +0.0%
>     5311.10: client   (2 days) (lookup=true)   0.00(0.01+0.00)   0.00(0.01+0.00) =
>     5311.12: server   (4 days) (lookup=true)   2.60(2.54+0.05)   0.03(0.03+0.00) -98.8%
>     5311.13: size     (4 days)                          209.0K            209.0K +0.0%
>     5311.14: client   (4 days) (lookup=true)   0.01(0.02+0.00)   0.01(0.01+0.00) +0.0%
>     5311.16: server   (8 days) (lookup=true)   2.58(2.53+0.04)   0.03(0.03+0.00) -98.8%
>     5311.17: size     (8 days)                          209.0K            209.0K +0.0%
>     5311.18: client   (8 days) (lookup=true)   0.01(0.01+0.00)   0.01(0.02+0.00) +0.0%
>     5311.20: server  (16 days) (lookup=true)   2.58(2.52+0.05)   0.03(0.03+0.00) -98.8%
>     5311.21: size    (16 days)                          209.0K            209.0K +0.0%
>     5311.22: client  (16 days) (lookup=true)   0.01(0.02+0.00)   0.01(0.01+0.00) +0.0%
>     5311.24: server  (32 days) (lookup=true)   2.61(2.58+0.03)   0.03(0.02+0.01) -98.9%
>     5311.25: size    (32 days)                          212.9K            212.9K +0.0%
>     5311.26: client  (32 days) (lookup=true)   0.01(0.02+0.00)   0.02(0.02+0.00) +100.0%
>     5311.28: server  (64 days) (lookup=true)   2.72(2.79+0.06)   0.19(0.30+0.03) -93.0%
>     5311.29: size    (64 days)                            4.5M              4.5M -0.0%
>     5311.30: client  (64 days) (lookup=true)   0.49(0.58+0.02)   0.48(0.56+0.04) -2.0%
>     5311.32: server (128 days) (lookup=true)   2.90(3.21+0.09)   0.35(0.70+0.04) -87.9%
>     5311.33: size   (128 days)                            9.4M              9.5M +0.4%
>     5311.34: client (128 days) (lookup=true)   0.98(1.27+0.08)   0.98(1.33+0.06) +0.0%
> 
> My reading here is that we get significantly smaller packs (i.e. all
> 'test_size' tests drop from HEAD~3 to HEAD) in the same amount of time
> (i.e. that all 'test_perf' tests are roughly flat).

The sizes don't shrink, and in one case increases by a small amount. I'm
happy to count those cases as noise from multi-threaded delta calculations
being less deterministic.

The _time_ taken to compute the packfiles is what decreases, though, which
is promising.

> That lines up with my expectation here, which is that even though we're
> using bitmaps at read time, that's effectively seeding the verbatim pack
> reuse over a significantly smaller pack, producing a much smaller output
> pack as a result.

Can you double-check this reasoning with my read of the data? The size
isn't changing, but the computation time is.

> As to whether we should modify the perf suite to test this, naturally I
> think we should. Likely that looks like modifying p5313 to re-run
> `test_all_with_args` with '--use-bitmap-index' after repacking with
> --path-walk and generating bitmaps like so (untested):
> 
> --- 8< ---
> diff --git a/t/perf/p5313-pack-objects.sh b/t/perf/p5313-pack-objects.sh
> index 46a6cd32d24..663717982b1 100755
> --- a/t/perf/p5313-pack-objects.sh
> +++ b/t/perf/p5313-pack-objects.sh
> @@ -22,6 +22,21 @@ test_expect_success 'create rev input' '
>  	EOF
>  '
> 
> +test_repack_with_args () {
> +	args="$@"
> +	export args
> +
> +	test_perf "repack with $args" '
> +		git repack -adf $args
> +	'
> +
> +	test_size "repack size with $args" '
> +		gitdir=$(git rev-parse --git-dir) &&
> +		pack=$(ls $gitdir/objects/pack/pack-*.pack) &&
> +		test_file_size "$pack"
> +	'
> +}
> +
I see that these tests are extracted from test_all_... below:

>  test_all_with_args () {
>  	parameter=$1
>  	export parameter
> @@ -52,23 +67,22 @@ test_all_with_args () {
>  	test_size "shallow pack size with $parameter" '
>  		test_file_size out
>  	'
> -
> -	test_perf "repack with $parameter" '
> -		git repack -adf $parameter
> -	'
> -
> -	test_size "repack size with $parameter" '
> -		gitdir=$(git rev-parse --git-dir) &&
> -		pack=$(ls $gitdir/objects/pack/pack-*.pack) &&
> -		test_file_size "$pack"
> -	'
>  }

Because the --use-bitmap-index and --write-bitmap-index args
aren't appropriate across these different commands.

nit: the diff would be more obvious if test_repack_with_args
was defined after test_all_with_args so the hunk of existing
tests wouldn't appear in the diff.

>  for version in 1 2
>  do
> -	test_all_with_args --name-hash-version=$version
> +	arg="--name-hash-version=$version" &&
> +
> +	test_all_with_args "$arg" &&
> +	test_repack_with_args "$arg" || return 1
>  done
> 
>  test_all_with_args --path-walk
> +test_repack_with_args --path-walk
> +
> +# inverted order here: we want to test using reachability bitmaps on a
> +# pack written with --path-walk
> +test_repack_with_args --path-walk --write-bitmap-index
> +test_all_with_args --use-bitmap-index

So this allows us to test all of the different modes.

> --- >8 ---
> 
> I don't have a strong opinion on whether or not we should include that
> in this series or elsewhere.
I'm interested to see some results of your new p5313 test
to make sure that we are getting expected size changes for
the repack, since the p5311 tests were more focused on the
thin fetch pack (and didn't show a change in size).

For that, I'd be interested to see this test be included in
a patch for future reference, too.

Thanks,
-Stolee


  reply	other threads:[~2026-05-29 21:28 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-27 23:18 [PATCH 0/3] pack-objects: support bitmaps and delta-islands with `--path-walk` Taylor Blau
2026-05-27 23:18 ` [PATCH 1/3] pack-objects: support reachability bitmaps " Taylor Blau
2026-05-27 23:18 ` [PATCH 2/3] pack-objects: extract `record_tree_depth()` helper Taylor Blau
2026-05-27 23:18 ` [PATCH 3/3] pack-objects: support `--delta-islands` with `--path-walk` Taylor Blau
2026-05-28 15:28 ` [PATCH 0/3] pack-objects: support bitmaps and delta-islands " Derrick Stolee
2026-05-29 17:26   ` Derrick Stolee
2026-05-29 20:07     ` Taylor Blau
2026-05-29 21:28       ` Derrick Stolee [this message]
2026-05-29 22:20         ` Taylor Blau
2026-06-02 22:21 ` [PATCH v2 0/4] " Taylor Blau
2026-06-02 22:21   ` [PATCH v2 1/4] t/perf: drop p5311's lookup-table permutation Taylor Blau
2026-06-02 22:21   ` [PATCH v2 2/4] pack-objects: support reachability bitmaps with `--path-walk` Taylor Blau
2026-06-02 22:21   ` [PATCH v2 3/4] pack-objects: extract `record_tree_depth()` helper Taylor Blau
2026-06-02 22:21   ` [PATCH v2 4/4] pack-objects: support `--delta-islands` with `--path-walk` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2d68fdb2-ac05-4331-b53e-53c2e9a2b3d4@gmail.com \
    --to=stolee@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox