Git development
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Jeff King <peff@peff.net>, Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH 0/3] pack-objects: support bitmaps and delta-islands with `--path-walk`
Date: Fri, 29 May 2026 16:07:26 -0400	[thread overview]
Message-ID: <ahnx/oBWe9yyBZFg@nand.local> (raw)
In-Reply-To: <22a7e32f-f645-4f00-bc5b-6b4309e483c2@gmail.com>

On Fri, May 29, 2026 at 01:26:33PM -0400, Derrick Stolee wrote:
> On 5/28/26 11:28 AM, Derrick Stolee wrote:
> > On 5/27/26 7:18 PM, Taylor Blau wrote:
>
> > Do you have any end-to-end performance data to demonstrate that these
> > changes are effective at scale? Are we still producing packfiles with the
> > pack-file compression and now with .bitmap files? How does this impact
> > the performance of a clone or fetch when using a bitmap index at read
> > time?
>
> Here's my attempt to use our existing performance tests to analyze the
> impact of this series.
>
> Running p5311 against the base of this topic and this topic with
> GIT_TEST_PACK_PATH_WALK=1, I get this output:

Yikes. That's not great, but see below for what I think is going on.

(As an aside, we can focus in on either lookup=true or lookup=false,
since these are just controlling whether or not the bitmap lookup table
is written. On a repository as small as git.git, this shouldn't make a
huge difference either way. I have a separate series to make this the
default and to clean up the t/perf suite accordingly, but haven't sent
it to the list yet.)

> Test                                     HEAD~3    HEAD
> -----------------------------------------------------------------
> [...]

I think this is either the primary reason why you're not seeing an
improvement here, or at least related to it...

> Do you have a good feeling for why the path-walk feature doesn't
> make a huge change in these test scenarios?

I think the problem is that we're relying on the TEST_ variable to tell
pack-objects to generate a pack using --path-walk, but treat it as a
fallback.

I suspect that since p5311 invokes repack *without* the '--path-walk'
option, we end up in this case within cmd_pack_objects():

	if (path_walk < 0) {
		if (use_bitmap_index > 0 ||
		    !use_internal_rev_list)
			path_walk = 0;
		else if (the_repository->gitdir &&
			 the_repository->settings.pack_use_path_walk)
			path_walk = 1;
		else
			path_walk = git_env_bool("GIT_TEST_PACK_PATH_WALK", 0);
	}

, where `path_walk` is *not* set, but `use_bitmap_index` is since p5311
set the 'pack.writeBitmaps' configuration (as a separate aside, this
should really prefer the non-deprecated 'repack.writeBitmaps' variant).

So in that case, we fall back to `path_walk = 0`, and don't even bother
reading the `GIT_TEST_PACK_PATH_WALK` variable.

If I modify the perf test like so:

--- 8< ---
diff --git a/t/perf/p5311-pack-bitmaps-fetch.sh b/t/perf/p5311-pack-bitmaps-fetch.sh
index 047efb995d6..1c9c99216e3 100755
--- a/t/perf/p5311-pack-bitmaps-fetch.sh
+++ b/t/perf/p5311-pack-bitmaps-fetch.sh
@@ -13,7 +13,7 @@ test_fetch_bitmaps () {
 	test_expect_success 'create bitmapped server repo' '
 		git config pack.writebitmaps true &&
 		git config pack.writeBitmapLookupTable '"$1"' &&
-		git repack -ad
+		git repack -ad --path-walk
 	'

 	# simulate a fetch from a repository that last fetched N days ago, for
--- >8 ---

, then I can get significantly improved results when running without the
GIT_TEST_PACK_PATH_WALK variablle (here I'm truncating the
'lookup=false' case, which performs nearly identically):

    Test                                       HEAD~3            HEAD
    ------------------------------------------------------------------------------------
    5311.4: server   (1 days) (lookup=true)    2.57(2.52+0.04)   0.03(0.02+0.00) -98.8%
    5311.5: size     (1 days)                           153.4K            153.4K +0.0%
    5311.6: client   (1 days) (lookup=true)    0.00(0.01+0.00)   0.00(0.01+0.00) =
    5311.8: server   (2 days) (lookup=true)    2.60(2.55+0.04)   0.02(0.02+0.00) -99.2%
    5311.9: size     (2 days)                           153.4K            153.4K +0.0%
    5311.10: client   (2 days) (lookup=true)   0.00(0.01+0.00)   0.00(0.01+0.00) =
    5311.12: server   (4 days) (lookup=true)   2.60(2.54+0.05)   0.03(0.03+0.00) -98.8%
    5311.13: size     (4 days)                          209.0K            209.0K +0.0%
    5311.14: client   (4 days) (lookup=true)   0.01(0.02+0.00)   0.01(0.01+0.00) +0.0%
    5311.16: server   (8 days) (lookup=true)   2.58(2.53+0.04)   0.03(0.03+0.00) -98.8%
    5311.17: size     (8 days)                          209.0K            209.0K +0.0%
    5311.18: client   (8 days) (lookup=true)   0.01(0.01+0.00)   0.01(0.02+0.00) +0.0%
    5311.20: server  (16 days) (lookup=true)   2.58(2.52+0.05)   0.03(0.03+0.00) -98.8%
    5311.21: size    (16 days)                          209.0K            209.0K +0.0%
    5311.22: client  (16 days) (lookup=true)   0.01(0.02+0.00)   0.01(0.01+0.00) +0.0%
    5311.24: server  (32 days) (lookup=true)   2.61(2.58+0.03)   0.03(0.02+0.01) -98.9%
    5311.25: size    (32 days)                          212.9K            212.9K +0.0%
    5311.26: client  (32 days) (lookup=true)   0.01(0.02+0.00)   0.02(0.02+0.00) +100.0%
    5311.28: server  (64 days) (lookup=true)   2.72(2.79+0.06)   0.19(0.30+0.03) -93.0%
    5311.29: size    (64 days)                            4.5M              4.5M -0.0%
    5311.30: client  (64 days) (lookup=true)   0.49(0.58+0.02)   0.48(0.56+0.04) -2.0%
    5311.32: server (128 days) (lookup=true)   2.90(3.21+0.09)   0.35(0.70+0.04) -87.9%
    5311.33: size   (128 days)                            9.4M              9.5M +0.4%
    5311.34: client (128 days) (lookup=true)   0.98(1.27+0.08)   0.98(1.33+0.06) +0.0%

My reading here is that we get significantly smaller packs (i.e. all
'test_size' tests drop from HEAD~3 to HEAD) in the same amount of time
(i.e. that all 'test_perf' tests are roughly flat).

That lines up with my expectation here, which is that even though we're
using bitmaps at read time, that's effectively seeding the verbatim pack
reuse over a significantly smaller pack, producing a much smaller output
pack as a result.

As to whether we should modify the perf suite to test this, naturally I
think we should. Likely that looks like modifying p5313 to re-run
`test_all_with_args` with '--use-bitmap-index' after repacking with
--path-walk and generating bitmaps like so (untested):

--- 8< ---
diff --git a/t/perf/p5313-pack-objects.sh b/t/perf/p5313-pack-objects.sh
index 46a6cd32d24..663717982b1 100755
--- a/t/perf/p5313-pack-objects.sh
+++ b/t/perf/p5313-pack-objects.sh
@@ -22,6 +22,21 @@ test_expect_success 'create rev input' '
 	EOF
 '

+test_repack_with_args () {
+	args="$@"
+	export args
+
+	test_perf "repack with $args" '
+		git repack -adf $args
+	'
+
+	test_size "repack size with $args" '
+		gitdir=$(git rev-parse --git-dir) &&
+		pack=$(ls $gitdir/objects/pack/pack-*.pack) &&
+		test_file_size "$pack"
+	'
+}
+
 test_all_with_args () {
 	parameter=$1
 	export parameter
@@ -52,23 +67,22 @@ test_all_with_args () {
 	test_size "shallow pack size with $parameter" '
 		test_file_size out
 	'
-
-	test_perf "repack with $parameter" '
-		git repack -adf $parameter
-	'
-
-	test_size "repack size with $parameter" '
-		gitdir=$(git rev-parse --git-dir) &&
-		pack=$(ls $gitdir/objects/pack/pack-*.pack) &&
-		test_file_size "$pack"
-	'
 }

 for version in 1 2
 do
-	test_all_with_args --name-hash-version=$version
+	arg="--name-hash-version=$version" &&
+
+	test_all_with_args "$arg" &&
+	test_repack_with_args "$arg" || return 1
 done

 test_all_with_args --path-walk
+test_repack_with_args --path-walk
+
+# inverted order here: we want to test using reachability bitmaps on a
+# pack written with --path-walk
+test_repack_with_args --path-walk --write-bitmap-index
+test_all_with_args --use-bitmap-index

 test_done
--- >8 ---

I don't have a strong opinion on whether or not we should include that
in this series or elsewhere.

Thanks,
Taylor

  reply	other threads:[~2026-05-29 20:07 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-27 23:18 [PATCH 0/3] pack-objects: support bitmaps and delta-islands with `--path-walk` Taylor Blau
2026-05-27 23:18 ` [PATCH 1/3] pack-objects: support reachability bitmaps " Taylor Blau
2026-05-27 23:18 ` [PATCH 2/3] pack-objects: extract `record_tree_depth()` helper Taylor Blau
2026-05-27 23:18 ` [PATCH 3/3] pack-objects: support `--delta-islands` with `--path-walk` Taylor Blau
2026-05-28 15:28 ` [PATCH 0/3] pack-objects: support bitmaps and delta-islands " Derrick Stolee
2026-05-29 17:26   ` Derrick Stolee
2026-05-29 20:07     ` Taylor Blau [this message]
2026-05-29 21:28       ` Derrick Stolee
2026-05-29 22:20         ` Taylor Blau
2026-06-02 22:21 ` [PATCH v2 0/4] " Taylor Blau
2026-06-02 22:21   ` [PATCH v2 1/4] t/perf: drop p5311's lookup-table permutation Taylor Blau
2026-06-02 22:21   ` [PATCH v2 2/4] pack-objects: support reachability bitmaps with `--path-walk` Taylor Blau
2026-06-02 22:21   ` [PATCH v2 3/4] pack-objects: extract `record_tree_depth()` helper Taylor Blau
2026-06-02 22:21   ` [PATCH v2 4/4] pack-objects: support `--delta-islands` with `--path-walk` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ahnx/oBWe9yyBZFg@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox