git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] pack-bitmap: enable lookup tables by default, misc. cleanups
@ 2025-04-17 21:12 Taylor Blau
  2025-04-17 21:12 ` [PATCH 1/4] pack-bitmap: write lookup table extension by default Taylor Blau
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Taylor Blau @ 2025-04-17 21:12 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

This is a short series I extracted from a larger topic on reusing
"external"[^1] deltas during verbatim pack reuse.

As part of performance-testing that series, I realized that bitmap
lookup tables are not written by default. Since it has been a
significant period of time since their introduction, the first patch of
this series makes writing the lookup table extension the default
behavior. This is:

  * pack-bitmap: write lookup table extension by default

The next three patches clean up some t/perf scripts that were redundant
now that lookup tables are the default behavior. Those are:

  * p5312: removed duplicate performance test script
  * t/perf: avoid testing bitmaps without lookup table
  * t/perf/lib-bitmap.sh: avoid test_perf during setup

Thanks in advance for your review :-).

[^1]: The term I'm using to describe delta/base pairs which either (a)
are represented from different packs in a MIDX bitmap, or (b) the client
is known to already have the base.

Taylor Blau (4):
  pack-bitmap: write lookup table extension by default
  p5312: removed duplicate performance test script
  t/perf: avoid testing bitmaps without lookup table
  t/perf/lib-bitmap.sh: avoid test_perf during setup

 Documentation/config/pack.adoc       |   2 +-
 builtin/multi-pack-index.c           |   1 +
 builtin/pack-objects.c               |   2 +-
 t/perf/lib-bitmap.sh                 |   2 +-
 t/perf/p5310-pack-bitmaps.sh         |  47 +++++-------
 t/perf/p5311-pack-bitmaps-fetch.sh   |  76 +++++++++----------
 t/perf/p5312-pack-bitmaps-revs.sh    |  34 ---------
 t/perf/p5326-multi-pack-bitmaps.sh   | 107 ++++++++++++---------------
 t/perf/p5333-pseudo-merge-bitmaps.sh |   1 -
 9 files changed, 106 insertions(+), 166 deletions(-)
 delete mode 100755 t/perf/p5312-pack-bitmaps-revs.sh


base-commit: c152ae3ef50dc7bbbf5089571df5bba404a96e0d
-- 
2.49.0.226.g0e6cae136d

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/4] pack-bitmap: write lookup table extension by default
  2025-04-17 21:12 [PATCH 0/4] pack-bitmap: enable lookup tables by default, misc. cleanups Taylor Blau
@ 2025-04-17 21:12 ` Taylor Blau
  2025-04-17 22:04   ` Junio C Hamano
  2025-04-17 21:12 ` [PATCH 2/4] p5312: removed duplicate performance test script Taylor Blau
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Taylor Blau @ 2025-04-17 21:12 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

The lookup table extension for reachability bitmaps was first introduced
via 3fe0121479 (Merge branch 'ac/bitmap-lookup-table', 2022-09-05).
For each bitmapped commit, the lookup table encodes three unsigned
integers:

  - The pack position of the commit.
  - An offset within the *.bitmap itself where that commit's bitmap
    lives.
  - An index into the same table where information on that commit's XOR
    pair can be found.

Lookup tables make bitmap operations faster because they no longer have
to read the entire *.bitmap file to discover what commits have
corresponding reachability bitmaps.

When bitmap lookup tables were first introduced, we established a
baseline level of performance in p5310 with and without lookup tables.
Here is the baseline without:

    Test                                                    this tree
    -----------------------------------------------------------------------
    5310.6: simulated clone                               14.04(5.78+1.79)
    5310.7: simulated fetch                               1.95(3.05+0.20)
    5310.8: pack to file (bitmap)                         44.73(20.55+7.45)
    5310.9: rev-list (commits)                            0.78(0.46+0.10)
    5310.10: rev-list (objects)                           4.07(3.97+0.08)
    5310.11: rev-list with tag negated via --not          0.06(0.02+0.03)
             --all (objects)
    5310.12: rev-list with negative tag (objects)         0.21(0.15+0.05)
    5310.13: rev-list count with blob:none                0.24(0.17+0.06)
    5310.14: rev-list count with blob:limit=1k            7.07(5.92+0.48)
    5310.15: rev-list count with tree:0                   0.25(0.17+0.07)
    5310.16: simulated partial clone                      5.67(3.28+0.64)
    5310.18: clone (partial bitmap)                       16.05(8.34+1.86)
    5310.19: pack to file (partial bitmap)                59.76(27.22+7.43)
    5310.20: rev-list with tree filter (partial bitmap)   0.90(0.18+0.16)

, and here is the same set of tests, this time with the lookup table
enabled:

    Test                                                    this tree
    -----------------------------------------------------------------------
    5310.26: simulated clone                              13.69(5.72+1.78)
    5310.27: simulated fetch                              1.84(3.02+0.16)
    5310.28: pack to file (bitmap)                        45.63(20.67+7.50)
    5310.29: rev-list (commits)                           0.56(0.39+0.8)
    5310.30: rev-list (objects)                           3.77(3.74+0.08)
    5310.31: rev-list with tag negated via --not          0.05(0.02+0.03)
             --all (objects)
    5310.32: rev-list with negative tag (objects)         0.21(0.15+0.05)
    5310.33: rev-list count with blob:none                0.23(0.17+0.05)
    5310.34: rev-list count with blob:limit=1k            6.65(5.72+0.40)
    5310.35: rev-list count with tree:0                   0.23(0.16+0.06)
    5310.36: simulated partial clone                      5.57(3.26+0.59)
    5310.38: clone (partial bitmap)                       15.89(8.39+1.84)
    5310.39: pack to file (partial bitmap)                58.32(27.55+7.47)
    5310.40: rev-list with tree filter (partial bitmap)   0.73(0.18+0.15)

(All numbers here come from a ~2022-era copy of the kernel, via
Abhradeep Chakraborty who implemented the lookup table extension).

In the almost three years since lookup tables were introduced, GitHub
has used them in production without issue, taking advantage of the above
performance benefits along the way.

Since this feature has had sufficient time to flush out any bugs and/or
performance regressions, let's enable it by default so that all bitmap
users can reap similar performance benefits.

[1]: https://lore.kernel.org/git/pull.1266.git.1655728395.gitgitgadget@gmail.com/

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/config/pack.adoc | 2 +-
 builtin/multi-pack-index.c     | 1 +
 builtin/pack-objects.c         | 2 +-
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/config/pack.adoc b/Documentation/config/pack.adoc
index da527377fa..ba538e3d9c 100644
--- a/Documentation/config/pack.adoc
+++ b/Documentation/config/pack.adoc
@@ -191,7 +191,7 @@ pack.writeBitmapLookupTable::
 	bitmap index (if one is written). This table is used to defer
 	loading individual bitmaps as late as possible. This can be
 	beneficial in repositories that have relatively large bitmap
-	indexes. Defaults to false.
+	indexes. Defaults to true.
 
 pack.readReverseIndex::
 	When true, git will read any .rev file(s) that may be available
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 2a938466f5..6ad6f814e3 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -142,6 +142,7 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
 	int ret;
 
 	opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
+	opts.flags |= MIDX_WRITE_BITMAP_LOOKUP_TABLE;
 
 	git_config(git_multi_pack_index_write_config, NULL);
 
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 3973267e9e..384fefbb1d 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -239,7 +239,7 @@ static enum {
 	WRITE_BITMAP_QUIET,
 	WRITE_BITMAP_TRUE,
 } write_bitmap_index;
-static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE;
+static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE | BITMAP_OPT_LOOKUP_TABLE;
 
 static int exclude_promisor_objects;
 static int exclude_promisor_objects_best_effort;
-- 
2.49.0.226.g0e6cae136d


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/4] p5312: removed duplicate performance test script
  2025-04-17 21:12 [PATCH 0/4] pack-bitmap: enable lookup tables by default, misc. cleanups Taylor Blau
  2025-04-17 21:12 ` [PATCH 1/4] pack-bitmap: write lookup table extension by default Taylor Blau
@ 2025-04-17 21:12 ` Taylor Blau
  2025-04-17 22:08   ` Junio C Hamano
  2025-04-17 21:12 ` [PATCH 3/4] t/perf: avoid testing bitmaps without lookup table Taylor Blau
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Taylor Blau @ 2025-04-17 21:12 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

When the reachability bitmap format learned to read and write a lookup
table containing the set of commits which received reachability bitmaps,
commit 761416ef91 (bitmap-lookup-table: add performance tests for lookup
table, 2022-08-14) added that mirrored p5310 but with reverse indexes
enabled.

Later on in a8dd7e05b1 (config: enable `pack.writeReverseIndex` by
default, 2023-04-12), we enabled reverse indexes by default, which made
these two tests indistinguishable from one another. Commit a8dd7e05b1
should have removed p5312 as a duplicate, but didn't do so.

Correct that by removing p5312 as a functional duplicate of p5310.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/perf/p5312-pack-bitmaps-revs.sh | 34 -------------------------------
 1 file changed, 34 deletions(-)
 delete mode 100755 t/perf/p5312-pack-bitmaps-revs.sh

diff --git a/t/perf/p5312-pack-bitmaps-revs.sh b/t/perf/p5312-pack-bitmaps-revs.sh
deleted file mode 100755
index ceec60656b..0000000000
--- a/t/perf/p5312-pack-bitmaps-revs.sh
+++ /dev/null
@@ -1,34 +0,0 @@
-#!/bin/sh
-
-test_description='Tests pack performance using bitmaps (rev index enabled)'
-. ./perf-lib.sh
-. "${TEST_DIRECTORY}/perf/lib-bitmap.sh"
-
-test_lookup_pack_bitmap () {
-	test_expect_success 'start the test from scratch' '
-		rm -rf * .git
-	'
-
-	test_perf_large_repo
-
-	test_expect_success 'setup bitmap config' '
-		git config pack.writebitmaps true
-	'
-
-	# we need to create the tag up front such that it is covered by the repack and
-	# thus by generated bitmaps.
-	test_expect_success 'create tags' '
-		git tag --message="tag pointing to HEAD" perf-tag HEAD
-	'
-
-	test_perf "enable lookup table: $1" '
-		git config pack.writeBitmapLookupTable '"$1"'
-	'
-
-	test_pack_bitmap
-}
-
-test_lookup_pack_bitmap false
-test_lookup_pack_bitmap true
-
-test_done
-- 
2.49.0.226.g0e6cae136d


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/4] t/perf: avoid testing bitmaps without lookup table
  2025-04-17 21:12 [PATCH 0/4] pack-bitmap: enable lookup tables by default, misc. cleanups Taylor Blau
  2025-04-17 21:12 ` [PATCH 1/4] pack-bitmap: write lookup table extension by default Taylor Blau
  2025-04-17 21:12 ` [PATCH 2/4] p5312: removed duplicate performance test script Taylor Blau
@ 2025-04-17 21:12 ` Taylor Blau
  2025-04-17 22:21   ` Junio C Hamano
  2025-04-17 21:12 ` [PATCH 4/4] t/perf/lib-bitmap.sh: avoid test_perf during setup Taylor Blau
  2025-05-02 21:21 ` [PATCH 0/4] pack-bitmap: enable lookup tables by default, misc. cleanups Junio C Hamano
  4 siblings, 1 reply; 18+ messages in thread
From: Taylor Blau @ 2025-04-17 21:12 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

In a previous commit, the setting which controls whether or not the
pack- and MIDX-bitmap machinery writes a lookup table,
'pack.writeBitmapLookupTable' was enabled by default.

As a result, we can clean up many of our bitmap-related performance
tests. Many of the relevant performance tests look something like:

    test_it () {
      test_expect_success 'setup pack.writeBitmapLookupTable' '
        git config pack.writeBitmapLookupTable '"$1"'
      '

      # ...
    }

    test_it true
    test_it false

, where the two invocations of 'test_it' run the tests with and without
bitmap lookup tables enabled.

But now that lookup tables are enabled by default and have proven to be
a performance win, let's avoid benchmarking what is now an uncommon and
non-default scenario.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/perf/p5310-pack-bitmaps.sh         |  47 +++++-------
 t/perf/p5311-pack-bitmaps-fetch.sh   |  76 +++++++++----------
 t/perf/p5326-multi-pack-bitmaps.sh   | 107 ++++++++++++---------------
 t/perf/p5333-pseudo-merge-bitmaps.sh |   1 -
 4 files changed, 102 insertions(+), 129 deletions(-)

diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
index b1399f1007..52f9fca14b 100755
--- a/t/perf/p5310-pack-bitmaps.sh
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -4,37 +4,28 @@ test_description='Tests pack performance using bitmaps'
 . ./perf-lib.sh
 . "${TEST_DIRECTORY}/perf/lib-bitmap.sh"
 
-test_lookup_pack_bitmap () {
-	test_expect_success 'start the test from scratch' '
-		rm -rf * .git
-	'
+test_expect_success 'start the test from scratch' '
+	rm -rf * .git
+'
 
-	test_perf_large_repo
+test_perf_large_repo
 
-	# note that we do everything through config,
-	# since we want to be able to compare bitmap-aware
-	# git versus non-bitmap git
-	#
-	# We intentionally use the deprecated pack.writebitmaps
-	# config so that we can test against older versions of git.
-	test_expect_success 'setup bitmap config' '
-		git config pack.writebitmaps true
-	'
+# note that we do everything through config,
+# since we want to be able to compare bitmap-aware
+# git versus non-bitmap git
+#
+# We intentionally use the deprecated pack.writebitmaps
+# config so that we can test against older versions of git.
+test_expect_success 'setup bitmap config' '
+	git config pack.writebitmaps true
+'
 
-	# we need to create the tag up front such that it is covered by the repack and
-	# thus by generated bitmaps.
-	test_expect_success 'create tags' '
-		git tag --message="tag pointing to HEAD" perf-tag HEAD
-	'
+# we need to create the tag up front such that it is covered by the repack and
+# thus by generated bitmaps.
+test_expect_success 'create tags' '
+	git tag --message="tag pointing to HEAD" perf-tag HEAD
+'
 
-	test_perf "enable lookup table: $1" '
-		git config pack.writeBitmapLookupTable '"$1"'
-	'
-
-	test_pack_bitmap
-}
-
-test_lookup_pack_bitmap false
-test_lookup_pack_bitmap true
+test_pack_bitmap
 
 test_done
diff --git a/t/perf/p5311-pack-bitmaps-fetch.sh b/t/perf/p5311-pack-bitmaps-fetch.sh
index 047efb995d..75b05a600e 100755
--- a/t/perf/p5311-pack-bitmaps-fetch.sh
+++ b/t/perf/p5311-pack-bitmaps-fetch.sh
@@ -3,52 +3,46 @@
 test_description='performance of fetches from bitmapped packs'
 . ./perf-lib.sh
 
-test_fetch_bitmaps () {
-	test_expect_success 'setup test directory' '
-		rm -fr * .git
-	'
-
-	test_perf_default_repo
+test_expect_success 'setup test directory' '
+	rm -fr * .git
+'
 
-	test_expect_success 'create bitmapped server repo' '
-		git config pack.writebitmaps true &&
-		git config pack.writeBitmapLookupTable '"$1"' &&
-		git repack -ad
-	'
+test_perf_default_repo
 
-	# simulate a fetch from a repository that last fetched N days ago, for
-	# various values of N. We do so by following the first-parent chain,
-	# and assume the first entry in the chain that is N days older than the current
-	# HEAD is where the HEAD would have been then.
-	for days in 1 2 4 8 16 32 64 128; do
-		title=$(printf '%10s' "($days days)")
-		test_expect_success "setup revs from $days days ago" '
-			now=$(git log -1 --format=%ct HEAD) &&
-			then=$(($now - ($days * 86400))) &&
-			tip=$(git rev-list -1 --first-parent --until=$then HEAD) &&
-			{
-				echo HEAD &&
-				echo ^$tip
-			} >revs
-		'
+test_expect_success 'create bitmapped server repo' '
+	git config pack.writebitmaps true &&
+	git repack -ad
+'
 
-		test_perf "server $title (lookup=$1)" '
-			git pack-objects --stdout --revs \
-					--thin --delta-base-offset \
-					<revs >tmp.pack
-		'
+# simulate a fetch from a repository that last fetched N days ago, for
+# various values of N. We do so by following the first-parent chain,
+# and assume the first entry in the chain that is N days older than the current
+# HEAD is where the HEAD would have been then.
+for days in 1 2 4 8 16 32 64 128; do
+	title=$(printf '%10s' "($days days)")
+	test_expect_success "setup revs from $days days ago" '
+		now=$(git log -1 --format=%ct HEAD) &&
+		then=$(($now - ($days * 86400))) &&
+		tip=$(git rev-list -1 --first-parent --until=$then HEAD) &&
+		{
+			echo HEAD &&
+			echo ^$tip
+		} >revs
+	'
 
-		test_size "size   $title" '
-			test_file_size tmp.pack
-		'
+	test_perf "server $title" '
+		git pack-objects --stdout --revs \
+				--thin --delta-base-offset \
+				<revs >tmp.pack
+	'
 
-		test_perf "client $title (lookup=$1)" '
-			git index-pack --stdin --fix-thin <tmp.pack
-		'
-	done
-}
+	test_size "size   $title" '
+		test_file_size tmp.pack
+	'
 
-test_fetch_bitmaps true
-test_fetch_bitmaps false
+	test_perf "client $title" '
+		git index-pack --stdin --fix-thin <tmp.pack
+	'
+done
 
 test_done
diff --git a/t/perf/p5326-multi-pack-bitmaps.sh b/t/perf/p5326-multi-pack-bitmaps.sh
index d082e6cacb..9f32582dec 100755
--- a/t/perf/p5326-multi-pack-bitmaps.sh
+++ b/t/perf/p5326-multi-pack-bitmaps.sh
@@ -4,64 +4,53 @@ test_description='Tests performance using midx bitmaps'
 . ./perf-lib.sh
 . "${TEST_DIRECTORY}/perf/lib-bitmap.sh"
 
-test_bitmap () {
-	local enabled="$1"
-
-	test_expect_success "remove existing repo (lookup=$enabled)" '
-		rm -fr * .git
-	'
-
-	test_perf_large_repo
-
-	# we need to create the tag up front such that it is covered by the repack and
-	# thus by generated bitmaps.
-	test_expect_success 'create tags' '
-		git tag --message="tag pointing to HEAD" perf-tag HEAD
-	'
-
-	test_expect_success "use lookup table: $enabled" '
-		git config pack.writeBitmapLookupTable '"$enabled"'
-	'
-
-	test_expect_success "start with bitmapped pack (lookup=$enabled)" '
-		git repack -adb
-	'
-
-	test_perf "setup multi-pack index (lookup=$enabled)" '
-		git multi-pack-index write --bitmap
-	'
-
-	test_expect_success "drop pack bitmap (lookup=$enabled)" '
-		rm -f .git/objects/pack/pack-*.bitmap
-	'
-
-	test_full_bitmap
-
-	test_expect_success "create partial bitmap state (lookup=$enabled)" '
-		# pick a commit to represent the repo tip in the past
-		cutoff=$(git rev-list HEAD~100 -1) &&
-		orig_tip=$(git rev-parse HEAD) &&
-
-		# now pretend we have just one tip
-		rm -rf .git/logs .git/refs/* .git/packed-refs &&
-		git update-ref HEAD $cutoff &&
-
-		# and then repack, which will leave us with a nice
-		# big bitmap pack of the "old" history, and all of
-		# the new history will be loose, as if it had been pushed
-		# up incrementally and exploded via unpack-objects
-		git repack -Ad &&
-		git multi-pack-index write --bitmap &&
-
-		# and now restore our original tip, as if the pushes
-		# had happened
-		git update-ref HEAD $orig_tip
-	'
-
-	test_partial_bitmap
-}
-
-test_bitmap false
-test_bitmap true
+test_expect_success "remove existing repo" '
+	rm -fr * .git
+'
+
+test_perf_large_repo
+
+# we need to create the tag up front such that it is covered by the repack and
+# thus by generated bitmaps.
+test_expect_success 'create tags' '
+	git tag --message="tag pointing to HEAD" perf-tag HEAD
+'
+
+test_expect_success "start with bitmapped pack" '
+	git repack -adb
+'
+
+test_perf "setup multi-pack index" '
+	git multi-pack-index write --bitmap
+'
+
+test_expect_success "drop pack bitmap" '
+	rm -f .git/objects/pack/pack-*.bitmap
+'
+
+test_full_bitmap
+
+test_expect_success "create partial bitmap state" '
+	# pick a commit to represent the repo tip in the past
+	cutoff=$(git rev-list HEAD~100 -1) &&
+	orig_tip=$(git rev-parse HEAD) &&
+
+	# now pretend we have just one tip
+	rm -rf .git/logs .git/refs/* .git/packed-refs &&
+	git update-ref HEAD $cutoff &&
+
+	# and then repack, which will leave us with a nice
+	# big bitmap pack of the "old" history, and all of
+	# the new history will be loose, as if it had been pushed
+	# up incrementally and exploded via unpack-objects
+	git repack -Ad &&
+	git multi-pack-index write --bitmap &&
+
+	# and now restore our original tip, as if the pushes
+	# had happened
+	git update-ref HEAD $orig_tip
+'
+
+test_partial_bitmap
 
 test_done
diff --git a/t/perf/p5333-pseudo-merge-bitmaps.sh b/t/perf/p5333-pseudo-merge-bitmaps.sh
index 2e8b1d2635..91b1c06745 100755
--- a/t/perf/p5333-pseudo-merge-bitmaps.sh
+++ b/t/perf/p5333-pseudo-merge-bitmaps.sh
@@ -11,7 +11,6 @@ test_expect_success 'setup' '
 		-c bitmapPseudoMerge.all.threshold=now \
 		-c bitmapPseudoMerge.all.stableThreshold=never \
 		-c bitmapPseudoMerge.all.maxMerges=64 \
-		-c pack.writeBitmapLookupTable=true \
 		repack -adb
 '
 
-- 
2.49.0.226.g0e6cae136d


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/4] t/perf/lib-bitmap.sh: avoid test_perf during setup
  2025-04-17 21:12 [PATCH 0/4] pack-bitmap: enable lookup tables by default, misc. cleanups Taylor Blau
                   ` (2 preceding siblings ...)
  2025-04-17 21:12 ` [PATCH 3/4] t/perf: avoid testing bitmaps without lookup table Taylor Blau
@ 2025-04-17 21:12 ` Taylor Blau
  2025-04-17 22:22   ` Junio C Hamano
  2025-04-18 10:17   ` Jeff King
  2025-05-02 21:21 ` [PATCH 0/4] pack-bitmap: enable lookup tables by default, misc. cleanups Junio C Hamano
  4 siblings, 2 replies; 18+ messages in thread
From: Taylor Blau @ 2025-04-17 21:12 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano

In the test_pack_bitmap() helper function, we first repack the
repository under test for consistency and to eliminate any effects from
different distributions of objects among packs.

This step is performed with test_perf, so it is repeated
$GIT_PERF_REPEAT_COUNT number of times. But we do not care about timing
this portion of the setup phase, and repeating the process does not
change the outcome.

Use test_expect_success to avoid spending time repeating an idempotent
portion of the setup for performance tests that use test_pack_bitmap().

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/perf/lib-bitmap.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/perf/lib-bitmap.sh b/t/perf/lib-bitmap.sh
index 55a8feb1dc..fdf5f35f1b 100644
--- a/t/perf/lib-bitmap.sh
+++ b/t/perf/lib-bitmap.sh
@@ -69,7 +69,7 @@ test_partial_bitmap () {
 }
 
 test_pack_bitmap () {
-	test_perf "repack to disk" '
+	test_expect_success "repack to disk" '
 		git repack -ad
 	'
 
-- 
2.49.0.226.g0e6cae136d

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] pack-bitmap: write lookup table extension by default
  2025-04-17 21:12 ` [PATCH 1/4] pack-bitmap: write lookup table extension by default Taylor Blau
@ 2025-04-17 22:04   ` Junio C Hamano
  2025-04-18  9:33     ` Jeff King
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2025-04-17 22:04 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King

Taylor Blau <me@ttaylorr.com> writes:

> diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
> index 2a938466f5..6ad6f814e3 100644
> --- a/builtin/multi-pack-index.c
> +++ b/builtin/multi-pack-index.c
> @@ -142,6 +142,7 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
>  	int ret;
>  
>  	opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
> +	opts.flags |= MIDX_WRITE_BITMAP_LOOKUP_TABLE;
>  
>  	git_config(git_multi_pack_index_write_config, NULL);
>  
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 3973267e9e..384fefbb1d 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -239,7 +239,7 @@ static enum {
>  	WRITE_BITMAP_QUIET,
>  	WRITE_BITMAP_TRUE,
>  } write_bitmap_index;
> -static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE;
> +static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE | BITMAP_OPT_LOOKUP_TABLE;

Are these two hunks required to be kept in sync?

If so, I am wondering what is the right approach to make sure they
are.  The definition of MIDX_WRITE_BITMAP_* flag bits in midx.h and
BITMAP_OPT_* flag bits in write_bitmap_index enum are distinct two
sets, and we need a way to somehow convert between them back and
forth if we really wanted to ensure that these "default" values are
kept in sync automatically.

The reason I ask is mostly because I do not know offhand, and I
would imagine that it would be hard for third-parties to verify, if
these are only two places that need to be updated in order for
lookup table extensions to be written by default, when somebody
new wants to further update the default.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/4] p5312: removed duplicate performance test script
  2025-04-17 21:12 ` [PATCH 2/4] p5312: removed duplicate performance test script Taylor Blau
@ 2025-04-17 22:08   ` Junio C Hamano
  2025-04-18 21:57     ` Taylor Blau
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2025-04-17 22:08 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King

Taylor Blau <me@ttaylorr.com> writes:

> Subject: Re: [PATCH 2/4] p5312: removed duplicate performance test script

"removed" -> "remove"???

> When the reachability bitmap format learned to read and write a lookup
> table containing the set of commits which received reachability bitmaps,
> commit 761416ef91 (bitmap-lookup-table: add performance tests for lookup
> table, 2022-08-14) added that mirrored p5310 but with reverse indexes
> enabled.

"added that" -> "added a <something> that"???

> Later on in a8dd7e05b1 (config: enable `pack.writeReverseIndex` by
> default, 2023-04-12), we enabled reverse indexes by default, which made
> these two tests indistinguishable from one another. Commit a8dd7e05b1
> should have removed p5312 as a duplicate, but didn't do so.

Or to retain the same coverage, it should have explicitly disabled
reverse index in one of the tests, while allowing the other to use
the reverse index enabled by default, perhaps?

> Correct that by removing p5312 as a functional duplicate of p5310.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  t/perf/p5312-pack-bitmaps-revs.sh | 34 -------------------------------
>  1 file changed, 34 deletions(-)
>  delete mode 100755 t/perf/p5312-pack-bitmaps-revs.sh
>
> diff --git a/t/perf/p5312-pack-bitmaps-revs.sh b/t/perf/p5312-pack-bitmaps-revs.sh
> deleted file mode 100755
> index ceec60656b..0000000000
> --- a/t/perf/p5312-pack-bitmaps-revs.sh
> +++ /dev/null
> @@ -1,34 +0,0 @@
> -#!/bin/sh
> -
> -test_description='Tests pack performance using bitmaps (rev index enabled)'
> -. ./perf-lib.sh
> -. "${TEST_DIRECTORY}/perf/lib-bitmap.sh"
> -
> -test_lookup_pack_bitmap () {
> -	test_expect_success 'start the test from scratch' '
> -		rm -rf * .git
> -	'
> -
> -	test_perf_large_repo
> -
> -	test_expect_success 'setup bitmap config' '
> -		git config pack.writebitmaps true
> -	'
> -
> -	# we need to create the tag up front such that it is covered by the repack and
> -	# thus by generated bitmaps.
> -	test_expect_success 'create tags' '
> -		git tag --message="tag pointing to HEAD" perf-tag HEAD
> -	'
> -
> -	test_perf "enable lookup table: $1" '
> -		git config pack.writeBitmapLookupTable '"$1"'
> -	'
> -
> -	test_pack_bitmap
> -}
> -
> -test_lookup_pack_bitmap false
> -test_lookup_pack_bitmap true
> -
> -test_done

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/4] t/perf: avoid testing bitmaps without lookup table
  2025-04-17 21:12 ` [PATCH 3/4] t/perf: avoid testing bitmaps without lookup table Taylor Blau
@ 2025-04-17 22:21   ` Junio C Hamano
  2025-04-18  4:24     ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2025-04-17 22:21 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King

Taylor Blau <me@ttaylorr.com> writes:

> In a previous commit, the setting which controls whether or not the
> pack- and MIDX-bitmap machinery writes a lookup table,
> 'pack.writeBitmapLookupTable' was enabled by default.
>
> As a result, we can clean up many of our bitmap-related performance
> tests. Many of the relevant performance tests look something like:
>
>     test_it () {
>       test_expect_success 'setup pack.writeBitmapLookupTable' '
>         git config pack.writeBitmapLookupTable '"$1"'
>       '
>
>       # ...
>     }
>
>     test_it true
>     test_it false
>
> , where the two invocations of 'test_it' run the tests with and without
> bitmap lookup tables enabled.
>
> But now that lookup tables are enabled by default and have proven to be
> a performance win, let's avoid benchmarking what is now an uncommon and
> non-default scenario.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---

Hmph, how costly are these tests to run and maintain?

I somehow have a feeling that removal of these "performance" tests
is less worrysome than removing correctness tests, but as long as we
claim to support both configurations (i.e. with and without lookup
tables), it feels a bit premature to remove tests for one of them.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] t/perf/lib-bitmap.sh: avoid test_perf during setup
  2025-04-17 21:12 ` [PATCH 4/4] t/perf/lib-bitmap.sh: avoid test_perf during setup Taylor Blau
@ 2025-04-17 22:22   ` Junio C Hamano
  2025-04-18 10:17   ` Jeff King
  1 sibling, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2025-04-17 22:22 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King

Taylor Blau <me@ttaylorr.com> writes:

> In the test_pack_bitmap() helper function, we first repack the
> repository under test for consistency and to eliminate any effects from
> different distributions of objects among packs.
>
> This step is performed with test_perf, so it is repeated
> $GIT_PERF_REPEAT_COUNT number of times. But we do not care about timing
> this portion of the setup phase, and repeating the process does not
> change the outcome.
>
> Use test_expect_success to avoid spending time repeating an idempotent
> portion of the setup for performance tests that use test_pack_bitmap().
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  t/perf/lib-bitmap.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

OK.

>
> diff --git a/t/perf/lib-bitmap.sh b/t/perf/lib-bitmap.sh
> index 55a8feb1dc..fdf5f35f1b 100644
> --- a/t/perf/lib-bitmap.sh
> +++ b/t/perf/lib-bitmap.sh
> @@ -69,7 +69,7 @@ test_partial_bitmap () {
>  }
>  
>  test_pack_bitmap () {
> -	test_perf "repack to disk" '
> +	test_expect_success "repack to disk" '
>  		git repack -ad
>  	'

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/4] t/perf: avoid testing bitmaps without lookup table
  2025-04-17 22:21   ` Junio C Hamano
@ 2025-04-18  4:24     ` Junio C Hamano
  2025-04-18 10:02       ` Jeff King
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2025-04-18  4:24 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King

Junio C Hamano <gitster@pobox.com> writes:

> I somehow have a feeling that removal of these "performance" tests
> is less worrysome than removing correctness tests, but as long as we
> claim to support both configurations (i.e. with and without lookup
> tables), it feels a bit premature to remove tests for one of them.

In case the implication was missed, I was hinting that in the longer
term, once one variant proves to be better than the other variant(s)
in any and all aspects, it would be a great move to remove the other
one(s).  It is exactly what is happening on the recursive-ort front.

Once we become so confident about correctness and performance with
the configuration with lookup tables that we are willing to lose an
escape hatch to operate without them, we can obviously remove these
tests for configuration without lookup tables.  If we are not there
yet, and still rely on the "escape hatch" value of the configuration
that does not use the lookup tables, we want to make sure that the
escape hatch still functions, right ;-)?

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] pack-bitmap: write lookup table extension by default
  2025-04-17 22:04   ` Junio C Hamano
@ 2025-04-18  9:33     ` Jeff King
  2025-04-18 15:44       ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff King @ 2025-04-18  9:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, Elijah Newren

On Thu, Apr 17, 2025 at 03:04:00PM -0700, Junio C Hamano wrote:

> Taylor Blau <me@ttaylorr.com> writes:
> 
> > diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
> > index 2a938466f5..6ad6f814e3 100644
> > --- a/builtin/multi-pack-index.c
> > +++ b/builtin/multi-pack-index.c
> > @@ -142,6 +142,7 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
> >  	int ret;
> >  
> >  	opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
> > +	opts.flags |= MIDX_WRITE_BITMAP_LOOKUP_TABLE;
> >  
> >  	git_config(git_multi_pack_index_write_config, NULL);
> >  
> > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> > index 3973267e9e..384fefbb1d 100644
> > --- a/builtin/pack-objects.c
> > +++ b/builtin/pack-objects.c
> > @@ -239,7 +239,7 @@ static enum {
> >  	WRITE_BITMAP_QUIET,
> >  	WRITE_BITMAP_TRUE,
> >  } write_bitmap_index;
> > -static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE;
> > +static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE | BITMAP_OPT_LOOKUP_TABLE;
> 
> Are these two hunks required to be kept in sync?

They're not technically required to be in sync. It is OK for the midx
bitmaps to have different options than the ones we make for packs. And
in theory they could intentionally diverge, though in practice I don't
think we (yet) have any extensions or options that would be more
appropriate for one or the other.

So if we did want to join them, I think it would make sense to still be
able to use different flags for each situation, but initialize them from
a common definition.

> If so, I am wondering what is the right approach to make sure they
> are.  The definition of MIDX_WRITE_BITMAP_* flag bits in midx.h and
> BITMAP_OPT_* flag bits in write_bitmap_index enum are distinct two
> sets, and we need a way to somehow convert between them back and
> forth if we really wanted to ensure that these "default" values are
> kept in sync automatically.

Yeah, I think that would be much simpler if the bitmap options were held
separately from the other midx flags, and then we could use the same
values for each (and a common "defaults" definition). Which is
conceptually simple, but means we have to plumb a separate variable
through a bunch of intermediate functions. See the (untested) patch
below.

I dunno if it's worth it for something that doesn't really change all
that often. OTOH, it does remove the extra layer of translation in
write_midx_bitmap().

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 2a938466f5..fb073a946a 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -10,6 +10,7 @@
 #include "object-store-ll.h"
 #include "replace-object.h"
 #include "repository.h"
+#include "pack-bitmap.h"
 
 #define BUILTIN_MIDX_WRITE_USAGE \
 	N_("git multi-pack-index [<options>] write [--preferred-pack=<pack>]" \
@@ -54,6 +55,7 @@ static struct opts_multi_pack_index {
 	char *refs_snapshot;
 	unsigned long batch_size;
 	unsigned flags;
+	uint16_t bitmap_options;
 	int stdin_packs;
 } opts;
 
@@ -89,16 +91,16 @@ static int git_multi_pack_index_write_config(const char *var, const char *value,
 {
 	if (!strcmp(var, "pack.writebitmaphashcache")) {
 		if (git_config_bool(var, value))
-			opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
+			opts.bitmap_options |= BITMAP_OPT_HASH_CACHE;
 		else
-			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
+			opts.bitmap_options &= ~BITMAP_OPT_HASH_CACHE;
 	}
 
 	if (!strcmp(var, "pack.writebitmaplookuptable")) {
 		if (git_config_bool(var, value))
-			opts.flags |= MIDX_WRITE_BITMAP_LOOKUP_TABLE;
+			opts.bitmap_options |= BITMAP_OPT_LOOKUP_TABLE;
 		else
-			opts.flags &= ~MIDX_WRITE_BITMAP_LOOKUP_TABLE;
+			opts.bitmap_options &= ~BITMAP_OPT_LOOKUP_TABLE;
 	}
 
 	/*
@@ -141,7 +143,7 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
 	};
 	int ret;
 
-	opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
+	opts.bitmap_options |= BITMAP_OPT_HASH_CACHE;
 
 	git_config(git_multi_pack_index_write_config, NULL);
 
@@ -167,7 +169,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
 
 		ret = write_midx_file_only(repo, opts.object_dir, &packs,
 					   opts.preferred_pack,
-					   opts.refs_snapshot, opts.flags);
+					   opts.refs_snapshot, opts.flags,
+					   opts.bitmap_options);
 
 		string_list_clear(&packs, 0);
 		free(opts.refs_snapshot);
@@ -177,7 +180,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
 	}
 
 	ret = write_midx_file(repo, opts.object_dir, opts.preferred_pack,
-			      opts.refs_snapshot, opts.flags);
+			      opts.refs_snapshot, opts.flags,
+			      opts.bitmap_options);
 
 	free(opts.refs_snapshot);
 	return ret;
@@ -236,7 +240,8 @@ static int cmd_multi_pack_index_expire(int argc, const char **argv,
 
 	FREE_AND_NULL(options);
 
-	return expire_midx_packs(the_repository, opts.object_dir, opts.flags);
+	return expire_midx_packs(the_repository, opts.object_dir, opts.flags,
+				 opts.bitmap_options);
 }
 
 static int cmd_multi_pack_index_repack(int argc, const char **argv,
@@ -269,7 +274,8 @@ static int cmd_multi_pack_index_repack(int argc, const char **argv,
 	FREE_AND_NULL(options);
 
 	return midx_repack(the_repository, opts.object_dir,
-			   (size_t)opts.batch_size, opts.flags);
+			   (size_t)opts.batch_size, opts.flags,
+			   opts.bitmap_options);
 }
 
 int cmd_multi_pack_index(int argc,
diff --git a/builtin/repack.c b/builtin/repack.c
index f3330ade7b..69c9a8bc4c 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -1562,7 +1562,7 @@ int cmd_repack(int argc,
 		if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL, 0))
 			flags |= MIDX_WRITE_INCREMENTAL;
 		write_midx_file(the_repository, repo_get_object_directory(the_repository),
-				NULL, NULL, flags);
+				NULL, NULL, flags, 0);
 	}
 
 cleanup:
diff --git a/midx-write.c b/midx-write.c
index 48a4dc5e94..5729b989cb 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -841,10 +841,10 @@ static int write_midx_bitmap(struct write_midx_context *ctx,
 			     struct packing_data *pdata,
 			     struct commit **commits,
 			     uint32_t commits_nr,
-			     unsigned flags)
+			     unsigned flags,
+			     uint16_t options)
 {
 	int ret, i;
-	uint16_t options = 0;
 	struct bitmap_writer writer;
 	struct pack_idx_entry **index;
 	struct strbuf bitmap_name = STRBUF_INIT;
@@ -859,12 +859,6 @@ static int write_midx_bitmap(struct write_midx_context *ctx,
 		get_midx_filename_ext(ctx->repo->hash_algo, &bitmap_name,
 				      object_dir, midx_hash, MIDX_EXT_BITMAP);
 
-	if (flags & MIDX_WRITE_BITMAP_HASH_CACHE)
-		options |= BITMAP_OPT_HASH_CACHE;
-
-	if (flags & MIDX_WRITE_BITMAP_LOOKUP_TABLE)
-		options |= BITMAP_OPT_LOOKUP_TABLE;
-
 	/*
 	 * Build the MIDX-order index based on pdata.objects (which is already
 	 * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of
@@ -1070,7 +1064,8 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
 			       struct string_list *packs_to_drop,
 			       const char *preferred_pack_name,
 			       const char *refs_snapshot,
-			       unsigned flags)
+			       unsigned flags,
+			       uint16_t bitmap_options)
 {
 	struct strbuf midx_name = STRBUF_INIT;
 	unsigned char midx_hash[GIT_MAX_RAWSZ];
@@ -1433,7 +1428,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
 
 		if (write_midx_bitmap(&ctx, object_dir,
 				      midx_hash, &pdata, commits, commits_nr,
-				      flags) < 0) {
+				      flags, bitmap_options) < 0) {
 			error(_("could not write multi-pack bitmap"));
 			result = 1;
 			clear_packing_data(&pdata);
@@ -1529,23 +1524,27 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
 
 int write_midx_file(struct repository *r, const char *object_dir,
 		    const char *preferred_pack_name,
-		    const char *refs_snapshot, unsigned flags)
+		    const char *refs_snapshot, unsigned flags,
+		    uint16_t bitmap_options)
 {
 	return write_midx_internal(r, object_dir, NULL, NULL,
 				   preferred_pack_name, refs_snapshot,
-				   flags);
+				   flags, bitmap_options);
 }
 
 int write_midx_file_only(struct repository *r, const char *object_dir,
 			 struct string_list *packs_to_include,
 			 const char *preferred_pack_name,
-			 const char *refs_snapshot, unsigned flags)
+			 const char *refs_snapshot, unsigned flags,
+			 uint16_t bitmap_options)
 {
 	return write_midx_internal(r, object_dir, packs_to_include, NULL,
-				   preferred_pack_name, refs_snapshot, flags);
+				   preferred_pack_name, refs_snapshot, flags,
+				   bitmap_options);
 }
 
-int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags)
+int expire_midx_packs(struct repository *r, const char *object_dir,
+		      unsigned flags, uint16_t bitmap_options)
 {
 	uint32_t i, *count, result = 0;
 	struct string_list packs_to_drop = STRING_LIST_INIT_DUP;
@@ -1603,7 +1602,8 @@ int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla
 
 	if (packs_to_drop.nr)
 		result = write_midx_internal(r, object_dir, NULL,
-					     &packs_to_drop, NULL, NULL, flags);
+					     &packs_to_drop, NULL, NULL, flags,
+					     bitmap_options);
 
 	string_list_clear(&packs_to_drop, 0);
 
@@ -1718,7 +1718,8 @@ static void fill_included_packs_batch(struct repository *r,
 	free(pack_info);
 }
 
-int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags)
+int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
+		unsigned flags, uint16_t bitmap_options)
 {
 	int result = 0;
 	uint32_t i, packs_to_repack = 0;
@@ -1801,7 +1802,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	}
 
 	result = write_midx_internal(r, object_dir, NULL, NULL, NULL, NULL,
-				     flags);
+				     flags, bitmap_options);
 
 cleanup:
 	free(include_pack);
diff --git a/midx.h b/midx.h
index 9d1374cbd5..5a79302a7a 100644
--- a/midx.h
+++ b/midx.h
@@ -81,8 +81,6 @@ struct multi_pack_index {
 #define MIDX_PROGRESS     (1 << 0)
 #define MIDX_WRITE_REV_INDEX (1 << 1)
 #define MIDX_WRITE_BITMAP (1 << 2)
-#define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3)
-#define MIDX_WRITE_BITMAP_LOOKUP_TABLE (1 << 4)
 #define MIDX_WRITE_INCREMENTAL (1 << 5)
 
 #define MIDX_EXT_REV "rev"
@@ -131,15 +129,16 @@ int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, i
  */
 int write_midx_file(struct repository *r, const char *object_dir,
 		    const char *preferred_pack_name, const char *refs_snapshot,
-		    unsigned flags);
+		    unsigned flags, uint16_t bitmap_options);
 int write_midx_file_only(struct repository *r, const char *object_dir,
 			 struct string_list *packs_to_include,
 			 const char *preferred_pack_name,
-			 const char *refs_snapshot, unsigned flags);
+			 const char *refs_snapshot, unsigned flags,
+			 uint16_t bitmap_options);
 void clear_midx_file(struct repository *r);
 int verify_midx_file(struct repository *r, const char *object_dir, unsigned flags);
-int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags);
-int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags);
+int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags, uint16_t bitmap_options);
+int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags, uint16_t bitmap_options);
 
 void close_midx(struct multi_pack_index *m);
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/4] t/perf: avoid testing bitmaps without lookup table
  2025-04-18  4:24     ` Junio C Hamano
@ 2025-04-18 10:02       ` Jeff King
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff King @ 2025-04-18 10:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, Elijah Newren

On Thu, Apr 17, 2025 at 09:24:46PM -0700, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
> 
> > I somehow have a feeling that removal of these "performance" tests
> > is less worrysome than removing correctness tests, but as long as we
> > claim to support both configurations (i.e. with and without lookup
> > tables), it feels a bit premature to remove tests for one of them.
> 
> In case the implication was missed, I was hinting that in the longer
> term, once one variant proves to be better than the other variant(s)
> in any and all aspects, it would be a great move to remove the other
> one(s).  It is exactly what is happening on the recursive-ort front.
> 
> Once we become so confident about correctness and performance with
> the configuration with lookup tables that we are willing to lose an
> escape hatch to operate without them, we can obviously remove these
> tests for configuration without lookup tables.  If we are not there
> yet, and still rely on the "escape hatch" value of the configuration
> that does not use the lookup tables, we want to make sure that the
> escape hatch still functions, right ;-)?

I think the perf tests differ from the correctness tests in that a
single run is not all that interesting. You can see how long something
takes, but that's meaningless without a baseline.

The interesting results come from comparing two versions. So in this
case:

  - running the simplified test script comparing an old version (where
    lookup tables were not the default) with one where they are (i.e.,
    one with the first patch from this series). That will show off the
    perf improvement from the lookup tables (and in a better way than
    the original, because we'll actually compute the time difference
    between the two versions, rather than showing them as separate lines
    which the perf suite does not realize are related).

  - going forward, comparing two "new" versions will show us if the
    operations regress in performance, using config both from Git's
    defaults but also the repo.

    So most of the time, you'd be testing the default case, where we do
    generate the lookup tables (because they're now the default). But if
    you have a particular repo or config setup you care about, you can
    provide a repo with pack.writeBitmapLookupTable set as you like.

That does create a blind spot if no developers running the perf suite
ever do the "you can provide..." step. But I think that is the tip of
the iceberg in terms of repo and config blind spots in the perf suite.
There are so many possible combinations, and it's expensive to test them
all. I don't think we have any particular reason to think that the
non-lookup-table code path would significantly regress in performance.

You asked earlier how much these tests cost to run. It's basically
doubling the cost of each script, since we're running everything twice.
So p5310 using linux.git, for instance, just took ~10 minutes after this
patch. And that's with PERF_REPEAT_COUNT set to 1! The default would
have been 30 minutes (and thus prior to this patch, 60 minutes total).

That's a lot of minutes.

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] t/perf/lib-bitmap.sh: avoid test_perf during setup
  2025-04-17 21:12 ` [PATCH 4/4] t/perf/lib-bitmap.sh: avoid test_perf during setup Taylor Blau
  2025-04-17 22:22   ` Junio C Hamano
@ 2025-04-18 10:17   ` Jeff King
  1 sibling, 0 replies; 18+ messages in thread
From: Jeff King @ 2025-04-18 10:17 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Thu, Apr 17, 2025 at 05:12:23PM -0400, Taylor Blau wrote:

> In the test_pack_bitmap() helper function, we first repack the
> repository under test for consistency and to eliminate any effects from
> different distributions of objects among packs.
>
> This step is performed with test_perf, so it is repeated
> $GIT_PERF_REPEAT_COUNT number of times. But we do not care about timing
> this portion of the setup phase, and repeating the process does not
> change the outcome.

Isn't this also where we actually generate the bitmaps? I.e., it is
where we would see a performance regression in the bitmap writing
process (whereas the rest of the script is about the reading side).

That said, I don't think it's even doing that very well. It is mutating
the on-disk state, so the first run will potentially be much slower than
subsequent runs (since everything is now in one big pack with bitmaps,
and we try to reuse deltas and bitmap data as much as possible). And
since we take best-of-N, we're basically just measuring those subsequent
noop repacks (unless you set the repeat count to 1!).

I think we've run into this before, e.g. in 775c71e16d (p5302: create
the repo in each index-pack test, 2019-04-22). And there the solution
was to reset the repo state before each timing, assuming it is quick
enough not to affect the test too much. Our perf suite doesn't provide
much support there (we'd want something like hyperfine's --prepare
option).

So I dunno. It is possible for timing this operation to provide some
value, but I don't think the current implementation is doing that. And
it's quite expensive to run.

> diff --git a/t/perf/lib-bitmap.sh b/t/perf/lib-bitmap.sh
> index 55a8feb1dc..fdf5f35f1b 100644
> --- a/t/perf/lib-bitmap.sh
> +++ b/t/perf/lib-bitmap.sh
> @@ -69,7 +69,7 @@ test_partial_bitmap () {
>  }
>  
>  test_pack_bitmap () {
> -	test_perf "repack to disk" '
> +	test_expect_success "repack to disk" '
>  		git repack -ad
>  	'

The same issue exists in t5326, which calls "multi-pack-index write"
with the "--bitmap" flag, I think. So if we are going to do this, we'd
probably want the same there.

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] pack-bitmap: write lookup table extension by default
  2025-04-18  9:33     ` Jeff King
@ 2025-04-18 15:44       ` Junio C Hamano
  2025-04-18 21:52         ` Taylor Blau
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2025-04-18 15:44 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, Elijah Newren

Jeff King <peff@peff.net> writes:

> They're not technically required to be in sync. It is OK for the midx
> bitmaps to have different options than the ones we make for packs. And
> in theory they could intentionally diverge, though in practice I don't
> think we (yet) have any extensions or options that would be more
> appropriate for one or the other.
>
> So if we did want to join them, I think it would make sense to still be
> able to use different flags for each situation, but initialize them from
> a common definition.

Thanks for great explanation---I guess it is not worth pursuing,
then.  It is not like it would make the system misbehave when two
are set differently.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] pack-bitmap: write lookup table extension by default
  2025-04-18 15:44       ` Junio C Hamano
@ 2025-04-18 21:52         ` Taylor Blau
  0 siblings, 0 replies; 18+ messages in thread
From: Taylor Blau @ 2025-04-18 21:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git, Elijah Newren

On Fri, Apr 18, 2025 at 08:44:29AM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > They're not technically required to be in sync. It is OK for the midx
> > bitmaps to have different options than the ones we make for packs. And
> > in theory they could intentionally diverge, though in practice I don't
> > think we (yet) have any extensions or options that would be more
> > appropriate for one or the other.
> >
> > So if we did want to join them, I think it would make sense to still be
> > able to use different flags for each situation, but initialize them from
> > a common definition.
>
> Thanks for great explanation---I guess it is not worth pursuing,
> then.  It is not like it would make the system misbehave when two
> are set differently.

Hah, Peff beat me to it. I saw your reply last night and was going to
write you a very similar response.

I think the summary from my perspective would be that: the two could
fall out-of-sync intentionally if we want the two commands to ever have
different defaults. Tangentially we could use some common "bitmap_flags"
field whose bits are defined in pack-bitmap.h and used in both places.

The latter is a bit awkward currently because the current "flags" that
we pass into the MIDX machinery from the builtin all have MIDX-specific
meanings. So we would have to either make sure that MIDX uses separate
bit positions (which is awful and far too fragile for my comfort/taste)
or store them as a separate set of flags (I think what Peff is getting
at above).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/4] p5312: removed duplicate performance test script
  2025-04-17 22:08   ` Junio C Hamano
@ 2025-04-18 21:57     ` Taylor Blau
  0 siblings, 0 replies; 18+ messages in thread
From: Taylor Blau @ 2025-04-18 21:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Elijah Newren, Jeff King

On Thu, Apr 17, 2025 at 03:08:59PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > Subject: Re: [PATCH 2/4] p5312: removed duplicate performance test script
>
> "removed" -> "remove"???
>
> > When the reachability bitmap format learned to read and write a lookup
> > table containing the set of commits which received reachability bitmaps,
> > commit 761416ef91 (bitmap-lookup-table: add performance tests for lookup
> > table, 2022-08-14) added that mirrored p5310 but with reverse indexes
> > enabled.
>
> "added that" -> "added a <something> that"???

I am embarrassed. These are both awful. Here's the relevant portion of
the range-diff:

2:  51c4604e16 ! 2:  a80a7b5e60 p5312: removed duplicate performance test script
    @@ Metadata
     Author: Taylor Blau <me@ttaylorr.com>

      ## Commit message ##
    -    p5312: removed duplicate performance test script
    +    p5312: remove duplicate performance test script

         When the reachability bitmap format learned to read and write a lookup
         table containing the set of commits which received reachability bitmaps,
         commit 761416ef91 (bitmap-lookup-table: add performance tests for lookup
    -    table, 2022-08-14) added that mirrored p5310 but with reverse indexes
    -    enabled.
    +    table, 2022-08-14) added a new performance test script mirroring p5310
    +    but with reverse indexes enabled.

         Later on in a8dd7e05b1 (config: enable `pack.writeReverseIndex` by
         default, 2023-04-12), we enabled reverse indexes by default, which made

> > Later on in a8dd7e05b1 (config: enable `pack.writeReverseIndex` by
> > default, 2023-04-12), we enabled reverse indexes by default, which made
> > these two tests indistinguishable from one another. Commit a8dd7e05b1
> > should have removed p5312 as a duplicate, but didn't do so.
>
> Or to retain the same coverage, it should have explicitly disabled
> reverse index in one of the tests, while allowing the other to use
> the reverse index enabled by default, perhaps?

I don't think we necessarily would benefit from having two variants of
this performance test. It is a little annoying to maintain, but that
isn't the main reason that I proposed removing it here.

I think that the pair of performance tests were useful in proving out
the lookup table extension as useful to bitmaps' performance
characteristics by comparison to the non-lookup table version. In that
sense, I think the pair of performance tests were useful as a contrast
to one another. Since we have evidence of their usefulness, the contrast
is less important IMHO.

I think we still want to keep the "lookup table enabled" version to
prevent regressions, though.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/4] pack-bitmap: enable lookup tables by default, misc. cleanups
  2025-04-17 21:12 [PATCH 0/4] pack-bitmap: enable lookup tables by default, misc. cleanups Taylor Blau
                   ` (3 preceding siblings ...)
  2025-04-17 21:12 ` [PATCH 4/4] t/perf/lib-bitmap.sh: avoid test_perf during setup Taylor Blau
@ 2025-05-02 21:21 ` Junio C Hamano
  2025-05-05  7:11   ` Patrick Steinhardt
  4 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2025-05-02 21:21 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King

Taylor Blau <me@ttaylorr.com> writes:

> This is a short series I extracted from a larger topic on reusing
> "external"[^1] deltas during verbatim pack reuse.
>
> As part of performance-testing that series, I realized that bitmap
> lookup tables are not written by default. Since it has been a
> significant period of time since their introduction, the first patch of
> this series makes writing the lookup table extension the default
> behavior. This is:
>
>   * pack-bitmap: write lookup table extension by default
>
> The next three patches clean up some t/perf scripts that were redundant
> now that lookup tables are the default behavior. Those are:
>
>   * p5312: removed duplicate performance test script
>   * t/perf: avoid testing bitmaps without lookup table
>   * t/perf/lib-bitmap.sh: avoid test_perf during setup
>
> Thanks in advance for your review :-).
>
> [^1]: The term I'm using to describe delta/base pairs which either (a)
> are represented from different packs in a MIDX bitmap, or (b) the client
> is known to already have the base.
>
> Taylor Blau (4):
>   pack-bitmap: write lookup table extension by default
>   p5312: removed duplicate performance test script
>   t/perf: avoid testing bitmaps without lookup table
>   t/perf/lib-bitmap.sh: avoid test_perf during setup

Peff and I were the only two people who read these patches?
Is this topic still viable, or has it been backburnered?

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/4] pack-bitmap: enable lookup tables by default, misc. cleanups
  2025-05-02 21:21 ` [PATCH 0/4] pack-bitmap: enable lookup tables by default, misc. cleanups Junio C Hamano
@ 2025-05-05  7:11   ` Patrick Steinhardt
  0 siblings, 0 replies; 18+ messages in thread
From: Patrick Steinhardt @ 2025-05-05  7:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, Elijah Newren, Jeff King

On Fri, May 02, 2025 at 02:21:47PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
> 
> > This is a short series I extracted from a larger topic on reusing
> > "external"[^1] deltas during verbatim pack reuse.
> >
> > As part of performance-testing that series, I realized that bitmap
> > lookup tables are not written by default. Since it has been a
> > significant period of time since their introduction, the first patch of
> > this series makes writing the lookup table extension the default
> > behavior. This is:
> >
> >   * pack-bitmap: write lookup table extension by default
> >
> > The next three patches clean up some t/perf scripts that were redundant
> > now that lookup tables are the default behavior. Those are:
> >
> >   * p5312: removed duplicate performance test script
> >   * t/perf: avoid testing bitmaps without lookup table
> >   * t/perf/lib-bitmap.sh: avoid test_perf during setup
> >
> > Thanks in advance for your review :-).
> >
> > [^1]: The term I'm using to describe delta/base pairs which either (a)
> > are represented from different packs in a MIDX bitmap, or (b) the client
> > is known to already have the base.
> >
> > Taylor Blau (4):
> >   pack-bitmap: write lookup table extension by default
> >   p5312: removed duplicate performance test script
> >   t/perf: avoid testing bitmaps without lookup table
> >   t/perf/lib-bitmap.sh: avoid test_perf during setup
> 
> Peff and I were the only two people who read these patches?
> Is this topic still viable, or has it been backburnered?

I read through the patch series, but didn't have anything to add over
what has already been discussed. Overall I think that it makes sense to
enable lookup tables -- we've had them enabled since February 2023 for
all users of GitLab.

Patrick

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-05-05  7:11 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-17 21:12 [PATCH 0/4] pack-bitmap: enable lookup tables by default, misc. cleanups Taylor Blau
2025-04-17 21:12 ` [PATCH 1/4] pack-bitmap: write lookup table extension by default Taylor Blau
2025-04-17 22:04   ` Junio C Hamano
2025-04-18  9:33     ` Jeff King
2025-04-18 15:44       ` Junio C Hamano
2025-04-18 21:52         ` Taylor Blau
2025-04-17 21:12 ` [PATCH 2/4] p5312: removed duplicate performance test script Taylor Blau
2025-04-17 22:08   ` Junio C Hamano
2025-04-18 21:57     ` Taylor Blau
2025-04-17 21:12 ` [PATCH 3/4] t/perf: avoid testing bitmaps without lookup table Taylor Blau
2025-04-17 22:21   ` Junio C Hamano
2025-04-18  4:24     ` Junio C Hamano
2025-04-18 10:02       ` Jeff King
2025-04-17 21:12 ` [PATCH 4/4] t/perf/lib-bitmap.sh: avoid test_perf during setup Taylor Blau
2025-04-17 22:22   ` Junio C Hamano
2025-04-18 10:17   ` Jeff King
2025-05-02 21:21 ` [PATCH 0/4] pack-bitmap: enable lookup tables by default, misc. cleanups Junio C Hamano
2025-05-05  7:11   ` Patrick Steinhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).