git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Elijah Newren <newren@gmail.com>, Jeff King <peff@peff.net>,
	Junio C Hamano <gitster@pobox.com>
Subject: [PATCH 1/4] pack-bitmap: write lookup table extension by default
Date: Thu, 17 Apr 2025 17:12:14 -0400	[thread overview]
Message-ID: <b7cfb1267fdd7f50f414c9f79377cb338a0c1ab0.1744924321.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1744924321.git.me@ttaylorr.com>

The lookup table extension for reachability bitmaps was first introduced
via 3fe0121479 (Merge branch 'ac/bitmap-lookup-table', 2022-09-05).
For each bitmapped commit, the lookup table encodes three unsigned
integers:

  - The pack position of the commit.
  - An offset within the *.bitmap itself where that commit's bitmap
    lives.
  - An index into the same table where information on that commit's XOR
    pair can be found.

Lookup tables make bitmap operations faster because they no longer have
to read the entire *.bitmap file to discover what commits have
corresponding reachability bitmaps.

When bitmap lookup tables were first introduced, we established a
baseline level of performance in p5310 with and without lookup tables.
Here is the baseline without:

    Test                                                    this tree
    -----------------------------------------------------------------------
    5310.6: simulated clone                               14.04(5.78+1.79)
    5310.7: simulated fetch                               1.95(3.05+0.20)
    5310.8: pack to file (bitmap)                         44.73(20.55+7.45)
    5310.9: rev-list (commits)                            0.78(0.46+0.10)
    5310.10: rev-list (objects)                           4.07(3.97+0.08)
    5310.11: rev-list with tag negated via --not          0.06(0.02+0.03)
             --all (objects)
    5310.12: rev-list with negative tag (objects)         0.21(0.15+0.05)
    5310.13: rev-list count with blob:none                0.24(0.17+0.06)
    5310.14: rev-list count with blob:limit=1k            7.07(5.92+0.48)
    5310.15: rev-list count with tree:0                   0.25(0.17+0.07)
    5310.16: simulated partial clone                      5.67(3.28+0.64)
    5310.18: clone (partial bitmap)                       16.05(8.34+1.86)
    5310.19: pack to file (partial bitmap)                59.76(27.22+7.43)
    5310.20: rev-list with tree filter (partial bitmap)   0.90(0.18+0.16)

, and here is the same set of tests, this time with the lookup table
enabled:

    Test                                                    this tree
    -----------------------------------------------------------------------
    5310.26: simulated clone                              13.69(5.72+1.78)
    5310.27: simulated fetch                              1.84(3.02+0.16)
    5310.28: pack to file (bitmap)                        45.63(20.67+7.50)
    5310.29: rev-list (commits)                           0.56(0.39+0.8)
    5310.30: rev-list (objects)                           3.77(3.74+0.08)
    5310.31: rev-list with tag negated via --not          0.05(0.02+0.03)
             --all (objects)
    5310.32: rev-list with negative tag (objects)         0.21(0.15+0.05)
    5310.33: rev-list count with blob:none                0.23(0.17+0.05)
    5310.34: rev-list count with blob:limit=1k            6.65(5.72+0.40)
    5310.35: rev-list count with tree:0                   0.23(0.16+0.06)
    5310.36: simulated partial clone                      5.57(3.26+0.59)
    5310.38: clone (partial bitmap)                       15.89(8.39+1.84)
    5310.39: pack to file (partial bitmap)                58.32(27.55+7.47)
    5310.40: rev-list with tree filter (partial bitmap)   0.73(0.18+0.15)

(All numbers here come from a ~2022-era copy of the kernel, via
Abhradeep Chakraborty who implemented the lookup table extension).

In the almost three years since lookup tables were introduced, GitHub
has used them in production without issue, taking advantage of the above
performance benefits along the way.

Since this feature has had sufficient time to flush out any bugs and/or
performance regressions, let's enable it by default so that all bitmap
users can reap similar performance benefits.

[1]: https://lore.kernel.org/git/pull.1266.git.1655728395.gitgitgadget@gmail.com/

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/config/pack.adoc | 2 +-
 builtin/multi-pack-index.c     | 1 +
 builtin/pack-objects.c         | 2 +-
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/config/pack.adoc b/Documentation/config/pack.adoc
index da527377fa..ba538e3d9c 100644
--- a/Documentation/config/pack.adoc
+++ b/Documentation/config/pack.adoc
@@ -191,7 +191,7 @@ pack.writeBitmapLookupTable::
 	bitmap index (if one is written). This table is used to defer
 	loading individual bitmaps as late as possible. This can be
 	beneficial in repositories that have relatively large bitmap
-	indexes. Defaults to false.
+	indexes. Defaults to true.
 
 pack.readReverseIndex::
 	When true, git will read any .rev file(s) that may be available
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 2a938466f5..6ad6f814e3 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -142,6 +142,7 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
 	int ret;
 
 	opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
+	opts.flags |= MIDX_WRITE_BITMAP_LOOKUP_TABLE;
 
 	git_config(git_multi_pack_index_write_config, NULL);
 
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 3973267e9e..384fefbb1d 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -239,7 +239,7 @@ static enum {
 	WRITE_BITMAP_QUIET,
 	WRITE_BITMAP_TRUE,
 } write_bitmap_index;
-static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE;
+static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE | BITMAP_OPT_LOOKUP_TABLE;
 
 static int exclude_promisor_objects;
 static int exclude_promisor_objects_best_effort;
-- 
2.49.0.226.g0e6cae136d


  reply	other threads:[~2025-04-17 21:12 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-17 21:12 [PATCH 0/4] pack-bitmap: enable lookup tables by default, misc. cleanups Taylor Blau
2025-04-17 21:12 ` Taylor Blau [this message]
2025-04-17 22:04   ` [PATCH 1/4] pack-bitmap: write lookup table extension by default Junio C Hamano
2025-04-18  9:33     ` Jeff King
2025-04-18 15:44       ` Junio C Hamano
2025-04-18 21:52         ` Taylor Blau
2025-04-17 21:12 ` [PATCH 2/4] p5312: removed duplicate performance test script Taylor Blau
2025-04-17 22:08   ` Junio C Hamano
2025-04-18 21:57     ` Taylor Blau
2025-04-17 21:12 ` [PATCH 3/4] t/perf: avoid testing bitmaps without lookup table Taylor Blau
2025-04-17 22:21   ` Junio C Hamano
2025-04-18  4:24     ` Junio C Hamano
2025-04-18 10:02       ` Jeff King
2025-04-17 21:12 ` [PATCH 4/4] t/perf/lib-bitmap.sh: avoid test_perf during setup Taylor Blau
2025-04-17 22:22   ` Junio C Hamano
2025-04-18 10:17   ` Jeff King
2025-05-02 21:21 ` [PATCH 0/4] pack-bitmap: enable lookup tables by default, misc. cleanups Junio C Hamano
2025-05-05  7:11   ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b7cfb1267fdd7f50f414c9f79377cb338a0c1ab0.1744924321.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).