Git development
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>,
	Elijah Newren <newren@gmail.com>,
	Derrick Stolee <stolee@gmail.com>
Subject: [PATCH 0/8] pack-bitmap-write: speed up bitmap generation
Date: Tue, 19 May 2026 12:12:33 -0400	[thread overview]
Message-ID: <cover.1779207127.git.me@ttaylorr.com> (raw)

Note to the maintainer:

 * This series is based on 'tb/pseudo-merge-bugfixes', with
   'ps/clang-w-glibc-2.43-and-_Generic' merged in. I suggest queueing it
   as 'tb/bitmap-build-performance'.

   The latter merge is only to avoid the current Clang/glibc 2.43 CI
   breakage, and is unrelated to the bitmap changes themselves.

This series improves the performance of reachability bitmap generation,
focusing on very large repositories and the penalty to generate
pseudo-merge reachability bitmaps.

The first few patches address hot paths in the ordinary bitmap build:

 - pass object positions into `fill_bitmap_tree()` so callers can avoid
   redundant lookups,

 - check subtree bits before recursing, which avoids many no-op
   `fill_bitmap_tree()` calls,

 - reuse already-stored selected bitmaps when `fill_bitmap_commit()`
   reaches a selected ancestor, and

 - add a small direct-mapped cache from object IDs to bitmap positions
   to avoid repeated pack/MIDX lookups while filling bitmaps.

On the large repository that I have been using to benchmark these
changes (~4.8M commits and ~57M total objects), the no-pseudo-merge
bitmap generation case drops **from ~612.5 seconds to ~294.1 seconds**.

The next patch sorts selected bitmaps before choosing XOR offsets. This
does not change bitmap selection/coverage, but in the same repository it
shrinks the generated bitmap file **from ~635.5 MiB to ~176.4 MiB** by
putting related ancestor/descendant bitmaps close enough together for
the XOR search window to find them.

The final two patches focus on pseudo-merge bitmaps. The existing code
feeds pseudo-merges into the same maximal-commit selection machinery as
ordinary selected commits. That machinery works well for real history,
but not pseudo-merges.

Instead, this series builds ordinary selected bitmaps first, then builds
pseudo-merge bitmaps afterwards. The later pseudo-merge fill can still
reuse stored selected ancestor bitmaps, and can also reuse an existing
on-disk pseudo-merge bitmap when the parent set matches.

With the coarse pseudo-merge configuration used for testing:

    [bitmapPseudoMerge "all"]
        pattern=refs/
        threshold=now
        stableSize=10000000
        maxMerges=8

, the optimized no-pseudo-merge case takes ~294.1 seconds, while the
**pseudo-merge case takes ~328.4 seconds**. Before the final change, the
same pseudo-merge configuration took ~575.0 seconds.

On our testing repository, it is faster at the end of this series to
generate bitmaps with pseudo-merges (~328 seconds as above) than it is
to generate bitmaps without pseudo-merges at the start of this series
(~612 seconds).

Thanks in advance for your review!

Taylor Blau (8):
  pack-bitmap: pass object position to `fill_bitmap_tree()`
  pack-bitmap: check subtree bits before recursing
  pack-bitmap: reuse stored selected bitmaps
  pack-bitmap: consolidate `find_object_pos()` success path
  pack-bitmap: cache object positions during fill
  pack-bitmap: sort bitmaps before XORing
  pack-bitmap: remember pseudo-merge parents
  pack-bitmap: build pseudo-merge bitmaps after regular bitmaps

 pack-bitmap-write.c | 431 +++++++++++++++++++++++++++++++++++++-------
 pack-bitmap.h       |   7 +
 2 files changed, 377 insertions(+), 61 deletions(-)


base-commit: c3d7ca7d982efc3a848fd85f34e867cfc0a99479
-- 
2.54.0.rc1.84.g30ce254312c

             reply	other threads:[~2026-05-19 16:12 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-19 16:12 Taylor Blau [this message]
2026-05-19 16:12 ` [PATCH 1/8] pack-bitmap: pass object position to `fill_bitmap_tree()` Taylor Blau
2026-05-19 16:12 ` [PATCH 2/8] pack-bitmap: check subtree bits before recursing Taylor Blau
2026-05-19 16:12 ` [PATCH 3/8] pack-bitmap: reuse stored selected bitmaps Taylor Blau
2026-05-19 16:12 ` [PATCH 4/8] pack-bitmap: consolidate `find_object_pos()` success path Taylor Blau
2026-05-19 16:12 ` [PATCH 5/8] pack-bitmap: cache object positions during fill Taylor Blau
2026-05-19 16:12 ` [PATCH 6/8] pack-bitmap: sort bitmaps before XORing Taylor Blau
2026-05-19 16:12 ` [PATCH 7/8] pack-bitmap: remember pseudo-merge parents Taylor Blau
2026-05-19 16:12 ` [PATCH 8/8] pack-bitmap: build pseudo-merge bitmaps after regular bitmaps Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1779207127.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox