From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>, Elijah Newren <newren@gmail.com>,
Patrick Steinhardt <ps@pks.im>,
Junio C Hamano <gitster@pobox.com>
Subject: [RFC PATCH 00/14] repack: incremental MIDX/bitmap-based repacking
Date: Tue, 24 Feb 2026 19:20:41 -0500 [thread overview]
Message-ID: <cover.1771978829.git.me@ttaylorr.com> (raw)
[NOTE: this series has not yet been tested thoroughly and is based on
the v3 of 'tb/incremental-midx-part-3.2', which is d54da84bd9d (midx:
enable reachability bitmaps during MIDX compaction, 2026-02-24).
I suspect that there will be non-trivial changes to the approach in this
series based on feedback, hence the RFC.]
Note to the maintainer:
* This series is based on 'tb/incremental-midx-part-3.2', I suggest
queueing as 'tb/incremental-midx-part-3.3'.
OVERVIEW
========
This series implements the third and final major component for an
incremental MIDX/bitmap-based repacking strategy. As a refresher, those
are:
1. Refactoring code out of builtin/repack.c into individual compilation
units outside of the builtin directory.
2. Implementing MIDX layer compaction, i.e., the ability to combine a
contiguous sequence of MIDX layers in an existing chain and replace
them with a single layer representing the union of objects/packs
among the compacted layers.
3. (this series) An incremental repacking strategy that, unlike our
current '--geometric' approach, does not rely on periodic
all-into-one repacks.
BACKGROUND
==========
Today, a '--geometric' repack with '--write-midx' rewrites the entire
MIDX (and, when enabled, its reachability bitmap) on every invocation.
For most repositories this is acceptable. In large monorepos, these
costs can add up significantly, especially when those repositories are
repacked frequently.
The incremental MIDX support introduced in the earlier parts of this
effort allows us to append new layers to a MIDX chain without rewriting
anything. Combined with the support for MIDX compaction in 3.2, we now
have all of the building blocks needed to maintain an incrementally
growing and shrinking MIDX chain as part of the repack cycle.
STRATEGY
========
The incremental repacking strategy implemented in this series works
(roughly) as follows:
1. Repack non-MIDX'd packs using an ordinary '--geometric' repack,
optionally including packs from the tip MIDX layer if and only if it
contains more than 'repack.midxNewLayerThreshold' number of packs.
2. Prepare to write a new MIDX layer containing the freshly written
pack(s) (and any surviving packs from the tip layer, if it was
rewritten).
3. Perform a compaction pass over adjacent MIDX layers to restore a
geometric progression on object count among layers in the chain
(determined by 'repack.midxSplitFactor').
4. Write the new MIDX chain, link it into place, and remove redundant
layers.
In effect, this approach encourages MIDX chains where:
* older layers contain fewer, larger packs, and
* newer layers contain many smaller packs.
As a result of the compaction pass, we prevent the chain itself from
ever growing too long.
This roughly matches the description of this algorithm I gave at Git
Merge last year, which is covered on slides 80-131 of this presentation:
https://ttaylorr.com/presentations/git-merge-2025.pdf
THIS SERIES
===========
This series is organized roughly as follows:
* The first three patches perform small-ish quality-of-life refactors
within the MIDX machinery.
* The next two patches introduce `--checksum-only` and `--base`
options for `git multi-pack-index write` and `compact`, which are
needed by the incremental repacking machinery to assemble MIDX
chains without prematurely updating the chain file.
* The next six patches prepare the repack infrastructure and pack
geometry code for the new repacking strategy.
* The final three patches introduce the new repacking machinery, expose
it to users, and then extend it to work with non-geometric repacks in
that order. The first two are the substantive patches of this series.
WHERE WE'RE AT
==============
This series delivers the final substantive component of this overall
effort to enable the new repacking strategy implemented here. There are
a couple of (comparatively much smaller) items that will be useful as
follow-on items, including:
* "Reachability-infused" geometric repacking to emit small cruft packs
to introduce other ways to update the set of cruft objects beyond a
whole-repository traversal.
* Richer bitmap configuration to determine which bitmap(s) to carry
forward between adjacent MIDX layers when doing an incremental
repack, to prevent an endless accumulation of reachability bitmaps.
Like I mentioned earlier, I think that both of those are significantly
smaller challenges than this and the previous series. As a general note,
I am going on vacation for two weeks starting this Friday (2026-02-24)
and will be away from the list.
I'll plan on picking up any review from this series as well as the two
other topics listed above as soon as I get back.
Thanks in advance for your review!
Taylor Blau (14):
midx: use `string_list` for retained MIDX files
strvec: introduce `strvec_init_alloc()`
midx: use `strvec` for `keep_hashes`
midx: introduce `--checksum-only` for incremental MIDX writes
midx: support custom `--base` for incremental MIDX writes
repack: track the ODB source via existing_packs
midx: expose `midx_layer_contains_pack()`
repack-midx: factor out `repack_prepare_midx_command()`
repack-midx: extract `repack_fill_midx_stdin_packs()`
repack-geometry: prepare for incremental MIDX repacking
builtin/repack.c: convert `--write-midx` to an `OPT_CALLBACK`
repack: implement incremental MIDX repacking
repack: introduce `--write-midx=incremental`
repack: allow `--write-midx=incremental` without `--geometric`
Documentation/config/repack.adoc | 18 +
Documentation/git-multi-pack-index.adoc | 19 +-
Documentation/git-repack.adoc | 44 +-
builtin/multi-pack-index.c | 48 +-
builtin/repack.c | 90 ++-
midx-write.c | 119 ++--
midx.c | 104 ++--
midx.h | 11 +-
repack-geometry.c | 50 +-
repack-midx.c | 704 +++++++++++++++++++++++-
repack.c | 23 +-
repack.h | 25 +-
strvec.c | 7 +
strvec.h | 5 +
t/meson.build | 1 +
t/t5334-incremental-multi-pack-index.sh | 47 ++
t/t5335-compact-multi-pack-index.sh | 105 ++++
t/t7705-repack-incremental-midx.sh | 461 ++++++++++++++++
18 files changed, 1730 insertions(+), 151 deletions(-)
create mode 100755 t/t7705-repack-incremental-midx.sh
base-commit: ac6221686db11ba802ced8ea6b8b2fa8388b21b2
--
2.53.0.185.g29bc4dff628
next reply other threads:[~2026-02-25 0:20 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-25 0:20 Taylor Blau [this message]
2026-02-25 0:20 ` [RFC PATCH 06/14] repack: track the ODB source via existing_packs Taylor Blau
2026-02-25 0:21 ` Taylor Blau
2026-02-25 0:23 ` Taylor Blau
2026-02-25 0:20 ` [RFC PATCH 01/14] midx: use `string_list` for retained MIDX files Taylor Blau
2026-02-26 20:29 ` Junio C Hamano
2026-02-27 3:02 ` Taylor Blau
2026-02-25 0:21 ` [RFC PATCH 02/14] strvec: introduce `strvec_init_alloc()` Taylor Blau
2026-02-26 20:34 ` Junio C Hamano
2026-02-26 20:58 ` Junio C Hamano
2026-02-27 3:07 ` Taylor Blau
2026-02-25 0:21 ` [RFC PATCH 03/14] midx: use `strvec` for `keep_hashes` Taylor Blau
2026-02-25 0:21 ` [RFC PATCH 04/14] midx: introduce `--checksum-only` for incremental MIDX writes Taylor Blau
2026-02-25 0:21 ` [RFC PATCH 05/14] midx: support custom `--base` " Taylor Blau
2026-02-25 0:21 ` [RFC PATCH 07/14] midx: expose `midx_layer_contains_pack()` Taylor Blau
2026-02-25 0:21 ` [RFC PATCH 08/14] repack-midx: factor out `repack_prepare_midx_command()` Taylor Blau
2026-02-25 0:21 ` [RFC PATCH 09/14] repack-midx: extract `repack_fill_midx_stdin_packs()` Taylor Blau
2026-02-25 0:21 ` [RFC PATCH 10/14] repack-geometry: prepare for incremental MIDX repacking Taylor Blau
2026-02-25 0:21 ` [RFC PATCH 11/14] builtin/repack.c: convert `--write-midx` to an `OPT_CALLBACK` Taylor Blau
2026-02-25 0:21 ` [RFC PATCH 12/14] repack: implement incremental MIDX repacking Taylor Blau
2026-02-25 0:22 ` [RFC PATCH 13/14] repack: introduce `--write-midx=incremental` Taylor Blau
2026-02-25 0:22 ` [RFC PATCH 14/14] repack: allow `--write-midx=incremental` without `--geometric` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1771978829.git.me@ttaylorr.com \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox