From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Elijah Newren <newren@gmail.com>, Jeff King <peff@peff.net>,
Junio C Hamano <gitster@pobox.com>,
Patrick Steinhardt <ps@pks.im>
Subject: [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction
Date: Sat, 6 Dec 2025 15:30:57 -0500 [thread overview]
Message-ID: <cover.1765053054.git.me@ttaylorr.com> (raw)
Note to the maintainer:
* This series is the spiritual successor to my earlier series queued
as 'tb/incremental-midx-part-3.1', but is based on 'master'
(currently f0ef5b6d9bc (The fifth batch, 2025-11-30) at the time of
writing). I suggest queueing as 'tb/incremental-midx-part-3.2'.
OVERVIEW
========
This series implements the second of three major components for an
incremental MIDX/bitmap-based repacking strategy. Those components
are:
1. Refactoring code out of builtin/repack.c into individual
compilation units outside of the builtin directory.
2. (this series) Implementing MIDX layer compaction, i.e. the ability
to take an contiguous sequence of MIDX layers in an existing MIDX
chain and replace them with a single layer representing the union
of objects/packs among the compacted layers.
3. An incremental repacking strategy that, unlike our current
'--geometric' approach, does not require periodic all-into-one
repacks.
BACKGROUND
==========
I'll briefly discuss that repacking strategy here in order to provide
some context on why MIDX layer compaction is important. When doing a
--geometric repack today, any update to the MIDX and/or its .bitmap
requires a full rewrite. For most repositories this is not such a big
deal, but in large monorepos these costs can add up significantly over
time.
To deal with this, incremental MIDXs (and corresponding support for
reachability bitmaps) were introduced as a way to append updates to
the end of a MIDX chain without having to rewrite anything.
Of course, growing a MIDX chain infinitely without ever shrinking it
would be pointless. So any repacking strategy which leverages
incremental MIDXs must take some action which encourages the MIDX
chain to avoid growing too large.
The strategy that I will share in the forthcoming topic related to
this effort does (roughly) the following:
1. Repack all non-MIDX'd packs geometrically, optionally including
any packs in the most recent MIDX layer iff it has more than some
threshold of packs.
2. Write out a new MIDX layer which is the result of the previous
step, adding it onto the end of the chain (or replacing the
previous end, depending on whether the pack threshold condition
was met).
3. Perform a compaction pass, combining adjacent MIDX layers to
restore a geometric progression on object count among MIDX layers
in the chain.
(As an aside, this description is not the most efficient way to
implement this behavior, and so the actual implementation varies
slightly, though achieves the same effect.)
This strategy encourages MIDX chains where older layers have fewer,
larger packs, and newer layers have a greater number of smaller packs.
The compaction pass prevents the MIDX chain from ever growing too
large. (In my simulation[1], even a large number of simulated pushes
yields a MIDX chain which is consistently fewer than ~10 or so layers
long.)
COMPACTION
==========
MIDX compaction works by producing a new MIDX layer which is the
logical union of any contiguous sequence within the existing MIDX
chain. Importantly, the packs/objects from those layers within the
compaction range have their order preserved. That means that the
pseudo-pack order of the compacted layer is guaranteed to be the same
as the concatenated pseudo-pack order across all layers in the
compaction range.
As a result, we can reuse any selected bitmaps within the compaction
range verbatim. More importantly, any bitmaps belonging to layers
which are descendants of the compacted one do not need to permute any
existing bitmaps in order for them to be valid post-compaction.
This allows us to cheaply combine parts of the MIDX chain without
needing to do any work outside of the combine layers, which is
essential in the above-described repacking strategy.
THIS SERIES
===========
This series implements MIDX compaction, and is organized roughly as
follows:
* The first two patches are stray code clean-up and minor refactoring
changes that I made while writing this series.
* The next three patches clean up the SYNOPSIS for in the manual page
for git-multi-pack-index(1), and removes the builtin from the list
of known failures in t0450.
* The next patch is a typofix, and the one following that is a
preemptive bugfix.
* The eighth patch is a code refactoring patch that makes the
argument list for midx-write.c::write_midx_internal() easier to
reason about.
* The ninth patch relaxes a MIDX constraint that forces all packs
within a layer to be sorted lexicographically.
* The remaining patches prepare for and introduce MIDX compaction.
The test coverage in the new t5335-compact-multi-pack-index.sh is
fairly sparse in my opinion, though most of the interesting behavior
is already exercised via t5319 and t5334. If others have ideas on how
to improve test coverage here, please let me know!
This series is somewhat lengthy, though the individual changes are
relatively small for the most part, so I am hoping that they aren't
too bad to review.
Thanks in advance for your review!
[1]: https://gist.github.com/ttaylorr/7ac0305f130197744f3583096c8198a1
Taylor Blau (17):
midx: mark `get_midx_checksum()` arguments as const
midx: split `get_midx_checksum()` by adding `get_midx_hash()`
builtin/multi-pack-index.c: make '--progress' a common option
git-multi-pack-index(1): remove non-existent incompatibility
git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h'
t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39
midx-write.c: don't use `pack_perm` when assigning `bitmap_pos`
midx-write.c: introduce `struct write_midx_opts`
midx: do not require packs to be sorted in lexicographic order
git-compat-util.h: introduce `u32_add()`
midx-write.c: introduce `midx_pack_perm()` helper
midx-write.c: extract `fill_pack_from_midx()`
midx-write.c: enumerate `pack_int_id` values directly
midx-write.c: factor fanout layering from `compute_sorted_entries()`
t/helper/test-read-midx.c: plug memory leak when selecting layer
midx: implement MIDX compaction
midx: enable reachability bitmaps during MIDX compaction
Documentation/git-multi-pack-index.adoc | 24 +-
builtin/multi-pack-index.c | 84 ++++-
git-compat-util.h | 8 +
midx-write.c | 463 ++++++++++++++++++------
midx.c | 36 +-
midx.h | 9 +-
pack-bitmap.c | 9 +-
pack-revindex.c | 4 +-
t/helper/test-read-midx.c | 21 +-
t/meson.build | 1 +
t/t0450/adoc-help-mismatches | 1 -
t/t5319-multi-pack-index.sh | 7 +-
t/t5335-compact-multi-pack-index.sh | 218 +++++++++++
13 files changed, 737 insertions(+), 148 deletions(-)
create mode 100755 t/t5335-compact-multi-pack-index.sh
base-commit: f0ef5b6d9bcc258e4cbef93839d1b7465d5212b9
--
2.52.0.171.gd6a4e6b6955
next reply other threads:[~2025-12-06 20:31 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-06 20:30 Taylor Blau [this message]
2025-12-06 20:31 ` [PATCH 01/17] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
2025-12-08 18:26 ` Patrick Steinhardt
2025-12-09 1:41 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 02/17] midx: split `get_midx_checksum()` by adding `get_midx_hash()` Taylor Blau
2025-12-08 18:25 ` Patrick Steinhardt
2025-12-09 1:42 ` Taylor Blau
2025-12-09 1:50 ` Taylor Blau
2025-12-09 6:27 ` Patrick Steinhardt
2025-12-06 20:31 ` [PATCH 03/17] builtin/multi-pack-index.c: make '--progress' a common option Taylor Blau
2025-12-06 20:31 ` [PATCH 04/17] git-multi-pack-index(1): remove non-existent incompatibility Taylor Blau
2025-12-06 20:31 ` [PATCH 05/17] git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h' Taylor Blau
2025-12-06 20:31 ` [PATCH 06/17] t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39 Taylor Blau
2025-12-06 20:31 ` [PATCH 07/17] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos` Taylor Blau
2025-12-08 18:26 ` Patrick Steinhardt
2025-12-09 1:59 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 08/17] midx-write.c: introduce `struct write_midx_opts` Taylor Blau
2025-12-08 18:26 ` Patrick Steinhardt
2025-12-09 2:04 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 09/17] midx: do not require packs to be sorted in lexicographic order Taylor Blau
2025-12-08 18:26 ` Patrick Steinhardt
2025-12-09 2:07 ` Taylor Blau
2025-12-09 2:11 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 10/17] git-compat-util.h: introduce `u32_add()` Taylor Blau
2025-12-08 18:27 ` Patrick Steinhardt
2025-12-09 2:13 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 11/17] midx-write.c: introduce `midx_pack_perm()` helper Taylor Blau
2025-12-06 20:31 ` [PATCH 12/17] midx-write.c: extract `fill_pack_from_midx()` Taylor Blau
2025-12-06 20:31 ` [PATCH 13/17] midx-write.c: enumerate `pack_int_id` values directly Taylor Blau
2025-12-08 18:27 ` Patrick Steinhardt
2025-12-09 2:14 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 14/17] midx-write.c: factor fanout layering from `compute_sorted_entries()` Taylor Blau
2025-12-06 20:31 ` [PATCH 15/17] t/helper/test-read-midx.c: plug memory leak when selecting layer Taylor Blau
2025-12-08 18:27 ` Patrick Steinhardt
2025-12-09 2:16 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 16/17] midx: implement MIDX compaction Taylor Blau
2025-12-09 7:21 ` Patrick Steinhardt
2025-12-06 20:31 ` [PATCH 17/17] midx: enable reachability bitmaps during " Taylor Blau
2025-12-09 7:21 ` Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1765053054.git.me@ttaylorr.com \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).