Git development
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>,
	Elijah Newren <newren@gmail.com>, Patrick Steinhardt <ps@pks.im>
Subject: [PATCH 00/16] repack: incremental MIDX/bitmap-based repacking
Date: Sun, 29 Mar 2026 17:40:50 -0400	[thread overview]
Message-ID: <cover.1774820449.git.me@ttaylorr.com> (raw)

Note to the maintainer:

 * Now that tb/incremental-midx-part-3.2 is merged, this series is
   rebased on current 'master', which is 5361983c075 (The 22nd batch,
   2026-03-27) at the time of writing.

 * I suggest queueing as 'tb/incremental-midx-part-3.3'.

The below is a nearly identical cover letter to the one I sent for the
RFC-version[1] of this series. The only changes between the RFC version
and this one are:

 * A new commit to correctly handle certain cases when optimizing out
   MIDX writes.

 * Defined `strvec_init()` as a special case of `strvec_init_alloc()`
   based on feedback from Junio.

 * Minor ODB source-related fallout as a consequence of rebasing.

 * Add trace2 instrumentation to help follow what repack is doing in
   response to different packing configurations.

 * Fix various memory leaks.

The original cover letter is below:

------------------------------------------------------------------------

OVERVIEW
========

This series implements the third and final major component for an
incremental MIDX/bitmap-based repacking strategy. As a refresher, those
are:

 1. Refactoring code out of builtin/repack.c into individual compilation
    units outside of the builtin directory.

 2. Implementing MIDX layer compaction, i.e., the ability to combine a
    contiguous sequence of MIDX layers in an existing chain and replace
    them with a single layer representing the union of objects/packs
    among the compacted layers.

 3. (this series) An incremental repacking strategy that, unlike our
    current '--geometric' approach, does not rely on periodic
    all-into-one repacks.

BACKGROUND
==========

Today, a '--geometric' repack with '--write-midx' rewrites the entire
MIDX (and, when enabled, its reachability bitmap) on every invocation.
For most repositories this is acceptable. In large monorepos, these
costs can add up significantly, especially when those repositories are
repacked frequently.

The incremental MIDX support introduced in the earlier parts of this
effort allows us to append new layers to a MIDX chain without rewriting
anything. Combined with the support for MIDX compaction in 3.2, we now
have all of the building blocks needed to maintain an incrementally
growing and shrinking MIDX chain as part of the repack cycle.

STRATEGY
========

The incremental repacking strategy implemented in this series works
(roughly) as follows:

 1. Repack non-MIDX'd packs using an ordinary '--geometric' repack,
    optionally including packs from the tip MIDX layer if and only if it
    contains more than 'repack.midxNewLayerThreshold' number of packs.

 2. Prepare to write a new MIDX layer containing the freshly written
    pack(s) (and any surviving packs from the tip layer, if it was
    rewritten).

 3. Perform a compaction pass over adjacent MIDX layers to restore a
    geometric progression on object count among layers in the chain
    (determined by 'repack.midxSplitFactor').

 4. Write the new MIDX chain, link it into place, and remove redundant
    layers.

In effect, this approach encourages MIDX chains where:

 * older layers contain fewer, larger packs, and

 * newer layers contain many smaller packs.

As a result of the compaction pass, we prevent the chain itself from
ever growing too long.

This roughly matches the description of this algorithm I gave at Git
Merge last year, which is covered on slides 80-131 of this presentation:

    https://ttaylorr.com/presentations/git-merge-2025.pdf

THIS SERIES
===========

This series is organized roughly as follows:

 * The first three patches perform small-ish quality-of-life refactors
   within the MIDX machinery.

 * The next two patches introduce `--checksum-only` and `--base`
   options for `git multi-pack-index write` and `compact`, which are
   needed by the incremental repacking machinery to assemble MIDX
   chains without prematurely updating the chain file.

 * The next six patches prepare the repack infrastructure and pack
   geometry code for the new repacking strategy.

 * The final three patches introduce the new repacking machinery, expose
   it to users, and then extend it to work with non-geometric repacks in
   that order. The first two are the substantive patches of this series.

WHERE WE'RE AT
==============

This series delivers the final substantive component of this overall
effort to enable the new repacking strategy implemented here. There are
a couple of (comparatively much smaller) items that will be useful as
follow-on items, including:

 * "Reachability-infused" geometric repacking to emit small cruft packs
   to introduce other ways to update the set of cruft objects beyond a
   whole-repository traversal.

 * Richer bitmap configuration to determine which bitmap(s) to carry
   forward between adjacent MIDX layers when doing an incremental
   repack, to prevent an endless accumulation of reachability bitmaps.

Like I mentioned earlier, I think that both of those are significantly
smaller challenges than this and the previous series.

Thanks in advance for your review!

[1]: https://lore.kernel.org/git/cover.1771978829.git.me@ttaylorr.com/

Taylor Blau (16):
  midx-write: handle noop writes when converting incremental chains
  midx: use `string_list` for retained MIDX files
  strvec: introduce `strvec_init_alloc()`
  midx: use `strvec` for `keep_hashes`
  midx: introduce `--checksum-only` for incremental MIDX writes
  midx: support custom `--base` for incremental MIDX writes
  repack: track the ODB source via existing_packs
  midx: expose `midx_layer_contains_pack()`
  repack-midx: factor out `repack_prepare_midx_command()`
  repack-midx: extract `repack_fill_midx_stdin_packs()`
  repack-geometry: prepare for incremental MIDX repacking
  builtin/repack.c: convert `--write-midx` to an `OPT_CALLBACK`
  packfile: ensure `close_pack_revindex()` frees in-memory revindex
  repack: implement incremental MIDX repacking
  repack: introduce `--write-midx=incremental`
  repack: allow `--write-midx=incremental` without `--geometric`

 Documentation/config/repack.adoc        |  18 +
 Documentation/git-multi-pack-index.adoc |  19 +-
 Documentation/git-repack.adoc           |  44 +-
 builtin/multi-pack-index.c              |  48 +-
 builtin/repack.c                        |  90 ++-
 midx-write.c                            | 137 +++--
 midx.c                                  | 106 ++--
 midx.h                                  |  11 +-
 packfile.c                              |   2 +
 repack-geometry.c                       |  50 +-
 repack-midx.c                           | 717 +++++++++++++++++++++++-
 repack.c                                |  23 +-
 repack.h                                |  25 +-
 strvec.c                                |  15 +-
 strvec.h                                |   5 +
 t/meson.build                           |   1 +
 t/t5334-incremental-multi-pack-index.sh |  63 +++
 t/t5335-compact-multi-pack-index.sh     | 113 ++++
 t/t7705-repack-incremental-midx.sh      | 461 +++++++++++++++
 19 files changed, 1792 insertions(+), 156 deletions(-)
 create mode 100755 t/t7705-repack-incremental-midx.sh


base-commit: 5361983c075154725be47b65cca9a2421789e410
-- 
2.53.0.729.g817728289e1.dirty

             reply	other threads:[~2026-03-29 21:40 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-29 21:40 Taylor Blau [this message]
2026-03-29 21:40 ` [PATCH 01/16] midx-write: handle noop writes when converting incremental chains Taylor Blau
2026-03-30 22:33   ` Jeff King
2026-03-31 21:43     ` Taylor Blau
2026-03-29 21:40 ` [PATCH 02/16] midx: use `string_list` for retained MIDX files Taylor Blau
2026-03-30 22:38   ` Jeff King
2026-03-31 21:49     ` Taylor Blau
2026-03-29 21:40 ` [PATCH 03/16] strvec: introduce `strvec_init_alloc()` Taylor Blau
2026-03-30 22:46   ` Jeff King
2026-03-29 21:41 ` [PATCH 04/16] midx: use `strvec` for `keep_hashes` Taylor Blau
2026-03-30 23:01   ` Jeff King
2026-03-31 22:26     ` Taylor Blau
2026-03-31 22:50       ` Taylor Blau
2026-03-31 23:17         ` Jeff King
2026-04-01 15:41           ` Taylor Blau
2026-04-01 19:25             ` Jeff King
2026-03-29 21:41 ` [PATCH 05/16] midx: introduce `--checksum-only` for incremental MIDX writes Taylor Blau
2026-03-30 23:15   ` Jeff King
2026-04-02 22:51     ` Taylor Blau
2026-03-29 21:41 ` [PATCH 06/16] midx: support custom `--base` " Taylor Blau
2026-04-07  5:57   ` Jeff King
2026-04-14 22:09     ` Taylor Blau
2026-03-29 21:41 ` [PATCH 07/16] repack: track the ODB source via existing_packs Taylor Blau
2026-04-07  6:04   ` Jeff King
2026-04-14 22:24     ` Taylor Blau
2026-03-29 21:41 ` [PATCH 08/16] midx: expose `midx_layer_contains_pack()` Taylor Blau
2026-04-07  6:05   ` Jeff King
2026-03-29 21:41 ` [PATCH 09/16] repack-midx: factor out `repack_prepare_midx_command()` Taylor Blau
2026-03-29 21:41 ` [PATCH 10/16] repack-midx: extract `repack_fill_midx_stdin_packs()` Taylor Blau
2026-04-07  6:08   ` Jeff King
2026-03-29 21:41 ` [PATCH 11/16] repack-geometry: prepare for incremental MIDX repacking Taylor Blau
2026-04-07  6:10   ` Jeff King
2026-04-16 22:51   ` Elijah Newren
2026-04-21 19:34     ` Taylor Blau
2026-03-29 21:41 ` [PATCH 12/16] builtin/repack.c: convert `--write-midx` to an `OPT_CALLBACK` Taylor Blau
2026-04-07  6:18   ` Jeff King
2026-03-29 21:41 ` [PATCH 13/16] packfile: ensure `close_pack_revindex()` frees in-memory revindex Taylor Blau
2026-04-07  6:29   ` Jeff King
2026-03-29 21:41 ` [PATCH 14/16] repack: implement incremental MIDX repacking Taylor Blau
2026-04-16 22:53   ` Elijah Newren
2026-04-21 19:40     ` Taylor Blau
2026-03-29 21:41 ` [PATCH 15/16] repack: introduce `--write-midx=incremental` Taylor Blau
2026-04-16 22:53   ` Elijah Newren
2026-04-21 19:52     ` Taylor Blau
2026-03-29 21:41 ` [PATCH 16/16] repack: allow `--write-midx=incremental` without `--geometric` Taylor Blau
2026-04-14 22:38 ` [PATCH 00/16] repack: incremental MIDX/bitmap-based repacking Taylor Blau
2026-04-21 20:37 ` [PATCH v2 " Taylor Blau
2026-04-21 20:37   ` [PATCH v2 01/16] midx-write: handle noop writes when converting incremental chains Taylor Blau
2026-04-21 20:37   ` [PATCH v2 02/16] midx: use `strset` for retained MIDX files Taylor Blau
2026-04-21 20:37   ` [PATCH v2 03/16] midx: build `keep_hashes` array in order Taylor Blau
2026-04-21 20:37   ` [PATCH v2 04/16] midx: use `strvec` for `keep_hashes` Taylor Blau
2026-04-21 20:37   ` [PATCH v2 05/16] midx: introduce `--no-write-chain-file` for incremental MIDX writes Taylor Blau
2026-04-21 20:37   ` [PATCH v2 06/16] midx: support custom `--base` " Taylor Blau
2026-04-21 20:37   ` [PATCH v2 07/16] repack: track the ODB source via existing_packs Taylor Blau
2026-04-21 20:37   ` [PATCH v2 08/16] midx: expose `midx_layer_contains_pack()` Taylor Blau
2026-04-21 20:37   ` [PATCH v2 09/16] repack-midx: factor out `repack_prepare_midx_command()` Taylor Blau
2026-04-21 20:37   ` [PATCH v2 10/16] repack-midx: extract `repack_fill_midx_stdin_packs()` Taylor Blau
2026-04-29  8:08     ` Jeff King
2026-04-29 22:40       ` Taylor Blau
2026-04-21 20:37   ` [PATCH v2 11/16] repack-geometry: prepare for incremental MIDX repacking Taylor Blau
2026-04-21 20:37   ` [PATCH v2 12/16] builtin/repack.c: convert `--write-midx` to an `OPT_CALLBACK` Taylor Blau
2026-04-21 20:37   ` [PATCH v2 13/16] packfile: ensure `close_pack_revindex()` frees in-memory revindex Taylor Blau
2026-04-21 20:37   ` [PATCH v2 14/16] repack: implement incremental MIDX repacking Taylor Blau
2026-04-29  7:51     ` Jeff King
2026-04-29 23:36       ` Taylor Blau
2026-04-29  8:10     ` Jeff King
2026-04-29 23:39       ` Taylor Blau
2026-04-21 20:37   ` [PATCH v2 15/16] repack: introduce `--write-midx=incremental` Taylor Blau
2026-04-21 21:02     ` Taylor Blau
2026-04-21 20:38   ` [PATCH v2 16/16] repack: allow `--write-midx=incremental` without `--geometric` Taylor Blau
2026-04-22 14:45   ` [PATCH v2 00/16] repack: incremental MIDX/bitmap-based repacking Elijah Newren
2026-04-29  8:10   ` Jeff King
2026-04-30  0:13 ` [PATCH v3 " Taylor Blau
2026-04-30  0:13   ` [PATCH v3 01/16] midx-write: handle noop writes when converting incremental chains Taylor Blau
2026-04-30  0:13   ` [PATCH v3 02/16] midx: use `strset` for retained MIDX files Taylor Blau
2026-04-30  0:13   ` [PATCH v3 03/16] midx: build `keep_hashes` array in order Taylor Blau
2026-04-30  0:13   ` [PATCH v3 04/16] midx: use `strvec` for `keep_hashes` Taylor Blau
2026-04-30  0:13   ` [PATCH v3 05/16] midx: introduce `--no-write-chain-file` for incremental MIDX writes Taylor Blau
2026-04-30  0:13   ` [PATCH v3 06/16] midx: support custom `--base` " Taylor Blau
2026-04-30  0:13   ` [PATCH v3 07/16] repack: track the ODB source via existing_packs Taylor Blau
2026-04-30  0:13   ` [PATCH v3 08/16] midx: expose `midx_layer_contains_pack()` Taylor Blau
2026-04-30  0:13   ` [PATCH v3 09/16] repack-midx: factor out `repack_prepare_midx_command()` Taylor Blau
2026-05-13 21:45     ` SZEDER Gábor
2026-04-30  0:13   ` [PATCH v3 10/16] repack-midx: extract `repack_fill_midx_stdin_packs()` Taylor Blau
2026-04-30  0:13   ` [PATCH v3 11/16] repack-geometry: prepare for incremental MIDX repacking Taylor Blau
2026-04-30  0:13   ` [PATCH v3 12/16] builtin/repack.c: convert `--write-midx` to an `OPT_CALLBACK` Taylor Blau
2026-04-30  0:13   ` [PATCH v3 13/16] packfile: ensure `close_pack_revindex()` frees in-memory revindex Taylor Blau
2026-04-30  0:13   ` [PATCH v3 14/16] repack: implement incremental MIDX repacking Taylor Blau
2026-04-30  0:13   ` [PATCH v3 15/16] repack: introduce `--write-midx=incremental` Taylor Blau
2026-05-13 23:08     ` Jeff King
2026-04-30  0:13   ` [PATCH v3 16/16] repack: allow `--write-midx=incremental` without `--geometric` Taylor Blau
2026-05-01  6:46   ` [PATCH v3 00/16] repack: incremental MIDX/bitmap-based repacking Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1774820449.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox