public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>, Elijah Newren <newren@gmail.com>,
	Patrick Steinhardt <ps@pks.im>,
	Junio C Hamano <gitster@pobox.com>
Subject: [RFC PATCH 00/14] repack: incremental MIDX/bitmap-based repacking
Date: Tue, 24 Feb 2026 19:20:41 -0500	[thread overview]
Message-ID: <cover.1771978829.git.me@ttaylorr.com> (raw)

[NOTE: this series has not yet been tested thoroughly and is based on
the v3 of 'tb/incremental-midx-part-3.2', which is d54da84bd9d (midx:
enable reachability bitmaps during MIDX compaction, 2026-02-24).

I suspect that there will be non-trivial changes to the approach in this
series based on feedback, hence the RFC.]

Note to the maintainer:

 * This series is based on 'tb/incremental-midx-part-3.2', I suggest
   queueing as 'tb/incremental-midx-part-3.3'.

OVERVIEW
========

This series implements the third and final major component for an
incremental MIDX/bitmap-based repacking strategy. As a refresher, those
are:

 1. Refactoring code out of builtin/repack.c into individual compilation
    units outside of the builtin directory.

 2. Implementing MIDX layer compaction, i.e., the ability to combine a
    contiguous sequence of MIDX layers in an existing chain and replace
    them with a single layer representing the union of objects/packs
    among the compacted layers.

 3. (this series) An incremental repacking strategy that, unlike our
    current '--geometric' approach, does not rely on periodic
    all-into-one repacks.

BACKGROUND
==========

Today, a '--geometric' repack with '--write-midx' rewrites the entire
MIDX (and, when enabled, its reachability bitmap) on every invocation.
For most repositories this is acceptable. In large monorepos, these
costs can add up significantly, especially when those repositories are
repacked frequently.

The incremental MIDX support introduced in the earlier parts of this
effort allows us to append new layers to a MIDX chain without rewriting
anything. Combined with the support for MIDX compaction in 3.2, we now
have all of the building blocks needed to maintain an incrementally
growing and shrinking MIDX chain as part of the repack cycle.

STRATEGY
========

The incremental repacking strategy implemented in this series works
(roughly) as follows:

 1. Repack non-MIDX'd packs using an ordinary '--geometric' repack,
    optionally including packs from the tip MIDX layer if and only if it
    contains more than 'repack.midxNewLayerThreshold' number of packs.

 2. Prepare to write a new MIDX layer containing the freshly written
    pack(s) (and any surviving packs from the tip layer, if it was
    rewritten).

 3. Perform a compaction pass over adjacent MIDX layers to restore a
    geometric progression on object count among layers in the chain
    (determined by 'repack.midxSplitFactor').

 4. Write the new MIDX chain, link it into place, and remove redundant
    layers.

In effect, this approach encourages MIDX chains where:

 * older layers contain fewer, larger packs, and

 * newer layers contain many smaller packs.

As a result of the compaction pass, we prevent the chain itself from
ever growing too long.

This roughly matches the description of this algorithm I gave at Git
Merge last year, which is covered on slides 80-131 of this presentation:

    https://ttaylorr.com/presentations/git-merge-2025.pdf

THIS SERIES
===========

This series is organized roughly as follows:

 * The first three patches perform small-ish quality-of-life refactors
   within the MIDX machinery.

 * The next two patches introduce `--checksum-only` and `--base`
   options for `git multi-pack-index write` and `compact`, which are
   needed by the incremental repacking machinery to assemble MIDX
   chains without prematurely updating the chain file.

 * The next six patches prepare the repack infrastructure and pack
   geometry code for the new repacking strategy.

 * The final three patches introduce the new repacking machinery, expose
   it to users, and then extend it to work with non-geometric repacks in
   that order. The first two are the substantive patches of this series.

WHERE WE'RE AT
==============

This series delivers the final substantive component of this overall
effort to enable the new repacking strategy implemented here. There are
a couple of (comparatively much smaller) items that will be useful as
follow-on items, including:

 * "Reachability-infused" geometric repacking to emit small cruft packs
   to introduce other ways to update the set of cruft objects beyond a
   whole-repository traversal.

 * Richer bitmap configuration to determine which bitmap(s) to carry
   forward between adjacent MIDX layers when doing an incremental
   repack, to prevent an endless accumulation of reachability bitmaps.

Like I mentioned earlier, I think that both of those are significantly
smaller challenges than this and the previous series. As a general note,
I am going on vacation for two weeks starting this Friday (2026-02-24)
and will be away from the list.

I'll plan on picking up any review from this series as well as the two
other topics listed above as soon as I get back.

Thanks in advance for your review!

Taylor Blau (14):
  midx: use `string_list` for retained MIDX files
  strvec: introduce `strvec_init_alloc()`
  midx: use `strvec` for `keep_hashes`
  midx: introduce `--checksum-only` for incremental MIDX writes
  midx: support custom `--base` for incremental MIDX writes
  repack: track the ODB source via existing_packs
  midx: expose `midx_layer_contains_pack()`
  repack-midx: factor out `repack_prepare_midx_command()`
  repack-midx: extract `repack_fill_midx_stdin_packs()`
  repack-geometry: prepare for incremental MIDX repacking
  builtin/repack.c: convert `--write-midx` to an `OPT_CALLBACK`
  repack: implement incremental MIDX repacking
  repack: introduce `--write-midx=incremental`
  repack: allow `--write-midx=incremental` without `--geometric`

 Documentation/config/repack.adoc        |  18 +
 Documentation/git-multi-pack-index.adoc |  19 +-
 Documentation/git-repack.adoc           |  44 +-
 builtin/multi-pack-index.c              |  48 +-
 builtin/repack.c                        |  90 ++-
 midx-write.c                            | 119 ++--
 midx.c                                  | 104 ++--
 midx.h                                  |  11 +-
 repack-geometry.c                       |  50 +-
 repack-midx.c                           | 704 +++++++++++++++++++++++-
 repack.c                                |  23 +-
 repack.h                                |  25 +-
 strvec.c                                |   7 +
 strvec.h                                |   5 +
 t/meson.build                           |   1 +
 t/t5334-incremental-multi-pack-index.sh |  47 ++
 t/t5335-compact-multi-pack-index.sh     | 105 ++++
 t/t7705-repack-incremental-midx.sh      | 461 ++++++++++++++++
 18 files changed, 1730 insertions(+), 151 deletions(-)
 create mode 100755 t/t7705-repack-incremental-midx.sh


base-commit: ac6221686db11ba802ced8ea6b8b2fa8388b21b2
-- 
2.53.0.185.g29bc4dff628

             reply	other threads:[~2026-02-25  0:20 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-25  0:20 Taylor Blau [this message]
2026-02-25  0:20 ` [RFC PATCH 06/14] repack: track the ODB source via existing_packs Taylor Blau
2026-02-25  0:21   ` Taylor Blau
2026-02-25  0:23   ` Taylor Blau
2026-02-25  0:20 ` [RFC PATCH 01/14] midx: use `string_list` for retained MIDX files Taylor Blau
2026-02-26 20:29   ` Junio C Hamano
2026-02-27  3:02     ` Taylor Blau
2026-02-25  0:21 ` [RFC PATCH 02/14] strvec: introduce `strvec_init_alloc()` Taylor Blau
2026-02-26 20:34   ` Junio C Hamano
2026-02-26 20:58     ` Junio C Hamano
2026-02-27  3:07       ` Taylor Blau
2026-02-25  0:21 ` [RFC PATCH 03/14] midx: use `strvec` for `keep_hashes` Taylor Blau
2026-02-25  0:21 ` [RFC PATCH 04/14] midx: introduce `--checksum-only` for incremental MIDX writes Taylor Blau
2026-02-25  0:21 ` [RFC PATCH 05/14] midx: support custom `--base` " Taylor Blau
2026-02-25  0:21 ` [RFC PATCH 07/14] midx: expose `midx_layer_contains_pack()` Taylor Blau
2026-02-25  0:21 ` [RFC PATCH 08/14] repack-midx: factor out `repack_prepare_midx_command()` Taylor Blau
2026-02-25  0:21 ` [RFC PATCH 09/14] repack-midx: extract `repack_fill_midx_stdin_packs()` Taylor Blau
2026-02-25  0:21 ` [RFC PATCH 10/14] repack-geometry: prepare for incremental MIDX repacking Taylor Blau
2026-02-25  0:21 ` [RFC PATCH 11/14] builtin/repack.c: convert `--write-midx` to an `OPT_CALLBACK` Taylor Blau
2026-02-25  0:21 ` [RFC PATCH 12/14] repack: implement incremental MIDX repacking Taylor Blau
2026-02-25  0:22 ` [RFC PATCH 13/14] repack: introduce `--write-midx=incremental` Taylor Blau
2026-02-25  0:22 ` [RFC PATCH 14/14] repack: allow `--write-midx=incremental` without `--geometric` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1771978829.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox