public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>, Elijah Newren <newren@gmail.com>,
	Patrick Steinhardt <ps@pks.im>,
	Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v3 00/17] midx: incremental MIDX/bitmap layer compaction
Date: Tue, 24 Feb 2026 13:59:30 -0500	[thread overview]
Message-ID: <cover.1771959555.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1765053054.git.me@ttaylorr.com>

[Note to the maintainer: this is based on 'master', which is 7c02d39fc2e
(The 6th batch, 2026-02-20) at the time of writing.]

This is a tiny reroll of my series to implement MIDX layer compaction
adjusted in response to reviewer feedback.

As usual, a range-diff is included below for convenience. The main
changes since last time are as follows:

 - Some typo and documentation fixes around the new "midx.version"
   setting.

 - Bugfix when adding objects from MIDXs in the compaction range by
   fanout layer where we recur beyond the lower end of the compaction
   boundary.

 - Drop u32_add().

I think that this should be fairly close to done, at which point I can
send the next series (tb/incremental-midx-part-3.3) which introduces the
new compaction-based repacking strategy on top of the tools introduced
in this series.

Like Peff wrote in his review of the last round, the proof here will be
in the pudding, and I am sure that as we continue to test and use these
patches we'll find and fix interesting bugs. But these changes are all
fairly well isolated to the compaction logic itself, so the risk of
regression among any existing use-cases should be low.

The original cover letter may be found here[1].

Thanks in advance for your review!

[1]: https://lore.kernel.org/git/cover.1765053054.git.me@ttaylorr.com/

Taylor Blau (17):
  midx: mark `get_midx_checksum()` arguments as const
  midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()`
  midx: introduce `midx_get_checksum_hex()`
  builtin/multi-pack-index.c: make '--progress' a common option
  git-multi-pack-index(1): remove non-existent incompatibility
  git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h'
  t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39
  midx-write.c: don't use `pack_perm` when assigning `bitmap_pos`
  midx-write.c: introduce `struct write_midx_opts`
  midx: do not require packs to be sorted in lexicographic order
  midx-write.c: introduce `midx_pack_perm()` helper
  midx-write.c: extract `fill_pack_from_midx()`
  midx-write.c: enumerate `pack_int_id` values directly
  midx-write.c: factor fanout layering from `compute_sorted_entries()`
  t/helper/test-read-midx.c: plug memory leak when selecting layer
  midx: implement MIDX compaction
  midx: enable reachability bitmaps during MIDX compaction

 Documentation/git-multi-pack-index.adoc |  27 +-
 Documentation/gitformat-pack.adoc       |   8 +-
 builtin/multi-pack-index.c              |  91 +++-
 midx-write.c                            | 525 ++++++++++++++++++------
 midx.c                                  |  39 +-
 midx.h                                  |  12 +-
 pack-bitmap.c                           |   9 +-
 pack-revindex.c                         |   4 +-
 t/helper/test-read-midx.c               |  21 +-
 t/meson.build                           |   1 +
 t/t0450/adoc-help-mismatches            |   1 -
 t/t5319-multi-pack-index.sh             |  18 +-
 t/t5335-compact-multi-pack-index.sh     | 293 +++++++++++++
 13 files changed, 891 insertions(+), 158 deletions(-)
 create mode 100755 t/t5335-compact-multi-pack-index.sh

Range-diff against v2:
 1:  2e549ea6443 =  1:  61045c604c0 midx: mark `get_midx_checksum()` arguments as const
 2:  7255adafe70 =  2:  61dd8e65d1a midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()`
 3:  25b628fda97 =  3:  cc5c7783062 midx: introduce `midx_get_checksum_hex()`
 4:  2aedd72db8c =  4:  016420264d2 builtin/multi-pack-index.c: make '--progress' a common option
 5:  a00598a36a3 =  5:  feb9ca5538e git-multi-pack-index(1): remove non-existent incompatibility
 6:  92e6d868a45 =  6:  1e86068046d git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h'
 7:  ff599c11f68 =  7:  06a96e16c37 t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39
 8:  315a0ea2985 =  8:  84d9d1ef7ba midx-write.c: don't use `pack_perm` when assigning `bitmap_pos`
 9:  af174e22e1e =  9:  132e823e758 midx-write.c: introduce `struct write_midx_opts`
10:  72bcd4ed6c7 ! 10:  3682bfd0e08 midx: do not require packs to be sorted in lexicographic order
    @@ Commit message
         lazily instantiate a `pack_names_sorted` array on the MIDX, which will
         be used to implement the binary search over pack names.
     
    -    Because this change produces MIDXs which may not be correctly read with
    -    external tools or older versions of Git. Though older versions of Git
    -    know how to gracefully degrade and ignore any MIDX(s) they consider
    -    corrupt, external tools may not be as robust. To avoid unintentionally
    -    breaking any such tools, guard this change behind a version bump in the
    -    MIDX's on-disk format.
    +    This change produces MIDXs which may not be correctly read with external
    +    tools or older versions of Git. Though older versions of Git know how to
    +    gracefully degrade and ignore any MIDX(s) they consider corrupt,
    +    external tools may not be as robust. To avoid unintentionally breaking
    +    any such tools, guard this change behind a version bump in the MIDX's
    +    on-disk format.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    @@ Documentation/gitformat-pack.adoc: HEADER:
      
      	1-byte version number:
     -	    Git only writes or recognizes version 1.
    -+	    Git only writes version 2, but recognizes versions 1 and 2.
    ++	    Git writes the version specified by the "midx.version"
    ++	    configuration option, which defaults to 2. It recognizes
    ++	    both versions 1 and 2.
      
      	1-byte Object Id Version
      	    We infer the length of object IDs (OIDs) from this value:
11:  c0c1769464b <  -:  ----------- git-compat-util.h: introduce `u32_add()`
12:  c11214a51f0 = 11:  7f99d3d728a midx-write.c: introduce `midx_pack_perm()` helper
13:  b9244a04297 = 12:  f3952f7db36 midx-write.c: extract `fill_pack_from_midx()`
14:  c6f8d323477 = 13:  238bd203eaa midx-write.c: enumerate `pack_int_id` values directly
15:  e71aa575463 = 14:  3a139575b15 midx-write.c: factor fanout layering from `compute_sorted_entries()`
16:  dbbcb494563 = 15:  505c8d72aa1 t/helper/test-read-midx.c: plug memory leak when selecting layer
17:  13336e864f4 ! 16:  19c983138c7 midx: implement MIDX compaction
    @@ Commit message
            `pack_int_id`'s (as opposed to the index at which each pack appears
            in `ctx.info`).
     
    +       Note that we cannot reuse `midx_fanout_add_midx_fanout()` directly
    +       here, as it unconditionally recurs through the `->base_midx`. Factor
    +       out a `_1()` variant that operates on a single layer, reimplement
    +       the existing function in terms of it, and use the new variant from
    +       `midx_fanout_add_compact()`.
    +
    +       Since we are sorting the list of objects ourselves, the order we add
    +       them in does not matter.
    +
          - When writing out the new 'multi-pack-index-chain' file, discard any
            layers in the compaction range, replacing them with the newly written
            layer, instead of keeping them and placing the new layer at the end
    @@ midx-write.c: struct write_midx_context {
      	return ctx->pack_perm[orig_pack_int_id];
      }
      
    +@@ midx-write.c: static void midx_fanout_sort(struct midx_fanout *fanout)
    + 	QSORT(fanout->entries, fanout->nr, midx_oid_compare);
    + }
    + 
    +-static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
    +-					struct multi_pack_index *m,
    +-					uint32_t cur_fanout,
    +-					uint32_t preferred_pack)
    ++static void midx_fanout_add_midx_fanout_1(struct midx_fanout *fanout,
    ++					  struct multi_pack_index *m,
    ++					  uint32_t cur_fanout,
    ++					  uint32_t preferred_pack)
    + {
    + 	uint32_t start = m->num_objects_in_base, end;
    + 	uint32_t cur_object;
    + 
    +-	if (m->base_midx)
    +-		midx_fanout_add_midx_fanout(fanout, m->base_midx, cur_fanout,
    +-					    preferred_pack);
    +-
    + 	if (cur_fanout)
    + 		start += ntohl(m->chunk_oid_fanout[cur_fanout - 1]);
    + 	end = m->num_objects_in_base + ntohl(m->chunk_oid_fanout[cur_fanout]);
    +@@ midx-write.c: static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
    + 	}
    + }
    + 
    ++static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
    ++					struct multi_pack_index *m,
    ++					uint32_t cur_fanout,
    ++					uint32_t preferred_pack)
    ++{
    ++	if (m->base_midx)
    ++		midx_fanout_add_midx_fanout(fanout, m->base_midx, cur_fanout,
    ++					    preferred_pack);
    ++	midx_fanout_add_midx_fanout_1(fanout, m, cur_fanout, preferred_pack);
    ++}
    ++
    + static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout,
    + 					struct pack_info *info,
    + 					uint32_t cur_pack,
     @@ midx-write.c: static void midx_fanout_add(struct midx_fanout *fanout,
      					    cur_fanout);
      }
    @@ midx-write.c: static void midx_fanout_add(struct midx_fanout *fanout,
     +	ASSERT(ctx->compact);
     +
     +	while (m && m != ctx->compact_from->base_midx) {
    -+		midx_fanout_add_midx_fanout(fanout, m, cur_fanout,
    -+					    NO_PREFERRED_PACK);
    ++		midx_fanout_add_midx_fanout_1(fanout, m, cur_fanout,
    ++					      NO_PREFERRED_PACK);
     +		m = m->base_midx;
     +	}
     +}
    @@ midx-write.c: static int fill_packs_from_midx(struct write_midx_context *ctx)
     +
     +	ASSERT(from && to);
     +
    -+	nr = u32_add(to->num_packs, to->num_packs_in_base);
    ++	if (unsigned_add_overflows(to->num_packs, to->num_packs_in_base))
    ++		die(_("too many packs, unable to compact"));
    ++
    ++	nr = to->num_packs + to->num_packs_in_base;
     +	if (nr < from->num_packs_in_base)
     +		BUG("unexpected number of packs in base during compaction: "
     +		    "%"PRIu32" < %"PRIu32, nr, from->num_packs_in_base);
18:  b599f1ad4b0 = 17:  ac6221686db midx: enable reachability bitmaps during MIDX compaction

base-commit: 7c02d39fc2ed2702223c7674f73150d9a7e61ba4
-- 
2.53.0.171.gde83996e422

  parent reply	other threads:[~2026-02-24 18:59 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
2025-12-06 20:31 ` [PATCH 01/17] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
2025-12-08 18:26   ` Patrick Steinhardt
2025-12-09  1:41     ` Taylor Blau
2025-12-06 20:31 ` [PATCH 02/17] midx: split `get_midx_checksum()` by adding `get_midx_hash()` Taylor Blau
2025-12-08 18:25   ` Patrick Steinhardt
2025-12-09  1:42     ` Taylor Blau
2025-12-09  1:50       ` Taylor Blau
2025-12-09  6:27         ` Patrick Steinhardt
2026-01-13 22:46           ` Taylor Blau
2025-12-06 20:31 ` [PATCH 03/17] builtin/multi-pack-index.c: make '--progress' a common option Taylor Blau
2025-12-06 20:31 ` [PATCH 04/17] git-multi-pack-index(1): remove non-existent incompatibility Taylor Blau
2025-12-06 20:31 ` [PATCH 05/17] git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h' Taylor Blau
2025-12-06 20:31 ` [PATCH 06/17] t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39 Taylor Blau
2025-12-06 20:31 ` [PATCH 07/17] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos` Taylor Blau
2025-12-08 18:26   ` Patrick Steinhardt
2025-12-09  1:59     ` Taylor Blau
2025-12-06 20:31 ` [PATCH 08/17] midx-write.c: introduce `struct write_midx_opts` Taylor Blau
2025-12-08 18:26   ` Patrick Steinhardt
2025-12-09  2:04     ` Taylor Blau
2025-12-06 20:31 ` [PATCH 09/17] midx: do not require packs to be sorted in lexicographic order Taylor Blau
2025-12-08 18:26   ` Patrick Steinhardt
2025-12-09  2:07     ` Taylor Blau
2025-12-09  2:11       ` Taylor Blau
2025-12-06 20:31 ` [PATCH 10/17] git-compat-util.h: introduce `u32_add()` Taylor Blau
2025-12-08 18:27   ` Patrick Steinhardt
2025-12-09  2:13     ` Taylor Blau
2025-12-06 20:31 ` [PATCH 11/17] midx-write.c: introduce `midx_pack_perm()` helper Taylor Blau
2025-12-06 20:31 ` [PATCH 12/17] midx-write.c: extract `fill_pack_from_midx()` Taylor Blau
2025-12-06 20:31 ` [PATCH 13/17] midx-write.c: enumerate `pack_int_id` values directly Taylor Blau
2025-12-08 18:27   ` Patrick Steinhardt
2025-12-09  2:14     ` Taylor Blau
2025-12-06 20:31 ` [PATCH 14/17] midx-write.c: factor fanout layering from `compute_sorted_entries()` Taylor Blau
2025-12-06 20:31 ` [PATCH 15/17] t/helper/test-read-midx.c: plug memory leak when selecting layer Taylor Blau
2025-12-08 18:27   ` Patrick Steinhardt
2025-12-09  2:16     ` Taylor Blau
2025-12-06 20:31 ` [PATCH 16/17] midx: implement MIDX compaction Taylor Blau
2025-12-09  7:21   ` Patrick Steinhardt
2026-01-13 23:32     ` Taylor Blau
2025-12-06 20:31 ` [PATCH 17/17] midx: enable reachability bitmaps during " Taylor Blau
2025-12-09  7:21   ` Patrick Steinhardt
2026-01-13 23:47     ` Taylor Blau
2026-01-14 19:54 ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Taylor Blau
2026-01-14 19:54   ` [PATCH v2 01/18] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
2026-01-14 19:54   ` [PATCH v2 02/18] midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()` Taylor Blau
2026-01-14 19:54   ` [PATCH v2 03/18] midx: introduce `midx_get_checksum_hex()` Taylor Blau
2026-01-14 19:54   ` [PATCH v2 04/18] builtin/multi-pack-index.c: make '--progress' a common option Taylor Blau
2026-01-14 19:54   ` [PATCH v2 05/18] git-multi-pack-index(1): remove non-existent incompatibility Taylor Blau
2026-01-14 19:54   ` [PATCH v2 06/18] git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h' Taylor Blau
2026-01-14 19:54   ` [PATCH v2 07/18] t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39 Taylor Blau
2026-01-14 19:54   ` [PATCH v2 08/18] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos` Taylor Blau
2026-01-14 21:13     ` Junio C Hamano
2026-01-14 21:40       ` Taylor Blau
2026-01-14 19:54   ` [PATCH v2 09/18] midx-write.c: introduce `struct write_midx_opts` Taylor Blau
2026-01-14 19:54   ` [PATCH v2 10/18] midx: do not require packs to be sorted in lexicographic order Taylor Blau
2026-01-14 21:28     ` Junio C Hamano
2026-01-14 21:44       ` Taylor Blau
2026-01-27  7:34     ` Patrick Steinhardt
2026-02-24 18:47       ` Taylor Blau
2026-01-14 19:54   ` [PATCH v2 11/18] git-compat-util.h: introduce `u32_add()` Taylor Blau
2026-01-14 21:49     ` Junio C Hamano
2026-01-14 22:03       ` Taylor Blau
2026-01-15  0:11         ` Taylor Blau
2026-01-21  8:51           ` Patrick Steinhardt
2026-01-21 23:55             ` Taylor Blau
2026-01-22  2:26               ` rsbecker
2026-01-22 17:07                 ` Junio C Hamano
2026-02-23 13:49               ` Jeff King
2026-02-24 18:53                 ` Taylor Blau
2026-01-14 19:54   ` [PATCH v2 12/18] midx-write.c: introduce `midx_pack_perm()` helper Taylor Blau
2026-01-14 19:54   ` [PATCH v2 13/18] midx-write.c: extract `fill_pack_from_midx()` Taylor Blau
2026-01-14 19:54   ` [PATCH v2 14/18] midx-write.c: enumerate `pack_int_id` values directly Taylor Blau
2026-01-14 19:55   ` [PATCH v2 15/18] midx-write.c: factor fanout layering from `compute_sorted_entries()` Taylor Blau
2026-01-14 19:55   ` [PATCH v2 16/18] t/helper/test-read-midx.c: plug memory leak when selecting layer Taylor Blau
2026-01-14 19:55   ` [PATCH v2 17/18] midx: implement MIDX compaction Taylor Blau
2026-01-27  7:35     ` Patrick Steinhardt
2026-01-27 22:13       ` Taylor Blau
2026-01-14 19:55   ` [PATCH v2 18/18] midx: enable reachability bitmaps during " Taylor Blau
2026-02-20 22:24   ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Junio C Hamano
2026-02-23 14:08     ` Jeff King
2026-02-24  5:25       ` Taylor Blau
2026-02-24 18:59 ` Taylor Blau [this message]
2026-02-24 18:59   ` [PATCH v3 01/17] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
2026-02-24 18:59   ` [PATCH v3 02/17] midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()` Taylor Blau
2026-02-24 18:59   ` [PATCH v3 03/17] midx: introduce `midx_get_checksum_hex()` Taylor Blau
2026-02-24 18:59   ` [PATCH v3 04/17] builtin/multi-pack-index.c: make '--progress' a common option Taylor Blau
2026-02-24 18:59   ` [PATCH v3 05/17] git-multi-pack-index(1): remove non-existent incompatibility Taylor Blau
2026-02-24 18:59   ` [PATCH v3 06/17] git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h' Taylor Blau
2026-02-24 19:00   ` [PATCH v3 07/17] t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39 Taylor Blau
2026-02-24 19:00   ` [PATCH v3 08/17] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos` Taylor Blau
2026-02-24 19:00   ` [PATCH v3 09/17] midx-write.c: introduce `struct write_midx_opts` Taylor Blau
2026-02-24 19:00   ` [PATCH v3 10/17] midx: do not require packs to be sorted in lexicographic order Taylor Blau
2026-02-24 19:00   ` [PATCH v3 11/17] midx-write.c: introduce `midx_pack_perm()` helper Taylor Blau
2026-02-24 19:00   ` [PATCH v3 12/17] midx-write.c: extract `fill_pack_from_midx()` Taylor Blau
2026-02-24 19:00   ` [PATCH v3 13/17] midx-write.c: enumerate `pack_int_id` values directly Taylor Blau
2026-02-24 19:00   ` [PATCH v3 14/17] midx-write.c: factor fanout layering from `compute_sorted_entries()` Taylor Blau
2026-02-24 19:00   ` [PATCH v3 15/17] t/helper/test-read-midx.c: plug memory leak when selecting layer Taylor Blau
2026-02-24 19:00   ` [PATCH v3 16/17] midx: implement MIDX compaction Taylor Blau
2026-02-24 19:00   ` [PATCH v3 17/17] midx: enable reachability bitmaps during " Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1771959555.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox