public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction
@ 2025-12-06 20:30 Taylor Blau
  2025-12-06 20:31 ` [PATCH 01/17] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
                   ` (18 more replies)
  0 siblings, 19 replies; 99+ messages in thread
From: Taylor Blau @ 2025-12-06 20:30 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt

Note to the maintainer:

 * This series is the spiritual successor to my earlier series queued
   as 'tb/incremental-midx-part-3.1', but is based on 'master'
   (currently f0ef5b6d9bc (The fifth batch, 2025-11-30) at the time of
   writing). I suggest queueing as 'tb/incremental-midx-part-3.2'.

OVERVIEW
========

This series implements the second of three major components for an
incremental MIDX/bitmap-based repacking strategy. Those components
are:

 1. Refactoring code out of builtin/repack.c into individual
    compilation units outside of the builtin directory.

 2. (this series) Implementing MIDX layer compaction, i.e. the ability
    to take an contiguous sequence of MIDX layers in an existing MIDX
    chain and replace them with a single layer representing the union
    of objects/packs among the compacted layers.

 3. An incremental repacking strategy that, unlike our current
    '--geometric' approach, does not require periodic all-into-one
    repacks.

BACKGROUND
==========

I'll briefly discuss that repacking strategy here in order to provide
some context on why MIDX layer compaction is important. When doing a
--geometric repack today, any update to the MIDX and/or its .bitmap
requires a full rewrite. For most repositories this is not such a big
deal, but in large monorepos these costs can add up significantly over
time.

To deal with this, incremental MIDXs (and corresponding support for
reachability bitmaps) were introduced as a way to append updates to
the end of a MIDX chain without having to rewrite anything.

Of course, growing a MIDX chain infinitely without ever shrinking it
would be pointless. So any repacking strategy which leverages
incremental MIDXs must take some action which encourages the MIDX
chain to avoid growing too large.

The strategy that I will share in the forthcoming topic related to
this effort does (roughly) the following:

 1. Repack all non-MIDX'd packs geometrically, optionally including
    any packs in the most recent MIDX layer iff it has more than some
    threshold of packs.

 2. Write out a new MIDX layer which is the result of the previous
    step, adding it onto the end of the chain (or replacing the
    previous end, depending on whether the pack threshold condition
    was met).

 3. Perform a compaction pass, combining adjacent MIDX layers to
    restore a geometric progression on object count among MIDX layers
    in the chain.

(As an aside, this description is not the most efficient way to
implement this behavior, and so the actual implementation varies
slightly, though achieves the same effect.)

This strategy encourages MIDX chains where older layers have fewer,
larger packs, and newer layers have a greater number of smaller packs.
The compaction pass prevents the MIDX chain from ever growing too
large. (In my simulation[1], even a large number of simulated pushes
yields a MIDX chain which is consistently fewer than ~10 or so layers
long.)

COMPACTION
==========

MIDX compaction works by producing a new MIDX layer which is the
logical union of any contiguous sequence within the existing MIDX
chain. Importantly, the packs/objects from those layers within the
compaction range have their order preserved. That means that the
pseudo-pack order of the compacted layer is guaranteed to be the same
as the concatenated pseudo-pack order across all layers in the
compaction range.

As a result, we can reuse any selected bitmaps within the compaction
range verbatim. More importantly, any bitmaps belonging to layers
which are descendants of the compacted one do not need to permute any
existing bitmaps in order for them to be valid post-compaction.

This allows us to cheaply combine parts of the MIDX chain without
needing to do any work outside of the combine layers, which is
essential in the above-described repacking strategy.

THIS SERIES
===========

This series implements MIDX compaction, and is organized roughly as
follows:

 * The first two patches are stray code clean-up and minor refactoring
   changes that I made while writing this series.

 * The next three patches clean up the SYNOPSIS for in the manual page
   for git-multi-pack-index(1), and removes the builtin from the list
   of known failures in t0450.

 * The next patch is a typofix, and the one following that is a
   preemptive bugfix.

 * The eighth patch is a code refactoring patch that makes the
   argument list for midx-write.c::write_midx_internal() easier to
   reason about.

 * The ninth patch relaxes a MIDX constraint that forces all packs
   within a layer to be sorted lexicographically.

 * The remaining patches prepare for and introduce MIDX compaction.

The test coverage in the new t5335-compact-multi-pack-index.sh is
fairly sparse in my opinion, though most of the interesting behavior
is already exercised via t5319 and t5334. If others have ideas on how
to improve test coverage here, please let me know!

This series is somewhat lengthy, though the individual changes are
relatively small for the most part, so I am hoping that they aren't
too bad to review.

Thanks in advance for your review!

[1]: https://gist.github.com/ttaylorr/7ac0305f130197744f3583096c8198a1

Taylor Blau (17):
  midx: mark `get_midx_checksum()` arguments as const
  midx: split `get_midx_checksum()` by adding `get_midx_hash()`
  builtin/multi-pack-index.c: make '--progress' a common option
  git-multi-pack-index(1): remove non-existent incompatibility
  git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h'
  t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39
  midx-write.c: don't use `pack_perm` when assigning `bitmap_pos`
  midx-write.c: introduce `struct write_midx_opts`
  midx: do not require packs to be sorted in lexicographic order
  git-compat-util.h: introduce `u32_add()`
  midx-write.c: introduce `midx_pack_perm()` helper
  midx-write.c: extract `fill_pack_from_midx()`
  midx-write.c: enumerate `pack_int_id` values directly
  midx-write.c: factor fanout layering from `compute_sorted_entries()`
  t/helper/test-read-midx.c: plug memory leak when selecting layer
  midx: implement MIDX compaction
  midx: enable reachability bitmaps during MIDX compaction

 Documentation/git-multi-pack-index.adoc |  24 +-
 builtin/multi-pack-index.c              |  84 ++++-
 git-compat-util.h                       |   8 +
 midx-write.c                            | 463 ++++++++++++++++++------
 midx.c                                  |  36 +-
 midx.h                                  |   9 +-
 pack-bitmap.c                           |   9 +-
 pack-revindex.c                         |   4 +-
 t/helper/test-read-midx.c               |  21 +-
 t/meson.build                           |   1 +
 t/t0450/adoc-help-mismatches            |   1 -
 t/t5319-multi-pack-index.sh             |   7 +-
 t/t5335-compact-multi-pack-index.sh     | 218 +++++++++++
 13 files changed, 737 insertions(+), 148 deletions(-)
 create mode 100755 t/t5335-compact-multi-pack-index.sh


base-commit: f0ef5b6d9bcc258e4cbef93839d1b7465d5212b9
-- 
2.52.0.171.gd6a4e6b6955

^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH 01/17] midx: mark `get_midx_checksum()` arguments as const
  2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
  2025-12-08 18:26   ` Patrick Steinhardt
  2025-12-06 20:31 ` [PATCH 02/17] midx: split `get_midx_checksum()` by adding `get_midx_hash()` Taylor Blau
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 99+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt

To make clear that the fucntion `get_midx_checksum()` does not do
anything to modify its argument, mark the MIDX pointer as const.

The following commit will rename this function altogether to make clear
that it returns the raw bytes of the checksum, not a hex-encoded copy of
it.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 2 +-
 midx.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index 24e1e721754..6c01f0fa522 100644
--- a/midx.c
+++ b/midx.c
@@ -24,7 +24,7 @@ void clear_incremental_midx_files_ext(struct odb_source *source, const char *ext
 int cmp_idx_or_pack_name(const char *idx_or_pack_name,
 			 const char *idx_name);
 
-const unsigned char *get_midx_checksum(struct multi_pack_index *m)
+const unsigned char *get_midx_checksum(const struct multi_pack_index *m)
 {
 	return m->data + m->data_len - m->source->odb->repo->hash_algo->rawsz;
 }
diff --git a/midx.h b/midx.h
index 6e54d73503d..7c7e0b59121 100644
--- a/midx.h
+++ b/midx.h
@@ -85,7 +85,7 @@ struct multi_pack_index {
 #define MIDX_EXT_BITMAP "bitmap"
 #define MIDX_EXT_MIDX "midx"
 
-const unsigned char *get_midx_checksum(struct multi_pack_index *m);
+const unsigned char *get_midx_checksum(const struct multi_pack_index *m);
 void get_midx_filename(struct odb_source *source, struct strbuf *out);
 void get_midx_filename_ext(struct odb_source *source, struct strbuf *out,
 			   const unsigned char *hash, const char *ext);
-- 
2.52.0.171.gd6a4e6b6955


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 02/17] midx: split `get_midx_checksum()` by adding `get_midx_hash()`
  2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
  2025-12-06 20:31 ` [PATCH 01/17] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
  2025-12-08 18:25   ` Patrick Steinhardt
  2025-12-06 20:31 ` [PATCH 03/17] builtin/multi-pack-index.c: make '--progress' a common option Taylor Blau
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 99+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt

When trying to print out, say, the hexadecimal representation of a
MIDX's hash, our code will do something like:

    hash_to_hex_algop(get_midx_checksum(m),
                      m->source->odb->repo->hash_algo);

, which is both cumbersome and repetitive. In fact, all but a handful of
callers to `get_midx_checksum()` do exactly the above. Reduce the
repetitive nature of calling `get_midx_checksum()` by having it return a
pointer into a static buffer containing the above result.

For the handful of callers that do need to compare the raw bytes and
don't want to deal with an encoded copy (e.g., because they are passing
it to hasheq() or similar), introduce `get_midx_hash()` which returns
the raw bytes.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c              | 8 +++-----
 midx.c                    | 8 +++++++-
 midx.h                    | 3 ++-
 pack-bitmap.c             | 9 ++++-----
 pack-revindex.c           | 4 ++--
 t/helper/test-read-midx.c | 4 ++--
 6 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 23e61cb0001..73d24fabbc6 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -955,7 +955,7 @@ static int link_midx_to_chain(struct multi_pack_index *m)
 	}
 
 	for (i = 0; i < ARRAY_SIZE(midx_exts); i++) {
-		const unsigned char *hash = get_midx_checksum(m);
+		const unsigned char *hash = get_midx_hash(m);
 
 		get_midx_filename_ext(m->source, &from,
 				      hash, midx_exts[i].non_split);
@@ -1086,8 +1086,7 @@ static int write_midx_internal(struct odb_source *source,
 		while (m) {
 			if (flags & MIDX_WRITE_BITMAP && load_midx_revindex(m)) {
 				error(_("could not load reverse index for MIDX %s"),
-				      hash_to_hex_algop(get_midx_checksum(m),
-							m->source->odb->repo->hash_algo));
+				      get_midx_checksum(m));
 				goto cleanup;
 			}
 			ctx.num_multi_pack_indexes_before++;
@@ -1445,8 +1444,7 @@ static int write_midx_internal(struct odb_source *source,
 		for (uint32_t i = 0; i < ctx.num_multi_pack_indexes_before; i++) {
 			uint32_t j = ctx.num_multi_pack_indexes_before - i - 1;
 
-			keep_hashes[j] = xstrdup(hash_to_hex_algop(get_midx_checksum(m),
-								   r->hash_algo));
+			keep_hashes[j] = xstrdup(get_midx_checksum(m));
 			m = m->base_midx;
 		}
 
diff --git a/midx.c b/midx.c
index 6c01f0fa522..f9b11de9ca9 100644
--- a/midx.c
+++ b/midx.c
@@ -24,7 +24,13 @@ void clear_incremental_midx_files_ext(struct odb_source *source, const char *ext
 int cmp_idx_or_pack_name(const char *idx_or_pack_name,
 			 const char *idx_name);
 
-const unsigned char *get_midx_checksum(const struct multi_pack_index *m)
+const char *get_midx_checksum(const struct multi_pack_index *m)
+{
+	return hash_to_hex_algop(get_midx_hash(m),
+				 m->source->odb->repo->hash_algo);
+}
+
+const unsigned char *get_midx_hash(const struct multi_pack_index *m)
 {
 	return m->data + m->data_len - m->source->odb->repo->hash_algo->rawsz;
 }
diff --git a/midx.h b/midx.h
index 7c7e0b59121..e188ffeb578 100644
--- a/midx.h
+++ b/midx.h
@@ -85,7 +85,8 @@ struct multi_pack_index {
 #define MIDX_EXT_BITMAP "bitmap"
 #define MIDX_EXT_MIDX "midx"
 
-const unsigned char *get_midx_checksum(const struct multi_pack_index *m);
+const char *get_midx_checksum(const struct multi_pack_index *m) /* static buffer */;
+const unsigned char *get_midx_hash(const struct multi_pack_index *m);
 void get_midx_filename(struct odb_source *source, struct strbuf *out);
 void get_midx_filename_ext(struct odb_source *source, struct strbuf *out,
 			   const unsigned char *hash, const char *ext);
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 8ca79725b1d..f466ed2ddcb 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -441,11 +441,11 @@ char *midx_bitmap_filename(struct multi_pack_index *midx)
 	struct strbuf buf = STRBUF_INIT;
 	if (midx->has_chain)
 		get_split_midx_filename_ext(midx->source, &buf,
-					    get_midx_checksum(midx),
+					    get_midx_hash(midx),
 					    MIDX_EXT_BITMAP);
 	else
 		get_midx_filename_ext(midx->source, &buf,
-				      get_midx_checksum(midx),
+				      get_midx_hash(midx),
 				      MIDX_EXT_BITMAP);
 
 	return strbuf_detach(&buf, NULL);
@@ -502,7 +502,7 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
 	if (load_bitmap_header(bitmap_git) < 0)
 		goto cleanup;
 
-	if (!hasheq(get_midx_checksum(bitmap_git->midx), bitmap_git->checksum,
+	if (!hasheq(get_midx_hash(bitmap_git->midx), bitmap_git->checksum,
 		    bitmap_repo(bitmap_git)->hash_algo)) {
 		error(_("checksum doesn't match in MIDX and bitmap"));
 		goto cleanup;
@@ -2820,8 +2820,7 @@ void test_bitmap_walk(struct rev_info *revs)
 
 		if (bitmap_is_midx(found))
 			fprintf_ln(stderr, "Located via MIDX '%s'.",
-				   hash_to_hex_algop(get_midx_checksum(found->midx),
-						     revs->repo->hash_algo));
+				   get_midx_checksum(found->midx));
 		else
 			fprintf_ln(stderr, "Located via pack '%s'.",
 				   hash_to_hex_algop(found->pack->hash,
diff --git a/pack-revindex.c b/pack-revindex.c
index d0791cc4938..016195ceb93 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -390,11 +390,11 @@ int load_midx_revindex(struct multi_pack_index *m)
 
 	if (m->has_chain)
 		get_split_midx_filename_ext(m->source, &revindex_name,
-					    get_midx_checksum(m),
+					    get_midx_hash(m),
 					    MIDX_EXT_REV);
 	else
 		get_midx_filename_ext(m->source, &revindex_name,
-				      get_midx_checksum(m),
+				      get_midx_hash(m),
 				      MIDX_EXT_REV);
 
 	ret = load_revindex_from_disk(m->source->odb->repo->hash_algo,
diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index 6de5d1665af..dee603b3cd0 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -34,7 +34,7 @@ static int read_midx_file(const char *object_dir, const char *checksum,
 		return 1;
 
 	if (checksum) {
-		while (m && strcmp(hash_to_hex(get_midx_checksum(m)), checksum))
+		while (m && strcmp(get_midx_checksum(m), checksum))
 			m = m->base_midx;
 		if (!m)
 			return 1;
@@ -94,7 +94,7 @@ static int read_midx_checksum(const char *object_dir)
 	m = setup_midx(object_dir);
 	if (!m)
 		return 1;
-	printf("%s\n", hash_to_hex(get_midx_checksum(m)));
+	printf("%s\n", get_midx_checksum(m));
 
 	close_midx(m);
 	return 0;
-- 
2.52.0.171.gd6a4e6b6955


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 03/17] builtin/multi-pack-index.c: make '--progress' a common option
  2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
  2025-12-06 20:31 ` [PATCH 01/17] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
  2025-12-06 20:31 ` [PATCH 02/17] midx: split `get_midx_checksum()` by adding `get_midx_hash()` Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
  2025-12-06 20:31 ` [PATCH 04/17] git-multi-pack-index(1): remove non-existent incompatibility Taylor Blau
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt

All multi-pack-index sub-commands (write, verify, repack, and expire)
support a '--progress' command-line option, despite not listing it as
one of the common options in `common_opts`.

As a result each sub-command declares its own `OPT_BIT()` for a
"--progress" command-line option. Centralize this within the
`common_opts` to avoid re-declaring it in each sub-command.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.adoc |  2 ++
 builtin/multi-pack-index.c              | 10 ++--------
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-multi-pack-index.adoc b/Documentation/git-multi-pack-index.adoc
index 2f642697e9e..a4550e28bed 100644
--- a/Documentation/git-multi-pack-index.adoc
+++ b/Documentation/git-multi-pack-index.adoc
@@ -18,6 +18,8 @@ Write or verify a multi-pack-index (MIDX) file.
 OPTIONS
 -------
 
+The following command-line options are applicable to all sub-commands:
+
 --object-dir=<dir>::
 	Use given directory for the location of Git objects. We check
 	`<dir>/packs/multi-pack-index` for the current MIDX file, and
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 5f364aa816b..ca98d4c3ba3 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -84,6 +84,8 @@ static struct option common_opts[] = {
 	  N_("directory"),
 	  N_("object directory containing set of packfile and pack-index pairs"),
 	  parse_object_dir),
+	OPT_BIT(0, "progress", &opts.flags, N_("force progress reporting"),
+		MIDX_PROGRESS),
 	OPT_END(),
 };
 
@@ -138,8 +140,6 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
 			   N_("pack for reuse when computing a multi-pack bitmap")),
 		OPT_BIT(0, "bitmap", &opts.flags, N_("write multi-pack bitmap"),
 			MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
-		OPT_BIT(0, "progress", &opts.flags,
-			N_("force progress reporting"), MIDX_PROGRESS),
 		OPT_BIT(0, "incremental", &opts.flags,
 			N_("write a new incremental MIDX"), MIDX_WRITE_INCREMENTAL),
 		OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
@@ -200,8 +200,6 @@ static int cmd_multi_pack_index_verify(int argc, const char **argv,
 {
 	struct option *options;
 	static struct option builtin_multi_pack_index_verify_options[] = {
-		OPT_BIT(0, "progress", &opts.flags,
-			N_("force progress reporting"), MIDX_PROGRESS),
 		OPT_END(),
 	};
 	struct odb_source *source;
@@ -231,8 +229,6 @@ static int cmd_multi_pack_index_expire(int argc, const char **argv,
 {
 	struct option *options;
 	static struct option builtin_multi_pack_index_expire_options[] = {
-		OPT_BIT(0, "progress", &opts.flags,
-			N_("force progress reporting"), MIDX_PROGRESS),
 		OPT_END(),
 	};
 	struct odb_source *source;
@@ -264,8 +260,6 @@ static int cmd_multi_pack_index_repack(int argc, const char **argv,
 	static struct option builtin_multi_pack_index_repack_options[] = {
 		OPT_UNSIGNED(0, "batch-size", &opts.batch_size,
 		  N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")),
-		OPT_BIT(0, "progress", &opts.flags,
-		  N_("force progress reporting"), MIDX_PROGRESS),
 		OPT_END(),
 	};
 	struct odb_source *source;
-- 
2.52.0.171.gd6a4e6b6955


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 04/17] git-multi-pack-index(1): remove non-existent incompatibility
  2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                   ` (2 preceding siblings ...)
  2025-12-06 20:31 ` [PATCH 03/17] builtin/multi-pack-index.c: make '--progress' a common option Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
  2025-12-06 20:31 ` [PATCH 05/17] git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h' Taylor Blau
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt

Since fcb2205b774 (midx: implement support for writing incremental MIDX
chains, 2024-08-06), the command-line options '--incremental' and
'--bitmap' were declared to be incompatible with one another when
running 'git multi-pack-index write'.

However, since 27afc272c49 (midx: implement writing incremental MIDX
bitmaps, 2025-03-20), that incompatibility no longer exists, despite the
documentation saying so. Correct this by removing the stale reference to
their incompatibility.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.adoc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/git-multi-pack-index.adoc b/Documentation/git-multi-pack-index.adoc
index a4550e28bed..a502819fc38 100644
--- a/Documentation/git-multi-pack-index.adoc
+++ b/Documentation/git-multi-pack-index.adoc
@@ -75,7 +75,7 @@ marker).
 		Write an incremental MIDX file containing only objects
 		and packs not present in an existing MIDX layer.
 		Migrates non-incremental MIDXs to incremental ones when
-		necessary. Incompatible with `--bitmap`.
+		necessary.
 --
 
 verify::
-- 
2.52.0.171.gd6a4e6b6955


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 05/17] git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h'
  2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                   ` (3 preceding siblings ...)
  2025-12-06 20:31 ` [PATCH 04/17] git-multi-pack-index(1): remove non-existent incompatibility Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
  2025-12-06 20:31 ` [PATCH 06/17] t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39 Taylor Blau
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt

Since c39fffc1c90 (tests: start asserting that *.txt SYNOPSIS matches -h
output, 2022-10-13), the manual page for 'git multi-pack-index' has a
SYNOPSIS section which differs from 'git multi-pack-index -h'.

Correct this while also documenting additional options accepted by the
'write' sub-command.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.adoc | 7 ++++++-
 builtin/multi-pack-index.c              | 5 +++--
 t/t0450/adoc-help-mismatches            | 1 -
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-multi-pack-index.adoc b/Documentation/git-multi-pack-index.adoc
index a502819fc38..164cf1f2291 100644
--- a/Documentation/git-multi-pack-index.adoc
+++ b/Documentation/git-multi-pack-index.adoc
@@ -9,7 +9,12 @@ git-multi-pack-index - Write and verify multi-pack-indexes
 SYNOPSIS
 --------
 [verse]
-'git multi-pack-index' [--object-dir=<dir>] [--[no-]bitmap] <sub-command>
+'git multi-pack-index' [<options>] write [--preferred-pack=<pack>]
+		         [--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]
+		         [--refs-snapshot=<path>]
+'git multi-pack-index' [<options>] verify
+'git multi-pack-index' [<options>] expire
+'git multi-pack-index' [<options>] repack [--batch-size=<size>]
 
 DESCRIPTION
 -----------
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index ca98d4c3ba3..c0c6c1760c0 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -13,8 +13,9 @@
 #include "repository.h"
 
 #define BUILTIN_MIDX_WRITE_USAGE \
-	N_("git multi-pack-index [<options>] write [--preferred-pack=<pack>]" \
-	   "[--refs-snapshot=<path>]")
+	N_("git multi-pack-index [<options>] write [--preferred-pack=<pack>]\n" \
+	   "  [--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]\n" \
+	   "  [--refs-snapshot=<path>]")
 
 #define BUILTIN_MIDX_VERIFY_USAGE \
 	N_("git multi-pack-index [<options>] verify")
diff --git a/t/t0450/adoc-help-mismatches b/t/t0450/adoc-help-mismatches
index 8ee2d3f7c81..e8d6c13ccd0 100644
--- a/t/t0450/adoc-help-mismatches
+++ b/t/t0450/adoc-help-mismatches
@@ -33,7 +33,6 @@ merge
 merge-file
 merge-index
 merge-one-file
-multi-pack-index
 name-rev
 notes
 push
-- 
2.52.0.171.gd6a4e6b6955


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 06/17] t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39
  2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                   ` (4 preceding siblings ...)
  2025-12-06 20:31 ` [PATCH 05/17] git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h' Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
  2025-12-06 20:31 ` [PATCH 07/17] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos` Taylor Blau
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt

Commit d4bf1d88b90 (multi-pack-index: verify missing pack, 2018-09-13)
adds a new test to the MIDX test script to test how we handle missing
packs.

While the commit itself describes the test as "verify missing pack[s]",
the test itself is actually called "verify packnames out of order",
despite that not being what it tests.

Likely this was a copy-and-paste of the test immediately above it of the
same name. Correct this by renaming the test to match the commit
message.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5319-multi-pack-index.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 93f319a4b29..ca020091dda 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -455,7 +455,7 @@ test_expect_success 'verify packnames out of order' '
 		"pack names out of order"
 '
 
-test_expect_success 'verify packnames out of order' '
+test_expect_success 'verify missing pack' '
 	corrupt_midx_and_verify $MIDX_BYTE_PACKNAME_ORDER "a" $objdir \
 		"failed to load pack"
 '
-- 
2.52.0.171.gd6a4e6b6955


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 07/17] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos`
  2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                   ` (5 preceding siblings ...)
  2025-12-06 20:31 ` [PATCH 06/17] t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39 Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
  2025-12-08 18:26   ` Patrick Steinhardt
  2025-12-06 20:31 ` [PATCH 08/17] midx-write.c: introduce `struct write_midx_opts` Taylor Blau
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 99+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt

In midx_pack_order(), we compute for each bitampped pack the first bit
to correspond to an object in that pack, along with how many bits were
assigned to object(s) in that pack.

Initially, each bitmap_nr value is set to zero, and each bitmap_pos
value is set to the sentinel BITMAP_POS_UNKNOWN. This is done to ensure
that there are no packs who have an unknown bit position but a somehow
non-zero number of objects (cf. `write_midx_bitmapped_packs()` in
midx-write.c).

Once the pack order is fully determined, midx_pack_order() sets the
bitmap_pos field for any bitmapped packs to zero if they are still
listed as BITMAP_POS_UNKNOWN.

However, we enumerate the bitmapped packs in order of `ctx->pack_perm`.
This is fine for existing cases, since the only time the
`ctx->pack_perm` array holds a value outside of the addressable range of
`ctx->info` is when there are expired packs, which only occurs via 'git
multi-pack-index expire', which does not support writing MIDX bitmaps.
As a result, the range of ctx->pack_perm covers all values in [0,
`ctx->nr`), so enumerating in this order isn't an issue.

A future change necessary for compaction will complicate this further by
introducing a wrapper around the `ctx->pack_perm` array, which turns the
given `pack_int_id` into one that is relative to the lower end of the
compaction range. As a result, indexing into `ctx->pack_perm` through
this helper, say, with "0" will produce a crash when the lower end of
the compaction range has >0 pack(s) in its base layer, since the
subtraction will wrap around the 32-bit unsigned range, resulting in an
uninitialized read.

But the process is completely unnecessary in the first place: we are
enumerating all values of `ctx->info`, and there is no reason to process
them in a different order than they appear in memory. Index `ctx->info`
directly to reflect that.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/midx-write.c b/midx-write.c
index 73d24fabbc6..c30f6a70d37 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -637,7 +637,7 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx)
 		pack_order[i] = data[i].nr;
 	}
 	for (i = 0; i < ctx->nr; i++) {
-		struct pack_info *pack = &ctx->info[ctx->pack_perm[i]];
+		struct pack_info *pack = &ctx->info[i];
 		if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
 			pack->bitmap_pos = 0;
 	}
-- 
2.52.0.171.gd6a4e6b6955


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 08/17] midx-write.c: introduce `struct write_midx_opts`
  2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                   ` (6 preceding siblings ...)
  2025-12-06 20:31 ` [PATCH 07/17] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos` Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
  2025-12-08 18:26   ` Patrick Steinhardt
  2025-12-06 20:31 ` [PATCH 09/17] midx: do not require packs to be sorted in lexicographic order Taylor Blau
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 99+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt

In the MIDX writing code, there are four functions which perform some
sort of MIDX write operation. They are:

 - write_midx_file()
 - write_midx_file_only()
 - expire_midx_packs()
 - midx_repack()

All of these functions are thin wrappers over `write_midx_internal()`,
which implements the bulk of these routines. As a result, the
`write_midx_internal()` function takes six arguments.

Future commits in this series will want to add additional arguments, and
in general this function's signature will be the union of parameters
among *all* possible ways to write a MIDX.

Instead of adding yet more arguments to this function to support MIDX
compaction, introduce a `struct write_midx_opts`, which has the same
struct members as `write_midx_internal()`'s arguments.

Adding additional fields to the `write_midx_opts` struct is preferable
to adding additional arguments to `write_midx_internal()`. This is
because the callers below all zero-initialize the struct, so each time
we add a new piece of information, we do not have to pass the zero value
for it in all other call-sites that do not care about it.

For now, no functional changes are included in this patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 129 ++++++++++++++++++++++++++++++---------------------
 1 file changed, 77 insertions(+), 52 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index c30f6a70d37..b262631ae45 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1014,14 +1014,20 @@ static void clear_midx_files(struct odb_source *source,
 	strbuf_release(&buf);
 }
 
-static int write_midx_internal(struct odb_source *source,
-			       struct string_list *packs_to_include,
-			       struct string_list *packs_to_drop,
-			       const char *preferred_pack_name,
-			       const char *refs_snapshot,
-			       unsigned flags)
+struct write_midx_opts {
+	struct odb_source *source;
+
+	struct string_list *packs_to_include;
+	struct string_list *packs_to_drop;
+
+	const char *preferred_pack_name;
+	const char *refs_snapshot;
+	unsigned flags;
+};
+
+static int write_midx_internal(struct write_midx_opts *opts)
 {
-	struct repository *r = source->odb->repo;
+	struct repository *r = opts->source->odb->repo;
 	struct strbuf midx_name = STRBUF_INIT;
 	unsigned char midx_hash[GIT_MAX_RAWSZ];
 	uint32_t start_pack;
@@ -1041,22 +1047,22 @@ static int write_midx_internal(struct odb_source *source,
 	trace2_region_enter("midx", "write_midx_internal", r);
 
 	ctx.repo = r;
-	ctx.source = source;
+	ctx.source = opts->source;
 
-	ctx.incremental = !!(flags & MIDX_WRITE_INCREMENTAL);
+	ctx.incremental = !!(opts->flags & MIDX_WRITE_INCREMENTAL);
 
 	if (ctx.incremental)
 		strbuf_addf(&midx_name,
 			    "%s/pack/multi-pack-index.d/tmp_midx_XXXXXX",
-			    source->path);
+			    opts->source->path);
 	else
-		get_midx_filename(source, &midx_name);
+		get_midx_filename(opts->source, &midx_name);
 	if (safe_create_leading_directories(r, midx_name.buf))
 		die_errno(_("unable to create leading directories of %s"),
 			  midx_name.buf);
 
-	if (!packs_to_include || ctx.incremental) {
-		struct multi_pack_index *m = get_multi_pack_index(source);
+	if (!opts->packs_to_include || ctx.incremental) {
+		struct multi_pack_index *m = get_multi_pack_index(opts->source);
 		if (m && !midx_checksum_valid(m)) {
 			warning(_("ignoring existing multi-pack-index; checksum mismatch"));
 			m = NULL;
@@ -1071,7 +1077,7 @@ static int write_midx_internal(struct odb_source *source,
 			 */
 			if (ctx.incremental)
 				ctx.base_midx = m;
-			else if (!packs_to_include)
+			else if (!opts->packs_to_include)
 				ctx.m = m;
 		}
 	}
@@ -1084,7 +1090,7 @@ static int write_midx_internal(struct odb_source *source,
 	if (ctx.incremental) {
 		struct multi_pack_index *m = ctx.base_midx;
 		while (m) {
-			if (flags & MIDX_WRITE_BITMAP && load_midx_revindex(m)) {
+			if (opts->flags & MIDX_WRITE_BITMAP && load_midx_revindex(m)) {
 				error(_("could not load reverse index for MIDX %s"),
 				      get_midx_checksum(m));
 				goto cleanup;
@@ -1099,23 +1105,23 @@ static int write_midx_internal(struct odb_source *source,
 	start_pack = ctx.nr;
 
 	ctx.pack_paths_checked = 0;
-	if (flags & MIDX_PROGRESS)
+	if (opts->flags & MIDX_PROGRESS)
 		ctx.progress = start_delayed_progress(r,
 						      _("Adding packfiles to multi-pack-index"), 0);
 	else
 		ctx.progress = NULL;
 
-	ctx.to_include = packs_to_include;
+	ctx.to_include = opts->packs_to_include;
 
-	for_each_file_in_pack_dir(source->path, add_pack_to_midx, &ctx);
+	for_each_file_in_pack_dir(opts->source->path, add_pack_to_midx, &ctx);
 	stop_progress(&ctx.progress);
 
 	if ((ctx.m && ctx.nr == ctx.m->num_packs + ctx.m->num_packs_in_base) &&
 	    !ctx.incremental &&
-	    !(packs_to_include || packs_to_drop)) {
+	    !(opts->packs_to_include || opts->packs_to_drop)) {
 		struct bitmap_index *bitmap_git;
 		int bitmap_exists;
-		int want_bitmap = flags & MIDX_WRITE_BITMAP;
+		int want_bitmap = opts->flags & MIDX_WRITE_BITMAP;
 
 		bitmap_git = prepare_midx_bitmap_git(ctx.m);
 		bitmap_exists = bitmap_git && bitmap_is_midx(bitmap_git);
@@ -1127,7 +1133,8 @@ static int write_midx_internal(struct odb_source *source,
 			 * corresponding bitmap (or one wasn't requested).
 			 */
 			if (!want_bitmap)
-				clear_midx_files_ext(source, "bitmap", NULL);
+				clear_midx_files_ext(opts->source, "bitmap",
+						     NULL);
 			result = 0;
 			goto cleanup;
 		}
@@ -1138,11 +1145,11 @@ static int write_midx_internal(struct odb_source *source,
 		goto cleanup; /* nothing to do */
 	}
 
-	if (preferred_pack_name) {
+	if (opts->preferred_pack_name) {
 		ctx.preferred_pack_idx = NO_PREFERRED_PACK;
 
 		for (size_t i = 0; i < ctx.nr; i++) {
-			if (!cmp_idx_or_pack_name(preferred_pack_name,
+			if (!cmp_idx_or_pack_name(opts->preferred_pack_name,
 						  ctx.info[i].pack_name)) {
 				ctx.preferred_pack_idx = i;
 				break;
@@ -1151,9 +1158,9 @@ static int write_midx_internal(struct odb_source *source,
 
 		if (ctx.preferred_pack_idx == NO_PREFERRED_PACK)
 			warning(_("unknown preferred pack: '%s'"),
-				preferred_pack_name);
+				opts->preferred_pack_name);
 	} else if (ctx.nr &&
-		   (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))) {
+		   (opts->flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))) {
 		struct packed_git *oldest = ctx.info[0].p;
 		ctx.preferred_pack_idx = 0;
 
@@ -1164,7 +1171,7 @@ static int write_midx_internal(struct odb_source *source,
 		 */
 		open_pack_index(oldest);
 
-		if (packs_to_drop && packs_to_drop->nr)
+		if (opts->packs_to_drop && opts->packs_to_drop->nr)
 			BUG("cannot write a MIDX bitmap during expiration");
 
 		/*
@@ -1226,20 +1233,21 @@ static int write_midx_internal(struct odb_source *source,
 
 	QSORT(ctx.info, ctx.nr, pack_info_compare);
 
-	if (packs_to_drop && packs_to_drop->nr) {
+	if (opts->packs_to_drop && opts->packs_to_drop->nr) {
 		size_t drop_index = 0;
 		int missing_drops = 0;
 
-		for (size_t i = 0; i < ctx.nr && drop_index < packs_to_drop->nr; i++) {
+		for (size_t i = 0;
+		     i < ctx.nr && drop_index < opts->packs_to_drop->nr; i++) {
 			int cmp = strcmp(ctx.info[i].pack_name,
-					 packs_to_drop->items[drop_index].string);
+					 opts->packs_to_drop->items[drop_index].string);
 
 			if (!cmp) {
 				drop_index++;
 				ctx.info[i].expired = 1;
 			} else if (cmp > 0) {
 				error(_("did not see pack-file %s to drop"),
-				      packs_to_drop->items[drop_index].string);
+				      opts->packs_to_drop->items[drop_index].string);
 				drop_index++;
 				missing_drops++;
 				i--;
@@ -1276,8 +1284,8 @@ static int write_midx_internal(struct odb_source *source,
 	}
 
 	/* Check that the preferred pack wasn't expired (if given). */
-	if (preferred_pack_name) {
-		struct pack_info *preferred = bsearch(preferred_pack_name,
+	if (opts->preferred_pack_name) {
+		struct pack_info *preferred = bsearch(opts->preferred_pack_name,
 						      ctx.info, ctx.nr,
 						      sizeof(*ctx.info),
 						      idx_or_pack_name_cmp);
@@ -1285,7 +1293,7 @@ static int write_midx_internal(struct odb_source *source,
 			uint32_t perm = ctx.pack_perm[preferred->orig_pack_int_id];
 			if (perm == PACK_EXPIRED)
 				warning(_("preferred pack '%s' is expired"),
-					preferred_pack_name);
+					opts->preferred_pack_name);
 		}
 	}
 
@@ -1299,15 +1307,15 @@ static int write_midx_internal(struct odb_source *source,
 	}
 
 	if (!ctx.entries_nr) {
-		if (flags & MIDX_WRITE_BITMAP)
+		if (opts->flags & MIDX_WRITE_BITMAP)
 			warning(_("refusing to write multi-pack .bitmap without any objects"));
-		flags &= ~(MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP);
+		opts->flags &= ~(MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP);
 	}
 
 	if (ctx.incremental) {
 		struct strbuf lock_name = STRBUF_INIT;
 
-		get_midx_chain_filename(source, &lock_name);
+		get_midx_chain_filename(opts->source, &lock_name);
 		hold_lock_file_for_update(&lk, lock_name.buf, LOCK_DIE_ON_ERROR);
 		strbuf_release(&lock_name);
 
@@ -1350,7 +1358,7 @@ static int write_midx_internal(struct odb_source *source,
 				MIDX_CHUNK_LARGE_OFFSET_WIDTH),
 			write_midx_large_offsets);
 
-	if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP)) {
+	if (opts->flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP)) {
 		ctx.pack_order = midx_pack_order(&ctx);
 		add_chunk(cf, MIDX_CHUNKID_REVINDEX,
 			  st_mult(ctx.entries_nr, sizeof(uint32_t)),
@@ -1368,11 +1376,11 @@ static int write_midx_internal(struct odb_source *source,
 			  CSUM_FSYNC | CSUM_HASH_IN_STREAM);
 	free_chunkfile(cf);
 
-	if (flags & MIDX_WRITE_REV_INDEX &&
+	if (opts->flags & MIDX_WRITE_REV_INDEX &&
 	    git_env_bool("GIT_TEST_MIDX_WRITE_REV", 0))
 		write_midx_reverse_index(&ctx, midx_hash);
 
-	if (flags & MIDX_WRITE_BITMAP) {
+	if (opts->flags & MIDX_WRITE_BITMAP) {
 		struct packing_data pdata;
 		struct commit **commits;
 		uint32_t commits_nr;
@@ -1382,7 +1390,7 @@ static int write_midx_internal(struct odb_source *source,
 
 		prepare_midx_packing_data(&pdata, &ctx);
 
-		commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, &ctx);
+		commits = find_commits_for_midx_bitmap(&commits_nr, opts->refs_snapshot, &ctx);
 
 		/*
 		 * The previous steps translated the information from
@@ -1395,7 +1403,7 @@ static int write_midx_internal(struct odb_source *source,
 
 		if (write_midx_bitmap(&ctx,
 				      midx_hash, &pdata, commits, commits_nr,
-				      flags) < 0) {
+				      opts->flags) < 0) {
 			error(_("could not write multi-pack bitmap"));
 			clear_packing_data(&pdata);
 			free(commits);
@@ -1428,7 +1436,7 @@ static int write_midx_internal(struct odb_source *source,
 		if (link_midx_to_chain(ctx.base_midx) < 0)
 			goto cleanup;
 
-		get_split_midx_filename_ext(source, &final_midx_name,
+		get_split_midx_filename_ext(opts->source, &final_midx_name,
 					    midx_hash, MIDX_EXT_MIDX);
 
 		if (rename_tempfile(&incr, final_midx_name.buf) < 0) {
@@ -1461,7 +1469,7 @@ static int write_midx_internal(struct odb_source *source,
 	if (commit_lock_file(&lk) < 0)
 		die_errno(_("could not write multi-pack-index"));
 
-	clear_midx_files(source, keep_hashes,
+	clear_midx_files(opts->source, keep_hashes,
 			 ctx.num_multi_pack_indexes_before + 1,
 			 ctx.incremental);
 	result = 0;
@@ -1495,9 +1503,14 @@ int write_midx_file(struct odb_source *source,
 		    const char *preferred_pack_name,
 		    const char *refs_snapshot, unsigned flags)
 {
-	return write_midx_internal(source, NULL, NULL,
-				   preferred_pack_name, refs_snapshot,
-				   flags);
+	struct write_midx_opts opts = {
+		.source = source,
+		.preferred_pack_name = preferred_pack_name,
+		.refs_snapshot = refs_snapshot,
+		.flags = flags,
+	};
+
+	return write_midx_internal(&opts);
 }
 
 int write_midx_file_only(struct odb_source *source,
@@ -1505,8 +1518,15 @@ int write_midx_file_only(struct odb_source *source,
 			 const char *preferred_pack_name,
 			 const char *refs_snapshot, unsigned flags)
 {
-	return write_midx_internal(source, packs_to_include, NULL,
-				   preferred_pack_name, refs_snapshot, flags);
+	struct write_midx_opts opts = {
+		.source = source,
+		.packs_to_include = packs_to_include,
+		.preferred_pack_name = preferred_pack_name,
+		.refs_snapshot = refs_snapshot,
+		.flags = flags,
+	};
+
+	return write_midx_internal(&opts);
 }
 
 int expire_midx_packs(struct odb_source *source, unsigned flags)
@@ -1566,8 +1586,11 @@ int expire_midx_packs(struct odb_source *source, unsigned flags)
 	free(count);
 
 	if (packs_to_drop.nr)
-		result = write_midx_internal(source, NULL,
-					     &packs_to_drop, NULL, NULL, flags);
+		result = write_midx_internal(&(struct write_midx_opts) {
+					     .source = source,
+					     .packs_to_drop = &packs_to_drop,
+					     .flags = flags & MIDX_PROGRESS,
+					     });
 
 	string_list_clear(&packs_to_drop, 0);
 
@@ -1774,8 +1797,10 @@ int midx_repack(struct odb_source *source, size_t batch_size, unsigned flags)
 		goto cleanup;
 	}
 
-	result = write_midx_internal(source, NULL, NULL, NULL, NULL,
-				     flags);
+	result = write_midx_internal(&(struct write_midx_opts) {
+				     .source = source,
+				     .flags = flags,
+				     });
 
 cleanup:
 	free(include_pack);
-- 
2.52.0.171.gd6a4e6b6955


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 09/17] midx: do not require packs to be sorted in lexicographic order
  2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                   ` (7 preceding siblings ...)
  2025-12-06 20:31 ` [PATCH 08/17] midx-write.c: introduce `struct write_midx_opts` Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
  2025-12-08 18:26   ` Patrick Steinhardt
  2025-12-06 20:31 ` [PATCH 10/17] git-compat-util.h: introduce `u32_add()` Taylor Blau
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 99+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt

The MIDX file format currently requires that pack files be identified by
the lexicographic ordering of their names (that is, a pack having a
checksum beginning with "abc" would have a numeric pack_int_id which is
smaller than the same value for a pack beginning with "bcd").

As a result, it is impossible to combine adjacent MIDX layers together
without permuting bits from bitmaps that are in more recent layer(s).

To see why, consider the following example:

          | packs       | preferred pack
  --------+-------------+---------------
  MIDX #0 | { X, Y, Z } | Y
  MIDX #1 | { A, B, C } | B
  MIDX #2 | { D, E, F } | D

, where MIDX #2's base MIDX is MIDX #1, and so on. Suppose that we want
to combine MIDX layers #0 and #1, to create a new layer #0' containing
the packs from both layers. With the original three MIDX layers, objects
are laid out in the bitmap in the order they appear in their source
pack, and the packs themselves are arranged according to the pseudo-pack
order. In this case, that ordering is Y, X, Z, B, A, C.

But recall that the pseudo-pack ordering is defined by the order that
packs appear in the MIDX, with the exception of the preferred pack,
which sorts ahead of all other packs regardless of its position within
the MIDX. In the above example, that means that pack 'Y' could be placed
anywhere (so long as it is designated as preferred), however, all other
packs must be placed in the location listed above.

Because that ordering isn't sorted lexicographically, it is impossible
to compact MIDX layers in the above configuration without permuting the
object-to-bit-position mapping. Changing this mapping would affect all
bitmaps belonging to newer layers, rendering the bitmaps associated with
MIDX #2 unreadable.

One of the goals of MIDX compaction is that we are able to shrink the
length of the MIDX chain *without* invalidating bitmaps that belong to
newer layers, and the lexicographic ordering constraint is at odds with
this goal.

However, packs do not *need* to be lexicographically ordered within the
MIDX. As far as I can gather, the only reason they are sorted lexically
is to make it possible to perform a binary search over the pack names in
a MIDX, necessary to make `midx_contains_pack()`'s performance
logarithmic in the number of packs rather than linear.

Relax this constraint by allowing MIDX writes to proceed with packs that
are not arranged in lexicographic order. `midx_contains_pack()` will
lazily instantiate a `pack_names_sorted` array on the MIDX, which will
be used to implement the binary search over pack names.

Note that this produces MIDXs which may be incompatible with earlier
versions of Git that have stricter requirements on the layout of packs
within a MIDX. This patch does *not* modify the version number of the
MIDX format, since existing versions of Git already know to gracefully
ignore a MIDX with packs that appear out-of-order.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c                |  5 -----
 midx.c                      | 28 ++++++++++++++++++++++------
 midx.h                      |  1 +
 t/t5319-multi-pack-index.sh |  5 -----
 4 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index b262631ae45..55342fcb6dd 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -410,11 +410,6 @@ static int write_midx_pack_names(struct hashfile *f, void *data)
 		if (ctx->info[i].expired)
 			continue;
 
-		if (i && strcmp(ctx->info[i].pack_name, ctx->info[i - 1].pack_name) <= 0)
-			BUG("incorrect pack-file order: %s before %s",
-			    ctx->info[i - 1].pack_name,
-			    ctx->info[i].pack_name);
-
 		writelen = strlen(ctx->info[i].pack_name) + 1;
 		hashwrite(f, ctx->info[i].pack_name, writelen);
 		written += writelen;
diff --git a/midx.c b/midx.c
index f9b11de9ca9..4d5fe880649 100644
--- a/midx.c
+++ b/midx.c
@@ -209,11 +209,6 @@ static struct multi_pack_index *load_multi_pack_index_one(struct odb_source *sou
 		if (!end)
 			die(_("multi-pack-index pack-name chunk is too short"));
 		cur_pack_name = end + 1;
-
-		if (i && strcmp(m->pack_names[i], m->pack_names[i - 1]) <= 0)
-			die(_("multi-pack-index pack names out of order: '%s' before '%s'"),
-			      m->pack_names[i - 1],
-			      m->pack_names[i]);
 	}
 
 	trace2_data_intmax("midx", r, "load/num_packs", m->num_packs);
@@ -411,6 +406,7 @@ void close_midx(struct multi_pack_index *m)
 	}
 	FREE_AND_NULL(m->packs);
 	FREE_AND_NULL(m->pack_names);
+	FREE_AND_NULL(m->pack_names_sorted);
 	free(m);
 }
 
@@ -656,17 +652,37 @@ int cmp_idx_or_pack_name(const char *idx_or_pack_name,
 	return strcmp(idx_or_pack_name, idx_name);
 }
 
+
+static int midx_pack_names_cmp(const void *a, const void *b, void *m_)
+{
+	struct multi_pack_index *m = m_;
+	return strcmp(m->pack_names[*(const size_t *)a],
+		      m->pack_names[*(const size_t *)b]);
+}
+
 static int midx_contains_pack_1(struct multi_pack_index *m,
 				const char *idx_or_pack_name)
 {
 	uint32_t first = 0, last = m->num_packs;
 
+	if (!m->pack_names_sorted) {
+		uint32_t i;
+
+		ALLOC_ARRAY(m->pack_names_sorted, m->num_packs);
+
+		for (i = 0; i < m->num_packs; i++)
+			m->pack_names_sorted[i] = i;
+
+		QSORT_S(m->pack_names_sorted, m->num_packs, midx_pack_names_cmp,
+			m);
+	}
+
 	while (first < last) {
 		uint32_t mid = first + (last - first) / 2;
 		const char *current;
 		int cmp;
 
-		current = m->pack_names[mid];
+		current = m->pack_names[m->pack_names_sorted[mid]];
 		cmp = cmp_idx_or_pack_name(idx_or_pack_name, current);
 		if (!cmp)
 			return 1;
diff --git a/midx.h b/midx.h
index e188ffeb578..39bf04b18e5 100644
--- a/midx.h
+++ b/midx.h
@@ -71,6 +71,7 @@ struct multi_pack_index {
 	uint32_t num_packs_in_base;
 
 	const char **pack_names;
+	size_t *pack_names_sorted;
 	struct packed_git **packs;
 };
 
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index ca020091dda..03676d37b98 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -450,11 +450,6 @@ test_expect_success 'verify invalid chunk offset' '
 		"improper chunk offset(s)"
 '
 
-test_expect_success 'verify packnames out of order' '
-	corrupt_midx_and_verify $MIDX_BYTE_PACKNAME_ORDER "z" $objdir \
-		"pack names out of order"
-'
-
 test_expect_success 'verify missing pack' '
 	corrupt_midx_and_verify $MIDX_BYTE_PACKNAME_ORDER "a" $objdir \
 		"failed to load pack"
-- 
2.52.0.171.gd6a4e6b6955


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 10/17] git-compat-util.h: introduce `u32_add()`
  2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                   ` (8 preceding siblings ...)
  2025-12-06 20:31 ` [PATCH 09/17] midx: do not require packs to be sorted in lexicographic order Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
  2025-12-08 18:27   ` Patrick Steinhardt
  2025-12-06 20:31 ` [PATCH 11/17] midx-write.c: introduce `midx_pack_perm()` helper Taylor Blau
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 99+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt

A future commit will want to add two 32-bit unsigned values together
while checking for overflow. Introduce a variant of the u64_add()
function for operating on 32-bit inputs.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 git-compat-util.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/git-compat-util.h b/git-compat-util.h
index 398e0fac4fa..a7aa5f05fc9 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -670,6 +670,14 @@ static inline int cast_size_t_to_int(size_t a)
 	return (int)a;
 }
 
+static inline uint32_t u32_add(uint32_t a, uint32_t b)
+{
+	if (unsigned_add_overflows(a, b))
+		die("uint32_t overflow: %"PRIuMAX" + %"PRIuMAX,
+		    (uintmax_t)a, (uintmax_t)b);
+	return a + b;
+}
+
 static inline uint64_t u64_mult(uint64_t a, uint64_t b)
 {
 	if (unsigned_mult_overflows(a, b))
-- 
2.52.0.171.gd6a4e6b6955


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 11/17] midx-write.c: introduce `midx_pack_perm()` helper
  2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                   ` (9 preceding siblings ...)
  2025-12-06 20:31 ` [PATCH 10/17] git-compat-util.h: introduce `u32_add()` Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
  2025-12-06 20:31 ` [PATCH 12/17] midx-write.c: extract `fill_pack_from_midx()` Taylor Blau
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt

The `ctx->pack_perm` array can be considered as a permutation between
the original `pack_int_id` of some given pack to its position in the
`ctx->info` array containing all packs.

Today we can always index into this array with any known `pack_int_id`,
since there is never a `pack_int_id` which is greater than or equal to
the value `ctx->nr`.

That is not necessarily the case with MIDX compaction. For example,
suppose we have a MIDX chain with three layers, each containing three
packs. The base of the MIDX chain will have packs with IDs 0, 1, and 2,
the next layer 3, 4, and 5, and so on. If we are compacting the topmost
two layers, we'll have input `pack_int_id` values between [3, 8], but
`ctx->nr` will only be 6.

In that example, if we want to know where the pack whose original
`pack_int_id` value was, say, 7, we would compute `ctx->pack_perm[7]`,
leading to an uninitialized read, since there are only 6 entries
allocated in that array.

To address this, there are a couple of options:

 - We could allocate enough entries in `ctx->pack_perm` to accommodate
   the largest `orig_pack_int_id` value.

 - Or, we could internally shift the input values by the number of packs
   in the base layer of the lower end of the MIDX compaction range.

This patch prepare us to take the latter approach, since it does not
allocate more memory than strictly necessary. (In our above example, the
base of the lower end of the compaction range is the first MIDX layer
(having three packs), so we would end up indexing `ctx->pack_perm[7-3]`,
which is a valid read.)

Note that this patch does not actually implement that approach yet, but
merely performs a behavior-preserving refactoring which will make the
change easier to carry out in the future.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 55342fcb6dd..4a1a16431a6 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -114,6 +114,12 @@ struct write_midx_context {
 	struct odb_source *source;
 };
 
+static uint32_t midx_pack_perm(struct write_midx_context *ctx,
+			       uint32_t orig_pack_int_id)
+{
+	return ctx->pack_perm[orig_pack_int_id];
+}
+
 static int should_include_pack(const struct write_midx_context *ctx,
 			       const char *file_name)
 {
@@ -509,12 +515,12 @@ static int write_midx_object_offsets(struct hashfile *f,
 	for (i = 0; i < ctx->entries_nr; i++) {
 		struct pack_midx_entry *obj = list++;
 
-		if (ctx->pack_perm[obj->pack_int_id] == PACK_EXPIRED)
+		if (midx_pack_perm(ctx, obj->pack_int_id) == PACK_EXPIRED)
 			BUG("object %s is in an expired pack with int-id %d",
 			    oid_to_hex(&obj->oid),
 			    obj->pack_int_id);
 
-		hashwrite_be32(f, ctx->pack_perm[obj->pack_int_id]);
+		hashwrite_be32(f, midx_pack_perm(ctx, obj->pack_int_id));
 
 		if (ctx->large_offsets_needed && obj->offset >> 31)
 			hashwrite_be32(f, MIDX_LARGE_OFFSET_NEEDED | nr_large_offset++);
@@ -615,7 +621,7 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx)
 	for (i = 0; i < ctx->entries_nr; i++) {
 		struct pack_midx_entry *e = &ctx->entries[i];
 		data[i].nr = i;
-		data[i].pack = ctx->pack_perm[e->pack_int_id];
+		data[i].pack = midx_pack_perm(ctx, e->pack_int_id);
 		if (!e->preferred)
 			data[i].pack |= (1U << 31);
 		data[i].offset = e->offset;
@@ -625,7 +631,7 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx)
 
 	for (i = 0; i < ctx->entries_nr; i++) {
 		struct pack_midx_entry *e = &ctx->entries[data[i].nr];
-		struct pack_info *pack = &ctx->info[ctx->pack_perm[e->pack_int_id]];
+		struct pack_info *pack = &ctx->info[midx_pack_perm(ctx, e->pack_int_id)];
 		if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
 			pack->bitmap_pos = i + base_objects;
 		pack->bitmap_nr++;
@@ -686,7 +692,7 @@ static void prepare_midx_packing_data(struct packing_data *pdata,
 		struct object_entry *to = packlist_alloc(pdata, &from->oid);
 
 		oe_set_in_pack(pdata, to,
-			       ctx->info[ctx->pack_perm[from->pack_int_id]].p);
+			       ctx->info[midx_pack_perm(ctx, from->pack_int_id)].p);
 	}
 
 	trace2_region_leave("midx", "prepare_midx_packing_data", ctx->repo);
@@ -1285,7 +1291,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
 						      sizeof(*ctx.info),
 						      idx_or_pack_name_cmp);
 		if (preferred) {
-			uint32_t perm = ctx.pack_perm[preferred->orig_pack_int_id];
+			uint32_t perm = midx_pack_perm(&ctx, preferred->orig_pack_int_id);
 			if (perm == PACK_EXPIRED)
 				warning(_("preferred pack '%s' is expired"),
 					opts->preferred_pack_name);
-- 
2.52.0.171.gd6a4e6b6955


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 12/17] midx-write.c: extract `fill_pack_from_midx()`
  2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                   ` (10 preceding siblings ...)
  2025-12-06 20:31 ` [PATCH 11/17] midx-write.c: introduce `midx_pack_perm()` helper Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
  2025-12-06 20:31 ` [PATCH 13/17] midx-write.c: enumerate `pack_int_id` values directly Taylor Blau
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt

When filling packs from an existing MIDX, `fill_packs_from_midx()`
handles preparing a MIDX'd pack, and reading out its pack name from the
existing MIDX.

MIDX compaction will want to perform an identical operation, though the
caller will look quite different than `fill_packs_from_midx()`. To
reduce any future code duplication, extract `fill_pack_from_midx()`
from `fill_packs_from_midx()` to prepare to call our new helper function
in a future change.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 4a1a16431a6..5927691f6a0 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -910,6 +910,21 @@ static int write_midx_bitmap(struct write_midx_context *ctx,
 	return ret;
 }
 
+static int fill_pack_from_midx(struct pack_info *info,
+			       struct multi_pack_index *m,
+			       uint32_t pack_int_id)
+{
+	if (prepare_midx_pack(m, pack_int_id))
+		return error(_("could not load pack %d"), pack_int_id);
+
+	fill_pack_info(info,
+		       m->packs[pack_int_id - m->num_packs_in_base],
+		       m->pack_names[pack_int_id - m->num_packs_in_base],
+		       pack_int_id);
+
+	return 0;
+}
+
 static int fill_packs_from_midx(struct write_midx_context *ctx)
 {
 	struct multi_pack_index *m;
@@ -918,13 +933,13 @@ static int fill_packs_from_midx(struct write_midx_context *ctx)
 		uint32_t i;
 
 		for (i = 0; i < m->num_packs; i++) {
-			if (prepare_midx_pack(m, m->num_packs_in_base + i))
-				return error(_("could not load pack"));
-
 			ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
-			fill_pack_info(&ctx->info[ctx->nr++], m->packs[i],
-				       m->pack_names[i],
-				       m->num_packs_in_base + i);
+
+			if (fill_pack_from_midx(&ctx->info[ctx->nr], m,
+						m->num_packs_in_base + i) < 0)
+				return -1;
+
+			ctx->nr++;
 		}
 	}
 	return 0;
-- 
2.52.0.171.gd6a4e6b6955


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 13/17] midx-write.c: enumerate `pack_int_id` values directly
  2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                   ` (11 preceding siblings ...)
  2025-12-06 20:31 ` [PATCH 12/17] midx-write.c: extract `fill_pack_from_midx()` Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
  2025-12-08 18:27   ` Patrick Steinhardt
  2025-12-06 20:31 ` [PATCH 14/17] midx-write.c: factor fanout layering from `compute_sorted_entries()` Taylor Blau
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 99+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt

Our `midx-write.c::fill_packs_from_midx()` function currently enumerates
the range [0, m->num_packs), and then shifts its index variable up by
`m->num_packs_in_base` to produce a valid `pack_int_id`.

Instead, directly enumerate the range:

    [m->num_packs_in_base, m->num_packs_in_base + m->num_packs)

, which are the original pack_int_ids themselves as opposed to the
indexes of those packs relative to the MIDX layer they are contained
within.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 5927691f6a0..d3644276aad 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -932,11 +932,11 @@ static int fill_packs_from_midx(struct write_midx_context *ctx)
 	for (m = ctx->m; m; m = m->base_midx) {
 		uint32_t i;
 
-		for (i = 0; i < m->num_packs; i++) {
+		for (i = m->num_packs_in_base;
+		     i < m->num_packs_in_base + m->num_packs; i++) {
 			ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
 
-			if (fill_pack_from_midx(&ctx->info[ctx->nr], m,
-						m->num_packs_in_base + i) < 0)
+			if (fill_pack_from_midx(&ctx->info[ctx->nr], m, i) < 0)
 				return -1;
 
 			ctx->nr++;
-- 
2.52.0.171.gd6a4e6b6955


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 14/17] midx-write.c: factor fanout layering from `compute_sorted_entries()`
  2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                   ` (12 preceding siblings ...)
  2025-12-06 20:31 ` [PATCH 13/17] midx-write.c: enumerate `pack_int_id` values directly Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
  2025-12-06 20:31 ` [PATCH 15/17] t/helper/test-read-midx.c: plug memory leak when selecting layer Taylor Blau
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt

When computing the set of objects to appear in a MIDX, we use
compute_sorted_entries(), which handles objects from various existing
sources one fanout layer at a time.

The process for computing this set is slightly different during MIDX
compaction, so factor out the existing functionality into its own
routine to prevent `compute_sorted_entries()` from becoming too
difficult to read.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 42 +++++++++++++++++++++++++-----------------
 1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index d3644276aad..7854561359d 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -323,6 +323,30 @@ static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout,
 	}
 }
 
+static void midx_fanout_add(struct midx_fanout *fanout,
+			    struct write_midx_context *ctx,
+			    uint32_t start_pack,
+			    uint32_t cur_fanout)
+{
+	uint32_t cur_pack;
+
+	if (ctx->m && !ctx->incremental)
+		midx_fanout_add_midx_fanout(fanout, ctx->m, cur_fanout,
+					    ctx->preferred_pack_idx);
+
+	for (cur_pack = start_pack; cur_pack < ctx->nr; cur_pack++) {
+		int preferred = cur_pack == ctx->preferred_pack_idx;
+		midx_fanout_add_pack_fanout(fanout, ctx->info, cur_pack,
+					    preferred, cur_fanout);
+	}
+
+	if (ctx->preferred_pack_idx != NO_PREFERRED_PACK &&
+	    ctx->preferred_pack_idx < start_pack)
+		midx_fanout_add_pack_fanout(fanout, ctx->info,
+					    ctx->preferred_pack_idx, 1,
+					    cur_fanout);
+}
+
 /*
  * It is possible to artificially get into a state where there are many
  * duplicate copies of objects. That can create high memory pressure if
@@ -359,23 +383,7 @@ static void compute_sorted_entries(struct write_midx_context *ctx,
 	for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
 		fanout.nr = 0;
 
-		if (ctx->m && !ctx->incremental)
-			midx_fanout_add_midx_fanout(&fanout, ctx->m, cur_fanout,
-						    ctx->preferred_pack_idx);
-
-		for (cur_pack = start_pack; cur_pack < ctx->nr; cur_pack++) {
-			int preferred = cur_pack == ctx->preferred_pack_idx;
-			midx_fanout_add_pack_fanout(&fanout,
-						    ctx->info, cur_pack,
-						    preferred, cur_fanout);
-		}
-
-		if (ctx->preferred_pack_idx != NO_PREFERRED_PACK &&
-		    ctx->preferred_pack_idx < start_pack)
-			midx_fanout_add_pack_fanout(&fanout, ctx->info,
-						    ctx->preferred_pack_idx, 1,
-						    cur_fanout);
-
+		midx_fanout_add(&fanout, ctx, start_pack, cur_fanout);
 		midx_fanout_sort(&fanout);
 
 		/*
-- 
2.52.0.171.gd6a4e6b6955


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 15/17] t/helper/test-read-midx.c: plug memory leak when selecting layer
  2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                   ` (13 preceding siblings ...)
  2025-12-06 20:31 ` [PATCH 14/17] midx-write.c: factor fanout layering from `compute_sorted_entries()` Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
  2025-12-08 18:27   ` Patrick Steinhardt
  2025-12-06 20:31 ` [PATCH 16/17] midx: implement MIDX compaction Taylor Blau
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 99+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt

Though our 'read-midx' test tool is capable of printing information
about a single MIDX layer identified by its checksum, no caller in our
test suite exercises this path.

Unfortunately, there is a memory leak lurking in this (currently) unused
path that would otherwise be exposed by the following commit.

This occurs when providing a MIDX layer checksum other than the tip. As
we walk over the MIDX chain trying to find the matching layer, we drop
our reference to the top-most MIDX layer. Thus, our call to
'close_midx()' later on leaks memory between the top-most MIDX layer and
the MIDX layer immediately following the specified one.

Plug this leak by holding a reference to the tip of the MIDX chain, and
ensure that we call `close_midx()` before terminating the test tool.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/helper/test-read-midx.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index dee603b3cd0..6e03aabca79 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -26,9 +26,10 @@ static int read_midx_file(const char *object_dir, const char *checksum,
 			  int show_objects)
 {
 	uint32_t i;
-	struct multi_pack_index *m;
+	struct multi_pack_index *m, *tip;
+	int ret = 0;
 
-	m = setup_midx(object_dir);
+	m = tip = setup_midx(object_dir);
 
 	if (!m)
 		return 1;
@@ -36,8 +37,11 @@ static int read_midx_file(const char *object_dir, const char *checksum,
 	if (checksum) {
 		while (m && strcmp(get_midx_checksum(m), checksum))
 			m = m->base_midx;
-		if (!m)
-			return 1;
+		if (!m) {
+			ret = error(_("could not find MIDX with checksum %s"),
+				    checksum);
+			goto out;
+		}
 	}
 
 	printf("header: %08x %d %d %d %d\n",
@@ -82,9 +86,10 @@ static int read_midx_file(const char *object_dir, const char *checksum,
 		}
 	}
 
-	close_midx(m);
+out:
+	close_midx(tip);
 
-	return 0;
+	return ret;
 }
 
 static int read_midx_checksum(const char *object_dir)
-- 
2.52.0.171.gd6a4e6b6955


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 16/17] midx: implement MIDX compaction
  2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                   ` (14 preceding siblings ...)
  2025-12-06 20:31 ` [PATCH 15/17] t/helper/test-read-midx.c: plug memory leak when selecting layer Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
  2025-12-09  7:21   ` Patrick Steinhardt
  2025-12-06 20:31 ` [PATCH 17/17] midx: enable reachability bitmaps during " Taylor Blau
                   ` (2 subsequent siblings)
  18 siblings, 1 reply; 99+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt

When managing a MIDX chain with many layers, it is convenient to combine
a sequence of adjacent layers into a single layer to prevent the chain
from growing too long.

While it is conceptually possible to "compact" a sequence of MIDX layers
together by running "git multi-pack-index write --stdin-packs", there
are a few drawbacks that make this less than desirable:

 - Preserving the MIDX chain is impossible, since there is no way to
   write a MIDX layer that contains objects or packs found in an earlier
   MIDX layer already part of the chain. So callers would have to write
   an entirely new (non-incremental) MIDX containing only the compacted
   layers, discarding all other objects/packs from the MIDX.

 - There is (currently) no way to write a MIDX layer outside of the MIDX
   chain to work around the above, such that the MIDX chain could be
   reassembled substituting the compacted layers with the MIDX that was
   written.

 - The `--stdin-packs` command-line option does not allow us to specify
   the order of packs as they appear in the MIDX. Therefore, even if
   there were workarounds for the previous two challenges, any bitmaps
   belonging to layers which come after the compacted layer(s) would no
   longer be valid.

This commit introduces a way to compact a sequence of adjacent MIDX
layers into a single layer while preserving the MIDX chain, as well as
any bitmap(s) in layers which are newer than the compacted ones.

Implementing MIDX compaction does not require a significant number of
changes to how MIDX layers are written. The main changes are as follows:

 - Instead of calling `fill_packs_from_midx()`, we call a new function
   `fill_packs_from_midx_range()`, which walks backwards along the
   portion of the MIDX chain which we are compacting, and adds packs one
   layer a time.

   In order to preserve the pseudo-pack order, the concatenated pack
   order is preserved, with the exception of preferred packs which are
   always added first.

 - After adding entries from the set of packs in the compaction range,
   `compute_sorted_entries()` must adjust the `pack_int_id`'s for all
   objects added in each fanout layer to match their original
   `pack_int_id`'s (as opposed to the index at which each pack appears
   in `ctx.info`).

 - When writing out the new 'multi-pack-index-chain' file, discard any
   layers in the compaction range, replacing them with the newly written
   layer, instead of keeping them and placing the new layer at the end
   of the chain.

This ends up being sufficient to implement MIDX compaction in such a way
that preserves bitmaps corresponding to more recent layers in the MIDX
chain.

The tests for MIDX compaction are so far fairly spartan, since the main
interesting behavior here is ensuring that the right packs/objects are
selected from each layer, and that the pack order is preserved despite
whether or not they are sorted in lexicographic order in the original
MIDX chain.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.adoc |  13 ++
 builtin/multi-pack-index.c              |  67 +++++++
 midx-write.c                            | 242 ++++++++++++++++++++++--
 midx.h                                  |   5 +
 t/meson.build                           |   1 +
 t/t5335-compact-multi-pack-index.sh     | 102 ++++++++++
 6 files changed, 411 insertions(+), 19 deletions(-)
 create mode 100755 t/t5335-compact-multi-pack-index.sh

diff --git a/Documentation/git-multi-pack-index.adoc b/Documentation/git-multi-pack-index.adoc
index 164cf1f2291..a9664e77411 100644
--- a/Documentation/git-multi-pack-index.adoc
+++ b/Documentation/git-multi-pack-index.adoc
@@ -12,6 +12,8 @@ SYNOPSIS
 'git multi-pack-index' [<options>] write [--preferred-pack=<pack>]
 		         [--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]
 		         [--refs-snapshot=<path>]
+'git multi-pack-index' [<options>] compact [--[no-]incremental]
+		         <from> <to>
 'git multi-pack-index' [<options>] verify
 'git multi-pack-index' [<options>] expire
 'git multi-pack-index' [<options>] repack [--batch-size=<size>]
@@ -83,6 +85,17 @@ marker).
 		necessary.
 --
 
+compact::
+	Write a new MIDX layer containing only objects and packs present
+	in the range `<from>` to `<to>`, where both arguments are
+	checksums of existing layers in the MIDX chain.
++
+--
+	--incremental::
+		Write the result to a MIDX chain instead of writing a
+		stand-alone MIDX. Incompatible with `--bitmap`.
+--
+
 verify::
 	Verify the contents of the MIDX file.
 
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index c0c6c1760c0..9b0c2082cb3 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -17,6 +17,10 @@
 	   "  [--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]\n" \
 	   "  [--refs-snapshot=<path>]")
 
+#define BUILTIN_MIDX_COMPACT_USAGE \
+	N_("git multi-pack-index [<options>] compact [--[no-]incremental]\n" \
+	   "  <from> <to>")
+
 #define BUILTIN_MIDX_VERIFY_USAGE \
 	N_("git multi-pack-index [<options>] verify")
 
@@ -30,6 +34,10 @@ static char const * const builtin_multi_pack_index_write_usage[] = {
 	BUILTIN_MIDX_WRITE_USAGE,
 	NULL
 };
+static char const * const builtin_multi_pack_index_compact_usage[] = {
+	BUILTIN_MIDX_COMPACT_USAGE,
+	NULL
+};
 static char const * const builtin_multi_pack_index_verify_usage[] = {
 	BUILTIN_MIDX_VERIFY_USAGE,
 	NULL
@@ -44,6 +52,7 @@ static char const * const builtin_multi_pack_index_repack_usage[] = {
 };
 static char const * const builtin_multi_pack_index_usage[] = {
 	BUILTIN_MIDX_WRITE_USAGE,
+	BUILTIN_MIDX_COMPACT_USAGE,
 	BUILTIN_MIDX_VERIFY_USAGE,
 	BUILTIN_MIDX_EXPIRE_USAGE,
 	BUILTIN_MIDX_REPACK_USAGE,
@@ -195,6 +204,63 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
 	return ret;
 }
 
+static int cmd_multi_pack_index_compact(int argc, const char **argv,
+					const char *prefix,
+					struct repository *repo)
+{
+	struct multi_pack_index *m, *cur;
+	struct multi_pack_index *from_midx = NULL;
+	struct multi_pack_index *to_midx = NULL;
+	struct odb_source *source;
+	int ret;
+
+	struct option *options;
+	static struct option builtin_multi_pack_index_compact_options[] = {
+		OPT_BIT(0, "incremental", &opts.flags,
+			N_("write a new incremental MIDX"), MIDX_WRITE_INCREMENTAL),
+		OPT_END(),
+	};
+
+	repo_config(repo, git_multi_pack_index_write_config, NULL);
+
+	options = add_common_options(builtin_multi_pack_index_compact_options);
+
+	trace2_cmd_mode(argv[0]);
+
+	if (isatty(2))
+		opts.flags |= MIDX_PROGRESS;
+	argc = parse_options(argc, argv, prefix,
+			     options, builtin_multi_pack_index_compact_usage,
+			     0);
+
+	if (argc != 2)
+		usage_with_options(builtin_multi_pack_index_compact_usage,
+				   options);
+	source = handle_object_dir_option(the_repository);
+
+	FREE_AND_NULL(options);
+
+	m = get_multi_pack_index(source);
+
+	for (cur = m; cur && !(from_midx && to_midx); cur = cur->base_midx) {
+		const char *midx_csum = get_midx_checksum(cur);
+
+		if (!from_midx && !strcmp(midx_csum, argv[0]))
+			from_midx = cur;
+		if (!to_midx && !strcmp(midx_csum, argv[1]))
+			to_midx = cur;
+	}
+
+	if (!from_midx)
+		die(_("could not find MIDX 'from': %s"), argv[0]);
+	if (!to_midx)
+		die(_("could not find MIDX 'to': %s"), argv[1]);
+
+	ret = write_midx_file_compact(source, from_midx, to_midx, opts.flags);
+
+	return ret;
+}
+
 static int cmd_multi_pack_index_verify(int argc, const char **argv,
 				       const char *prefix,
 				       struct repository *repo UNUSED)
@@ -295,6 +361,7 @@ int cmd_multi_pack_index(int argc,
 	struct option builtin_multi_pack_index_options[] = {
 		OPT_SUBCOMMAND("repack", &fn, cmd_multi_pack_index_repack),
 		OPT_SUBCOMMAND("write", &fn, cmd_multi_pack_index_write),
+		OPT_SUBCOMMAND("compact", &fn, cmd_multi_pack_index_compact),
 		OPT_SUBCOMMAND("verify", &fn, cmd_multi_pack_index_verify),
 		OPT_SUBCOMMAND("expire", &fn, cmd_multi_pack_index_expire),
 		OPT_END(),
diff --git a/midx-write.c b/midx-write.c
index 7854561359d..fcbfedcd913 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -108,6 +108,10 @@ struct write_midx_context {
 	int incremental;
 	uint32_t num_multi_pack_indexes_before;
 
+	struct multi_pack_index *compact_from;
+	struct multi_pack_index *compact_to;
+	int compact;
+
 	struct string_list *to_include;
 
 	struct repository *repo;
@@ -117,6 +121,8 @@ struct write_midx_context {
 static uint32_t midx_pack_perm(struct write_midx_context *ctx,
 			       uint32_t orig_pack_int_id)
 {
+	if (ctx->compact)
+		orig_pack_int_id -= ctx->compact_from->num_packs_in_base;
 	return ctx->pack_perm[orig_pack_int_id];
 }
 
@@ -347,6 +353,21 @@ static void midx_fanout_add(struct midx_fanout *fanout,
 					    cur_fanout);
 }
 
+static void midx_fanout_add_compact(struct midx_fanout *fanout,
+				    struct write_midx_context *ctx,
+				    uint32_t cur_fanout)
+{
+	struct multi_pack_index *m = ctx->compact_to;
+
+	ASSERT(ctx->compact);
+
+	while (m && m != ctx->compact_from->base_midx) {
+		midx_fanout_add_midx_fanout(fanout, m, cur_fanout,
+					    NO_PREFERRED_PACK);
+		m = m->base_midx;
+	}
+}
+
 /*
  * It is possible to artificially get into a state where there are many
  * duplicate copies of objects. That can create high memory pressure if
@@ -365,6 +386,9 @@ static void compute_sorted_entries(struct write_midx_context *ctx,
 	size_t alloc_objects, total_objects = 0;
 	struct midx_fanout fanout = { 0 };
 
+	if (ctx->compact)
+		ASSERT(!start_pack);
+
 	for (cur_pack = start_pack; cur_pack < ctx->nr; cur_pack++)
 		total_objects = st_add(total_objects,
 				       ctx->info[cur_pack].p->num_objects);
@@ -383,7 +407,10 @@ static void compute_sorted_entries(struct write_midx_context *ctx,
 	for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
 		fanout.nr = 0;
 
-		midx_fanout_add(&fanout, ctx, start_pack, cur_fanout);
+		if (ctx->compact)
+			midx_fanout_add_compact(&fanout, ctx, cur_fanout);
+		else
+			midx_fanout_add(&fanout, ctx, start_pack, cur_fanout);
 		midx_fanout_sort(&fanout);
 
 		/*
@@ -953,6 +980,72 @@ static int fill_packs_from_midx(struct write_midx_context *ctx)
 	return 0;
 }
 
+static uint32_t compactible_packs_between(const struct multi_pack_index *from,
+					  const struct multi_pack_index *to)
+{
+	uint32_t nr;
+
+	ASSERT(from && to);
+
+	nr = u32_add(to->num_packs, to->num_packs_in_base);
+	if (nr < from->num_packs_in_base)
+		BUG("unexpected number of packs in base during compaction: "
+		    "%"PRIu32" < %"PRIu32, nr, from->num_packs_in_base);
+
+	return nr - from->num_packs_in_base;
+}
+
+static int fill_packs_from_midx_range(struct write_midx_context *ctx,
+				      int bitmap_order)
+{
+	struct multi_pack_index *m = ctx->compact_to;
+	uint32_t packs_nr;
+
+	ASSERT(ctx->compact && !ctx->nr);
+	ASSERT(ctx->compact_from);
+	ASSERT(ctx->compact_to);
+
+	packs_nr = compactible_packs_between(ctx->compact_from,
+					     ctx->compact_to);
+
+	ALLOC_GROW(ctx->info, packs_nr, ctx->alloc);
+
+	while (m != ctx->compact_from->base_midx) {
+		uint32_t pack_int_id, preferred_pack_id;
+		uint32_t i;
+
+		if (bitmap_order) {
+			if (midx_preferred_pack(m, &preferred_pack_id) < 0)
+				die(_("could not determine preferred pack"));
+		} else {
+			preferred_pack_id = m->num_packs_in_base;
+		}
+
+		pack_int_id = m->num_packs_in_base - ctx->compact_from->num_packs_in_base;
+
+		if (fill_pack_from_midx(&ctx->info[pack_int_id++], m,
+					preferred_pack_id) < 0)
+			return -1;
+
+		for (i = m->num_packs_in_base;
+		     i < m->num_packs_in_base + m->num_packs; i++) {
+			if (preferred_pack_id == i)
+				continue;
+
+			if (fill_pack_from_midx(&ctx->info[pack_int_id++], m,
+						i) < 0)
+				return -1;
+		}
+
+		ctx->nr += m->num_packs;
+		m = m->base_midx;
+	}
+
+	ASSERT(ctx->nr == packs_nr);
+
+	return 0;
+}
+
 static struct {
 	const char *non_split;
 	const char *split;
@@ -1038,12 +1131,22 @@ static void clear_midx_files(struct odb_source *source,
 	strbuf_release(&buf);
 }
 
+static int midx_hashcmp(const struct multi_pack_index *a,
+			const struct multi_pack_index *b,
+			const struct git_hash_algo *algop)
+{
+	return hashcmp(get_midx_hash(a), get_midx_hash(b), algop);
+}
+
 struct write_midx_opts {
 	struct odb_source *source;
 
 	struct string_list *packs_to_include;
 	struct string_list *packs_to_drop;
 
+	struct multi_pack_index *compact_from;
+	struct multi_pack_index *compact_to;
+
 	const char *preferred_pack_name;
 	const char *refs_snapshot;
 	unsigned flags;
@@ -1066,6 +1169,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
 	int dropped_packs = 0;
 	int result = -1;
 	const char **keep_hashes = NULL;
+	size_t keep_hashes_nr = 0;
 	struct chunkfile *cf;
 
 	trace2_region_enter("midx", "write_midx_internal", r);
@@ -1074,6 +1178,17 @@ static int write_midx_internal(struct write_midx_opts *opts)
 	ctx.source = opts->source;
 
 	ctx.incremental = !!(opts->flags & MIDX_WRITE_INCREMENTAL);
+	ctx.compact = !!(opts->flags & MIDX_WRITE_COMPACT);
+
+	if (ctx.compact) {
+		if (!opts->compact_from)
+			BUG("expected non-NULL 'from' MIDX during compaction");
+		if (!opts->compact_to)
+			BUG("expected non-NULL 'to' MIDX during compaction");
+
+		ctx.compact_from = opts->compact_from;
+		ctx.compact_to = opts->compact_to;
+	}
 
 	if (ctx.incremental)
 		strbuf_addf(&midx_name,
@@ -1101,11 +1216,18 @@ static int write_midx_internal(struct write_midx_opts *opts)
 			 */
 			if (ctx.incremental)
 				ctx.base_midx = m;
-			else if (!opts->packs_to_include)
+			if (!opts->packs_to_include)
 				ctx.m = m;
 		}
 	}
 
+	/*
+	 * If compacting MIDX layer(s) in the range [from, to], then the
+	 * compacted MIDX will share the same base MIDX as 'from'.
+	 */
+	if (ctx.compact)
+		ctx.base_midx = ctx.compact_from->base_midx;
+
 	ctx.nr = 0;
 	ctx.alloc = ctx.m ? ctx.m->num_packs + ctx.m->num_packs_in_base : 16;
 	ctx.info = NULL;
@@ -1122,7 +1244,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
 			ctx.num_multi_pack_indexes_before++;
 			m = m->base_midx;
 		}
-	} else if (ctx.m && fill_packs_from_midx(&ctx)) {
+	} else if (ctx.m && !ctx.compact && fill_packs_from_midx(&ctx)) {
 		goto cleanup;
 	}
 
@@ -1135,13 +1257,23 @@ static int write_midx_internal(struct write_midx_opts *opts)
 	else
 		ctx.progress = NULL;
 
-	ctx.to_include = opts->packs_to_include;
+	if (ctx.compact) {
+		int bitmap_order = 0;
+		if (opts->preferred_pack_name)
+			bitmap_order |= 1;
+		else if (opts->flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))
+			bitmap_order |= 1;
 
-	for_each_file_in_pack_dir(opts->source->path, add_pack_to_midx, &ctx);
+		fill_packs_from_midx_range(&ctx, bitmap_order);
+	} else {
+		ctx.to_include = opts->packs_to_include;
+		for_each_file_in_pack_dir(opts->source->path, add_pack_to_midx, &ctx);
+	}
 	stop_progress(&ctx.progress);
 
 	if ((ctx.m && ctx.nr == ctx.m->num_packs + ctx.m->num_packs_in_base) &&
 	    !ctx.incremental &&
+	    !ctx.compact &&
 	    !(opts->packs_to_include || opts->packs_to_drop)) {
 		struct bitmap_index *bitmap_git;
 		int bitmap_exists;
@@ -1255,12 +1387,15 @@ static int write_midx_internal(struct write_midx_opts *opts)
 			ctx.large_offsets_needed = 1;
 	}
 
-	QSORT(ctx.info, ctx.nr, pack_info_compare);
+	if (!ctx.compact)
+		QSORT(ctx.info, ctx.nr, pack_info_compare);
 
 	if (opts->packs_to_drop && opts->packs_to_drop->nr) {
 		size_t drop_index = 0;
 		int missing_drops = 0;
 
+		ASSERT(!ctx.compact);
+
 		for (size_t i = 0;
 		     i < ctx.nr && drop_index < opts->packs_to_drop->nr; i++) {
 			int cmp = strcmp(ctx.info[i].pack_name,
@@ -1292,12 +1427,20 @@ static int write_midx_internal(struct write_midx_opts *opts)
 	 */
 	ALLOC_ARRAY(ctx.pack_perm, ctx.nr);
 	for (size_t i = 0; i < ctx.nr; i++) {
+		uint32_t from = ctx.info[i].orig_pack_int_id;
+		uint32_t to;
+
 		if (ctx.info[i].expired) {
+			to = PACK_EXPIRED;
 			dropped_packs++;
-			ctx.pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED;
 		} else {
-			ctx.pack_perm[ctx.info[i].orig_pack_int_id] = i - dropped_packs;
+			to = i - dropped_packs;
 		}
+
+		if (ctx.compact)
+			from -= ctx.compact_from->num_packs_in_base;
+
+		ctx.pack_perm[from] = to;
 	}
 
 	for (size_t i = 0; i < ctx.nr; i++) {
@@ -1445,7 +1588,24 @@ static int write_midx_internal(struct write_midx_opts *opts)
 	if (ctx.num_multi_pack_indexes_before == UINT32_MAX)
 		die(_("too many multi-pack-indexes"));
 
-	CALLOC_ARRAY(keep_hashes, ctx.num_multi_pack_indexes_before + 1);
+	if (ctx.compact) {
+		struct multi_pack_index *m;
+
+		/*
+		 * Keep all MIDX layers excluding those in the range [from, to].
+		 */
+		for (m = ctx.base_midx; m; m = m->base_midx)
+			keep_hashes_nr++;
+		for (m = ctx.m;
+		     m && midx_hashcmp(m, ctx.compact_to, r->hash_algo);
+		     m = m->base_midx)
+			keep_hashes_nr++;
+
+		keep_hashes_nr++; /* include the compacted layer */
+	} else {
+		keep_hashes_nr = ctx.num_multi_pack_indexes_before + 1;
+	}
+	CALLOC_ARRAY(keep_hashes, keep_hashes_nr);
 
 	if (ctx.incremental) {
 		FILE *chainf = fdopen_lock_file(&lk, "w");
@@ -1470,17 +1630,47 @@ static int write_midx_internal(struct write_midx_opts *opts)
 
 		strbuf_release(&final_midx_name);
 
-		keep_hashes[ctx.num_multi_pack_indexes_before] =
-			xstrdup(hash_to_hex_algop(midx_hash, r->hash_algo));
+		if (ctx.compact) {
+			struct multi_pack_index *m;
+			uint32_t num_layers_before_from = 0;
+			uint32_t i;
 
-		for (uint32_t i = 0; i < ctx.num_multi_pack_indexes_before; i++) {
-			uint32_t j = ctx.num_multi_pack_indexes_before - i - 1;
+			for (m = ctx.base_midx; m; m = m->base_midx)
+				num_layers_before_from++;
 
-			keep_hashes[j] = xstrdup(get_midx_checksum(m));
-			m = m->base_midx;
+			m = ctx.base_midx;
+			for (i = 0; i < num_layers_before_from; i++) {
+				uint32_t j = num_layers_before_from - i - 1;
+
+				keep_hashes[j] = xstrdup(get_midx_checksum(m));
+				m = m->base_midx;
+			}
+
+			keep_hashes[i] = xstrdup(hash_to_hex_algop(midx_hash,
+								   r->hash_algo));
+
+			i = 0;
+			for (m = ctx.m;
+			     m && midx_hashcmp(m, ctx.compact_to, r->hash_algo);
+			     m = m->base_midx) {
+				keep_hashes[keep_hashes_nr - i - 1] =
+					xstrdup(get_midx_checksum(m));
+				i++;
+			}
+		} else {
+			keep_hashes[ctx.num_multi_pack_indexes_before] =
+				xstrdup(hash_to_hex_algop(midx_hash,
+							  r->hash_algo));
+
+			for (uint32_t i = 0; i < ctx.num_multi_pack_indexes_before; i++) {
+				uint32_t j = ctx.num_multi_pack_indexes_before - i - 1;
+
+				keep_hashes[j] = xstrdup(get_midx_checksum(m));
+				m = m->base_midx;
+			}
 		}
 
-		for (uint32_t i = 0; i <= ctx.num_multi_pack_indexes_before; i++)
+		for (uint32_t i = 0; i < keep_hashes_nr; i++)
 			fprintf(get_lock_file_fp(&lk), "%s\n", keep_hashes[i]);
 	} else {
 		keep_hashes[ctx.num_multi_pack_indexes_before] =
@@ -1493,8 +1683,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
 	if (commit_lock_file(&lk) < 0)
 		die_errno(_("could not write multi-pack-index"));
 
-	clear_midx_files(opts->source, keep_hashes,
-			 ctx.num_multi_pack_indexes_before + 1,
+	clear_midx_files(opts->source, keep_hashes, keep_hashes_nr,
 			 ctx.incremental);
 	result = 0;
 
@@ -1512,7 +1701,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
 	free(ctx.pack_perm);
 	free(ctx.pack_order);
 	if (keep_hashes) {
-		for (uint32_t i = 0; i <= ctx.num_multi_pack_indexes_before; i++)
+		for (uint32_t i = 0; i < keep_hashes_nr; i++)
 			free((char *)keep_hashes[i]);
 		free(keep_hashes);
 	}
@@ -1553,6 +1742,21 @@ int write_midx_file_only(struct odb_source *source,
 	return write_midx_internal(&opts);
 }
 
+int write_midx_file_compact(struct odb_source *source,
+			    struct multi_pack_index *from,
+			    struct multi_pack_index *to,
+			    unsigned flags)
+{
+	struct write_midx_opts opts = {
+		.source = source,
+		.compact_from = from,
+		.compact_to = to,
+		.flags = flags | MIDX_WRITE_COMPACT,
+	};
+
+	return write_midx_internal(&opts);
+}
+
 int expire_midx_packs(struct odb_source *source, unsigned flags)
 {
 	uint32_t i, *count, result = 0;
diff --git a/midx.h b/midx.h
index 39bf04b18e5..61f9809b8c9 100644
--- a/midx.h
+++ b/midx.h
@@ -81,6 +81,7 @@ struct multi_pack_index {
 #define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3)
 #define MIDX_WRITE_BITMAP_LOOKUP_TABLE (1 << 4)
 #define MIDX_WRITE_INCREMENTAL (1 << 5)
+#define MIDX_WRITE_COMPACT (1 << 6)
 
 #define MIDX_EXT_REV "rev"
 #define MIDX_EXT_BITMAP "bitmap"
@@ -130,6 +131,10 @@ int write_midx_file_only(struct odb_source *source,
 			 struct string_list *packs_to_include,
 			 const char *preferred_pack_name,
 			 const char *refs_snapshot, unsigned flags);
+int write_midx_file_compact(struct odb_source *source,
+			    struct multi_pack_index *from,
+			    struct multi_pack_index *to,
+			    unsigned flags);
 void clear_midx_file(struct repository *r);
 int verify_midx_file(struct odb_source *source, unsigned flags);
 int expire_midx_packs(struct odb_source *source, unsigned flags);
diff --git a/t/meson.build b/t/meson.build
index 7c994d4643e..2d1926faaf2 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -613,6 +613,7 @@ integration_tests = [
   't5332-multi-pack-reuse.sh',
   't5333-pseudo-merge-bitmaps.sh',
   't5334-incremental-multi-pack-index.sh',
+  't5335-compact-multi-pack-index.sh',
   't5351-unpack-large-objects.sh',
   't5400-send-pack.sh',
   't5401-update-hooks.sh',
diff --git a/t/t5335-compact-multi-pack-index.sh b/t/t5335-compact-multi-pack-index.sh
new file mode 100755
index 00000000000..f889af7fb1d
--- /dev/null
+++ b/t/t5335-compact-multi-pack-index.sh
@@ -0,0 +1,102 @@
+#!/bin/sh
+
+test_description='multi-pack-index compaction'
+
+. ./test-lib.sh
+
+GIT_TEST_MULTI_PACK_INDEX=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=0
+
+objdir=.git/objects
+packdir=$objdir/pack
+midxdir=$packdir/multi-pack-index.d
+midx_chain=$midxdir/multi-pack-index-chain
+
+nth_line() {
+	local n="$1"
+	shift
+	awk "NR==$n" "$@"
+}
+
+write_packs () {
+	for c in "$@"
+	do
+		test_commit "$c" &&
+
+		git pack-objects --all --unpacked $packdir/pack-$c &&
+		git prune-packed &&
+
+		git multi-pack-index write --incremental --bitmap || return 1
+	done
+}
+
+test_midx_layer_packs () {
+	local checksum="$1" &&
+	shift &&
+
+	test-tool read-midx $objdir "$checksum" >out &&
+
+	printf "%s\n" "$@" >expect &&
+	# NOTE: do *not* pipe through sort here, we want to ensure the
+	# order of packs is preserved during compaction.
+	grep "^pack-" out | cut -d"-" -f2 >actual &&
+
+	test_cmp expect actual
+}
+
+test_midx_layer_object_uniqueness () {
+	: >objs.all
+	while read layer
+	do
+		test-tool read-midx --show-objects $objdir "$layer" >out &&
+		grep "\.pack$" out | cut -d" " -f1 | sort >objs.layer &&
+		test_stdout_line_count = 0 comm -12 objs.all objs.layer &&
+		cat objs.all objs.layer | sort >objs.tmp &&
+		mv objs.tmp objs.all || return 1
+	done <$midx_chain
+}
+
+test_expect_success 'MIDX compaction with lex-ordered pack names' '
+	git init midx-compact-lex-order &&
+	(
+		cd midx-compact-lex-order &&
+
+		write_packs A B C D E &&
+		test_line_count = 5 $midx_chain &&
+
+		git multi-pack-index compact --incremental \
+			"$(nth_line 2 "$midx_chain")" \
+			"$(nth_line 4 "$midx_chain")" &&
+		test_line_count = 3 $midx_chain &&
+
+		test_midx_layer_packs "$(nth_line 1 "$midx_chain")" A &&
+		test_midx_layer_packs "$(nth_line 2 "$midx_chain")" B C D &&
+		test_midx_layer_packs "$(nth_line 3 "$midx_chain")" E &&
+
+		test_midx_layer_object_uniqueness
+	)
+'
+
+test_expect_success 'MIDX compaction with non-lex-ordered pack names' '
+	git init midx-compact-non-lex-order &&
+	(
+		cd midx-compact-non-lex-order &&
+
+		write_packs D C A B E &&
+		test_line_count = 5 $midx_chain &&
+
+		git multi-pack-index compact --incremental \
+			"$(nth_line 2 "$midx_chain")" \
+			"$(nth_line 4 "$midx_chain")" &&
+		test_line_count = 3 $midx_chain &&
+
+		test_midx_layer_packs "$(nth_line 1 "$midx_chain")" D &&
+		test_midx_layer_packs "$(nth_line 2 "$midx_chain")" C A B &&
+		test_midx_layer_packs "$(nth_line 3 "$midx_chain")" E &&
+
+		test_midx_layer_object_uniqueness
+	)
+'
+
+test_done
-- 
2.52.0.171.gd6a4e6b6955


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 17/17] midx: enable reachability bitmaps during MIDX compaction
  2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                   ` (15 preceding siblings ...)
  2025-12-06 20:31 ` [PATCH 16/17] midx: implement MIDX compaction Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
  2025-12-09  7:21   ` Patrick Steinhardt
  2026-01-14 19:54 ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Taylor Blau
  2026-02-24 18:59 ` [PATCH v3 00/17] " Taylor Blau
  18 siblings, 1 reply; 99+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt

Enable callers to generate reachability bitmaps when performing MIDX
layer compaction by combining all existing bitmaps from the compacted
layers.

Note that the because of the object/pack ordering described by the
previous commit, the pseudo-pack order for the compacted MIDX is the
same as concatenating the individual pseudo-pack orderings for each
layer in the compaction range.

As a result, the only non-test or documentation change necessary is to
treat all objects as non-preferred during compaction so as not to
disturb the object ordering.

In the future, we may want to adjust which commit(s) receive
reachability bitmaps when compacting multiple .bitmap files into one, or
even generate new bitmaps (e.g., if the references have moved
significantly since the .bitmap was generated). This commit only
implements combining all existing bitmaps in range together in order to
demonstrate and lay the groundwork for more exotic strategies.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.adoc |   2 +-
 builtin/multi-pack-index.c              |   4 +-
 midx-write.c                            |   2 +-
 t/t5335-compact-multi-pack-index.sh     | 120 +++++++++++++++++++++++-
 4 files changed, 123 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-multi-pack-index.adoc b/Documentation/git-multi-pack-index.adoc
index a9664e77411..458bb873633 100644
--- a/Documentation/git-multi-pack-index.adoc
+++ b/Documentation/git-multi-pack-index.adoc
@@ -13,7 +13,7 @@ SYNOPSIS
 		         [--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]
 		         [--refs-snapshot=<path>]
 'git multi-pack-index' [<options>] compact [--[no-]incremental]
-		         <from> <to>
+		         [--[no-]bitmap] <from> <to>
 'git multi-pack-index' [<options>] verify
 'git multi-pack-index' [<options>] expire
 'git multi-pack-index' [<options>] repack [--batch-size=<size>]
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 9b0c2082cb3..40afa8f1ed8 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -19,7 +19,7 @@
 
 #define BUILTIN_MIDX_COMPACT_USAGE \
 	N_("git multi-pack-index [<options>] compact [--[no-]incremental]\n" \
-	   "  <from> <to>")
+	   "  [--[no-]bitmap] <from> <to>")
 
 #define BUILTIN_MIDX_VERIFY_USAGE \
 	N_("git multi-pack-index [<options>] verify")
@@ -216,6 +216,8 @@ static int cmd_multi_pack_index_compact(int argc, const char **argv,
 
 	struct option *options;
 	static struct option builtin_multi_pack_index_compact_options[] = {
+		OPT_BIT(0, "bitmap", &opts.flags, N_("write multi-pack bitmap"),
+			MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
 		OPT_BIT(0, "incremental", &opts.flags,
 			N_("write a new incremental MIDX"), MIDX_WRITE_INCREMENTAL),
 		OPT_END(),
diff --git a/midx-write.c b/midx-write.c
index fcbfedcd913..f2dbacef4cd 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -657,7 +657,7 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx)
 		struct pack_midx_entry *e = &ctx->entries[i];
 		data[i].nr = i;
 		data[i].pack = midx_pack_perm(ctx, e->pack_int_id);
-		if (!e->preferred)
+		if (!e->preferred || ctx->compact)
 			data[i].pack |= (1U << 31);
 		data[i].offset = e->offset;
 	}
diff --git a/t/t5335-compact-multi-pack-index.sh b/t/t5335-compact-multi-pack-index.sh
index f889af7fb1d..a306f504305 100755
--- a/t/t5335-compact-multi-pack-index.sh
+++ b/t/t5335-compact-multi-pack-index.sh
@@ -65,7 +65,7 @@ test_expect_success 'MIDX compaction with lex-ordered pack names' '
 		write_packs A B C D E &&
 		test_line_count = 5 $midx_chain &&
 
-		git multi-pack-index compact --incremental \
+		git multi-pack-index compact --incremental --bitmap \
 			"$(nth_line 2 "$midx_chain")" \
 			"$(nth_line 4 "$midx_chain")" &&
 		test_line_count = 3 $midx_chain &&
@@ -86,7 +86,7 @@ test_expect_success 'MIDX compaction with non-lex-ordered pack names' '
 		write_packs D C A B E &&
 		test_line_count = 5 $midx_chain &&
 
-		git multi-pack-index compact --incremental \
+		git multi-pack-index compact --incremental --bitmap \
 			"$(nth_line 2 "$midx_chain")" \
 			"$(nth_line 4 "$midx_chain")" &&
 		test_line_count = 3 $midx_chain &&
@@ -99,4 +99,120 @@ test_expect_success 'MIDX compaction with non-lex-ordered pack names' '
 	)
 '
 
+midx_objs_by_pack () {
+	awk '/\.pack$/ { split($3, a, "-"); print a[2], $1 }' | sort
+}
+
+tag_objs_from_pack () {
+	objs="$(git rev-list --objects --no-object-names "$2")" &&
+	printf "$1 %s\n" $objs | sort
+}
+
+test_expect_success 'MIDX compaction preserves pack object selection' '
+	git init midx-compact-preserve-selection &&
+	(
+		cd midx-compact-preserve-selection &&
+
+		test_commit A &&
+		test_commit B &&
+
+		# Create two packs, one containing just the objects from
+		# A, and another containing all objects from the
+		# repository.
+		p1="$(echo A | git pack-objects --revs --delta-base-offset \
+			$packdir/pack-1)" &&
+		p0="$(echo B | git pack-objects --revs --delta-base-offset \
+			$packdir/pack-0)" &&
+
+		echo "pack-1-$p1.idx" | git multi-pack-index write \
+			--incremental --bitmap --stdin-packs &&
+		echo "pack-0-$p0.idx" | git multi-pack-index write \
+			--incremental --bitmap --stdin-packs &&
+
+		write_packs C &&
+
+		git multi-pack-index compact --incremental --bitmap \
+			"$(nth_line 1 "$midx_chain")" \
+			"$(nth_line 2 "$midx_chain")" &&
+
+
+		test-tool read-midx --show-objects $objdir \
+			"$(nth_line 1 "$midx_chain")" >AB.info &&
+		test-tool read-midx --show-objects $objdir \
+			"$(nth_line 2 "$midx_chain")" >C.info &&
+
+		midx_objs_by_pack <AB.info >AB.actual &&
+		midx_objs_by_pack <C.info >C.actual &&
+
+		{
+			tag_objs_from_pack 1 A &&
+			tag_objs_from_pack 0 A..B
+		} | sort >AB.expect &&
+		tag_objs_from_pack C B..C >C.expect &&
+
+		test_cmp AB.expect AB.actual &&
+		test_cmp C.expect C.actual
+	)
+'
+
+test_expect_success 'MIDX compaction with bitmaps' '
+	git init midx-compact-with-bitmaps &&
+	(
+		cd midx-compact-with-bitmaps &&
+
+		write_packs foo bar baz quux woot &&
+
+		test-tool read-midx --bitmap $objdir >bitmap.expect &&
+		git multi-pack-index compact --incremental --bitmap \
+			"$(nth_line 2 "$midx_chain")" \
+			"$(nth_line 4 "$midx_chain")" &&
+		test-tool read-midx --bitmap $objdir >bitmap.actual &&
+
+		test_cmp bitmap.expect bitmap.actual &&
+
+		true
+	)
+'
+
+test_expect_success 'MIDX compaction with bitmaps (non-trivial)' '
+	git init midx-compact-with-bitmaps-non-trivial &&
+	(
+		cd midx-compact-with-bitmaps-non-trivial &&
+
+		git branch -m main &&
+
+		#               D(4)
+		#              /
+		# A(1) --- B(2) --- C(3) --- G(7)
+		#              \
+		#               E(5) --- F(6)
+		write_packs A B C &&
+		git checkout -b side &&
+		write_packs D &&
+		git checkout -b other B &&
+		write_packs E F &&
+		git checkout main &&
+		write_packs G &&
+
+		cat $midx_chain &&
+
+		# Compact layers 2-4, leaving us with:
+		#
+		#  [A, [B, C, D], E, F, G]
+		git multi-pack-index compact --incremental --bitmap \
+			"$(nth_line 2 "$midx_chain")" \
+			"$(nth_line 4 "$midx_chain")" &&
+
+		# Then compact the top two layers, condensing the above
+		# such that the new 4th layer contains F and G.
+		#
+		#  [A, [B, C, D], E, [F, G]]
+		git multi-pack-index compact --incremental --bitmap \
+			"$(nth_line 4 "$midx_chain")" \
+			"$(nth_line 5 "$midx_chain")" &&
+
+		cat $midx_chain
+	)
+'
+
 test_done
-- 
2.52.0.171.gd6a4e6b6955

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH 02/17] midx: split `get_midx_checksum()` by adding `get_midx_hash()`
  2025-12-06 20:31 ` [PATCH 02/17] midx: split `get_midx_checksum()` by adding `get_midx_hash()` Taylor Blau
@ 2025-12-08 18:25   ` Patrick Steinhardt
  2025-12-09  1:42     ` Taylor Blau
  0 siblings, 1 reply; 99+ messages in thread
From: Patrick Steinhardt @ 2025-12-08 18:25 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano

On Sat, Dec 06, 2025 at 03:31:04PM -0500, Taylor Blau wrote:
> When trying to print out, say, the hexadecimal representation of a
> MIDX's hash, our code will do something like:
> 
>     hash_to_hex_algop(get_midx_checksum(m),
>                       m->source->odb->repo->hash_algo);
> 
> , which is both cumbersome and repetitive. In fact, all but a handful of
> callers to `get_midx_checksum()` do exactly the above. Reduce the
> repetitive nature of calling `get_midx_checksum()` by having it return a
> pointer into a static buffer containing the above result.
> 
> For the handful of callers that do need to compare the raw bytes and
> don't want to deal with an encoded copy (e.g., because they are passing
> it to hasheq() or similar), introduce `get_midx_hash()` which returns
> the raw bytes.

This is a welcome change indeed.

> diff --git a/midx.h b/midx.h
> index 7c7e0b59121..e188ffeb578 100644
> --- a/midx.h
> +++ b/midx.h
> @@ -85,7 +85,8 @@ struct multi_pack_index {
>  #define MIDX_EXT_BITMAP "bitmap"
>  #define MIDX_EXT_MIDX "midx"
>  
> -const unsigned char *get_midx_checksum(const struct multi_pack_index *m);
> +const char *get_midx_checksum(const struct multi_pack_index *m) /* static buffer */;
> +const unsigned char *get_midx_hash(const struct multi_pack_index *m);
>  void get_midx_filename(struct odb_source *source, struct strbuf *out);
>  void get_midx_filename_ext(struct odb_source *source, struct strbuf *out,
>  			   const unsigned char *hash, const char *ext);

If I didn't have the context of this patch series I would be wondering
what the actual difference between `get_midx_checksum()` and
`get_midx_hash()` is. The way the functions are named seems to rather
indicate that we talk about two different kinds of hashes, rather than
two different ways to encode them.

Would it maybe be preferable to call them `get_midx_checksum()` and
`get_midx_checksum_hex()`? While at it, we could go even further and
rename them to `midx_get_checksum()` and `midx_get_checksum_hex()` to
conform to our modern best practices.

Patrick

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 07/17] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos`
  2025-12-06 20:31 ` [PATCH 07/17] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos` Taylor Blau
@ 2025-12-08 18:26   ` Patrick Steinhardt
  2025-12-09  1:59     ` Taylor Blau
  0 siblings, 1 reply; 99+ messages in thread
From: Patrick Steinhardt @ 2025-12-08 18:26 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano

On Sat, Dec 06, 2025 at 03:31:19PM -0500, Taylor Blau wrote:
> In midx_pack_order(), we compute for each bitampped pack the first bit

s/bitampped/bitmapped/

> to correspond to an object in that pack, along with how many bits were
> assigned to object(s) in that pack.
> 
> Initially, each bitmap_nr value is set to zero, and each bitmap_pos

I assume `bitmap_nr` is the number of bits, whereas `bitmap_pos` is the
position of the first bit?

> value is set to the sentinel BITMAP_POS_UNKNOWN. This is done to ensure
> that there are no packs who have an unknown bit position but a somehow
> non-zero number of objects (cf. `write_midx_bitmapped_packs()` in
> midx-write.c).
> 
> Once the pack order is fully determined, midx_pack_order() sets the
> bitmap_pos field for any bitmapped packs to zero if they are still
> listed as BITMAP_POS_UNKNOWN.

If so, it feels somewhat weird that we'd set the `bitmap_pos` to zero.
But I guess it doesn't matter anyway, as I assume that the `bitmap_nr`
would be zero anyway?

Anyway, reading on.

> However, we enumerate the bitmapped packs in order of `ctx->pack_perm`.

Which is the "permutation between pack-int-ids from the previous
multi-pack-index to the new one we are writing"'. So it's basically
tracking which new packs correspond to the old packs.

> This is fine for existing cases, since the only time the
> `ctx->pack_perm` array holds a value outside of the addressable range of
> `ctx->info` is when there are expired packs, which only occurs via 'git
> multi-pack-index expire', which does not support writing MIDX bitmaps.
> As a result, the range of ctx->pack_perm covers all values in [0,
> `ctx->nr`), so enumerating in this order isn't an issue.
> 
> A future change necessary for compaction will complicate this further by
> introducing a wrapper around the `ctx->pack_perm` array, which turns the
> given `pack_int_id` into one that is relative to the lower end of the
> compaction range. As a result, indexing into `ctx->pack_perm` through
> this helper, say, with "0" will produce a crash when the lower end of
> the compaction range has >0 pack(s) in its base layer, since the
> subtraction will wrap around the 32-bit unsigned range, resulting in an
> uninitialized read.
> 
> But the process is completely unnecessary in the first place: we are
> enumerating all values of `ctx->info`, and there is no reason to process
> them in a different order than they appear in memory. Index `ctx->info`
> directly to reflect that.

Fair. We do initialize the permutations like this:

	ALLOC_ARRAY(ctx.pack_perm, ctx.nr);
	for (size_t i = 0; i < ctx.nr; i++) {
		if (ctx.info[i].expired) {
			dropped_packs++;
			ctx.pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED;
		} else {
			ctx.pack_perm[ctx.info[i].orig_pack_int_id] = i - dropped_packs;
		}
	}

So obviously, the permutation will only ever be different in case we've
got at least one dropped pack, and that only happens when we expire any
packs. So the explanation matches.

Of course it may be a bit more fragile now if we ever added a caller
of this function that _does_ expire data. But we don't have any, so that
enters the territory of overthinking things.

> diff --git a/midx-write.c b/midx-write.c
> index 73d24fabbc6..c30f6a70d37 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -637,7 +637,7 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx)
>  		pack_order[i] = data[i].nr;
>  	}
>  	for (i = 0; i < ctx->nr; i++) {
> -		struct pack_info *pack = &ctx->info[ctx->pack_perm[i]];
> +		struct pack_info *pack = &ctx->info[i];
>  		if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
>  			pack->bitmap_pos = 0;
>  	}

The change looks simple enough.

Patrick

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 08/17] midx-write.c: introduce `struct write_midx_opts`
  2025-12-06 20:31 ` [PATCH 08/17] midx-write.c: introduce `struct write_midx_opts` Taylor Blau
@ 2025-12-08 18:26   ` Patrick Steinhardt
  2025-12-09  2:04     ` Taylor Blau
  0 siblings, 1 reply; 99+ messages in thread
From: Patrick Steinhardt @ 2025-12-08 18:26 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano

On Sat, Dec 06, 2025 at 03:31:22PM -0500, Taylor Blau wrote:
> diff --git a/midx-write.c b/midx-write.c
> index c30f6a70d37..b262631ae45 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -1014,14 +1014,20 @@ static void clear_midx_files(struct odb_source *source,
>  	strbuf_release(&buf);
>  }
>  
> -static int write_midx_internal(struct odb_source *source,
> -			       struct string_list *packs_to_include,
> -			       struct string_list *packs_to_drop,
> -			       const char *preferred_pack_name,
> -			       const char *refs_snapshot,
> -			       unsigned flags)
> +struct write_midx_opts {
> +	struct odb_source *source;
> +
> +	struct string_list *packs_to_include;
> +	struct string_list *packs_to_drop;
> +
> +	const char *preferred_pack_name;
> +	const char *refs_snapshot;
> +	unsigned flags;
> +};
> +
> +static int write_midx_internal(struct write_midx_opts *opts)
>  {
> -	struct repository *r = source->odb->repo;
> +	struct repository *r = opts->source->odb->repo;
>  	struct strbuf midx_name = STRBUF_INIT;
>  	unsigned char midx_hash[GIT_MAX_RAWSZ];
>  	uint32_t start_pack;

One might argue that parameters which _must_ be passed could be moved
out of the structure and into the function signature, and as far as I
understand, that would only be the `struct odb_source`. After all, we
are talking about options, and a mandatory field is not really an option
in my book. It also makes the interface at least a tiny bit more self
documenting.

Other than that this patch looks like a nice improvement to me.

> @@ -1566,8 +1586,11 @@ int expire_midx_packs(struct odb_source *source, unsigned flags)
>  	free(count);
>  
>  	if (packs_to_drop.nr)
> -		result = write_midx_internal(source, NULL,
> -					     &packs_to_drop, NULL, NULL, flags);
> +		result = write_midx_internal(&(struct write_midx_opts) {
> +					     .source = source,
> +					     .packs_to_drop = &packs_to_drop,
> +					     .flags = flags & MIDX_PROGRESS,
> +					     });
>  
>  	string_list_clear(&packs_to_drop, 0);
>  

I think this syntax is not allowed in our codebase except for a test
balloon just yet. See aso 9b2527caa4 (CodingGuidelines: document test
balloons in flight, 2025-07-23):

    since late 2024 with v2.48.0-rc0~20, we have test balloons for
    compound literal syntax, e.g., (struct foo){ .member = value };
    our hope is that no platforms we care about have trouble using
    them, and officially adopt its wider use in mid 2026.  Do not add
    more use of the syntax until that happens.

> @@ -1774,8 +1797,10 @@ int midx_repack(struct odb_source *source, size_t batch_size, unsigned flags)
>  		goto cleanup;
>  	}
>  
> -	result = write_midx_internal(source, NULL, NULL, NULL, NULL,
> -				     flags);
> +	result = write_midx_internal(&(struct write_midx_opts) {
> +				     .source = source,
> +				     .flags = flags,
> +				     });

Same here.

Patrick

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 01/17] midx: mark `get_midx_checksum()` arguments as const
  2025-12-06 20:31 ` [PATCH 01/17] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
@ 2025-12-08 18:26   ` Patrick Steinhardt
  2025-12-09  1:41     ` Taylor Blau
  0 siblings, 1 reply; 99+ messages in thread
From: Patrick Steinhardt @ 2025-12-08 18:26 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano

On Sat, Dec 06, 2025 at 03:31:01PM -0500, Taylor Blau wrote:
> To make clear that the fucntion `get_midx_checksum()` does not do

Nit: s/fucntion/functino

> anything to modify its argument, mark the MIDX pointer as const.
> 
> The following commit will rename this function altogether to make clear
> that it returns the raw bytes of the checksum, not a hex-encoded copy of
> it.

Patrick

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 09/17] midx: do not require packs to be sorted in lexicographic order
  2025-12-06 20:31 ` [PATCH 09/17] midx: do not require packs to be sorted in lexicographic order Taylor Blau
@ 2025-12-08 18:26   ` Patrick Steinhardt
  2025-12-09  2:07     ` Taylor Blau
  0 siblings, 1 reply; 99+ messages in thread
From: Patrick Steinhardt @ 2025-12-08 18:26 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano

On Sat, Dec 06, 2025 at 03:31:25PM -0500, Taylor Blau wrote:
> Note that this produces MIDXs which may be incompatible with earlier
> versions of Git that have stricter requirements on the layout of packs
> within a MIDX. This patch does *not* modify the version number of the
> MIDX format, since existing versions of Git already know to gracefully
> ignore a MIDX with packs that appear out-of-order.

Interesting. Did you verify how other implementations of Git behave if
we start to relax this requirement? It seems like a somewhat dangerous
assumption to me that this will just continue to work.

Also, is there a reason why you prefer this over bumping the version
number?

> @@ -656,17 +652,37 @@ int cmp_idx_or_pack_name(const char *idx_or_pack_name,
>  	return strcmp(idx_or_pack_name, idx_name);
>  }
>  
> +
> +static int midx_pack_names_cmp(const void *a, const void *b, void *m_)
> +{
> +	struct multi_pack_index *m = m_;
> +	return strcmp(m->pack_names[*(const size_t *)a],
> +		      m->pack_names[*(const size_t *)b]);
> +}
> +
>  static int midx_contains_pack_1(struct multi_pack_index *m,
>  				const char *idx_or_pack_name)
>  {
>  	uint32_t first = 0, last = m->num_packs;
>  
> +	if (!m->pack_names_sorted) {
> +		uint32_t i;
> +
> +		ALLOC_ARRAY(m->pack_names_sorted, m->num_packs);
> +
> +		for (i = 0; i < m->num_packs; i++)
> +			m->pack_names_sorted[i] = i;
> +
> +		QSORT_S(m->pack_names_sorted, m->num_packs, midx_pack_names_cmp,
> +			m);
> +	}
> +
>  	while (first < last) {
>  		uint32_t mid = first + (last - first) / 2;
>  		const char *current;
>  		int cmp;
>  
> -		current = m->pack_names[mid];
> +		current = m->pack_names[m->pack_names_sorted[mid]];
>  		cmp = cmp_idx_or_pack_name(idx_or_pack_name, current);
>  		if (!cmp)
>  			return 1;

I assume that it cannot happen that we append to the array of MIDX'd
packs after we have sorted. It would mean that the MIDX somehow changed
its representation or was amended to, which isn't possible.

Patrick 

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 10/17] git-compat-util.h: introduce `u32_add()`
  2025-12-06 20:31 ` [PATCH 10/17] git-compat-util.h: introduce `u32_add()` Taylor Blau
@ 2025-12-08 18:27   ` Patrick Steinhardt
  2025-12-09  2:13     ` Taylor Blau
  0 siblings, 1 reply; 99+ messages in thread
From: Patrick Steinhardt @ 2025-12-08 18:27 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano

On Sat, Dec 06, 2025 at 03:31:28PM -0500, Taylor Blau wrote:
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 398e0fac4fa..a7aa5f05fc9 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -670,6 +670,14 @@ static inline int cast_size_t_to_int(size_t a)
>  	return (int)a;
>  }
>  
> +static inline uint32_t u32_add(uint32_t a, uint32_t b)
> +{
> +	if (unsigned_add_overflows(a, b))
> +		die("uint32_t overflow: %"PRIuMAX" + %"PRIuMAX,
> +		    (uintmax_t)a, (uintmax_t)b);
> +	return a + b;
> +}

We already use PRIu32 in our codebase, so why is the cast necessary?

Patrick

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 13/17] midx-write.c: enumerate `pack_int_id` values directly
  2025-12-06 20:31 ` [PATCH 13/17] midx-write.c: enumerate `pack_int_id` values directly Taylor Blau
@ 2025-12-08 18:27   ` Patrick Steinhardt
  2025-12-09  2:14     ` Taylor Blau
  0 siblings, 1 reply; 99+ messages in thread
From: Patrick Steinhardt @ 2025-12-08 18:27 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano

On Sat, Dec 06, 2025 at 03:31:37PM -0500, Taylor Blau wrote:
> Our `midx-write.c::fill_packs_from_midx()` function currently enumerates
> the range [0, m->num_packs), and then shifts its index variable up by
> `m->num_packs_in_base` to produce a valid `pack_int_id`.
> 
> Instead, directly enumerate the range:
> 
>     [m->num_packs_in_base, m->num_packs_in_base + m->num_packs)
> 
> , which are the original pack_int_ids themselves as opposed to the
> indexes of those packs relative to the MIDX layer they are contained
> within.

Sensible. I was confused a bit by the previous change because I couldn't
quite spot the shift happening. I think this makes things a bit easier
to read.

Patrick

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 15/17] t/helper/test-read-midx.c: plug memory leak when selecting layer
  2025-12-06 20:31 ` [PATCH 15/17] t/helper/test-read-midx.c: plug memory leak when selecting layer Taylor Blau
@ 2025-12-08 18:27   ` Patrick Steinhardt
  2025-12-09  2:16     ` Taylor Blau
  0 siblings, 1 reply; 99+ messages in thread
From: Patrick Steinhardt @ 2025-12-08 18:27 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano

On Sat, Dec 06, 2025 at 03:31:43PM -0500, Taylor Blau wrote:
> diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
> index dee603b3cd0..6e03aabca79 100644
> --- a/t/helper/test-read-midx.c
> +++ b/t/helper/test-read-midx.c
> @@ -26,9 +26,10 @@ static int read_midx_file(const char *object_dir, const char *checksum,
>  			  int show_objects)
>  {
>  	uint32_t i;
> -	struct multi_pack_index *m;
> +	struct multi_pack_index *m, *tip;
> +	int ret = 0;
>  
> -	m = setup_midx(object_dir);
> +	m = tip = setup_midx(object_dir);
>  
>  	if (!m)
>  		return 1;

I was briefly wondering whether we should also convert this into a `goto
out`. It's of course not required, as setting up the MIDX has just
failed. But it would simplfy things a bit as we now have a single exit
path, only.

I don't mind this too much though.

> @@ -36,8 +37,11 @@ static int read_midx_file(const char *object_dir, const char *checksum,
>  	if (checksum) {
>  		while (m && strcmp(get_midx_checksum(m), checksum))
>  			m = m->base_midx;
> -		if (!m)
> -			return 1;
> +		if (!m) {
> +			ret = error(_("could not find MIDX with checksum %s"),
> +				    checksum);
> +			goto out;
> +		}
>  	}
>  
>  	printf("header: %08x %d %d %d %d\n",

We change the return code from 1 to -1, but that ultimately shouldn't
matter much.

I'll stop reviewing here and will have a look at the remaining two
patches with some fresh eyes. But so far this was a nice read, thanks!

Patrick

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 01/17] midx: mark `get_midx_checksum()` arguments as const
  2025-12-08 18:26   ` Patrick Steinhardt
@ 2025-12-09  1:41     ` Taylor Blau
  0 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2025-12-09  1:41 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano

On Mon, Dec 08, 2025 at 07:26:45PM +0100, Patrick Steinhardt wrote:
> On Sat, Dec 06, 2025 at 03:31:01PM -0500, Taylor Blau wrote:
> > To make clear that the fucntion `get_midx_checksum()` does not do
>
> Nit: s/fucntion/functino

s/functino/function, but otherwise ;-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 02/17] midx: split `get_midx_checksum()` by adding `get_midx_hash()`
  2025-12-08 18:25   ` Patrick Steinhardt
@ 2025-12-09  1:42     ` Taylor Blau
  2025-12-09  1:50       ` Taylor Blau
  0 siblings, 1 reply; 99+ messages in thread
From: Taylor Blau @ 2025-12-09  1:42 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano

On Mon, Dec 08, 2025 at 07:25:05PM +0100, Patrick Steinhardt wrote:
> If I didn't have the context of this patch series I would be wondering
> what the actual difference between `get_midx_checksum()` and
> `get_midx_hash()` is. The way the functions are named seems to rather
> indicate that we talk about two different kinds of hashes, rather than
> two different ways to encode them.
>
> Would it maybe be preferable to call them `get_midx_checksum()` and
> `get_midx_checksum_hex()`? While at it, we could go even further and
> rename them to `midx_get_checksum()` and `midx_get_checksum_hex()` to
> conform to our modern best practices.

Yeah, I think those are both reasonable suggestions; I'll apply those
locally, thanks!

> Patrick

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 02/17] midx: split `get_midx_checksum()` by adding `get_midx_hash()`
  2025-12-09  1:42     ` Taylor Blau
@ 2025-12-09  1:50       ` Taylor Blau
  2025-12-09  6:27         ` Patrick Steinhardt
  0 siblings, 1 reply; 99+ messages in thread
From: Taylor Blau @ 2025-12-09  1:50 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano

On Mon, Dec 08, 2025 at 08:42:23PM -0500, Taylor Blau wrote:
> On Mon, Dec 08, 2025 at 07:25:05PM +0100, Patrick Steinhardt wrote:
> > If I didn't have the context of this patch series I would be wondering
> > what the actual difference between `get_midx_checksum()` and
> > `get_midx_hash()` is. The way the functions are named seems to rather
> > indicate that we talk about two different kinds of hashes, rather than
> > two different ways to encode them.
> >
> > Would it maybe be preferable to call them `get_midx_checksum()` and
> > `get_midx_checksum_hex()`? While at it, we could go even further and
> > rename them to `midx_get_checksum()` and `midx_get_checksum_hex()` to
> > conform to our modern best practices.
>
> Yeah, I think those are both reasonable suggestions; I'll apply those
> locally, thanks!

Hmm. Upon further thinking, I wonder which function should be named
which.

I think the _checksum() variant suggests that it returns the non-hex
encoded form, while the _hex() variant suggests the opposite.
Unfortunately, the latter is both more commonly used and more characters
to type ;-).

I wonder if there are shorter names available. Perhaps
midx_get_checksum() and midx_get_checksum_raw()?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 07/17] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos`
  2025-12-08 18:26   ` Patrick Steinhardt
@ 2025-12-09  1:59     ` Taylor Blau
  0 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2025-12-09  1:59 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano

On Mon, Dec 08, 2025 at 07:26:27PM +0100, Patrick Steinhardt wrote:
> On Sat, Dec 06, 2025 at 03:31:19PM -0500, Taylor Blau wrote:
> > In midx_pack_order(), we compute for each bitampped pack the first bit
>
> s/bitampped/bitmapped/

Ugh. My "bitamp" typo strikes again, thanks for spotting!

> > to correspond to an object in that pack, along with how many bits were
> > assigned to object(s) in that pack.
> >
> > Initially, each bitmap_nr value is set to zero, and each bitmap_pos
>
> I assume `bitmap_nr` is the number of bits, whereas `bitmap_pos` is the
> position of the first bit?

That's right!

> > However, we enumerate the bitmapped packs in order of `ctx->pack_perm`.
>
> Which is the "permutation between pack-int-ids from the previous
> multi-pack-index to the new one we are writing"'. So it's basically
> tracking which new packs correspond to the old packs.

Ditto.

> So obviously, the permutation will only ever be different in case we've
> got at least one dropped pack, and that only happens when we expire any
> packs. So the explanation matches.
>
> Of course it may be a bit more fragile now if we ever added a caller
> of this function that _does_ expire data. But we don't have any, so that
> enters the territory of overthinking things.

I think that with incremental MIDXs we will never have such a caller
without a mechanism to tombstone objects in existing packs, but
definitely worth calling out.

> > diff --git a/midx-write.c b/midx-write.c
> > index 73d24fabbc6..c30f6a70d37 100644
> > --- a/midx-write.c
> > +++ b/midx-write.c
> > @@ -637,7 +637,7 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx)
> >  		pack_order[i] = data[i].nr;
> >  	}
> >  	for (i = 0; i < ctx->nr; i++) {
> > -		struct pack_info *pack = &ctx->info[ctx->pack_perm[i]];
> > +		struct pack_info *pack = &ctx->info[i];
> >  		if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
> >  			pack->bitmap_pos = 0;
> >  	}
>
> The change looks simple enough.

Yeah, I almost wonder if the commit message was more harmful than not.
The main points that I wanted to get across were:

 - Ultimately we want to enumerate a list, and there's no reason to do
   that in a permuted order.

 - Iterating in that permuted order is fine today because the array of
   values in ctx->pack_perm are always addressable indices into
   ctx->info.

 - That won't be the case in the future when we are combining packs from
   MIDX layers that have a non-zero m->num_packs_in_base, so adjusting
   the implementation now prevents us from running into that pitfall in
   such a future.

Let me know if you think that I should adjust the commit message here.
It's hard to know whether something resembling the above is better or
worse than the current version of the commit message from a reviewer's
perspective, so I'm happy to do whatever you think is cleaner ;-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 08/17] midx-write.c: introduce `struct write_midx_opts`
  2025-12-08 18:26   ` Patrick Steinhardt
@ 2025-12-09  2:04     ` Taylor Blau
  0 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2025-12-09  2:04 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano

On Mon, Dec 08, 2025 at 07:26:36PM +0100, Patrick Steinhardt wrote:
> One might argue that parameters which _must_ be passed could be moved
> out of the structure and into the function signature, and as far as I
> understand, that would only be the `struct odb_source`. After all, we
> are talking about options, and a mandatory field is not really an option
> in my book. It also makes the interface at least a tiny bit more self
> documenting.
>
> Other than that this patch looks like a nice improvement to me.

I think that's a reasonable consideration. My preference would be to
keep everything contained within the 'struct write_midx_opts' since it
makes it really easy to pass everything you might need around to other
sub-routines by just passing a single pointer.

So I'm inclined to keep the new API as-is presented here, but I'm happy
to discuss changing it around if you feel strongly about it.

As a reasonable middle-ground, I added a "/* non-optional */" next to
the "source" member within the new structure.

> > @@ -1566,8 +1586,11 @@ int expire_midx_packs(struct odb_source *source, unsigned flags)
> >  	free(count);
> >
> >  	if (packs_to_drop.nr)
> > -		result = write_midx_internal(source, NULL,
> > -					     &packs_to_drop, NULL, NULL, flags);
> > +		result = write_midx_internal(&(struct write_midx_opts) {
> > +					     .source = source,
> > +					     .packs_to_drop = &packs_to_drop,
> > +					     .flags = flags & MIDX_PROGRESS,
> > +					     });
> >
> >  	string_list_clear(&packs_to_drop, 0);
> >
>
> I think this syntax is not allowed in our codebase except for a test
> balloon just yet. See aso 9b2527caa4 (CodingGuidelines: document test
> balloons in flight, 2025-07-23):
>
>     since late 2024 with v2.48.0-rc0~20, we have test balloons for
>     compound literal syntax, e.g., (struct foo){ .member = value };
>     our hope is that no platforms we care about have trouble using
>     them, and officially adopt its wider use in mid 2026.  Do not add
>     more use of the syntax until that happens.
>
> > @@ -1774,8 +1797,10 @@ int midx_repack(struct odb_source *source, size_t batch_size, unsigned flags)
> >  		goto cleanup;
> >  	}
> >
> > -	result = write_midx_internal(source, NULL, NULL, NULL, NULL,
> > -				     flags);
> > +	result = write_midx_internal(&(struct write_midx_opts) {
> > +				     .source = source,
> > +				     .flags = flags,
> > +				     });
>
> Same here.

Hah, I even remember checking to make sure there wasn't such a test
balloon and then thinking that I'd need to adjust before sending. That
must have been just before I got up from my desk, and I must have
forgotten about it until now. Fixed up locally, thanks for spotting!

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 09/17] midx: do not require packs to be sorted in lexicographic order
  2025-12-08 18:26   ` Patrick Steinhardt
@ 2025-12-09  2:07     ` Taylor Blau
  2025-12-09  2:11       ` Taylor Blau
  0 siblings, 1 reply; 99+ messages in thread
From: Taylor Blau @ 2025-12-09  2:07 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano

On Mon, Dec 08, 2025 at 07:26:53PM +0100, Patrick Steinhardt wrote:
> On Sat, Dec 06, 2025 at 03:31:25PM -0500, Taylor Blau wrote:
> > Note that this produces MIDXs which may be incompatible with earlier
> > versions of Git that have stricter requirements on the layout of packs
> > within a MIDX. This patch does *not* modify the version number of the
> > MIDX format, since existing versions of Git already know to gracefully
> > ignore a MIDX with packs that appear out-of-order.
>
> Interesting. Did you verify how other implementations of Git behave if
> we start to relax this requirement? It seems like a somewhat dangerous
> assumption to me that this will just continue to work.

That's a great point. It looks like current libgit2 assumes[1] that the
list is sorted and complains loudly if it is not. Presumably other
implementations behave similarly.

I think that is a compelling enough argument to swing us towards
bumping the version number to avoid compatibility issues.

> Also, is there a reason why you prefer this over bumping the version
> number?

I was trying to avoid having all existing Git clients be unable to read
v2 MIDXs for such a seemingly minor change, but I think the above
compels us to.

> I assume that it cannot happen that we append to the array of MIDX'd
> packs after we have sorted. It would mean that the MIDX somehow changed
> its representation or was amended to, which isn't possible.

That's right.

Thanks,
Taylor

[1]: https://github.com/libgit2/libgit2/blob/v1.9.2/src/libgit2/midx.c#L75-L76

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 09/17] midx: do not require packs to be sorted in lexicographic order
  2025-12-09  2:07     ` Taylor Blau
@ 2025-12-09  2:11       ` Taylor Blau
  0 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2025-12-09  2:11 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano

On Mon, Dec 08, 2025 at 09:07:33PM -0500, Taylor Blau wrote:
> On Mon, Dec 08, 2025 at 07:26:53PM +0100, Patrick Steinhardt wrote:
> > On Sat, Dec 06, 2025 at 03:31:25PM -0500, Taylor Blau wrote:
> > > Note that this produces MIDXs which may be incompatible with earlier
> > > versions of Git that have stricter requirements on the layout of packs
> > > within a MIDX. This patch does *not* modify the version number of the
> > > MIDX format, since existing versions of Git already know to gracefully
> > > ignore a MIDX with packs that appear out-of-order.
> >
> > Interesting. Did you verify how other implementations of Git behave if
> > we start to relax this requirement? It seems like a somewhat dangerous
> > assumption to me that this will just continue to work.
>
> That's a great point. It looks like current libgit2 assumes[1] that the
> list is sorted and complains loudly if it is not. Presumably other
> implementations behave similarly.
>
> I think that is a compelling enough argument to swing us towards
> bumping the version number to avoid compatibility issues.

I had another thought about how we might work around this without
forcing a compatibility issue, but it's a non-starter. I wanted to share
it on the list for posterity regardless.

I was going to add that we could instead consider adding a new chunk to
the MIDX format that lists the pack names in the order that they should
appear in the pseudo-pack order. Absent of that chunk, the pseudo-pack
order would be defined by the lexicographic order of pack names. If the
chunk exists, it would supersede that ordering.

But that just kicks the can down the road, since implementations like
libgit2 would think that they could read a *.midx file, but then they'd
produce all sorts of errors when trying to read its corresponding
*.bitmap file by permuting its bits out-of-order.

(I'm not sure off-hand whether or not libgit2 supports reading MIDX
bitmaps to begin with. Regardless, we should not introduce the
possibility for such a breakage in clients that *do* support reading
MIDX bitmaps, whether or not libgit2 is such a client.)

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 10/17] git-compat-util.h: introduce `u32_add()`
  2025-12-08 18:27   ` Patrick Steinhardt
@ 2025-12-09  2:13     ` Taylor Blau
  0 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2025-12-09  2:13 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano

On Mon, Dec 08, 2025 at 07:27:01PM +0100, Patrick Steinhardt wrote:
> On Sat, Dec 06, 2025 at 03:31:28PM -0500, Taylor Blau wrote:
> > diff --git a/git-compat-util.h b/git-compat-util.h
> > index 398e0fac4fa..a7aa5f05fc9 100644
> > --- a/git-compat-util.h
> > +++ b/git-compat-util.h
> > @@ -670,6 +670,14 @@ static inline int cast_size_t_to_int(size_t a)
> >  	return (int)a;
> >  }
> >
> > +static inline uint32_t u32_add(uint32_t a, uint32_t b)
> > +{
> > +	if (unsigned_add_overflows(a, b))
> > +		die("uint32_t overflow: %"PRIuMAX" + %"PRIuMAX,
> > +		    (uintmax_t)a, (uintmax_t)b);
> > +	return a + b;
> > +}
>
> We already use PRIu32 in our codebase, so why is the cast necessary?

I don't think it is; we could easily write this as:

    die("uint32_t overflow: %"PRIu32" + %"PRIu32, a, b);

instead, but this matches the convention of other similar functions in
the compat-util header.

(It's possible that there is some reasoning here that using PRIuMAX
really *is* necessary, but it isn't clear to me that's the case.)

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 13/17] midx-write.c: enumerate `pack_int_id` values directly
  2025-12-08 18:27   ` Patrick Steinhardt
@ 2025-12-09  2:14     ` Taylor Blau
  0 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2025-12-09  2:14 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano

On Mon, Dec 08, 2025 at 07:27:08PM +0100, Patrick Steinhardt wrote:
> On Sat, Dec 06, 2025 at 03:31:37PM -0500, Taylor Blau wrote:
> > Our `midx-write.c::fill_packs_from_midx()` function currently enumerates
> > the range [0, m->num_packs), and then shifts its index variable up by
> > `m->num_packs_in_base` to produce a valid `pack_int_id`.
> >
> > Instead, directly enumerate the range:
> >
> >     [m->num_packs_in_base, m->num_packs_in_base + m->num_packs)
> >
> > , which are the original pack_int_ids themselves as opposed to the
> > indexes of those packs relative to the MIDX layer they are contained
> > within.
>
> Sensible. I was confused a bit by the previous change because I couldn't
> quite spot the shift happening. I think this makes things a bit easier
> to read.

I'm glad that the end result was more pleasing. I have gone back and
forth whether to enumerate [0, m->num_packs) and shift, or to enumerate
the pack_int_ids directly, so it's helpful to know what style others
prefer.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 15/17] t/helper/test-read-midx.c: plug memory leak when selecting layer
  2025-12-08 18:27   ` Patrick Steinhardt
@ 2025-12-09  2:16     ` Taylor Blau
  0 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2025-12-09  2:16 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano

On Mon, Dec 08, 2025 at 07:27:16PM +0100, Patrick Steinhardt wrote:
> > @@ -36,8 +37,11 @@ static int read_midx_file(const char *object_dir, const char *checksum,
> >  	if (checksum) {
> >  		while (m && strcmp(get_midx_checksum(m), checksum))
> >  			m = m->base_midx;
> > -		if (!m)
> > -			return 1;
> > +		if (!m) {
> > +			ret = error(_("could not find MIDX with checksum %s"),
> > +				    checksum);
> > +			goto out;
> > +		}
> >  	}
> >
> >  	printf("header: %08x %d %d %d %d\n",
>
> We change the return code from 1 to -1, but that ultimately shouldn't
> matter much.

Yeah; I think that returning negative values here makes more sense, and
use of error() encourages that pattern, hence the change here.

> I'll stop reviewing here and will have a look at the remaining two
> patches with some fresh eyes. But so far this was a nice read, thanks!

Thanks for the review thus far! I look forward to your thoughts on the
remainder of the series. In related news, I owe you some review on your
'pks/skip-noop-rewrite' patches, which I hope to get to tomorrow.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 02/17] midx: split `get_midx_checksum()` by adding `get_midx_hash()`
  2025-12-09  1:50       ` Taylor Blau
@ 2025-12-09  6:27         ` Patrick Steinhardt
  2026-01-13 22:46           ` Taylor Blau
  0 siblings, 1 reply; 99+ messages in thread
From: Patrick Steinhardt @ 2025-12-09  6:27 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano

On Mon, Dec 08, 2025 at 08:50:08PM -0500, Taylor Blau wrote:
> On Mon, Dec 08, 2025 at 08:42:23PM -0500, Taylor Blau wrote:
> > On Mon, Dec 08, 2025 at 07:25:05PM +0100, Patrick Steinhardt wrote:
> > > If I didn't have the context of this patch series I would be wondering
> > > what the actual difference between `get_midx_checksum()` and
> > > `get_midx_hash()` is. The way the functions are named seems to rather
> > > indicate that we talk about two different kinds of hashes, rather than
> > > two different ways to encode them.
> > >
> > > Would it maybe be preferable to call them `get_midx_checksum()` and
> > > `get_midx_checksum_hex()`? While at it, we could go even further and
> > > rename them to `midx_get_checksum()` and `midx_get_checksum_hex()` to
> > > conform to our modern best practices.
> >
> > Yeah, I think those are both reasonable suggestions; I'll apply those
> > locally, thanks!
> 
> Hmm. Upon further thinking, I wonder which function should be named
> which.
> 
> I think the _checksum() variant suggests that it returns the non-hex
> encoded form, while the _hex() variant suggests the opposite.
> Unfortunately, the latter is both more commonly used and more characters
> to type ;-).
> 
> I wonder if there are shorter names available. Perhaps
> midx_get_checksum() and midx_get_checksum_raw()?

It's only four more characters to type the `_hex()` variant, and it is
in line with the interfaces we've got in "hex.h". So personally I'd
still prefer to go with `_hex()`.

Patrick

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 16/17] midx: implement MIDX compaction
  2025-12-06 20:31 ` [PATCH 16/17] midx: implement MIDX compaction Taylor Blau
@ 2025-12-09  7:21   ` Patrick Steinhardt
  2026-01-13 23:32     ` Taylor Blau
  0 siblings, 1 reply; 99+ messages in thread
From: Patrick Steinhardt @ 2025-12-09  7:21 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano

On Sat, Dec 06, 2025 at 03:31:47PM -0500, Taylor Blau wrote:
> diff --git a/Documentation/git-multi-pack-index.adoc b/Documentation/git-multi-pack-index.adoc
> index 164cf1f2291..a9664e77411 100644
> --- a/Documentation/git-multi-pack-index.adoc
> +++ b/Documentation/git-multi-pack-index.adoc
> @@ -12,6 +12,8 @@ SYNOPSIS
>  'git multi-pack-index' [<options>] write [--preferred-pack=<pack>]
>  		         [--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]
>  		         [--refs-snapshot=<path>]
> +'git multi-pack-index' [<options>] compact [--[no-]incremental]
> +		         <from> <to>
>  'git multi-pack-index' [<options>] verify
>  'git multi-pack-index' [<options>] expire
>  'git multi-pack-index' [<options>] repack [--batch-size=<size>]
> @@ -83,6 +85,17 @@ marker).
>  		necessary.
>  --
>  
> +compact::
> +	Write a new MIDX layer containing only objects and packs present
> +	in the range `<from>` to `<to>`, where both arguments are
> +	checksums of existing layers in the MIDX chain.
> ++
> +--
> +	--incremental::
> +		Write the result to a MIDX chain instead of writing a
> +		stand-alone MIDX. Incompatible with `--bitmap`.

Interesting. What would happen if you compact a subrange of the MIDX
chain without incremental? Would the MIDX be completely replaced with a
MIDX that only covers these packs?

Also, the "--bitmap" flag does not exist yet, so the second sentence
probably needs to be introduced in the next commit.

> diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
> index c0c6c1760c0..9b0c2082cb3 100644
> --- a/builtin/multi-pack-index.c
> +++ b/builtin/multi-pack-index.c
> @@ -195,6 +204,63 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
>  	return ret;
>  }
>  
> +static int cmd_multi_pack_index_compact(int argc, const char **argv,
> +					const char *prefix,
> +					struct repository *repo)
> +{
> +	struct multi_pack_index *m, *cur;
> +	struct multi_pack_index *from_midx = NULL;
> +	struct multi_pack_index *to_midx = NULL;
> +	struct odb_source *source;
> +	int ret;
> +
> +	struct option *options;
> +	static struct option builtin_multi_pack_index_compact_options[] = {
> +		OPT_BIT(0, "incremental", &opts.flags,
> +			N_("write a new incremental MIDX"), MIDX_WRITE_INCREMENTAL),
> +		OPT_END(),
> +	};
> +
> +	repo_config(repo, git_multi_pack_index_write_config, NULL);
> +
> +	options = add_common_options(builtin_multi_pack_index_compact_options);
> +
> +	trace2_cmd_mode(argv[0]);
> +
> +	if (isatty(2))
> +		opts.flags |= MIDX_PROGRESS;
> +	argc = parse_options(argc, argv, prefix,
> +			     options, builtin_multi_pack_index_compact_usage,
> +			     0);
> +
> +	if (argc != 2)
> +		usage_with_options(builtin_multi_pack_index_compact_usage,
> +				   options);
> +	source = handle_object_dir_option(the_repository);
> +
> +	FREE_AND_NULL(options);
> +
> +	m = get_multi_pack_index(source);
> +
> +	for (cur = m; cur && !(from_midx && to_midx); cur = cur->base_midx) {
> +		const char *midx_csum = get_midx_checksum(cur);
> +
> +		if (!from_midx && !strcmp(midx_csum, argv[0]))
> +			from_midx = cur;
> +		if (!to_midx && !strcmp(midx_csum, argv[1]))
> +			to_midx = cur;
> +	}
> +
> +	if (!from_midx)
> +		die(_("could not find MIDX 'from': %s"), argv[0]);
> +	if (!to_midx)
> +		die(_("could not find MIDX 'to': %s"), argv[1]);
> +
> +	ret = write_midx_file_compact(source, from_midx, to_midx, opts.flags);
> +
> +	return ret;
> +}

Is it valid if `from_midx == to_midx`?

> diff --git a/midx-write.c b/midx-write.c
> index 7854561359d..fcbfedcd913 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -953,6 +980,72 @@ static int fill_packs_from_midx(struct write_midx_context *ctx)
>  	return 0;
>  }
>  
> +static uint32_t compactible_packs_between(const struct multi_pack_index *from,
> +					  const struct multi_pack_index *to)
> +{
> +	uint32_t nr;
> +
> +	ASSERT(from && to);
> +
> +	nr = u32_add(to->num_packs, to->num_packs_in_base);
> +	if (nr < from->num_packs_in_base)
> +		BUG("unexpected number of packs in base during compaction: "
> +		    "%"PRIu32" < %"PRIu32, nr, from->num_packs_in_base);
> +
> +	return nr - from->num_packs_in_base;
> +}
> +
> +static int fill_packs_from_midx_range(struct write_midx_context *ctx,
> +				      int bitmap_order)
> +{
> +	struct multi_pack_index *m = ctx->compact_to;
> +	uint32_t packs_nr;
> +
> +	ASSERT(ctx->compact && !ctx->nr);
> +	ASSERT(ctx->compact_from);
> +	ASSERT(ctx->compact_to);
> +
> +	packs_nr = compactible_packs_between(ctx->compact_from,
> +					     ctx->compact_to);
> +
> +	ALLOC_GROW(ctx->info, packs_nr, ctx->alloc);
> +
> +	while (m != ctx->compact_from->base_midx) {
> +		uint32_t pack_int_id, preferred_pack_id;
> +		uint32_t i;
> +
> +		if (bitmap_order) {
> +			if (midx_preferred_pack(m, &preferred_pack_id) < 0)
> +				die(_("could not determine preferred pack"));

`midx_preferred_pack()` only returns a valid pack ID in case we've got a
reverse index, and as far as I understand we seem to only generate those
when computing bitmaps. I assume that this means that we can only
compact MIDX layers in bitmap order if they already were in bitmap order
before?

That would at least also make sense. We of course cannot randomly change
the order in the middle of our layers, as that would break later layers
that build on top.

> +		} else {
> +			preferred_pack_id = m->num_packs_in_base;
> +		}
> +
> +		pack_int_id = m->num_packs_in_base - ctx->compact_from->num_packs_in_base;
> +
> +		if (fill_pack_from_midx(&ctx->info[pack_int_id++], m,
> +					preferred_pack_id) < 0)
> +			return -1;
> +
> +		for (i = m->num_packs_in_base;
> +		     i < m->num_packs_in_base + m->num_packs; i++) {
> +			if (preferred_pack_id == i)
> +				continue;
> +
> +			if (fill_pack_from_midx(&ctx->info[pack_int_id++], m,
> +						i) < 0)
> +				return -1;
> +		}
> +

So the condition that should hold after this loop is `pack_int_id ==
m->num_packs`. Which is somewhat obvious: we skip one pack, but that
pack is the preferred pack that we have populated first.

> @@ -1101,11 +1216,18 @@ static int write_midx_internal(struct write_midx_opts *opts)
>  			 */
>  			if (ctx.incremental)
>  				ctx.base_midx = m;
> -			else if (!opts->packs_to_include)
> +			if (!opts->packs_to_include)
>  				ctx.m = m;

I'm a bit surprised by this change here. I would've expected that we
never pass `packs_to_include` when compacting, so why is this change
necessary?

>  		}
>  	}
>  
> +	/*
> +	 * If compacting MIDX layer(s) in the range [from, to], then the
> +	 * compacted MIDX will share the same base MIDX as 'from'.
> +	 */
> +	if (ctx.compact)
> +		ctx.base_midx = ctx.compact_from->base_midx;

Okay, here we overwrite `ctx.base_midx` that we might've already set in
the condition above. It's a bit confusing, doubly so because we may be
warning about the invalid MIDX and claim to ignore it, but ultimately we
don't.

> diff --git a/t/t5335-compact-multi-pack-index.sh b/t/t5335-compact-multi-pack-index.sh
> new file mode 100755
> index 00000000000..f889af7fb1d
> --- /dev/null
> +++ b/t/t5335-compact-multi-pack-index.sh
> @@ -0,0 +1,102 @@
> +#!/bin/sh
> +
> +test_description='multi-pack-index compaction'
> +
> +. ./test-lib.sh
> +
> +GIT_TEST_MULTI_PACK_INDEX=0
> +GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
> +GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=0
> +
> +objdir=.git/objects
> +packdir=$objdir/pack
> +midxdir=$packdir/multi-pack-index.d
> +midx_chain=$midxdir/multi-pack-index-chain
> +
> +nth_line() {
> +	local n="$1"
> +	shift
> +	awk "NR==$n" "$@"
> +}
> +
> +write_packs () {
> +	for c in "$@"
> +	do
> +		test_commit "$c" &&

Nit: it might be sensible to disable housekeeping here. You strongly
depend on the on-disk shape of the objects, so if you by chance wrote
two objects starting with "17" we'd end up repacking and racing.

I've also got an upcoming patch series in mindthat I've got cooking to
make geometric compaction the default for auto-maintenance. We've got
many test suites that implicitly rely on the current algorithm used by
git-gc(1), so I'd love to avoid adding more.

[snip]
> +test_expect_success 'MIDX compaction with lex-ordered pack names' '
> +	git init midx-compact-lex-order &&
> +	(
> +		cd midx-compact-lex-order &&
> +
> +		write_packs A B C D E &&
> +		test_line_count = 5 $midx_chain &&
> +
> +		git multi-pack-index compact --incremental \
> +			"$(nth_line 2 "$midx_chain")" \
> +			"$(nth_line 4 "$midx_chain")" &&
> +		test_line_count = 3 $midx_chain &&
> +
> +		test_midx_layer_packs "$(nth_line 1 "$midx_chain")" A &&
> +		test_midx_layer_packs "$(nth_line 2 "$midx_chain")" B C D &&
> +		test_midx_layer_packs "$(nth_line 3 "$midx_chain")" E &&
> +
> +		test_midx_layer_object_uniqueness
> +	)
> +'

It would be nice to also test for requests that don't make sense: "from"
larger than "to", "from == to", missing "from" or "foo" and so on.

Patrick

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 17/17] midx: enable reachability bitmaps during MIDX compaction
  2025-12-06 20:31 ` [PATCH 17/17] midx: enable reachability bitmaps during " Taylor Blau
@ 2025-12-09  7:21   ` Patrick Steinhardt
  2026-01-13 23:47     ` Taylor Blau
  0 siblings, 1 reply; 99+ messages in thread
From: Patrick Steinhardt @ 2025-12-09  7:21 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano

On Sat, Dec 06, 2025 at 03:31:50PM -0500, Taylor Blau wrote:
> Enable callers to generate reachability bitmaps when performing MIDX
> layer compaction by combining all existing bitmaps from the compacted
> layers.
> 
> Note that the because of the object/pack ordering described by the

s/that the because/that because/

> previous commit, the pseudo-pack order for the compacted MIDX is the
> same as concatenating the individual pseudo-pack orderings for each
> layer in the compaction range.
> 
> As a result, the only non-test or documentation change necessary is to
> treat all objects as non-preferred during compaction so as not to
> disturb the object ordering.
> 
> In the future, we may want to adjust which commit(s) receive
> reachability bitmaps when compacting multiple .bitmap files into one, or
> even generate new bitmaps (e.g., if the references have moved
> significantly since the .bitmap was generated). This commit only
> implements combining all existing bitmaps in range together in order to
> demonstrate and lay the groundwork for more exotic strategies.

Will there also be a follow-up patch series that introduces geometric
repacking for multi-pack indices?

> @@ -216,6 +216,8 @@ static int cmd_multi_pack_index_compact(int argc, const char **argv,
>  
>  	struct option *options;
>  	static struct option builtin_multi_pack_index_compact_options[] = {
> +		OPT_BIT(0, "bitmap", &opts.flags, N_("write multi-pack bitmap"),
> +			MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
>  		OPT_BIT(0, "incremental", &opts.flags,
>  			N_("write a new incremental MIDX"), MIDX_WRITE_INCREMENTAL),
>  		OPT_END(),

Is this new flag actually incompatible with the incremental flag like
you claimed in the preceding commit? I had the impression that it should
be possible to write incremental bitmaps now.

If that's not the case, we should probably have a call to
`die_for_imcopatible_opt2()` somewhere.

Patrick

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 02/17] midx: split `get_midx_checksum()` by adding `get_midx_hash()`
  2025-12-09  6:27         ` Patrick Steinhardt
@ 2026-01-13 22:46           ` Taylor Blau
  0 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-01-13 22:46 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano

On Tue, Dec 09, 2025 at 07:27:49AM +0100, Patrick Steinhardt wrote:
> It's only four more characters to type the `_hex()` variant, and it is
> in line with the interfaces we've got in "hex.h". So personally I'd
> still prefer to go with `_hex()`.

I think that's a fair point. I am not sure that I go along with the
"it's only four more characters" argument, but I would like to be
consistent with existing APIs above all else. I think the choice we made
in "hex.h" is a good one, so let's stick with that.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 16/17] midx: implement MIDX compaction
  2025-12-09  7:21   ` Patrick Steinhardt
@ 2026-01-13 23:32     ` Taylor Blau
  0 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-01-13 23:32 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano

On Tue, Dec 09, 2025 at 08:21:32AM +0100, Patrick Steinhardt wrote:
> On Sat, Dec 06, 2025 at 03:31:47PM -0500, Taylor Blau wrote:
> > diff --git a/Documentation/git-multi-pack-index.adoc b/Documentation/git-multi-pack-index.adoc
> > index 164cf1f2291..a9664e77411 100644
> > --- a/Documentation/git-multi-pack-index.adoc
> > +++ b/Documentation/git-multi-pack-index.adoc
> > @@ -12,6 +12,8 @@ SYNOPSIS
> >  'git multi-pack-index' [<options>] write [--preferred-pack=<pack>]
> >  		         [--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]
> >  		         [--refs-snapshot=<path>]
> > +'git multi-pack-index' [<options>] compact [--[no-]incremental]
> > +		         <from> <to>
> >  'git multi-pack-index' [<options>] verify
> >  'git multi-pack-index' [<options>] expire
> >  'git multi-pack-index' [<options>] repack [--batch-size=<size>]
> > @@ -83,6 +85,17 @@ marker).
> >  		necessary.
> >  --
> >
> > +compact::
> > +	Write a new MIDX layer containing only objects and packs present
> > +	in the range `<from>` to `<to>`, where both arguments are
> > +	checksums of existing layers in the MIDX chain.
> > ++
> > +--
> > +	--incremental::
> > +		Write the result to a MIDX chain instead of writing a
> > +		stand-alone MIDX. Incompatible with `--bitmap`.
>
> Interesting. What would happen if you compact a subrange of the MIDX
> chain without incremental? Would the MIDX be completely replaced with a
> MIDX that only covers these packs?

That's right.

> Also, the "--bitmap" flag does not exist yet, so the second sentence
> probably needs to be introduced in the next commit.

Ah, great catch -- I removed that line here. I don't think it needs to
be readded in the following commit, though, since that patch introduces
"--bitmap" and makes it compatible with MIDX compaction.

> > +	if (!from_midx)
> > +		die(_("could not find MIDX 'from': %s"), argv[0]);
> > +	if (!to_midx)
> > +		die(_("could not find MIDX 'to': %s"), argv[1]);
> > +
> > +	ret = write_midx_file_compact(source, from_midx, to_midx, opts.flags);
> > +
> > +	return ret;
> > +}
>
> Is it valid if `from_midx == to_midx`?

Yes, that would result in a noop write.

> > +	while (m != ctx->compact_from->base_midx) {
> > +		uint32_t pack_int_id, preferred_pack_id;
> > +		uint32_t i;
> > +
> > +		if (bitmap_order) {
> > +			if (midx_preferred_pack(m, &preferred_pack_id) < 0)
> > +				die(_("could not determine preferred pack"));
>
> `midx_preferred_pack()` only returns a valid pack ID in case we've got a
> reverse index, and as far as I understand we seem to only generate those
> when computing bitmaps. I assume that this means that we can only
> compact MIDX layers in bitmap order if they already were in bitmap order
> before?
>
> That would at least also make sense. We of course cannot randomly change
> the order in the middle of our layers, as that would break later layers
> that build on top.

Indeed, we only generate a reverse index for a MIDX if we are writing it
with bitmaps, since there is no other purpose for having a revindex
outside of reachability bitmaps.

So if we have a bitmap and are compacting, then we need to retain the
order of the packs as they appear in the pre-compaction pseudo-pack
order to avoid permuting the bits corresponding to those objects. In
other words, you're correct in saying that we cannot start writing
bitmaps during compaction if we did not have bitmaps to begin with
pre-compaction.

> > +		for (i = m->num_packs_in_base;
> > +		     i < m->num_packs_in_base + m->num_packs; i++) {
> > +			if (preferred_pack_id == i)
> > +				continue;
> > +
> > +			if (fill_pack_from_midx(&ctx->info[pack_int_id++], m,
> > +						i) < 0)
> > +				return -1;
> > +		}
> > +
>
> So the condition that should hold after this loop is `pack_int_id ==
> m->num_packs`. Which is somewhat obvious: we skip one pack, but that
> pack is the preferred pack that we have populated first.

Exactly!

> > @@ -1101,11 +1216,18 @@ static int write_midx_internal(struct write_midx_opts *opts)
> >  			 */
> >  			if (ctx.incremental)
> >  				ctx.base_midx = m;
> > -			else if (!opts->packs_to_include)
> > +			if (!opts->packs_to_include)
> >  				ctx.m = m;
>
> I'm a bit surprised by this change here. I would've expected that we
> never pass `packs_to_include` when compacting, so why is this change
> necessary?

Right, we do not pass packs_to_include here during compaction. But if we
are doing an incremental compaction, then we do want to assign ctx.m in
addition to ctx.base_midx.

> > diff --git a/t/t5335-compact-multi-pack-index.sh b/t/t5335-compact-multi-pack-index.sh
> > new file mode 100755
> > index 00000000000..f889af7fb1d
> > --- /dev/null
> > +++ b/t/t5335-compact-multi-pack-index.sh
> > @@ -0,0 +1,102 @@
> > +#!/bin/sh
> > +
> > +test_description='multi-pack-index compaction'
> > +
> > +. ./test-lib.sh
> > +
> > +GIT_TEST_MULTI_PACK_INDEX=0
> > +GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
> > +GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=0
> > +
> > +objdir=.git/objects
> > +packdir=$objdir/pack
> > +midxdir=$packdir/multi-pack-index.d
> > +midx_chain=$midxdir/multi-pack-index-chain
> > +
> > +nth_line() {
> > +	local n="$1"
> > +	shift
> > +	awk "NR==$n" "$@"
> > +}
> > +
> > +write_packs () {
> > +	for c in "$@"
> > +	do
> > +		test_commit "$c" &&
>
> Nit: it might be sensible to disable housekeeping here. You strongly
> depend on the on-disk shape of the objects, so if you by chance wrote
> two objects starting with "17" we'd end up repacking and racing.
>
> I've also got an upcoming patch series in mindthat I've got cooking to
> make geometric compaction the default for auto-maintenance. We've got
> many test suites that implicitly rely on the current algorithm used by
> git-gc(1), so I'd love to avoid adding more.

I'm not sure I follow what you mean by "housekeeping" here. Are you
referring to maintenance.auto? If so, we shouldn't be writing so many
packs as to trigger that during these tests, but I can disable it as a
sanity check just in case.

> [snip]
> > +test_expect_success 'MIDX compaction with lex-ordered pack names' '
> > +	git init midx-compact-lex-order &&
> > +	(
> > +		cd midx-compact-lex-order &&
> > +
> > +		write_packs A B C D E &&
> > +		test_line_count = 5 $midx_chain &&
> > +
> > +		git multi-pack-index compact --incremental \
> > +			"$(nth_line 2 "$midx_chain")" \
> > +			"$(nth_line 4 "$midx_chain")" &&
> > +		test_line_count = 3 $midx_chain &&
> > +
> > +		test_midx_layer_packs "$(nth_line 1 "$midx_chain")" A &&
> > +		test_midx_layer_packs "$(nth_line 2 "$midx_chain")" B C D &&
> > +		test_midx_layer_packs "$(nth_line 3 "$midx_chain")" E &&
> > +
> > +		test_midx_layer_object_uniqueness
> > +	)
> > +'
>
> It would be nice to also test for requests that don't make sense: "from"
> larger than "to", "from == to", missing "from" or "foo" and so on.

All good suggestions, thanks!

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 17/17] midx: enable reachability bitmaps during MIDX compaction
  2025-12-09  7:21   ` Patrick Steinhardt
@ 2026-01-13 23:47     ` Taylor Blau
  0 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-01-13 23:47 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano

On Tue, Dec 09, 2025 at 08:21:43AM +0100, Patrick Steinhardt wrote:
> On Sat, Dec 06, 2025 at 03:31:50PM -0500, Taylor Blau wrote:
> > Enable callers to generate reachability bitmaps when performing MIDX
> > layer compaction by combining all existing bitmaps from the compacted
> > layers.
> >
> > Note that the because of the object/pack ordering described by the
>
> s/that the because/that because/

;-) great catch!

> > previous commit, the pseudo-pack order for the compacted MIDX is the
> > same as concatenating the individual pseudo-pack orderings for each
> > layer in the compaction range.
> >
> > As a result, the only non-test or documentation change necessary is to
> > treat all objects as non-preferred during compaction so as not to
> > disturb the object ordering.
> >
> > In the future, we may want to adjust which commit(s) receive
> > reachability bitmaps when compacting multiple .bitmap files into one, or
> > even generate new bitmaps (e.g., if the references have moved
> > significantly since the .bitmap was generated). This commit only
> > implements combining all existing bitmaps in range together in order to
> > demonstrate and lay the groundwork for more exotic strategies.
>
> Will there also be a follow-up patch series that introduces geometric
> repacking for multi-pack indices?

That's the plan, though I have been calling it "incremental MIDX/bitmap"
repacking. The idea is roughly what I presented on slides 80-90 of my
Git Merge talk from last year[1].

The gist is the following:

 1. Perform a geometric repack of all non-MIDX'd packs, optionally
    including any packs in the top-most MIDX layer if it has more than M
    packs.

 2. Generate a new MIDX layer containing the result of that geometric
    repack.

 3. Perform layer compaction on the new chain in order to ensure that
    each layer has no more than 2x the number of packs as the layer
    above it.

This is a conceptual overview, so there are a few white lies here.
Namely, we do not write out a MIDX containing the result of the
geometric repack from the first step in all cases. If it would be
compacted in the following step, we'll optimize out that write and
instead write a layer whose contents is the result of the geometric
repack and the contents of the layer(s) below.

Compaction is also done in a single pass, so we don't compact the same
layer multiple times during a single repack.

The result is that we end up with a MIDX chain with two properties (see
slide 131 for more details):

 - Newer layers contain a greater number of packs which tend to be
   smaller by object count than older layers.

 - Older layers contain a smaller number of packs which tend to be
   larger by object count than newer layers.

I have that working in a WIP-quality branch[2], which I have been using
as a base to pull patches out of (which has yielded parts 3.1 and 3.2
thus far).

> > @@ -216,6 +216,8 @@ static int cmd_multi_pack_index_compact(int argc, const char **argv,
> >
> >  	struct option *options;
> >  	static struct option builtin_multi_pack_index_compact_options[] = {
> > +		OPT_BIT(0, "bitmap", &opts.flags, N_("write multi-pack bitmap"),
> > +			MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
> >  		OPT_BIT(0, "incremental", &opts.flags,
> >  			N_("write a new incremental MIDX"), MIDX_WRITE_INCREMENTAL),
> >  		OPT_END(),
>
> Is this new flag actually incompatible with the incremental flag like
> you claimed in the preceding commit? I had the impression that it should
> be possible to write incremental bitmaps now.
>
> If that's not the case, we should probably have a call to
> `die_for_imcopatible_opt2()` somewhere.

It's not incompatible, the comment is just stale, thanks for pointing it
out!

Thanks,
Taylor

[1]: https://ttaylorr.com/presentations/git-merge-2025.pdf

[2]: 'tb/incremental-midx-part-3.wip' from my 'ttaylorr/git' fork on
     GitHub.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction
  2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                   ` (16 preceding siblings ...)
  2025-12-06 20:31 ` [PATCH 17/17] midx: enable reachability bitmaps during " Taylor Blau
@ 2026-01-14 19:54 ` Taylor Blau
  2026-01-14 19:54   ` [PATCH v2 01/18] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
                     ` (18 more replies)
  2026-02-24 18:59 ` [PATCH v3 00/17] " Taylor Blau
  18 siblings, 19 replies; 99+ messages in thread
From: Taylor Blau @ 2026-01-14 19:54 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

[Note to the maintainer: this is based on 'master' with my
'tb/midx-write-corrupt-checksum-fix' merged in and should produce zero
conflicts when applied on top.]

This is a reroll of my series to implement MIDX layer compaction
adjusted in response to reviewer feedback.

As usual, a range-diff is included below for convenience. The main
changes since last time are as follows:

 - The MIDX file format has been bumped to version 2 to avoid breaking
   external implementations which may not be prepared to gracefully
   degrade away from the MIDX when encountering a non-sorted PNAM
   chunk.

 - I adjusted the names of the two functions which were split out from
   "get_midx_checksum()" to be "midx_get_checksum_hash()" and
   "midx_get_checksum_hex()" to match the hex.h API convention.

Along with those, there are a handful of more minor/cosmetic changes
that are also included in this round:

 - Explicitly declared the write_midx_opts struct from
   expire_midx_packs() instead of making use of compound literal
   syntax, for which we have an in-flight weather balloon.

 - Docfix to remove an incorrect statement suggesting that the options
   "--bitmap" and "--incremental" are incompatible with one another
   when used in conjunction with "git multi-pack-index compact".

 - Explicit checks and tests for bogus compaction scenarios (e.g.,
   missing from/to, to as an ancestor of from, etc.).

 - Various wording changes and a small handful of typofixes.

The original cover letter may be found here[1].

Thanks in advance for your review!

[1]: https://lore.kernel.org/git/cover.1765053054.git.me@ttaylorr.com/

Taylor Blau (18):
  midx: mark `get_midx_checksum()` arguments as const
  midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()`
  midx: introduce `midx_get_checksum_hex()`
  builtin/multi-pack-index.c: make '--progress' a common option
  git-multi-pack-index(1): remove non-existent incompatibility
  git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h'
  t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39
  midx-write.c: don't use `pack_perm` when assigning `bitmap_pos`
  midx-write.c: introduce `struct write_midx_opts`
  midx: do not require packs to be sorted in lexicographic order
  git-compat-util.h: introduce `u32_add()`
  midx-write.c: introduce `midx_pack_perm()` helper
  midx-write.c: extract `fill_pack_from_midx()`
  midx-write.c: enumerate `pack_int_id` values directly
  midx-write.c: factor fanout layering from `compute_sorted_entries()`
  t/helper/test-read-midx.c: plug memory leak when selecting layer
  midx: implement MIDX compaction
  midx: enable reachability bitmaps during MIDX compaction

 Documentation/git-multi-pack-index.adoc |  27 +-
 Documentation/gitformat-pack.adoc       |   6 +-
 builtin/multi-pack-index.c              |  91 ++++-
 git-compat-util.h                       |   8 +
 midx-write.c                            | 499 +++++++++++++++++++-----
 midx.c                                  |  39 +-
 midx.h                                  |  12 +-
 pack-bitmap.c                           |   9 +-
 pack-revindex.c                         |   4 +-
 t/helper/test-read-midx.c               |  21 +-
 t/meson.build                           |   1 +
 t/t0450/adoc-help-mismatches            |   1 -
 t/t5319-multi-pack-index.sh             |  18 +-
 t/t5335-compact-multi-pack-index.sh     | 293 ++++++++++++++
 14 files changed, 879 insertions(+), 150 deletions(-)
 create mode 100755 t/t5335-compact-multi-pack-index.sh

Range-diff against v1:
 1:  1ee8b752279 !  1:  2e549ea6443 midx: mark `get_midx_checksum()` arguments as const
    @@ Metadata
      ## Commit message ##
         midx: mark `get_midx_checksum()` arguments as const
     
    -    To make clear that the fucntion `get_midx_checksum()` does not do
    +    To make clear that the function `get_midx_checksum()` does not do
         anything to modify its argument, mark the MIDX pointer as const.
     
         The following commit will rename this function altogether to make clear
 2:  57125303758 !  2:  7255adafe70 midx: split `get_midx_checksum()` by adding `get_midx_hash()`
    @@ Metadata
     Author: Taylor Blau <me@ttaylorr.com>
     
      ## Commit message ##
    -    midx: split `get_midx_checksum()` by adding `get_midx_hash()`
    +    midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()`
     
    -    When trying to print out, say, the hexadecimal representation of a
    -    MIDX's hash, our code will do something like:
    +    Since 541204aabea (Documentation: document naming schema for structs and
    +    their functions, 2024-07-30), we have adopted a naming convention for
    +    functions that would prefer a name like, say, `midx_get_checksum()` over
    +    `get_midx_checksum()`.
     
    -        hash_to_hex_algop(get_midx_checksum(m),
    -                          m->source->odb->repo->hash_algo);
    -
    -    , which is both cumbersome and repetitive. In fact, all but a handful of
    -    callers to `get_midx_checksum()` do exactly the above. Reduce the
    -    repetitive nature of calling `get_midx_checksum()` by having it return a
    -    pointer into a static buffer containing the above result.
    -
    -    For the handful of callers that do need to compare the raw bytes and
    -    don't want to deal with an encoded copy (e.g., because they are passing
    -    it to hasheq() or similar), introduce `get_midx_hash()` which returns
    -    the raw bytes.
    +    Adopt this convention throughout the midx.h API. Since this function
    +    returns a raw (that is, non-hex encoded) hash, let's suffix the function
    +    with "_hash()" to make this clear. As a side effect, this prepares us
    +    for the subsequent change which will introduce a "_hex()" variant that
    +    encodes the checksum itself.
     
    +    Suggested-by: Patrick Steinhardt <ps@pks.im>
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## midx-write.c ##
    @@ midx-write.c: static int link_midx_to_chain(struct multi_pack_index *m)
      
      	for (i = 0; i < ARRAY_SIZE(midx_exts); i++) {
     -		const unsigned char *hash = get_midx_checksum(m);
    -+		const unsigned char *hash = get_midx_hash(m);
    ++		const unsigned char *hash = midx_get_checksum_hash(m);
      
      		get_midx_filename_ext(m->source, &from,
      				      hash, midx_exts[i].non_split);
    @@ midx-write.c: static int write_midx_internal(struct odb_source *source,
      			if (flags & MIDX_WRITE_BITMAP && load_midx_revindex(m)) {
      				error(_("could not load reverse index for MIDX %s"),
     -				      hash_to_hex_algop(get_midx_checksum(m),
    --							m->source->odb->repo->hash_algo));
    -+				      get_midx_checksum(m));
    ++				      hash_to_hex_algop(midx_get_checksum_hash(m),
    + 							m->source->odb->repo->hash_algo));
      				goto cleanup;
      			}
    - 			ctx.num_multi_pack_indexes_before++;
     @@ midx-write.c: static int write_midx_internal(struct odb_source *source,
      		for (uint32_t i = 0; i < ctx.num_multi_pack_indexes_before; i++) {
      			uint32_t j = ctx.num_multi_pack_indexes_before - i - 1;
      
     -			keep_hashes[j] = xstrdup(hash_to_hex_algop(get_midx_checksum(m),
    --								   r->hash_algo));
    -+			keep_hashes[j] = xstrdup(get_midx_checksum(m));
    ++			keep_hashes[j] = xstrdup(hash_to_hex_algop(midx_get_checksum_hash(m),
    + 								   r->hash_algo));
      			m = m->base_midx;
      		}
    - 
     
      ## midx.c ##
     @@ midx.c: void clear_incremental_midx_files_ext(struct odb_source *source, const char *ext
    @@ midx.c: void clear_incremental_midx_files_ext(struct odb_source *source, const c
      			 const char *idx_name);
      
     -const unsigned char *get_midx_checksum(const struct multi_pack_index *m)
    -+const char *get_midx_checksum(const struct multi_pack_index *m)
    -+{
    -+	return hash_to_hex_algop(get_midx_hash(m),
    -+				 m->source->odb->repo->hash_algo);
    -+}
    -+
    -+const unsigned char *get_midx_hash(const struct multi_pack_index *m)
    ++const unsigned char *midx_get_checksum_hash(const struct multi_pack_index *m)
      {
      	return m->data + m->data_len - m->source->odb->repo->hash_algo->rawsz;
      }
    @@ midx.h: struct multi_pack_index {
      #define MIDX_EXT_MIDX "midx"
      
     -const unsigned char *get_midx_checksum(const struct multi_pack_index *m);
    -+const char *get_midx_checksum(const struct multi_pack_index *m) /* static buffer */;
    -+const unsigned char *get_midx_hash(const struct multi_pack_index *m);
    ++const unsigned char *midx_get_checksum_hash(const struct multi_pack_index *m);
      void get_midx_filename(struct odb_source *source, struct strbuf *out);
      void get_midx_filename_ext(struct odb_source *source, struct strbuf *out,
      			   const unsigned char *hash, const char *ext);
    @@ pack-bitmap.c: char *midx_bitmap_filename(struct multi_pack_index *midx)
      	if (midx->has_chain)
      		get_split_midx_filename_ext(midx->source, &buf,
     -					    get_midx_checksum(midx),
    -+					    get_midx_hash(midx),
    ++					    midx_get_checksum_hash(midx),
      					    MIDX_EXT_BITMAP);
      	else
      		get_midx_filename_ext(midx->source, &buf,
     -				      get_midx_checksum(midx),
    -+				      get_midx_hash(midx),
    ++				      midx_get_checksum_hash(midx),
      				      MIDX_EXT_BITMAP);
      
      	return strbuf_detach(&buf, NULL);
    @@ pack-bitmap.c: static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
      		goto cleanup;
      
     -	if (!hasheq(get_midx_checksum(bitmap_git->midx), bitmap_git->checksum,
    -+	if (!hasheq(get_midx_hash(bitmap_git->midx), bitmap_git->checksum,
    ++	if (!hasheq(midx_get_checksum_hash(bitmap_git->midx), bitmap_git->checksum,
      		    bitmap_repo(bitmap_git)->hash_algo)) {
      		error(_("checksum doesn't match in MIDX and bitmap"));
      		goto cleanup;
    @@ pack-bitmap.c: void test_bitmap_walk(struct rev_info *revs)
      		if (bitmap_is_midx(found))
      			fprintf_ln(stderr, "Located via MIDX '%s'.",
     -				   hash_to_hex_algop(get_midx_checksum(found->midx),
    --						     revs->repo->hash_algo));
    -+				   get_midx_checksum(found->midx));
    ++				   hash_to_hex_algop(midx_get_checksum_hash(found->midx),
    + 						     revs->repo->hash_algo));
      		else
      			fprintf_ln(stderr, "Located via pack '%s'.",
    - 				   hash_to_hex_algop(found->pack->hash,
     
      ## pack-revindex.c ##
     @@ pack-revindex.c: int load_midx_revindex(struct multi_pack_index *m)
    @@ pack-revindex.c: int load_midx_revindex(struct multi_pack_index *m)
      	if (m->has_chain)
      		get_split_midx_filename_ext(m->source, &revindex_name,
     -					    get_midx_checksum(m),
    -+					    get_midx_hash(m),
    ++					    midx_get_checksum_hash(m),
      					    MIDX_EXT_REV);
      	else
      		get_midx_filename_ext(m->source, &revindex_name,
     -				      get_midx_checksum(m),
    -+				      get_midx_hash(m),
    ++				      midx_get_checksum_hash(m),
      				      MIDX_EXT_REV);
      
      	ret = load_revindex_from_disk(m->source->odb->repo->hash_algo,
    @@ t/helper/test-read-midx.c: static int read_midx_file(const char *object_dir, con
      
      	if (checksum) {
     -		while (m && strcmp(hash_to_hex(get_midx_checksum(m)), checksum))
    -+		while (m && strcmp(get_midx_checksum(m), checksum))
    ++		while (m && strcmp(hash_to_hex(midx_get_checksum_hash(m)), checksum))
      			m = m->base_midx;
      		if (!m)
      			return 1;
    @@ t/helper/test-read-midx.c: static int read_midx_checksum(const char *object_dir)
      	if (!m)
      		return 1;
     -	printf("%s\n", hash_to_hex(get_midx_checksum(m)));
    -+	printf("%s\n", get_midx_checksum(m));
    ++	printf("%s\n", hash_to_hex(midx_get_checksum_hash(m)));
      
      	close_midx(m);
      	return 0;
15:  c2149ae5bc5 !  3:  25b628fda97 t/helper/test-read-midx.c: plug memory leak when selecting layer
    @@ Metadata
     Author: Taylor Blau <me@ttaylorr.com>
     
      ## Commit message ##
    -    t/helper/test-read-midx.c: plug memory leak when selecting layer
    +    midx: introduce `midx_get_checksum_hex()`
     
    -    Though our 'read-midx' test tool is capable of printing information
    -    about a single MIDX layer identified by its checksum, no caller in our
    -    test suite exercises this path.
    +    When trying to print out, say, the hexadecimal representation of a
    +    MIDX's hash, our code will do something like:
     
    -    Unfortunately, there is a memory leak lurking in this (currently) unused
    -    path that would otherwise be exposed by the following commit.
    +        hash_to_hex_algop(midx_get_checksum_hash(m),
    +                          m->source->odb->repo->hash_algo);
     
    -    This occurs when providing a MIDX layer checksum other than the tip. As
    -    we walk over the MIDX chain trying to find the matching layer, we drop
    -    our reference to the top-most MIDX layer. Thus, our call to
    -    'close_midx()' later on leaks memory between the top-most MIDX layer and
    -    the MIDX layer immediately following the specified one.
    +    , which is both cumbersome and repetitive. In fact, all but a handful of
    +    callers to `midx_get_checksum_hash()` do exactly the above. Reduce the
    +    repetitive nature of calling `midx_get_checksum_hash()` by having it
    +    return a pointer into a static buffer containing the above result.
     
    -    Plug this leak by holding a reference to the tip of the MIDX chain, and
    -    ensure that we call `close_midx()` before terminating the test tool.
    +    For the handful of callers that do need to compare the raw bytes and
    +    don't want to deal with an encoded copy (e.g., because they are passing
    +    it to hasheq() or similar), they may still rely on
    +    `midx_get_checksum_hash()` which returns the raw bytes.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    - ## t/helper/test-read-midx.c ##
    -@@ t/helper/test-read-midx.c: static int read_midx_file(const char *object_dir, const char *checksum,
    - 			  int show_objects)
    + ## midx-write.c ##
    +@@ midx-write.c: static int write_midx_internal(struct odb_source *source,
    + 		while (m) {
    + 			if (flags & MIDX_WRITE_BITMAP && load_midx_revindex(m)) {
    + 				error(_("could not load reverse index for MIDX %s"),
    +-				      hash_to_hex_algop(midx_get_checksum_hash(m),
    +-							m->source->odb->repo->hash_algo));
    ++				      midx_get_checksum_hex(m));
    + 				goto cleanup;
    + 			}
    + 			ctx.num_multi_pack_indexes_before++;
    +@@ midx-write.c: static int write_midx_internal(struct odb_source *source,
    + 		for (uint32_t i = 0; i < ctx.num_multi_pack_indexes_before; i++) {
    + 			uint32_t j = ctx.num_multi_pack_indexes_before - i - 1;
    + 
    +-			keep_hashes[j] = xstrdup(hash_to_hex_algop(midx_get_checksum_hash(m),
    +-								   r->hash_algo));
    ++			keep_hashes[j] = xstrdup(midx_get_checksum_hex(m));
    + 			m = m->base_midx;
    + 		}
    + 
    +
    + ## midx.c ##
    +@@ midx.c: void clear_incremental_midx_files_ext(struct odb_source *source, const char *ext
    + int cmp_idx_or_pack_name(const char *idx_or_pack_name,
    + 			 const char *idx_name);
    + 
    ++const char *midx_get_checksum_hex(const struct multi_pack_index *m)
    ++{
    ++	return hash_to_hex_algop(midx_get_checksum_hash(m),
    ++				 m->source->odb->repo->hash_algo);
    ++}
    ++
    + const unsigned char *midx_get_checksum_hash(const struct multi_pack_index *m)
      {
    - 	uint32_t i;
    --	struct multi_pack_index *m;
    -+	struct multi_pack_index *m, *tip;
    -+	int ret = 0;
    + 	return m->data + m->data_len - m->source->odb->repo->hash_algo->rawsz;
    +
    + ## midx.h ##
    +@@ midx.h: struct multi_pack_index {
    + #define MIDX_EXT_BITMAP "bitmap"
    + #define MIDX_EXT_MIDX "midx"
      
    --	m = setup_midx(object_dir);
    -+	m = tip = setup_midx(object_dir);
    ++const char *midx_get_checksum_hex(const struct multi_pack_index *m) /* static buffer */;
    + const unsigned char *midx_get_checksum_hash(const struct multi_pack_index *m);
    + void get_midx_filename(struct odb_source *source, struct strbuf *out);
    + void get_midx_filename_ext(struct odb_source *source, struct strbuf *out,
    +
    + ## pack-bitmap.c ##
    +@@ pack-bitmap.c: void test_bitmap_walk(struct rev_info *revs)
      
    - 	if (!m)
    - 		return 1;
    + 		if (bitmap_is_midx(found))
    + 			fprintf_ln(stderr, "Located via MIDX '%s'.",
    +-				   hash_to_hex_algop(midx_get_checksum_hash(found->midx),
    +-						     revs->repo->hash_algo));
    ++				   midx_get_checksum_hex(found->midx));
    + 		else
    + 			fprintf_ln(stderr, "Located via pack '%s'.",
    + 				   hash_to_hex_algop(found->pack->hash,
    +
    + ## t/helper/test-read-midx.c ##
     @@ t/helper/test-read-midx.c: static int read_midx_file(const char *object_dir, const char *checksum,
    + 		return 1;
    + 
      	if (checksum) {
    - 		while (m && strcmp(get_midx_checksum(m), checksum))
    +-		while (m && strcmp(hash_to_hex(midx_get_checksum_hash(m)), checksum))
    ++		while (m && strcmp(midx_get_checksum_hex(m), checksum))
      			m = m->base_midx;
    --		if (!m)
    --			return 1;
    -+		if (!m) {
    -+			ret = error(_("could not find MIDX with checksum %s"),
    -+				    checksum);
    -+			goto out;
    -+		}
    - 	}
    - 
    - 	printf("header: %08x %d %d %d %d\n",
    -@@ t/helper/test-read-midx.c: static int read_midx_file(const char *object_dir, const char *checksum,
    - 		}
    - 	}
    - 
    --	close_midx(m);
    -+out:
    -+	close_midx(tip);
    - 
    --	return 0;
    -+	return ret;
    - }
    - 
    - static int read_midx_checksum(const char *object_dir)
    + 		if (!m)
    + 			return 1;
    +@@ t/helper/test-read-midx.c: static int read_midx_checksum(const char *object_dir)
    + 	m = setup_midx(object_dir);
    + 	if (!m)
    + 		return 1;
    +-	printf("%s\n", hash_to_hex(midx_get_checksum_hash(m)));
    ++	printf("%s\n", midx_get_checksum_hex(m));
    + 
    + 	close_midx(m);
    + 	return 0;
 3:  d35c709632b =  4:  2aedd72db8c builtin/multi-pack-index.c: make '--progress' a common option
 4:  6111a1f5abd =  5:  a00598a36a3 git-multi-pack-index(1): remove non-existent incompatibility
 5:  630a53adbd6 =  6:  92e6d868a45 git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h'
 6:  6dd59967876 =  7:  ff599c11f68 t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39
 7:  7f498a02ce7 !  8:  315a0ea2985 midx-write.c: don't use `pack_perm` when assigning `bitmap_pos`
    @@ Metadata
      ## Commit message ##
         midx-write.c: don't use `pack_perm` when assigning `bitmap_pos`
     
    -    In midx_pack_order(), we compute for each bitampped pack the first bit
    +    In midx_pack_order(), we compute for each bitmapped pack the first bit
         to correspond to an object in that pack, along with how many bits were
         assigned to object(s) in that pack.
     
 8:  687a2f50337 !  9:  af174e22e1e midx-write.c: introduce `struct write_midx_opts`
    @@ Commit message
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## midx-write.c ##
    -@@ midx-write.c: static void clear_midx_files(struct odb_source *source,
    - 	strbuf_release(&buf);
    +@@ midx-write.c: static bool midx_needs_update(struct multi_pack_index *midx, struct write_midx_c
    + 	return needed;
      }
      
     -static int write_midx_internal(struct odb_source *source,
    @@ midx-write.c: static void clear_midx_files(struct odb_source *source,
     -			       const char *refs_snapshot,
     -			       unsigned flags)
     +struct write_midx_opts {
    -+	struct odb_source *source;
    ++	struct odb_source *source; /* non-optional */
     +
     +	struct string_list *packs_to_include;
     +	struct string_list *packs_to_drop;
    @@ midx-write.c: static int write_midx_internal(struct odb_source *source,
     -			if (flags & MIDX_WRITE_BITMAP && load_midx_revindex(m)) {
     +			if (opts->flags & MIDX_WRITE_BITMAP && load_midx_revindex(m)) {
      				error(_("could not load reverse index for MIDX %s"),
    - 				      get_midx_checksum(m));
    + 				      midx_get_checksum_hex(m));
      				goto cleanup;
     @@ midx-write.c: static int write_midx_internal(struct odb_source *source,
      	start_pack = ctx.nr;
    @@ midx-write.c: static int write_midx_internal(struct odb_source *source,
     +	for_each_file_in_pack_dir(opts->source->path, add_pack_to_midx, &ctx);
      	stop_progress(&ctx.progress);
      
    - 	if ((ctx.m && ctx.nr == ctx.m->num_packs + ctx.m->num_packs_in_base) &&
    - 	    !ctx.incremental &&
    --	    !(packs_to_include || packs_to_drop)) {
    -+	    !(opts->packs_to_include || opts->packs_to_drop)) {
    - 		struct bitmap_index *bitmap_git;
    - 		int bitmap_exists;
    --		int want_bitmap = flags & MIDX_WRITE_BITMAP;
    -+		int want_bitmap = opts->flags & MIDX_WRITE_BITMAP;
    +-	if (!packs_to_drop) {
    ++	if (!opts->packs_to_drop) {
    + 		/*
    + 		 * If there is no MIDX then either it doesn't exist, or we're
    + 		 * doing a geometric repack. Try to load it from the source to
    +@@ midx-write.c: static int write_midx_internal(struct odb_source *source,
    + 		if (midx && !midx_needs_update(midx, &ctx)) {
    + 			struct bitmap_index *bitmap_git;
    + 			int bitmap_exists;
    +-			int want_bitmap = flags & MIDX_WRITE_BITMAP;
    ++			int want_bitmap = opts->flags & MIDX_WRITE_BITMAP;
      
    - 		bitmap_git = prepare_midx_bitmap_git(ctx.m);
    - 		bitmap_exists = bitmap_git && bitmap_is_midx(bitmap_git);
    + 			bitmap_git = prepare_midx_bitmap_git(midx);
    + 			bitmap_exists = bitmap_git && bitmap_is_midx(bitmap_git);
     @@ midx-write.c: static int write_midx_internal(struct odb_source *source,
    - 			 * corresponding bitmap (or one wasn't requested).
    - 			 */
    - 			if (!want_bitmap)
    --				clear_midx_files_ext(source, "bitmap", NULL);
    -+				clear_midx_files_ext(opts->source, "bitmap",
    -+						     NULL);
    - 			result = 0;
    - 			goto cleanup;
    - 		}
    + 				 * corresponding bitmap (or one wasn't requested).
    + 				 */
    + 				if (!want_bitmap)
    +-					clear_midx_files_ext(source, "bitmap", NULL);
    ++					clear_midx_files_ext(ctx.source, "bitmap", NULL);
    + 				result = 0;
    + 				goto cleanup;
    + 			}
     @@ midx-write.c: static int write_midx_internal(struct odb_source *source,
      		goto cleanup; /* nothing to do */
      	}
    @@ midx-write.c: static int write_midx_internal(struct odb_source *source,
     -	if (flags & MIDX_WRITE_BITMAP) {
     +	if (opts->flags & MIDX_WRITE_BITMAP) {
      		struct packing_data pdata;
    - 		struct commit **commits;
    - 		uint32_t commits_nr;
    + 		struct commit_stack commits = COMMIT_STACK_INIT;
    + 
     @@ midx-write.c: static int write_midx_internal(struct odb_source *source,
      
      		prepare_midx_packing_data(&pdata, &ctx);
      
    --		commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, &ctx);
    -+		commits = find_commits_for_midx_bitmap(&commits_nr, opts->refs_snapshot, &ctx);
    +-		find_commits_for_midx_bitmap(&commits, refs_snapshot, &ctx);
    ++		find_commits_for_midx_bitmap(&commits, opts->refs_snapshot, &ctx);
      
      		/*
      		 * The previous steps translated the information from
     @@ midx-write.c: static int write_midx_internal(struct odb_source *source,
    + 		FREE_AND_NULL(ctx.entries);
    + 		ctx.entries_nr = 0;
      
    - 		if (write_midx_bitmap(&ctx,
    - 				      midx_hash, &pdata, commits, commits_nr,
    --				      flags) < 0) {
    -+				      opts->flags) < 0) {
    +-		if (write_midx_bitmap(&ctx, midx_hash, &pdata,
    +-				      commits.items, commits.nr, flags) < 0) {
    ++		if (write_midx_bitmap(&ctx, midx_hash, &pdata, commits.items,
    ++				      commits.nr, opts->flags) < 0) {
      			error(_("could not write multi-pack bitmap"));
      			clear_packing_data(&pdata);
    - 			free(commits);
    + 			commit_stack_clear(&commits);
     @@ midx-write.c: static int write_midx_internal(struct odb_source *source,
      		if (link_midx_to_chain(ctx.base_midx) < 0)
      			goto cleanup;
    @@ midx-write.c: int write_midx_file_only(struct odb_source *source,
      
      int expire_midx_packs(struct odb_source *source, unsigned flags)
     @@ midx-write.c: int expire_midx_packs(struct odb_source *source, unsigned flags)
    + 
      	free(count);
      
    - 	if (packs_to_drop.nr)
    +-	if (packs_to_drop.nr)
     -		result = write_midx_internal(source, NULL,
     -					     &packs_to_drop, NULL, NULL, flags);
    -+		result = write_midx_internal(&(struct write_midx_opts) {
    -+					     .source = source,
    -+					     .packs_to_drop = &packs_to_drop,
    -+					     .flags = flags & MIDX_PROGRESS,
    -+					     });
    ++	if (packs_to_drop.nr) {
    ++		struct write_midx_opts opts = {
    ++			.source = source,
    ++			.packs_to_drop = &packs_to_drop,
    ++			.flags = flags & MIDX_PROGRESS,
    ++		};
    ++		result = write_midx_internal(&opts);
    ++	}
      
      	string_list_clear(&packs_to_drop, 0);
      
    +@@ midx-write.c: int midx_repack(struct odb_source *source, size_t batch_size, unsigned flags)
    + 	struct child_process cmd = CHILD_PROCESS_INIT;
    + 	FILE *cmd_in;
    + 	struct multi_pack_index *m = get_multi_pack_index(source);
    ++	struct write_midx_opts opts = {
    ++		.source = source,
    ++		.flags = flags,
    ++	};
    + 
    + 	/*
    + 	 * When updating the default for these configuration
     @@ midx-write.c: int midx_repack(struct odb_source *source, size_t batch_size, unsigned flags)
      		goto cleanup;
      	}
      
     -	result = write_midx_internal(source, NULL, NULL, NULL, NULL,
     -				     flags);
    -+	result = write_midx_internal(&(struct write_midx_opts) {
    -+				     .source = source,
    -+				     .flags = flags,
    -+				     });
    ++	result = write_midx_internal(&opts);
      
      cleanup:
      	free(include_pack);
 9:  66ae4bc8c0a ! 10:  72bcd4ed6c7 midx: do not require packs to be sorted in lexicographic order
    @@ Commit message
         lazily instantiate a `pack_names_sorted` array on the MIDX, which will
         be used to implement the binary search over pack names.
     
    -    Note that this produces MIDXs which may be incompatible with earlier
    -    versions of Git that have stricter requirements on the layout of packs
    -    within a MIDX. This patch does *not* modify the version number of the
    -    MIDX format, since existing versions of Git already know to gracefully
    -    ignore a MIDX with packs that appear out-of-order.
    +    Because this change produces MIDXs which may not be correctly read with
    +    external tools or older versions of Git. Though older versions of Git
    +    know how to gracefully degrade and ignore any MIDX(s) they consider
    +    corrupt, external tools may not be as robust. To avoid unintentionally
    +    breaking any such tools, guard this change behind a version bump in the
    +    MIDX's on-disk format.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    + ## Documentation/gitformat-pack.adoc ##
    +@@ Documentation/gitformat-pack.adoc: HEADER:
    + 	    The signature is: {'M', 'I', 'D', 'X'}
    + 
    + 	1-byte version number:
    +-	    Git only writes or recognizes version 1.
    ++	    Git only writes version 2, but recognizes versions 1 and 2.
    + 
    + 	1-byte Object Id Version
    + 	    We infer the length of object IDs (OIDs) from this value:
    +@@ Documentation/gitformat-pack.adoc: CHUNK DATA:
    + 	    strings. There is no extra padding between the filenames,
    + 	    and they are listed in lexicographic order. The chunk itself
    + 	    is padded at the end with between 0 and 3 NUL bytes to make the
    +-	    chunk size a multiple of 4 bytes.
    ++	    chunk size a multiple of 4 bytes. Version 1 MIDXs are required to
    ++	    list their packs in lexicographic order, but version 2 MIDXs may
    ++	    list their packs in any arbitrary order.
    + 
    + 	Bitmapped Packfiles (ID: {'B', 'T', 'M', 'P'})
    + 	    Stores a table of two 4-byte unsigned integers in network order.
    +
      ## midx-write.c ##
    +@@ midx-write.c: extern int cmp_idx_or_pack_name(const char *idx_or_pack_name,
    + 
    + static size_t write_midx_header(const struct git_hash_algo *hash_algo,
    + 				struct hashfile *f, unsigned char num_chunks,
    +-				uint32_t num_packs)
    ++				uint32_t num_packs, int version)
    + {
    ++	if (version != MIDX_VERSION_V1 && version != MIDX_VERSION_V2)
    ++		BUG("unexpected MIDX version: %d", version);
    ++
    + 	hashwrite_be32(f, MIDX_SIGNATURE);
    +-	hashwrite_u8(f, MIDX_VERSION);
    ++	hashwrite_u8(f, version);
    + 	hashwrite_u8(f, oid_version(hash_algo));
    + 	hashwrite_u8(f, num_chunks);
    + 	hashwrite_u8(f, 0); /* unused */
    +@@ midx-write.c: struct write_midx_context {
    + 
    + 	uint32_t preferred_pack_idx;
    + 
    ++	int version; /* must be MIDX_VERSION_V1 or _V2 */
    ++
    + 	int incremental;
    + 	uint32_t num_multi_pack_indexes_before;
    + 
     @@ midx-write.c: static int write_midx_pack_names(struct hashfile *f, void *data)
      		if (ctx->info[i].expired)
      			continue;
      
     -		if (i && strcmp(ctx->info[i].pack_name, ctx->info[i - 1].pack_name) <= 0)
    --			BUG("incorrect pack-file order: %s before %s",
    --			    ctx->info[i - 1].pack_name,
    --			    ctx->info[i].pack_name);
    --
    - 		writelen = strlen(ctx->info[i].pack_name) + 1;
    - 		hashwrite(f, ctx->info[i].pack_name, writelen);
    - 		written += writelen;
    ++		if (ctx->version == MIDX_VERSION_V1 &&
    ++		    i && strcmp(ctx->info[i].pack_name,
    ++				ctx->info[i - 1].pack_name) <= 0)
    + 			BUG("incorrect pack-file order: %s before %s",
    + 			    ctx->info[i - 1].pack_name,
    + 			    ctx->info[i].pack_name);
    +@@ midx-write.c: static bool midx_needs_update(struct multi_pack_index *midx, struct write_midx_c
    + 	if (!midx_checksum_valid(midx))
    + 		goto out;
    + 
    ++	/*
    ++	 * If the version differs, we need to update.
    ++	 */
    ++	if (midx->version != ctx->version)
    ++		goto out;
    ++
    + 	/*
    + 	 * Ignore incremental updates for now. The assumption is that any
    + 	 * incremental update would be either empty (in which case we will bail
    +@@ midx-write.c: static int write_midx_internal(struct write_midx_opts *opts)
    + 	struct tempfile *incr;
    + 	struct write_midx_context ctx = {
    + 		.preferred_pack_idx = NO_PREFERRED_PACK,
    ++		.version = MIDX_VERSION_V2,
    + 	 };
    + 	struct multi_pack_index *midx_to_free = NULL;
    + 	int bitmapped_packs_concat_len = 0;
    +@@ midx-write.c: static int write_midx_internal(struct write_midx_opts *opts)
    + 	ctx.repo = r;
    + 	ctx.source = opts->source;
    + 
    ++	repo_config_get_int(ctx.repo, "midx.version", &ctx.version);
    ++	if (ctx.version != MIDX_VERSION_V1 && ctx.version != MIDX_VERSION_V2)
    ++		die(_("unknown MIDX version: %d"), ctx.version);
    ++
    + 	ctx.incremental = !!(opts->flags & MIDX_WRITE_INCREMENTAL);
    + 
    + 	if (ctx.incremental)
    +@@ midx-write.c: static int write_midx_internal(struct write_midx_opts *opts)
    + 	}
    + 
    + 	write_midx_header(r->hash_algo, f, get_num_chunks(cf),
    +-			  ctx.nr - dropped_packs);
    ++			  ctx.nr - dropped_packs, ctx.version);
    + 	write_chunkfile(cf, &ctx);
    + 
    + 	finalize_hashfile(f, midx_hash, FSYNC_COMPONENT_PACK_METADATA,
     
      ## midx.c ##
     @@ midx.c: static struct multi_pack_index *load_multi_pack_index_one(struct odb_source *sou
    - 		if (!end)
    + 		      m->signature, MIDX_SIGNATURE);
    + 
    + 	m->version = m->data[MIDX_BYTE_FILE_VERSION];
    +-	if (m->version != MIDX_VERSION)
    ++	if (m->version != MIDX_VERSION_V1 && m->version != MIDX_VERSION_V2)
    + 		die(_("multi-pack-index version %d not recognized"),
    + 		      m->version);
    + 
    +@@ midx.c: static struct multi_pack_index *load_multi_pack_index_one(struct odb_source *sou
      			die(_("multi-pack-index pack-name chunk is too short"));
      		cur_pack_name = end + 1;
    --
    + 
     -		if (i && strcmp(m->pack_names[i], m->pack_names[i - 1]) <= 0)
    --			die(_("multi-pack-index pack names out of order: '%s' before '%s'"),
    --			      m->pack_names[i - 1],
    --			      m->pack_names[i]);
    - 	}
    - 
    - 	trace2_data_intmax("midx", r, "load/num_packs", m->num_packs);
    ++		if (m->version == MIDX_VERSION_V1 &&
    ++		    i && strcmp(m->pack_names[i], m->pack_names[i - 1]) <= 0)
    + 			die(_("multi-pack-index pack names out of order: '%s' before '%s'"),
    + 			      m->pack_names[i - 1],
    + 			      m->pack_names[i]);
     @@ midx.c: void close_midx(struct multi_pack_index *m)
      	}
      	FREE_AND_NULL(m->packs);
    @@ midx.c: int cmp_idx_or_pack_name(const char *idx_or_pack_name,
      {
      	uint32_t first = 0, last = m->num_packs;
      
    -+	if (!m->pack_names_sorted) {
    ++	if (m->version == MIDX_VERSION_V2 && !m->pack_names_sorted) {
     +		uint32_t i;
     +
     +		ALLOC_ARRAY(m->pack_names_sorted, m->num_packs);
    @@ midx.c: int cmp_idx_or_pack_name(const char *idx_or_pack_name,
      		int cmp;
      
     -		current = m->pack_names[mid];
    -+		current = m->pack_names[m->pack_names_sorted[mid]];
    ++		if (m->pack_names_sorted)
    ++			current = m->pack_names[m->pack_names_sorted[mid]];
    ++		else
    ++			current = m->pack_names[mid];
      		cmp = cmp_idx_or_pack_name(idx_or_pack_name, current);
      		if (!cmp)
      			return 1;
     
      ## midx.h ##
    +@@ midx.h: struct git_hash_algo;
    + struct odb_source;
    + 
    + #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
    +-#define MIDX_VERSION 1
    ++#define MIDX_VERSION_V1 1
    ++#define MIDX_VERSION_V2 2
    + #define MIDX_BYTE_FILE_VERSION 4
    + #define MIDX_BYTE_HASH_VERSION 5
    + #define MIDX_BYTE_NUM_CHUNKS 6
     @@ midx.h: struct multi_pack_index {
      	uint32_t num_packs_in_base;
      
    @@ midx.h: struct multi_pack_index {
      
     
      ## t/t5319-multi-pack-index.sh ##
    +@@ t/t5319-multi-pack-index.sh: midx_read_expect () {
    + 	EXTRA_CHUNKS="$5"
    + 	{
    + 		cat <<-EOF &&
    +-		header: 4d494458 1 $HASH_LEN $NUM_CHUNKS $NUM_PACKS
    ++		header: 4d494458 2 $HASH_LEN $NUM_CHUNKS $NUM_PACKS
    + 		chunks: pack-names oid-fanout oid-lookup object-offsets$EXTRA_CHUNKS
    + 		num_objects: $NUM_OBJECTS
    + 		packs:
     @@ t/t5319-multi-pack-index.sh: test_expect_success 'verify invalid chunk offset' '
      		"improper chunk offset(s)"
      '
    @@ t/t5319-multi-pack-index.sh: test_expect_success 'verify invalid chunk offset' '
      test_expect_success 'verify missing pack' '
      	corrupt_midx_and_verify $MIDX_BYTE_PACKNAME_ORDER "a" $objdir \
      		"failed to load pack"
    +@@ t/t5319-multi-pack-index.sh: test_expect_success 'verify incorrect checksum' '
    + 		$objdir "incorrect checksum"
    + '
    + 
    ++test_expect_success 'setup for v1-specific fsck tests' '
    ++	git -c midx.version=1 multi-pack-index write
    ++'
    ++
    ++test_expect_success 'verify packnames out of order (v1)' '
    ++	corrupt_midx_and_verify $MIDX_BYTE_PACKNAME_ORDER "z" $objdir \
    ++		"pack names out of order"
    ++'
    ++
    + test_expect_success 'repack progress off for redirected stderr' '
    + 	GIT_PROGRESS_DELAY=0 git multi-pack-index --object-dir=$objdir repack 2>err &&
    + 	test_line_count = 0 err
10:  9231fdca112 = 11:  c0c1769464b git-compat-util.h: introduce `u32_add()`
11:  12211c9ad53 = 12:  c11214a51f0 midx-write.c: introduce `midx_pack_perm()` helper
12:  978d7bf8bbc = 13:  b9244a04297 midx-write.c: extract `fill_pack_from_midx()`
13:  61deab3f731 = 14:  c6f8d323477 midx-write.c: enumerate `pack_int_id` values directly
14:  4f920a328ef = 15:  e71aa575463 midx-write.c: factor fanout layering from `compute_sorted_entries()`
 -:  ----------- > 16:  dbbcb494563 t/helper/test-read-midx.c: plug memory leak when selecting layer
16:  088673c762d ! 17:  13336e864f4 midx: implement MIDX compaction
    @@ Documentation/git-multi-pack-index.adoc: marker).
     +--
     +	--incremental::
     +		Write the result to a MIDX chain instead of writing a
    -+		stand-alone MIDX. Incompatible with `--bitmap`.
    ++		stand-alone MIDX.
     +--
     +
      verify::
    @@ builtin/multi-pack-index.c: static int cmd_multi_pack_index_write(int argc, cons
     +	m = get_multi_pack_index(source);
     +
     +	for (cur = m; cur && !(from_midx && to_midx); cur = cur->base_midx) {
    -+		const char *midx_csum = get_midx_checksum(cur);
    ++		const char *midx_csum = midx_get_checksum_hex(cur);
     +
     +		if (!from_midx && !strcmp(midx_csum, argv[0]))
     +			from_midx = cur;
    @@ builtin/multi-pack-index.c: static int cmd_multi_pack_index_write(int argc, cons
     +	}
     +
     +	if (!from_midx)
    -+		die(_("could not find MIDX 'from': %s"), argv[0]);
    ++		die(_("could not find MIDX: %s"), argv[0]);
     +	if (!to_midx)
    -+		die(_("could not find MIDX 'to': %s"), argv[1]);
    ++		die(_("could not find MIDX: %s"), argv[1]);
    ++	if (from_midx == to_midx)
    ++		die(_("MIDX compaction endpoints must be unique"));
    ++
    ++	for (m = from_midx; m; m = m->base_midx) {
    ++		if (m == to_midx)
    ++			die(_("MIDX %s must be an ancestor of %s"), argv[0], argv[1]);
    ++	}
     +
     +	ret = write_midx_file_compact(source, from_midx, to_midx, opts.flags);
     +
    @@ midx-write.c: static int fill_packs_from_midx(struct write_midx_context *ctx)
      static struct {
      	const char *non_split;
      	const char *split;
    -@@ midx-write.c: static void clear_midx_files(struct odb_source *source,
    - 	strbuf_release(&buf);
    +@@ midx-write.c: static bool midx_needs_update(struct multi_pack_index *midx, struct write_midx_c
    + 	if (ctx->incremental)
    + 		goto out;
    + 
    ++	if (ctx->compact)
    ++		goto out; /* Compaction always requires an update. */
    ++
    + 	/*
    + 	 * Otherwise, we need to verify that the packs covered by the existing
    + 	 * MIDX match the packs that we already have. The logic to do so is way
    +@@ midx-write.c: static bool midx_needs_update(struct multi_pack_index *midx, struct write_midx_c
    + 	return needed;
      }
      
     +static int midx_hashcmp(const struct multi_pack_index *a,
     +			const struct multi_pack_index *b,
     +			const struct git_hash_algo *algop)
     +{
    -+	return hashcmp(get_midx_hash(a), get_midx_hash(b), algop);
    ++	return hashcmp(midx_get_checksum_hash(a), midx_get_checksum_hash(b),
    ++		       algop);
     +}
     +
      struct write_midx_opts {
    - 	struct odb_source *source;
    + 	struct odb_source *source; /* non-optional */
      
      	struct string_list *packs_to_include;
      	struct string_list *packs_to_drop;
    @@ midx-write.c: static int write_midx_internal(struct write_midx_opts *opts)
      
      	trace2_region_enter("midx", "write_midx_internal", r);
     @@ midx-write.c: static int write_midx_internal(struct write_midx_opts *opts)
    - 	ctx.source = opts->source;
    + 		die(_("unknown MIDX version: %d"), ctx.version);
      
      	ctx.incremental = !!(opts->flags & MIDX_WRITE_INCREMENTAL);
     +	ctx.compact = !!(opts->flags & MIDX_WRITE_COMPACT);
     +
     +	if (ctx.compact) {
    ++		if (ctx.version != MIDX_VERSION_V2)
    ++			die(_("cannot perform MIDX compaction with v1 format"));
     +		if (!opts->compact_from)
     +			BUG("expected non-NULL 'from' MIDX during compaction");
     +		if (!opts->compact_to)
    @@ midx-write.c: static int write_midx_internal(struct write_midx_opts *opts)
     +	}
      	stop_progress(&ctx.progress);
      
    - 	if ((ctx.m && ctx.nr == ctx.m->num_packs + ctx.m->num_packs_in_base) &&
    - 	    !ctx.incremental &&
    -+	    !ctx.compact &&
    - 	    !(opts->packs_to_include || opts->packs_to_drop)) {
    - 		struct bitmap_index *bitmap_git;
    - 		int bitmap_exists;
    + 	if (!opts->packs_to_drop) {
     @@ midx-write.c: static int write_midx_internal(struct write_midx_opts *opts)
      			ctx.large_offsets_needed = 1;
      	}
      
     -	QSORT(ctx.info, ctx.nr, pack_info_compare);
    -+	if (!ctx.compact)
    ++	if (ctx.compact) {
    ++		if (ctx.version != MIDX_VERSION_V2)
    ++			BUG("performing MIDX compaction with v1 MIDX");
    ++	} else {
     +		QSORT(ctx.info, ctx.nr, pack_info_compare);
    ++	}
      
      	if (opts->packs_to_drop && opts->packs_to_drop->nr) {
      		size_t drop_index = 0;
    @@ midx-write.c: static int write_midx_internal(struct write_midx_opts *opts)
     +			for (m = ctx.base_midx; m; m = m->base_midx)
     +				num_layers_before_from++;
      
    --			keep_hashes[j] = xstrdup(get_midx_checksum(m));
    +-			keep_hashes[j] = xstrdup(midx_get_checksum_hex(m));
     -			m = m->base_midx;
     +			m = ctx.base_midx;
     +			for (i = 0; i < num_layers_before_from; i++) {
     +				uint32_t j = num_layers_before_from - i - 1;
     +
    -+				keep_hashes[j] = xstrdup(get_midx_checksum(m));
    ++				keep_hashes[j] = xstrdup(midx_get_checksum_hex(m));
     +				m = m->base_midx;
     +			}
     +
    @@ midx-write.c: static int write_midx_internal(struct write_midx_opts *opts)
     +			     m && midx_hashcmp(m, ctx.compact_to, r->hash_algo);
     +			     m = m->base_midx) {
     +				keep_hashes[keep_hashes_nr - i - 1] =
    -+					xstrdup(get_midx_checksum(m));
    ++					xstrdup(midx_get_checksum_hex(m));
     +				i++;
     +			}
     +		} else {
    @@ midx-write.c: static int write_midx_internal(struct write_midx_opts *opts)
     +			for (uint32_t i = 0; i < ctx.num_multi_pack_indexes_before; i++) {
     +				uint32_t j = ctx.num_multi_pack_indexes_before - i - 1;
     +
    -+				keep_hashes[j] = xstrdup(get_midx_checksum(m));
    ++				keep_hashes[j] = xstrdup(midx_get_checksum_hex(m));
     +				m = m->base_midx;
     +			}
      		}
    @@ t/t5335-compact-multi-pack-index.sh (new)
     +	(
     +		cd midx-compact-lex-order &&
     +
    ++		git config maintenance.auto false &&
    ++
     +		write_packs A B C D E &&
     +		test_line_count = 5 $midx_chain &&
     +
    @@ t/t5335-compact-multi-pack-index.sh (new)
     +	(
     +		cd midx-compact-non-lex-order &&
     +
    ++		git config maintenance.auto false &&
    ++
     +		write_packs D C A B E &&
     +		test_line_count = 5 $midx_chain &&
     +
    @@ t/t5335-compact-multi-pack-index.sh (new)
     +	)
     +'
     +
    ++test_expect_success 'setup for bogus MIDX compaction scenarios' '
    ++	git init midx-compact-bogus &&
    ++	(
    ++		cd midx-compact-bogus &&
    ++
    ++		git config maintenance.auto false &&
    ++
    ++		write_packs A B C
    ++	)
    ++'
    ++
    ++test_expect_success 'MIDX compaction with missing endpoints' '
    ++	(
    ++		cd midx-compact-bogus &&
    ++
    ++		test_must_fail git multi-pack-index compact --incremental \
    ++			"<missing>" "<missing>" 2>err &&
    ++		test_grep "could not find MIDX: <missing>" err &&
    ++
    ++		test_must_fail git multi-pack-index compact --incremental \
    ++			"<missing>" "$(nth_line 2 "$midx_chain")" 2>err &&
    ++		test_grep "could not find MIDX: <missing>" err &&
    ++
    ++		test_must_fail git multi-pack-index compact --incremental \
    ++			"$(nth_line 2 "$midx_chain")" "<missing>" 2>err &&
    ++		test_grep "could not find MIDX: <missing>" err
    ++	)
    ++'
    ++
    ++test_expect_success 'MIDX compaction with reversed endpoints' '
    ++	(
    ++		cd midx-compact-bogus &&
    ++
    ++		from="$(nth_line 3 "$midx_chain")" &&
    ++		to="$(nth_line 1 "$midx_chain")" &&
    ++
    ++		test_must_fail git multi-pack-index compact --incremental \
    ++			"$from" "$to" 2>err &&
    ++
    ++		test_grep "MIDX $from must be an ancestor of $to" err
    ++	)
    ++'
    ++
    ++test_expect_success 'MIDX compaction with identical endpoints' '
    ++	(
    ++		cd midx-compact-bogus &&
    ++
    ++		from="$(nth_line 3 "$midx_chain")" &&
    ++		to="$(nth_line 3 "$midx_chain")" &&
    ++
    ++		test_must_fail git multi-pack-index compact --incremental \
    ++			"$from" "$to" 2>err &&
    ++
    ++		test_grep "MIDX compaction endpoints must be unique" err
    ++	)
    ++'
    ++
    ++test_expect_success 'MIDX compaction with midx.version=1' '
    ++	(
    ++		cd midx-compact-bogus &&
    ++
    ++		test_must_fail git -c midx.version=1 multi-pack-index compact \
    ++			"$(nth_line 1 "$midx_chain")" \
    ++			"$(nth_line 2 "$midx_chain")" 2>err &&
    ++
    ++		test_grep "fatal: cannot perform MIDX compaction with v1 format" err
    ++	)
    ++'
    ++
     +test_done
17:  b96c4e04266 ! 18:  b599f1ad4b0 midx: enable reachability bitmaps during MIDX compaction
    @@ Commit message
         layer compaction by combining all existing bitmaps from the compacted
         layers.
     
    -    Note that the because of the object/pack ordering described by the
    -    previous commit, the pseudo-pack order for the compacted MIDX is the
    -    same as concatenating the individual pseudo-pack orderings for each
    -    layer in the compaction range.
    +    Note that because of the object/pack ordering described by the previous
    +    commit, the pseudo-pack order for the compacted MIDX is the same as
    +    concatenating the individual pseudo-pack orderings for each layer in the
    +    compaction range.
     
         As a result, the only non-test or documentation change necessary is to
         treat all objects as non-preferred during compaction so as not to
    @@ Documentation/git-multi-pack-index.adoc: SYNOPSIS
      'git multi-pack-index' [<options>] verify
      'git multi-pack-index' [<options>] expire
      'git multi-pack-index' [<options>] repack [--batch-size=<size>]
    +@@ Documentation/git-multi-pack-index.adoc: compact::
    + 	--incremental::
    + 		Write the result to a MIDX chain instead of writing a
    + 		stand-alone MIDX.
    ++
    ++	--[no-]bitmap::
    ++		Control whether or not a multi-pack bitmap is written.
    + --
    + 
    + verify::
     
      ## builtin/multi-pack-index.c ##
     @@
    @@ t/t5335-compact-multi-pack-index.sh: test_expect_success 'MIDX compaction with n
      			"$(nth_line 2 "$midx_chain")" \
      			"$(nth_line 4 "$midx_chain")" &&
      		test_line_count = 3 $midx_chain &&
    -@@ t/t5335-compact-multi-pack-index.sh: test_expect_success 'MIDX compaction with non-lex-ordered pack names' '
    +@@ t/t5335-compact-multi-pack-index.sh: test_expect_success 'MIDX compaction with midx.version=1' '
      	)
      '
      
    @@ t/t5335-compact-multi-pack-index.sh: test_expect_success 'MIDX compaction with n
     +	(
     +		cd midx-compact-preserve-selection &&
     +
    ++		git config maintenance.auto false &&
    ++
     +		test_commit A &&
     +		test_commit B &&
     +
    @@ t/t5335-compact-multi-pack-index.sh: test_expect_success 'MIDX compaction with n
     +	(
     +		cd midx-compact-with-bitmaps &&
     +
    ++		git config maintenance.auto false &&
    ++
     +		write_packs foo bar baz quux woot &&
     +
     +		test-tool read-midx --bitmap $objdir >bitmap.expect &&
    @@ t/t5335-compact-multi-pack-index.sh: test_expect_success 'MIDX compaction with n
     +	(
     +		cd midx-compact-with-bitmaps-non-trivial &&
     +
    ++		git config maintenance.auto false &&
    ++
     +		git branch -m main &&
     +
     +		#               D(4)
    @@ t/t5335-compact-multi-pack-index.sh: test_expect_success 'MIDX compaction with n
     +		git checkout main &&
     +		write_packs G &&
     +
    -+		cat $midx_chain &&
    -+
     +		# Compact layers 2-4, leaving us with:
     +		#
     +		#  [A, [B, C, D], E, F, G]
    @@ t/t5335-compact-multi-pack-index.sh: test_expect_success 'MIDX compaction with n
     +		#  [A, [B, C, D], E, [F, G]]
     +		git multi-pack-index compact --incremental --bitmap \
     +			"$(nth_line 4 "$midx_chain")" \
    -+			"$(nth_line 5 "$midx_chain")" &&
    -+
    -+		cat $midx_chain
    ++			"$(nth_line 5 "$midx_chain")"
     +	)
     +'
     +
-- 
2.52.0.457.gb599f1ad4b0

^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v2 01/18] midx: mark `get_midx_checksum()` arguments as const
  2026-01-14 19:54 ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Taylor Blau
@ 2026-01-14 19:54   ` Taylor Blau
  2026-01-14 19:54   ` [PATCH v2 02/18] midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()` Taylor Blau
                     ` (17 subsequent siblings)
  18 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-01-14 19:54 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

To make clear that the function `get_midx_checksum()` does not do
anything to modify its argument, mark the MIDX pointer as const.

The following commit will rename this function altogether to make clear
that it returns the raw bytes of the checksum, not a hex-encoded copy of
it.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 2 +-
 midx.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index b681b18fc19..4c90eb04c64 100644
--- a/midx.c
+++ b/midx.c
@@ -24,7 +24,7 @@ void clear_incremental_midx_files_ext(struct odb_source *source, const char *ext
 int cmp_idx_or_pack_name(const char *idx_or_pack_name,
 			 const char *idx_name);
 
-const unsigned char *get_midx_checksum(struct multi_pack_index *m)
+const unsigned char *get_midx_checksum(const struct multi_pack_index *m)
 {
 	return m->data + m->data_len - m->source->odb->repo->hash_algo->rawsz;
 }
diff --git a/midx.h b/midx.h
index 6e54d73503d..7c7e0b59121 100644
--- a/midx.h
+++ b/midx.h
@@ -85,7 +85,7 @@ struct multi_pack_index {
 #define MIDX_EXT_BITMAP "bitmap"
 #define MIDX_EXT_MIDX "midx"
 
-const unsigned char *get_midx_checksum(struct multi_pack_index *m);
+const unsigned char *get_midx_checksum(const struct multi_pack_index *m);
 void get_midx_filename(struct odb_source *source, struct strbuf *out);
 void get_midx_filename_ext(struct odb_source *source, struct strbuf *out,
 			   const unsigned char *hash, const char *ext);
-- 
2.52.0.457.gb599f1ad4b0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 02/18] midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()`
  2026-01-14 19:54 ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Taylor Blau
  2026-01-14 19:54   ` [PATCH v2 01/18] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
@ 2026-01-14 19:54   ` Taylor Blau
  2026-01-14 19:54   ` [PATCH v2 03/18] midx: introduce `midx_get_checksum_hex()` Taylor Blau
                     ` (16 subsequent siblings)
  18 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-01-14 19:54 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Since 541204aabea (Documentation: document naming schema for structs and
their functions, 2024-07-30), we have adopted a naming convention for
functions that would prefer a name like, say, `midx_get_checksum()` over
`get_midx_checksum()`.

Adopt this convention throughout the midx.h API. Since this function
returns a raw (that is, non-hex encoded) hash, let's suffix the function
with "_hash()" to make this clear. As a side effect, this prepares us
for the subsequent change which will introduce a "_hex()" variant that
encodes the checksum itself.

Suggested-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c              | 6 +++---
 midx.c                    | 2 +-
 midx.h                    | 2 +-
 pack-bitmap.c             | 8 ++++----
 pack-revindex.c           | 4 ++--
 t/helper/test-read-midx.c | 4 ++--
 6 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 6485cb67068..73d33752ef1 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -946,7 +946,7 @@ static int link_midx_to_chain(struct multi_pack_index *m)
 	}
 
 	for (i = 0; i < ARRAY_SIZE(midx_exts); i++) {
-		const unsigned char *hash = get_midx_checksum(m);
+		const unsigned char *hash = midx_get_checksum_hash(m);
 
 		get_midx_filename_ext(m->source, &from,
 				      hash, midx_exts[i].non_split);
@@ -1151,7 +1151,7 @@ static int write_midx_internal(struct odb_source *source,
 		while (m) {
 			if (flags & MIDX_WRITE_BITMAP && load_midx_revindex(m)) {
 				error(_("could not load reverse index for MIDX %s"),
-				      hash_to_hex_algop(get_midx_checksum(m),
+				      hash_to_hex_algop(midx_get_checksum_hash(m),
 							m->source->odb->repo->hash_algo));
 				goto cleanup;
 			}
@@ -1520,7 +1520,7 @@ static int write_midx_internal(struct odb_source *source,
 		for (uint32_t i = 0; i < ctx.num_multi_pack_indexes_before; i++) {
 			uint32_t j = ctx.num_multi_pack_indexes_before - i - 1;
 
-			keep_hashes[j] = xstrdup(hash_to_hex_algop(get_midx_checksum(m),
+			keep_hashes[j] = xstrdup(hash_to_hex_algop(midx_get_checksum_hash(m),
 								   r->hash_algo));
 			m = m->base_midx;
 		}
diff --git a/midx.c b/midx.c
index 4c90eb04c64..554bdfc505b 100644
--- a/midx.c
+++ b/midx.c
@@ -24,7 +24,7 @@ void clear_incremental_midx_files_ext(struct odb_source *source, const char *ext
 int cmp_idx_or_pack_name(const char *idx_or_pack_name,
 			 const char *idx_name);
 
-const unsigned char *get_midx_checksum(const struct multi_pack_index *m)
+const unsigned char *midx_get_checksum_hash(const struct multi_pack_index *m)
 {
 	return m->data + m->data_len - m->source->odb->repo->hash_algo->rawsz;
 }
diff --git a/midx.h b/midx.h
index 7c7e0b59121..62d6105195f 100644
--- a/midx.h
+++ b/midx.h
@@ -85,7 +85,7 @@ struct multi_pack_index {
 #define MIDX_EXT_BITMAP "bitmap"
 #define MIDX_EXT_MIDX "midx"
 
-const unsigned char *get_midx_checksum(const struct multi_pack_index *m);
+const unsigned char *midx_get_checksum_hash(const struct multi_pack_index *m);
 void get_midx_filename(struct odb_source *source, struct strbuf *out);
 void get_midx_filename_ext(struct odb_source *source, struct strbuf *out,
 			   const unsigned char *hash, const char *ext);
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 8ca79725b1d..90ae63e6804 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -441,11 +441,11 @@ char *midx_bitmap_filename(struct multi_pack_index *midx)
 	struct strbuf buf = STRBUF_INIT;
 	if (midx->has_chain)
 		get_split_midx_filename_ext(midx->source, &buf,
-					    get_midx_checksum(midx),
+					    midx_get_checksum_hash(midx),
 					    MIDX_EXT_BITMAP);
 	else
 		get_midx_filename_ext(midx->source, &buf,
-				      get_midx_checksum(midx),
+				      midx_get_checksum_hash(midx),
 				      MIDX_EXT_BITMAP);
 
 	return strbuf_detach(&buf, NULL);
@@ -502,7 +502,7 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
 	if (load_bitmap_header(bitmap_git) < 0)
 		goto cleanup;
 
-	if (!hasheq(get_midx_checksum(bitmap_git->midx), bitmap_git->checksum,
+	if (!hasheq(midx_get_checksum_hash(bitmap_git->midx), bitmap_git->checksum,
 		    bitmap_repo(bitmap_git)->hash_algo)) {
 		error(_("checksum doesn't match in MIDX and bitmap"));
 		goto cleanup;
@@ -2820,7 +2820,7 @@ void test_bitmap_walk(struct rev_info *revs)
 
 		if (bitmap_is_midx(found))
 			fprintf_ln(stderr, "Located via MIDX '%s'.",
-				   hash_to_hex_algop(get_midx_checksum(found->midx),
+				   hash_to_hex_algop(midx_get_checksum_hash(found->midx),
 						     revs->repo->hash_algo));
 		else
 			fprintf_ln(stderr, "Located via pack '%s'.",
diff --git a/pack-revindex.c b/pack-revindex.c
index 8598b941c8c..70731071ab6 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -390,11 +390,11 @@ int load_midx_revindex(struct multi_pack_index *m)
 
 	if (m->has_chain)
 		get_split_midx_filename_ext(m->source, &revindex_name,
-					    get_midx_checksum(m),
+					    midx_get_checksum_hash(m),
 					    MIDX_EXT_REV);
 	else
 		get_midx_filename_ext(m->source, &revindex_name,
-				      get_midx_checksum(m),
+				      midx_get_checksum_hash(m),
 				      MIDX_EXT_REV);
 
 	ret = load_revindex_from_disk(m->source->odb->repo->hash_algo,
diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index 6de5d1665af..b8fefb1a124 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -34,7 +34,7 @@ static int read_midx_file(const char *object_dir, const char *checksum,
 		return 1;
 
 	if (checksum) {
-		while (m && strcmp(hash_to_hex(get_midx_checksum(m)), checksum))
+		while (m && strcmp(hash_to_hex(midx_get_checksum_hash(m)), checksum))
 			m = m->base_midx;
 		if (!m)
 			return 1;
@@ -94,7 +94,7 @@ static int read_midx_checksum(const char *object_dir)
 	m = setup_midx(object_dir);
 	if (!m)
 		return 1;
-	printf("%s\n", hash_to_hex(get_midx_checksum(m)));
+	printf("%s\n", hash_to_hex(midx_get_checksum_hash(m)));
 
 	close_midx(m);
 	return 0;
-- 
2.52.0.457.gb599f1ad4b0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 03/18] midx: introduce `midx_get_checksum_hex()`
  2026-01-14 19:54 ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Taylor Blau
  2026-01-14 19:54   ` [PATCH v2 01/18] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
  2026-01-14 19:54   ` [PATCH v2 02/18] midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()` Taylor Blau
@ 2026-01-14 19:54   ` Taylor Blau
  2026-01-14 19:54   ` [PATCH v2 04/18] builtin/multi-pack-index.c: make '--progress' a common option Taylor Blau
                     ` (15 subsequent siblings)
  18 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-01-14 19:54 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

When trying to print out, say, the hexadecimal representation of a
MIDX's hash, our code will do something like:

    hash_to_hex_algop(midx_get_checksum_hash(m),
                      m->source->odb->repo->hash_algo);

, which is both cumbersome and repetitive. In fact, all but a handful of
callers to `midx_get_checksum_hash()` do exactly the above. Reduce the
repetitive nature of calling `midx_get_checksum_hash()` by having it
return a pointer into a static buffer containing the above result.

For the handful of callers that do need to compare the raw bytes and
don't want to deal with an encoded copy (e.g., because they are passing
it to hasheq() or similar), they may still rely on
`midx_get_checksum_hash()` which returns the raw bytes.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c              | 6 ++----
 midx.c                    | 6 ++++++
 midx.h                    | 1 +
 pack-bitmap.c             | 3 +--
 t/helper/test-read-midx.c | 4 ++--
 5 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 73d33752ef1..13171d7e9c4 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1151,8 +1151,7 @@ static int write_midx_internal(struct odb_source *source,
 		while (m) {
 			if (flags & MIDX_WRITE_BITMAP && load_midx_revindex(m)) {
 				error(_("could not load reverse index for MIDX %s"),
-				      hash_to_hex_algop(midx_get_checksum_hash(m),
-							m->source->odb->repo->hash_algo));
+				      midx_get_checksum_hex(m));
 				goto cleanup;
 			}
 			ctx.num_multi_pack_indexes_before++;
@@ -1520,8 +1519,7 @@ static int write_midx_internal(struct odb_source *source,
 		for (uint32_t i = 0; i < ctx.num_multi_pack_indexes_before; i++) {
 			uint32_t j = ctx.num_multi_pack_indexes_before - i - 1;
 
-			keep_hashes[j] = xstrdup(hash_to_hex_algop(midx_get_checksum_hash(m),
-								   r->hash_algo));
+			keep_hashes[j] = xstrdup(midx_get_checksum_hex(m));
 			m = m->base_midx;
 		}
 
diff --git a/midx.c b/midx.c
index 554bdfc505b..19ef230d3fd 100644
--- a/midx.c
+++ b/midx.c
@@ -24,6 +24,12 @@ void clear_incremental_midx_files_ext(struct odb_source *source, const char *ext
 int cmp_idx_or_pack_name(const char *idx_or_pack_name,
 			 const char *idx_name);
 
+const char *midx_get_checksum_hex(const struct multi_pack_index *m)
+{
+	return hash_to_hex_algop(midx_get_checksum_hash(m),
+				 m->source->odb->repo->hash_algo);
+}
+
 const unsigned char *midx_get_checksum_hash(const struct multi_pack_index *m)
 {
 	return m->data + m->data_len - m->source->odb->repo->hash_algo->rawsz;
diff --git a/midx.h b/midx.h
index 62d6105195f..a39bcc9d03f 100644
--- a/midx.h
+++ b/midx.h
@@ -85,6 +85,7 @@ struct multi_pack_index {
 #define MIDX_EXT_BITMAP "bitmap"
 #define MIDX_EXT_MIDX "midx"
 
+const char *midx_get_checksum_hex(const struct multi_pack_index *m) /* static buffer */;
 const unsigned char *midx_get_checksum_hash(const struct multi_pack_index *m);
 void get_midx_filename(struct odb_source *source, struct strbuf *out);
 void get_midx_filename_ext(struct odb_source *source, struct strbuf *out,
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 90ae63e6804..b688531fd2f 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2820,8 +2820,7 @@ void test_bitmap_walk(struct rev_info *revs)
 
 		if (bitmap_is_midx(found))
 			fprintf_ln(stderr, "Located via MIDX '%s'.",
-				   hash_to_hex_algop(midx_get_checksum_hash(found->midx),
-						     revs->repo->hash_algo));
+				   midx_get_checksum_hex(found->midx));
 		else
 			fprintf_ln(stderr, "Located via pack '%s'.",
 				   hash_to_hex_algop(found->pack->hash,
diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index b8fefb1a124..9d42c587564 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -34,7 +34,7 @@ static int read_midx_file(const char *object_dir, const char *checksum,
 		return 1;
 
 	if (checksum) {
-		while (m && strcmp(hash_to_hex(midx_get_checksum_hash(m)), checksum))
+		while (m && strcmp(midx_get_checksum_hex(m), checksum))
 			m = m->base_midx;
 		if (!m)
 			return 1;
@@ -94,7 +94,7 @@ static int read_midx_checksum(const char *object_dir)
 	m = setup_midx(object_dir);
 	if (!m)
 		return 1;
-	printf("%s\n", hash_to_hex(midx_get_checksum_hash(m)));
+	printf("%s\n", midx_get_checksum_hex(m));
 
 	close_midx(m);
 	return 0;
-- 
2.52.0.457.gb599f1ad4b0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 04/18] builtin/multi-pack-index.c: make '--progress' a common option
  2026-01-14 19:54 ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                     ` (2 preceding siblings ...)
  2026-01-14 19:54   ` [PATCH v2 03/18] midx: introduce `midx_get_checksum_hex()` Taylor Blau
@ 2026-01-14 19:54   ` Taylor Blau
  2026-01-14 19:54   ` [PATCH v2 05/18] git-multi-pack-index(1): remove non-existent incompatibility Taylor Blau
                     ` (14 subsequent siblings)
  18 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-01-14 19:54 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

All multi-pack-index sub-commands (write, verify, repack, and expire)
support a '--progress' command-line option, despite not listing it as
one of the common options in `common_opts`.

As a result each sub-command declares its own `OPT_BIT()` for a
"--progress" command-line option. Centralize this within the
`common_opts` to avoid re-declaring it in each sub-command.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.adoc |  2 ++
 builtin/multi-pack-index.c              | 10 ++--------
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-multi-pack-index.adoc b/Documentation/git-multi-pack-index.adoc
index 2f642697e9e..a4550e28bed 100644
--- a/Documentation/git-multi-pack-index.adoc
+++ b/Documentation/git-multi-pack-index.adoc
@@ -18,6 +18,8 @@ Write or verify a multi-pack-index (MIDX) file.
 OPTIONS
 -------
 
+The following command-line options are applicable to all sub-commands:
+
 --object-dir=<dir>::
 	Use given directory for the location of Git objects. We check
 	`<dir>/packs/multi-pack-index` for the current MIDX file, and
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 5f364aa816b..ca98d4c3ba3 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -84,6 +84,8 @@ static struct option common_opts[] = {
 	  N_("directory"),
 	  N_("object directory containing set of packfile and pack-index pairs"),
 	  parse_object_dir),
+	OPT_BIT(0, "progress", &opts.flags, N_("force progress reporting"),
+		MIDX_PROGRESS),
 	OPT_END(),
 };
 
@@ -138,8 +140,6 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
 			   N_("pack for reuse when computing a multi-pack bitmap")),
 		OPT_BIT(0, "bitmap", &opts.flags, N_("write multi-pack bitmap"),
 			MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
-		OPT_BIT(0, "progress", &opts.flags,
-			N_("force progress reporting"), MIDX_PROGRESS),
 		OPT_BIT(0, "incremental", &opts.flags,
 			N_("write a new incremental MIDX"), MIDX_WRITE_INCREMENTAL),
 		OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
@@ -200,8 +200,6 @@ static int cmd_multi_pack_index_verify(int argc, const char **argv,
 {
 	struct option *options;
 	static struct option builtin_multi_pack_index_verify_options[] = {
-		OPT_BIT(0, "progress", &opts.flags,
-			N_("force progress reporting"), MIDX_PROGRESS),
 		OPT_END(),
 	};
 	struct odb_source *source;
@@ -231,8 +229,6 @@ static int cmd_multi_pack_index_expire(int argc, const char **argv,
 {
 	struct option *options;
 	static struct option builtin_multi_pack_index_expire_options[] = {
-		OPT_BIT(0, "progress", &opts.flags,
-			N_("force progress reporting"), MIDX_PROGRESS),
 		OPT_END(),
 	};
 	struct odb_source *source;
@@ -264,8 +260,6 @@ static int cmd_multi_pack_index_repack(int argc, const char **argv,
 	static struct option builtin_multi_pack_index_repack_options[] = {
 		OPT_UNSIGNED(0, "batch-size", &opts.batch_size,
 		  N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")),
-		OPT_BIT(0, "progress", &opts.flags,
-		  N_("force progress reporting"), MIDX_PROGRESS),
 		OPT_END(),
 	};
 	struct odb_source *source;
-- 
2.52.0.457.gb599f1ad4b0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 05/18] git-multi-pack-index(1): remove non-existent incompatibility
  2026-01-14 19:54 ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                     ` (3 preceding siblings ...)
  2026-01-14 19:54   ` [PATCH v2 04/18] builtin/multi-pack-index.c: make '--progress' a common option Taylor Blau
@ 2026-01-14 19:54   ` Taylor Blau
  2026-01-14 19:54   ` [PATCH v2 06/18] git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h' Taylor Blau
                     ` (13 subsequent siblings)
  18 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-01-14 19:54 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Since fcb2205b774 (midx: implement support for writing incremental MIDX
chains, 2024-08-06), the command-line options '--incremental' and
'--bitmap' were declared to be incompatible with one another when
running 'git multi-pack-index write'.

However, since 27afc272c49 (midx: implement writing incremental MIDX
bitmaps, 2025-03-20), that incompatibility no longer exists, despite the
documentation saying so. Correct this by removing the stale reference to
their incompatibility.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.adoc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/git-multi-pack-index.adoc b/Documentation/git-multi-pack-index.adoc
index a4550e28bed..a502819fc38 100644
--- a/Documentation/git-multi-pack-index.adoc
+++ b/Documentation/git-multi-pack-index.adoc
@@ -75,7 +75,7 @@ marker).
 		Write an incremental MIDX file containing only objects
 		and packs not present in an existing MIDX layer.
 		Migrates non-incremental MIDXs to incremental ones when
-		necessary. Incompatible with `--bitmap`.
+		necessary.
 --
 
 verify::
-- 
2.52.0.457.gb599f1ad4b0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 06/18] git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h'
  2026-01-14 19:54 ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                     ` (4 preceding siblings ...)
  2026-01-14 19:54   ` [PATCH v2 05/18] git-multi-pack-index(1): remove non-existent incompatibility Taylor Blau
@ 2026-01-14 19:54   ` Taylor Blau
  2026-01-14 19:54   ` [PATCH v2 07/18] t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39 Taylor Blau
                     ` (12 subsequent siblings)
  18 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-01-14 19:54 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Since c39fffc1c90 (tests: start asserting that *.txt SYNOPSIS matches -h
output, 2022-10-13), the manual page for 'git multi-pack-index' has a
SYNOPSIS section which differs from 'git multi-pack-index -h'.

Correct this while also documenting additional options accepted by the
'write' sub-command.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.adoc | 7 ++++++-
 builtin/multi-pack-index.c              | 5 +++--
 t/t0450/adoc-help-mismatches            | 1 -
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-multi-pack-index.adoc b/Documentation/git-multi-pack-index.adoc
index a502819fc38..164cf1f2291 100644
--- a/Documentation/git-multi-pack-index.adoc
+++ b/Documentation/git-multi-pack-index.adoc
@@ -9,7 +9,12 @@ git-multi-pack-index - Write and verify multi-pack-indexes
 SYNOPSIS
 --------
 [verse]
-'git multi-pack-index' [--object-dir=<dir>] [--[no-]bitmap] <sub-command>
+'git multi-pack-index' [<options>] write [--preferred-pack=<pack>]
+		         [--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]
+		         [--refs-snapshot=<path>]
+'git multi-pack-index' [<options>] verify
+'git multi-pack-index' [<options>] expire
+'git multi-pack-index' [<options>] repack [--batch-size=<size>]
 
 DESCRIPTION
 -----------
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index ca98d4c3ba3..c0c6c1760c0 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -13,8 +13,9 @@
 #include "repository.h"
 
 #define BUILTIN_MIDX_WRITE_USAGE \
-	N_("git multi-pack-index [<options>] write [--preferred-pack=<pack>]" \
-	   "[--refs-snapshot=<path>]")
+	N_("git multi-pack-index [<options>] write [--preferred-pack=<pack>]\n" \
+	   "  [--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]\n" \
+	   "  [--refs-snapshot=<path>]")
 
 #define BUILTIN_MIDX_VERIFY_USAGE \
 	N_("git multi-pack-index [<options>] verify")
diff --git a/t/t0450/adoc-help-mismatches b/t/t0450/adoc-help-mismatches
index 8ee2d3f7c81..e8d6c13ccd0 100644
--- a/t/t0450/adoc-help-mismatches
+++ b/t/t0450/adoc-help-mismatches
@@ -33,7 +33,6 @@ merge
 merge-file
 merge-index
 merge-one-file
-multi-pack-index
 name-rev
 notes
 push
-- 
2.52.0.457.gb599f1ad4b0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 07/18] t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39
  2026-01-14 19:54 ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                     ` (5 preceding siblings ...)
  2026-01-14 19:54   ` [PATCH v2 06/18] git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h' Taylor Blau
@ 2026-01-14 19:54   ` Taylor Blau
  2026-01-14 19:54   ` [PATCH v2 08/18] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos` Taylor Blau
                     ` (11 subsequent siblings)
  18 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-01-14 19:54 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Commit d4bf1d88b90 (multi-pack-index: verify missing pack, 2018-09-13)
adds a new test to the MIDX test script to test how we handle missing
packs.

While the commit itself describes the test as "verify missing pack[s]",
the test itself is actually called "verify packnames out of order",
despite that not being what it tests.

Likely this was a copy-and-paste of the test immediately above it of the
same name. Correct this by renaming the test to match the commit
message.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5319-multi-pack-index.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index faae98c7e76..efeab4d22b7 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -517,7 +517,7 @@ test_expect_success 'verify packnames out of order' '
 		"pack names out of order"
 '
 
-test_expect_success 'verify packnames out of order' '
+test_expect_success 'verify missing pack' '
 	corrupt_midx_and_verify $MIDX_BYTE_PACKNAME_ORDER "a" $objdir \
 		"failed to load pack"
 '
-- 
2.52.0.457.gb599f1ad4b0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 08/18] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos`
  2026-01-14 19:54 ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                     ` (6 preceding siblings ...)
  2026-01-14 19:54   ` [PATCH v2 07/18] t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39 Taylor Blau
@ 2026-01-14 19:54   ` Taylor Blau
  2026-01-14 21:13     ` Junio C Hamano
  2026-01-14 19:54   ` [PATCH v2 09/18] midx-write.c: introduce `struct write_midx_opts` Taylor Blau
                     ` (10 subsequent siblings)
  18 siblings, 1 reply; 99+ messages in thread
From: Taylor Blau @ 2026-01-14 19:54 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

In midx_pack_order(), we compute for each bitmapped pack the first bit
to correspond to an object in that pack, along with how many bits were
assigned to object(s) in that pack.

Initially, each bitmap_nr value is set to zero, and each bitmap_pos
value is set to the sentinel BITMAP_POS_UNKNOWN. This is done to ensure
that there are no packs who have an unknown bit position but a somehow
non-zero number of objects (cf. `write_midx_bitmapped_packs()` in
midx-write.c).

Once the pack order is fully determined, midx_pack_order() sets the
bitmap_pos field for any bitmapped packs to zero if they are still
listed as BITMAP_POS_UNKNOWN.

However, we enumerate the bitmapped packs in order of `ctx->pack_perm`.
This is fine for existing cases, since the only time the
`ctx->pack_perm` array holds a value outside of the addressable range of
`ctx->info` is when there are expired packs, which only occurs via 'git
multi-pack-index expire', which does not support writing MIDX bitmaps.
As a result, the range of ctx->pack_perm covers all values in [0,
`ctx->nr`), so enumerating in this order isn't an issue.

A future change necessary for compaction will complicate this further by
introducing a wrapper around the `ctx->pack_perm` array, which turns the
given `pack_int_id` into one that is relative to the lower end of the
compaction range. As a result, indexing into `ctx->pack_perm` through
this helper, say, with "0" will produce a crash when the lower end of
the compaction range has >0 pack(s) in its base layer, since the
subtraction will wrap around the 32-bit unsigned range, resulting in an
uninitialized read.

But the process is completely unnecessary in the first place: we are
enumerating all values of `ctx->info`, and there is no reason to process
them in a different order than they appear in memory. Index `ctx->info`
directly to reflect that.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/midx-write.c b/midx-write.c
index 13171d7e9c4..da9c5a7c295 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -637,7 +637,7 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx)
 		pack_order[i] = data[i].nr;
 	}
 	for (i = 0; i < ctx->nr; i++) {
-		struct pack_info *pack = &ctx->info[ctx->pack_perm[i]];
+		struct pack_info *pack = &ctx->info[i];
 		if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
 			pack->bitmap_pos = 0;
 	}
-- 
2.52.0.457.gb599f1ad4b0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 09/18] midx-write.c: introduce `struct write_midx_opts`
  2026-01-14 19:54 ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                     ` (7 preceding siblings ...)
  2026-01-14 19:54   ` [PATCH v2 08/18] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos` Taylor Blau
@ 2026-01-14 19:54   ` Taylor Blau
  2026-01-14 19:54   ` [PATCH v2 10/18] midx: do not require packs to be sorted in lexicographic order Taylor Blau
                     ` (9 subsequent siblings)
  18 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-01-14 19:54 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

In the MIDX writing code, there are four functions which perform some
sort of MIDX write operation. They are:

 - write_midx_file()
 - write_midx_file_only()
 - expire_midx_packs()
 - midx_repack()

All of these functions are thin wrappers over `write_midx_internal()`,
which implements the bulk of these routines. As a result, the
`write_midx_internal()` function takes six arguments.

Future commits in this series will want to add additional arguments, and
in general this function's signature will be the union of parameters
among *all* possible ways to write a MIDX.

Instead of adding yet more arguments to this function to support MIDX
compaction, introduce a `struct write_midx_opts`, which has the same
struct members as `write_midx_internal()`'s arguments.

Adding additional fields to the `write_midx_opts` struct is preferable
to adding additional arguments to `write_midx_internal()`. This is
because the callers below all zero-initialize the struct, so each time
we add a new piece of information, we do not have to pass the zero value
for it in all other call-sites that do not care about it.

For now, no functional changes are included in this patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 135 ++++++++++++++++++++++++++++++---------------------
 1 file changed, 81 insertions(+), 54 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index da9c5a7c295..8a54644e427 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1078,14 +1078,20 @@ static bool midx_needs_update(struct multi_pack_index *midx, struct write_midx_c
 	return needed;
 }
 
-static int write_midx_internal(struct odb_source *source,
-			       struct string_list *packs_to_include,
-			       struct string_list *packs_to_drop,
-			       const char *preferred_pack_name,
-			       const char *refs_snapshot,
-			       unsigned flags)
+struct write_midx_opts {
+	struct odb_source *source; /* non-optional */
+
+	struct string_list *packs_to_include;
+	struct string_list *packs_to_drop;
+
+	const char *preferred_pack_name;
+	const char *refs_snapshot;
+	unsigned flags;
+};
+
+static int write_midx_internal(struct write_midx_opts *opts)
 {
-	struct repository *r = source->odb->repo;
+	struct repository *r = opts->source->odb->repo;
 	struct strbuf midx_name = STRBUF_INIT;
 	unsigned char midx_hash[GIT_MAX_RAWSZ];
 	uint32_t start_pack;
@@ -1106,22 +1112,22 @@ static int write_midx_internal(struct odb_source *source,
 	trace2_region_enter("midx", "write_midx_internal", r);
 
 	ctx.repo = r;
-	ctx.source = source;
+	ctx.source = opts->source;
 
-	ctx.incremental = !!(flags & MIDX_WRITE_INCREMENTAL);
+	ctx.incremental = !!(opts->flags & MIDX_WRITE_INCREMENTAL);
 
 	if (ctx.incremental)
 		strbuf_addf(&midx_name,
 			    "%s/pack/multi-pack-index.d/tmp_midx_XXXXXX",
-			    source->path);
+			    opts->source->path);
 	else
-		get_midx_filename(source, &midx_name);
+		get_midx_filename(opts->source, &midx_name);
 	if (safe_create_leading_directories(r, midx_name.buf))
 		die_errno(_("unable to create leading directories of %s"),
 			  midx_name.buf);
 
-	if (!packs_to_include || ctx.incremental) {
-		struct multi_pack_index *m = get_multi_pack_index(source);
+	if (!opts->packs_to_include || ctx.incremental) {
+		struct multi_pack_index *m = get_multi_pack_index(opts->source);
 		if (m && !midx_checksum_valid(m)) {
 			warning(_("ignoring existing multi-pack-index; checksum mismatch"));
 			m = NULL;
@@ -1136,7 +1142,7 @@ static int write_midx_internal(struct odb_source *source,
 			 */
 			if (ctx.incremental)
 				ctx.base_midx = m;
-			else if (!packs_to_include)
+			else if (!opts->packs_to_include)
 				ctx.m = m;
 		}
 	}
@@ -1149,7 +1155,7 @@ static int write_midx_internal(struct odb_source *source,
 	if (ctx.incremental) {
 		struct multi_pack_index *m = ctx.base_midx;
 		while (m) {
-			if (flags & MIDX_WRITE_BITMAP && load_midx_revindex(m)) {
+			if (opts->flags & MIDX_WRITE_BITMAP && load_midx_revindex(m)) {
 				error(_("could not load reverse index for MIDX %s"),
 				      midx_get_checksum_hex(m));
 				goto cleanup;
@@ -1164,18 +1170,18 @@ static int write_midx_internal(struct odb_source *source,
 	start_pack = ctx.nr;
 
 	ctx.pack_paths_checked = 0;
-	if (flags & MIDX_PROGRESS)
+	if (opts->flags & MIDX_PROGRESS)
 		ctx.progress = start_delayed_progress(r,
 						      _("Adding packfiles to multi-pack-index"), 0);
 	else
 		ctx.progress = NULL;
 
-	ctx.to_include = packs_to_include;
+	ctx.to_include = opts->packs_to_include;
 
-	for_each_file_in_pack_dir(source->path, add_pack_to_midx, &ctx);
+	for_each_file_in_pack_dir(opts->source->path, add_pack_to_midx, &ctx);
 	stop_progress(&ctx.progress);
 
-	if (!packs_to_drop) {
+	if (!opts->packs_to_drop) {
 		/*
 		 * If there is no MIDX then either it doesn't exist, or we're
 		 * doing a geometric repack. Try to load it from the source to
@@ -1188,7 +1194,7 @@ static int write_midx_internal(struct odb_source *source,
 		if (midx && !midx_needs_update(midx, &ctx)) {
 			struct bitmap_index *bitmap_git;
 			int bitmap_exists;
-			int want_bitmap = flags & MIDX_WRITE_BITMAP;
+			int want_bitmap = opts->flags & MIDX_WRITE_BITMAP;
 
 			bitmap_git = prepare_midx_bitmap_git(midx);
 			bitmap_exists = bitmap_git && bitmap_is_midx(bitmap_git);
@@ -1200,7 +1206,7 @@ static int write_midx_internal(struct odb_source *source,
 				 * corresponding bitmap (or one wasn't requested).
 				 */
 				if (!want_bitmap)
-					clear_midx_files_ext(source, "bitmap", NULL);
+					clear_midx_files_ext(ctx.source, "bitmap", NULL);
 				result = 0;
 				goto cleanup;
 			}
@@ -1215,11 +1221,11 @@ static int write_midx_internal(struct odb_source *source,
 		goto cleanup; /* nothing to do */
 	}
 
-	if (preferred_pack_name) {
+	if (opts->preferred_pack_name) {
 		ctx.preferred_pack_idx = NO_PREFERRED_PACK;
 
 		for (size_t i = 0; i < ctx.nr; i++) {
-			if (!cmp_idx_or_pack_name(preferred_pack_name,
+			if (!cmp_idx_or_pack_name(opts->preferred_pack_name,
 						  ctx.info[i].pack_name)) {
 				ctx.preferred_pack_idx = i;
 				break;
@@ -1228,9 +1234,9 @@ static int write_midx_internal(struct odb_source *source,
 
 		if (ctx.preferred_pack_idx == NO_PREFERRED_PACK)
 			warning(_("unknown preferred pack: '%s'"),
-				preferred_pack_name);
+				opts->preferred_pack_name);
 	} else if (ctx.nr &&
-		   (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))) {
+		   (opts->flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))) {
 		struct packed_git *oldest = ctx.info[0].p;
 		ctx.preferred_pack_idx = 0;
 
@@ -1241,7 +1247,7 @@ static int write_midx_internal(struct odb_source *source,
 		 */
 		open_pack_index(oldest);
 
-		if (packs_to_drop && packs_to_drop->nr)
+		if (opts->packs_to_drop && opts->packs_to_drop->nr)
 			BUG("cannot write a MIDX bitmap during expiration");
 
 		/*
@@ -1303,20 +1309,21 @@ static int write_midx_internal(struct odb_source *source,
 
 	QSORT(ctx.info, ctx.nr, pack_info_compare);
 
-	if (packs_to_drop && packs_to_drop->nr) {
+	if (opts->packs_to_drop && opts->packs_to_drop->nr) {
 		size_t drop_index = 0;
 		int missing_drops = 0;
 
-		for (size_t i = 0; i < ctx.nr && drop_index < packs_to_drop->nr; i++) {
+		for (size_t i = 0;
+		     i < ctx.nr && drop_index < opts->packs_to_drop->nr; i++) {
 			int cmp = strcmp(ctx.info[i].pack_name,
-					 packs_to_drop->items[drop_index].string);
+					 opts->packs_to_drop->items[drop_index].string);
 
 			if (!cmp) {
 				drop_index++;
 				ctx.info[i].expired = 1;
 			} else if (cmp > 0) {
 				error(_("did not see pack-file %s to drop"),
-				      packs_to_drop->items[drop_index].string);
+				      opts->packs_to_drop->items[drop_index].string);
 				drop_index++;
 				missing_drops++;
 				i--;
@@ -1353,8 +1360,8 @@ static int write_midx_internal(struct odb_source *source,
 	}
 
 	/* Check that the preferred pack wasn't expired (if given). */
-	if (preferred_pack_name) {
-		struct pack_info *preferred = bsearch(preferred_pack_name,
+	if (opts->preferred_pack_name) {
+		struct pack_info *preferred = bsearch(opts->preferred_pack_name,
 						      ctx.info, ctx.nr,
 						      sizeof(*ctx.info),
 						      idx_or_pack_name_cmp);
@@ -1362,7 +1369,7 @@ static int write_midx_internal(struct odb_source *source,
 			uint32_t perm = ctx.pack_perm[preferred->orig_pack_int_id];
 			if (perm == PACK_EXPIRED)
 				warning(_("preferred pack '%s' is expired"),
-					preferred_pack_name);
+					opts->preferred_pack_name);
 		}
 	}
 
@@ -1376,15 +1383,15 @@ static int write_midx_internal(struct odb_source *source,
 	}
 
 	if (!ctx.entries_nr) {
-		if (flags & MIDX_WRITE_BITMAP)
+		if (opts->flags & MIDX_WRITE_BITMAP)
 			warning(_("refusing to write multi-pack .bitmap without any objects"));
-		flags &= ~(MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP);
+		opts->flags &= ~(MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP);
 	}
 
 	if (ctx.incremental) {
 		struct strbuf lock_name = STRBUF_INIT;
 
-		get_midx_chain_filename(source, &lock_name);
+		get_midx_chain_filename(opts->source, &lock_name);
 		hold_lock_file_for_update(&lk, lock_name.buf, LOCK_DIE_ON_ERROR);
 		strbuf_release(&lock_name);
 
@@ -1427,7 +1434,7 @@ static int write_midx_internal(struct odb_source *source,
 				MIDX_CHUNK_LARGE_OFFSET_WIDTH),
 			write_midx_large_offsets);
 
-	if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP)) {
+	if (opts->flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP)) {
 		ctx.pack_order = midx_pack_order(&ctx);
 		add_chunk(cf, MIDX_CHUNKID_REVINDEX,
 			  st_mult(ctx.entries_nr, sizeof(uint32_t)),
@@ -1445,11 +1452,11 @@ static int write_midx_internal(struct odb_source *source,
 			  CSUM_FSYNC | CSUM_HASH_IN_STREAM);
 	free_chunkfile(cf);
 
-	if (flags & MIDX_WRITE_REV_INDEX &&
+	if (opts->flags & MIDX_WRITE_REV_INDEX &&
 	    git_env_bool("GIT_TEST_MIDX_WRITE_REV", 0))
 		write_midx_reverse_index(&ctx, midx_hash);
 
-	if (flags & MIDX_WRITE_BITMAP) {
+	if (opts->flags & MIDX_WRITE_BITMAP) {
 		struct packing_data pdata;
 		struct commit_stack commits = COMMIT_STACK_INIT;
 
@@ -1458,7 +1465,7 @@ static int write_midx_internal(struct odb_source *source,
 
 		prepare_midx_packing_data(&pdata, &ctx);
 
-		find_commits_for_midx_bitmap(&commits, refs_snapshot, &ctx);
+		find_commits_for_midx_bitmap(&commits, opts->refs_snapshot, &ctx);
 
 		/*
 		 * The previous steps translated the information from
@@ -1469,8 +1476,8 @@ static int write_midx_internal(struct odb_source *source,
 		FREE_AND_NULL(ctx.entries);
 		ctx.entries_nr = 0;
 
-		if (write_midx_bitmap(&ctx, midx_hash, &pdata,
-				      commits.items, commits.nr, flags) < 0) {
+		if (write_midx_bitmap(&ctx, midx_hash, &pdata, commits.items,
+				      commits.nr, opts->flags) < 0) {
 			error(_("could not write multi-pack bitmap"));
 			clear_packing_data(&pdata);
 			commit_stack_clear(&commits);
@@ -1503,7 +1510,7 @@ static int write_midx_internal(struct odb_source *source,
 		if (link_midx_to_chain(ctx.base_midx) < 0)
 			goto cleanup;
 
-		get_split_midx_filename_ext(source, &final_midx_name,
+		get_split_midx_filename_ext(opts->source, &final_midx_name,
 					    midx_hash, MIDX_EXT_MIDX);
 
 		if (rename_tempfile(&incr, final_midx_name.buf) < 0) {
@@ -1536,7 +1543,7 @@ static int write_midx_internal(struct odb_source *source,
 	if (commit_lock_file(&lk) < 0)
 		die_errno(_("could not write multi-pack-index"));
 
-	clear_midx_files(source, keep_hashes,
+	clear_midx_files(opts->source, keep_hashes,
 			 ctx.num_multi_pack_indexes_before + 1,
 			 ctx.incremental);
 	result = 0;
@@ -1571,9 +1578,14 @@ int write_midx_file(struct odb_source *source,
 		    const char *preferred_pack_name,
 		    const char *refs_snapshot, unsigned flags)
 {
-	return write_midx_internal(source, NULL, NULL,
-				   preferred_pack_name, refs_snapshot,
-				   flags);
+	struct write_midx_opts opts = {
+		.source = source,
+		.preferred_pack_name = preferred_pack_name,
+		.refs_snapshot = refs_snapshot,
+		.flags = flags,
+	};
+
+	return write_midx_internal(&opts);
 }
 
 int write_midx_file_only(struct odb_source *source,
@@ -1581,8 +1593,15 @@ int write_midx_file_only(struct odb_source *source,
 			 const char *preferred_pack_name,
 			 const char *refs_snapshot, unsigned flags)
 {
-	return write_midx_internal(source, packs_to_include, NULL,
-				   preferred_pack_name, refs_snapshot, flags);
+	struct write_midx_opts opts = {
+		.source = source,
+		.packs_to_include = packs_to_include,
+		.preferred_pack_name = preferred_pack_name,
+		.refs_snapshot = refs_snapshot,
+		.flags = flags,
+	};
+
+	return write_midx_internal(&opts);
 }
 
 int expire_midx_packs(struct odb_source *source, unsigned flags)
@@ -1641,9 +1660,14 @@ int expire_midx_packs(struct odb_source *source, unsigned flags)
 
 	free(count);
 
-	if (packs_to_drop.nr)
-		result = write_midx_internal(source, NULL,
-					     &packs_to_drop, NULL, NULL, flags);
+	if (packs_to_drop.nr) {
+		struct write_midx_opts opts = {
+			.source = source,
+			.packs_to_drop = &packs_to_drop,
+			.flags = flags & MIDX_PROGRESS,
+		};
+		result = write_midx_internal(&opts);
+	}
 
 	string_list_clear(&packs_to_drop, 0);
 
@@ -1776,6 +1800,10 @@ int midx_repack(struct odb_source *source, size_t batch_size, unsigned flags)
 	struct child_process cmd = CHILD_PROCESS_INIT;
 	FILE *cmd_in;
 	struct multi_pack_index *m = get_multi_pack_index(source);
+	struct write_midx_opts opts = {
+		.source = source,
+		.flags = flags,
+	};
 
 	/*
 	 * When updating the default for these configuration
@@ -1850,8 +1878,7 @@ int midx_repack(struct odb_source *source, size_t batch_size, unsigned flags)
 		goto cleanup;
 	}
 
-	result = write_midx_internal(source, NULL, NULL, NULL, NULL,
-				     flags);
+	result = write_midx_internal(&opts);
 
 cleanup:
 	free(include_pack);
-- 
2.52.0.457.gb599f1ad4b0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 10/18] midx: do not require packs to be sorted in lexicographic order
  2026-01-14 19:54 ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                     ` (8 preceding siblings ...)
  2026-01-14 19:54   ` [PATCH v2 09/18] midx-write.c: introduce `struct write_midx_opts` Taylor Blau
@ 2026-01-14 19:54   ` Taylor Blau
  2026-01-14 21:28     ` Junio C Hamano
  2026-01-27  7:34     ` Patrick Steinhardt
  2026-01-14 19:54   ` [PATCH v2 11/18] git-compat-util.h: introduce `u32_add()` Taylor Blau
                     ` (8 subsequent siblings)
  18 siblings, 2 replies; 99+ messages in thread
From: Taylor Blau @ 2026-01-14 19:54 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

The MIDX file format currently requires that pack files be identified by
the lexicographic ordering of their names (that is, a pack having a
checksum beginning with "abc" would have a numeric pack_int_id which is
smaller than the same value for a pack beginning with "bcd").

As a result, it is impossible to combine adjacent MIDX layers together
without permuting bits from bitmaps that are in more recent layer(s).

To see why, consider the following example:

          | packs       | preferred pack
  --------+-------------+---------------
  MIDX #0 | { X, Y, Z } | Y
  MIDX #1 | { A, B, C } | B
  MIDX #2 | { D, E, F } | D

, where MIDX #2's base MIDX is MIDX #1, and so on. Suppose that we want
to combine MIDX layers #0 and #1, to create a new layer #0' containing
the packs from both layers. With the original three MIDX layers, objects
are laid out in the bitmap in the order they appear in their source
pack, and the packs themselves are arranged according to the pseudo-pack
order. In this case, that ordering is Y, X, Z, B, A, C.

But recall that the pseudo-pack ordering is defined by the order that
packs appear in the MIDX, with the exception of the preferred pack,
which sorts ahead of all other packs regardless of its position within
the MIDX. In the above example, that means that pack 'Y' could be placed
anywhere (so long as it is designated as preferred), however, all other
packs must be placed in the location listed above.

Because that ordering isn't sorted lexicographically, it is impossible
to compact MIDX layers in the above configuration without permuting the
object-to-bit-position mapping. Changing this mapping would affect all
bitmaps belonging to newer layers, rendering the bitmaps associated with
MIDX #2 unreadable.

One of the goals of MIDX compaction is that we are able to shrink the
length of the MIDX chain *without* invalidating bitmaps that belong to
newer layers, and the lexicographic ordering constraint is at odds with
this goal.

However, packs do not *need* to be lexicographically ordered within the
MIDX. As far as I can gather, the only reason they are sorted lexically
is to make it possible to perform a binary search over the pack names in
a MIDX, necessary to make `midx_contains_pack()`'s performance
logarithmic in the number of packs rather than linear.

Relax this constraint by allowing MIDX writes to proceed with packs that
are not arranged in lexicographic order. `midx_contains_pack()` will
lazily instantiate a `pack_names_sorted` array on the MIDX, which will
be used to implement the binary search over pack names.

Because this change produces MIDXs which may not be correctly read with
external tools or older versions of Git. Though older versions of Git
know how to gracefully degrade and ignore any MIDX(s) they consider
corrupt, external tools may not be as robust. To avoid unintentionally
breaking any such tools, guard this change behind a version bump in the
MIDX's on-disk format.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/gitformat-pack.adoc |  6 ++++--
 midx-write.c                      | 26 ++++++++++++++++++++++----
 midx.c                            | 31 ++++++++++++++++++++++++++++---
 midx.h                            |  4 +++-
 t/t5319-multi-pack-index.sh       | 16 ++++++++++------
 5 files changed, 67 insertions(+), 16 deletions(-)

diff --git a/Documentation/gitformat-pack.adoc b/Documentation/gitformat-pack.adoc
index 1b4db4aa611..5be2206b20b 100644
--- a/Documentation/gitformat-pack.adoc
+++ b/Documentation/gitformat-pack.adoc
@@ -374,7 +374,7 @@ HEADER:
 	    The signature is: {'M', 'I', 'D', 'X'}
 
 	1-byte version number:
-	    Git only writes or recognizes version 1.
+	    Git only writes version 2, but recognizes versions 1 and 2.
 
 	1-byte Object Id Version
 	    We infer the length of object IDs (OIDs) from this value:
@@ -413,7 +413,9 @@ CHUNK DATA:
 	    strings. There is no extra padding between the filenames,
 	    and they are listed in lexicographic order. The chunk itself
 	    is padded at the end with between 0 and 3 NUL bytes to make the
-	    chunk size a multiple of 4 bytes.
+	    chunk size a multiple of 4 bytes. Version 1 MIDXs are required to
+	    list their packs in lexicographic order, but version 2 MIDXs may
+	    list their packs in any arbitrary order.
 
 	Bitmapped Packfiles (ID: {'B', 'T', 'M', 'P'})
 	    Stores a table of two 4-byte unsigned integers in network order.
diff --git a/midx-write.c b/midx-write.c
index 8a54644e427..5c8700065a1 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -36,10 +36,13 @@ extern int cmp_idx_or_pack_name(const char *idx_or_pack_name,
 
 static size_t write_midx_header(const struct git_hash_algo *hash_algo,
 				struct hashfile *f, unsigned char num_chunks,
-				uint32_t num_packs)
+				uint32_t num_packs, int version)
 {
+	if (version != MIDX_VERSION_V1 && version != MIDX_VERSION_V2)
+		BUG("unexpected MIDX version: %d", version);
+
 	hashwrite_be32(f, MIDX_SIGNATURE);
-	hashwrite_u8(f, MIDX_VERSION);
+	hashwrite_u8(f, version);
 	hashwrite_u8(f, oid_version(hash_algo));
 	hashwrite_u8(f, num_chunks);
 	hashwrite_u8(f, 0); /* unused */
@@ -105,6 +108,8 @@ struct write_midx_context {
 
 	uint32_t preferred_pack_idx;
 
+	int version; /* must be MIDX_VERSION_V1 or _V2 */
+
 	int incremental;
 	uint32_t num_multi_pack_indexes_before;
 
@@ -410,7 +415,9 @@ static int write_midx_pack_names(struct hashfile *f, void *data)
 		if (ctx->info[i].expired)
 			continue;
 
-		if (i && strcmp(ctx->info[i].pack_name, ctx->info[i - 1].pack_name) <= 0)
+		if (ctx->version == MIDX_VERSION_V1 &&
+		    i && strcmp(ctx->info[i].pack_name,
+				ctx->info[i - 1].pack_name) <= 0)
 			BUG("incorrect pack-file order: %s before %s",
 			    ctx->info[i - 1].pack_name,
 			    ctx->info[i].pack_name);
@@ -1025,6 +1032,12 @@ static bool midx_needs_update(struct multi_pack_index *midx, struct write_midx_c
 	if (!midx_checksum_valid(midx))
 		goto out;
 
+	/*
+	 * If the version differs, we need to update.
+	 */
+	if (midx->version != ctx->version)
+		goto out;
+
 	/*
 	 * Ignore incremental updates for now. The assumption is that any
 	 * incremental update would be either empty (in which case we will bail
@@ -1100,6 +1113,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
 	struct tempfile *incr;
 	struct write_midx_context ctx = {
 		.preferred_pack_idx = NO_PREFERRED_PACK,
+		.version = MIDX_VERSION_V2,
 	 };
 	struct multi_pack_index *midx_to_free = NULL;
 	int bitmapped_packs_concat_len = 0;
@@ -1114,6 +1128,10 @@ static int write_midx_internal(struct write_midx_opts *opts)
 	ctx.repo = r;
 	ctx.source = opts->source;
 
+	repo_config_get_int(ctx.repo, "midx.version", &ctx.version);
+	if (ctx.version != MIDX_VERSION_V1 && ctx.version != MIDX_VERSION_V2)
+		die(_("unknown MIDX version: %d"), ctx.version);
+
 	ctx.incremental = !!(opts->flags & MIDX_WRITE_INCREMENTAL);
 
 	if (ctx.incremental)
@@ -1445,7 +1463,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
 	}
 
 	write_midx_header(r->hash_algo, f, get_num_chunks(cf),
-			  ctx.nr - dropped_packs);
+			  ctx.nr - dropped_packs, ctx.version);
 	write_chunkfile(cf, &ctx);
 
 	finalize_hashfile(f, midx_hash, FSYNC_COMPONENT_PACK_METADATA,
diff --git a/midx.c b/midx.c
index 19ef230d3fd..1327d0a3695 100644
--- a/midx.c
+++ b/midx.c
@@ -149,7 +149,7 @@ static struct multi_pack_index *load_multi_pack_index_one(struct odb_source *sou
 		      m->signature, MIDX_SIGNATURE);
 
 	m->version = m->data[MIDX_BYTE_FILE_VERSION];
-	if (m->version != MIDX_VERSION)
+	if (m->version != MIDX_VERSION_V1 && m->version != MIDX_VERSION_V2)
 		die(_("multi-pack-index version %d not recognized"),
 		      m->version);
 
@@ -210,7 +210,8 @@ static struct multi_pack_index *load_multi_pack_index_one(struct odb_source *sou
 			die(_("multi-pack-index pack-name chunk is too short"));
 		cur_pack_name = end + 1;
 
-		if (i && strcmp(m->pack_names[i], m->pack_names[i - 1]) <= 0)
+		if (m->version == MIDX_VERSION_V1 &&
+		    i && strcmp(m->pack_names[i], m->pack_names[i - 1]) <= 0)
 			die(_("multi-pack-index pack names out of order: '%s' before '%s'"),
 			      m->pack_names[i - 1],
 			      m->pack_names[i]);
@@ -411,6 +412,7 @@ void close_midx(struct multi_pack_index *m)
 	}
 	FREE_AND_NULL(m->packs);
 	FREE_AND_NULL(m->pack_names);
+	FREE_AND_NULL(m->pack_names_sorted);
 	free(m);
 }
 
@@ -656,17 +658,40 @@ int cmp_idx_or_pack_name(const char *idx_or_pack_name,
 	return strcmp(idx_or_pack_name, idx_name);
 }
 
+
+static int midx_pack_names_cmp(const void *a, const void *b, void *m_)
+{
+	struct multi_pack_index *m = m_;
+	return strcmp(m->pack_names[*(const size_t *)a],
+		      m->pack_names[*(const size_t *)b]);
+}
+
 static int midx_contains_pack_1(struct multi_pack_index *m,
 				const char *idx_or_pack_name)
 {
 	uint32_t first = 0, last = m->num_packs;
 
+	if (m->version == MIDX_VERSION_V2 && !m->pack_names_sorted) {
+		uint32_t i;
+
+		ALLOC_ARRAY(m->pack_names_sorted, m->num_packs);
+
+		for (i = 0; i < m->num_packs; i++)
+			m->pack_names_sorted[i] = i;
+
+		QSORT_S(m->pack_names_sorted, m->num_packs, midx_pack_names_cmp,
+			m);
+	}
+
 	while (first < last) {
 		uint32_t mid = first + (last - first) / 2;
 		const char *current;
 		int cmp;
 
-		current = m->pack_names[mid];
+		if (m->pack_names_sorted)
+			current = m->pack_names[m->pack_names_sorted[mid]];
+		else
+			current = m->pack_names[mid];
 		cmp = cmp_idx_or_pack_name(idx_or_pack_name, current);
 		if (!cmp)
 			return 1;
diff --git a/midx.h b/midx.h
index a39bcc9d03f..aa99a6cb215 100644
--- a/midx.h
+++ b/midx.h
@@ -11,7 +11,8 @@ struct git_hash_algo;
 struct odb_source;
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
-#define MIDX_VERSION 1
+#define MIDX_VERSION_V1 1
+#define MIDX_VERSION_V2 2
 #define MIDX_BYTE_FILE_VERSION 4
 #define MIDX_BYTE_HASH_VERSION 5
 #define MIDX_BYTE_NUM_CHUNKS 6
@@ -71,6 +72,7 @@ struct multi_pack_index {
 	uint32_t num_packs_in_base;
 
 	const char **pack_names;
+	size_t *pack_names_sorted;
 	struct packed_git **packs;
 };
 
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index efeab4d22b7..250d21dbd67 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -21,7 +21,7 @@ midx_read_expect () {
 	EXTRA_CHUNKS="$5"
 	{
 		cat <<-EOF &&
-		header: 4d494458 1 $HASH_LEN $NUM_CHUNKS $NUM_PACKS
+		header: 4d494458 2 $HASH_LEN $NUM_CHUNKS $NUM_PACKS
 		chunks: pack-names oid-fanout oid-lookup object-offsets$EXTRA_CHUNKS
 		num_objects: $NUM_OBJECTS
 		packs:
@@ -512,11 +512,6 @@ test_expect_success 'verify invalid chunk offset' '
 		"improper chunk offset(s)"
 '
 
-test_expect_success 'verify packnames out of order' '
-	corrupt_midx_and_verify $MIDX_BYTE_PACKNAME_ORDER "z" $objdir \
-		"pack names out of order"
-'
-
 test_expect_success 'verify missing pack' '
 	corrupt_midx_and_verify $MIDX_BYTE_PACKNAME_ORDER "a" $objdir \
 		"failed to load pack"
@@ -578,6 +573,15 @@ test_expect_success 'verify incorrect checksum' '
 		$objdir "incorrect checksum"
 '
 
+test_expect_success 'setup for v1-specific fsck tests' '
+	git -c midx.version=1 multi-pack-index write
+'
+
+test_expect_success 'verify packnames out of order (v1)' '
+	corrupt_midx_and_verify $MIDX_BYTE_PACKNAME_ORDER "z" $objdir \
+		"pack names out of order"
+'
+
 test_expect_success 'repack progress off for redirected stderr' '
 	GIT_PROGRESS_DELAY=0 git multi-pack-index --object-dir=$objdir repack 2>err &&
 	test_line_count = 0 err
-- 
2.52.0.457.gb599f1ad4b0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 11/18] git-compat-util.h: introduce `u32_add()`
  2026-01-14 19:54 ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                     ` (9 preceding siblings ...)
  2026-01-14 19:54   ` [PATCH v2 10/18] midx: do not require packs to be sorted in lexicographic order Taylor Blau
@ 2026-01-14 19:54   ` Taylor Blau
  2026-01-14 21:49     ` Junio C Hamano
  2026-01-14 19:54   ` [PATCH v2 12/18] midx-write.c: introduce `midx_pack_perm()` helper Taylor Blau
                     ` (7 subsequent siblings)
  18 siblings, 1 reply; 99+ messages in thread
From: Taylor Blau @ 2026-01-14 19:54 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

A future commit will want to add two 32-bit unsigned values together
while checking for overflow. Introduce a variant of the u64_add()
function for operating on 32-bit inputs.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 git-compat-util.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/git-compat-util.h b/git-compat-util.h
index b0673d1a450..db62a6f25c5 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -641,6 +641,14 @@ static inline int cast_size_t_to_int(size_t a)
 	return (int)a;
 }
 
+static inline uint32_t u32_add(uint32_t a, uint32_t b)
+{
+	if (unsigned_add_overflows(a, b))
+		die("uint32_t overflow: %"PRIuMAX" + %"PRIuMAX,
+		    (uintmax_t)a, (uintmax_t)b);
+	return a + b;
+}
+
 static inline uint64_t u64_mult(uint64_t a, uint64_t b)
 {
 	if (unsigned_mult_overflows(a, b))
-- 
2.52.0.457.gb599f1ad4b0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 12/18] midx-write.c: introduce `midx_pack_perm()` helper
  2026-01-14 19:54 ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                     ` (10 preceding siblings ...)
  2026-01-14 19:54   ` [PATCH v2 11/18] git-compat-util.h: introduce `u32_add()` Taylor Blau
@ 2026-01-14 19:54   ` Taylor Blau
  2026-01-14 19:54   ` [PATCH v2 13/18] midx-write.c: extract `fill_pack_from_midx()` Taylor Blau
                     ` (6 subsequent siblings)
  18 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-01-14 19:54 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

The `ctx->pack_perm` array can be considered as a permutation between
the original `pack_int_id` of some given pack to its position in the
`ctx->info` array containing all packs.

Today we can always index into this array with any known `pack_int_id`,
since there is never a `pack_int_id` which is greater than or equal to
the value `ctx->nr`.

That is not necessarily the case with MIDX compaction. For example,
suppose we have a MIDX chain with three layers, each containing three
packs. The base of the MIDX chain will have packs with IDs 0, 1, and 2,
the next layer 3, 4, and 5, and so on. If we are compacting the topmost
two layers, we'll have input `pack_int_id` values between [3, 8], but
`ctx->nr` will only be 6.

In that example, if we want to know where the pack whose original
`pack_int_id` value was, say, 7, we would compute `ctx->pack_perm[7]`,
leading to an uninitialized read, since there are only 6 entries
allocated in that array.

To address this, there are a couple of options:

 - We could allocate enough entries in `ctx->pack_perm` to accommodate
   the largest `orig_pack_int_id` value.

 - Or, we could internally shift the input values by the number of packs
   in the base layer of the lower end of the MIDX compaction range.

This patch prepare us to take the latter approach, since it does not
allocate more memory than strictly necessary. (In our above example, the
base of the lower end of the compaction range is the first MIDX layer
(having three packs), so we would end up indexing `ctx->pack_perm[7-3]`,
which is a valid read.)

Note that this patch does not actually implement that approach yet, but
merely performs a behavior-preserving refactoring which will make the
change easier to carry out in the future.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 5c8700065a1..9d345fb4737 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -119,6 +119,12 @@ struct write_midx_context {
 	struct odb_source *source;
 };
 
+static uint32_t midx_pack_perm(struct write_midx_context *ctx,
+			       uint32_t orig_pack_int_id)
+{
+	return ctx->pack_perm[orig_pack_int_id];
+}
+
 static int should_include_pack(const struct write_midx_context *ctx,
 			       const char *file_name)
 {
@@ -521,12 +527,12 @@ static int write_midx_object_offsets(struct hashfile *f,
 	for (i = 0; i < ctx->entries_nr; i++) {
 		struct pack_midx_entry *obj = list++;
 
-		if (ctx->pack_perm[obj->pack_int_id] == PACK_EXPIRED)
+		if (midx_pack_perm(ctx, obj->pack_int_id) == PACK_EXPIRED)
 			BUG("object %s is in an expired pack with int-id %d",
 			    oid_to_hex(&obj->oid),
 			    obj->pack_int_id);
 
-		hashwrite_be32(f, ctx->pack_perm[obj->pack_int_id]);
+		hashwrite_be32(f, midx_pack_perm(ctx, obj->pack_int_id));
 
 		if (ctx->large_offsets_needed && obj->offset >> 31)
 			hashwrite_be32(f, MIDX_LARGE_OFFSET_NEEDED | nr_large_offset++);
@@ -627,7 +633,7 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx)
 	for (i = 0; i < ctx->entries_nr; i++) {
 		struct pack_midx_entry *e = &ctx->entries[i];
 		data[i].nr = i;
-		data[i].pack = ctx->pack_perm[e->pack_int_id];
+		data[i].pack = midx_pack_perm(ctx, e->pack_int_id);
 		if (!e->preferred)
 			data[i].pack |= (1U << 31);
 		data[i].offset = e->offset;
@@ -637,7 +643,7 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx)
 
 	for (i = 0; i < ctx->entries_nr; i++) {
 		struct pack_midx_entry *e = &ctx->entries[data[i].nr];
-		struct pack_info *pack = &ctx->info[ctx->pack_perm[e->pack_int_id]];
+		struct pack_info *pack = &ctx->info[midx_pack_perm(ctx, e->pack_int_id)];
 		if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
 			pack->bitmap_pos = i + base_objects;
 		pack->bitmap_nr++;
@@ -698,7 +704,7 @@ static void prepare_midx_packing_data(struct packing_data *pdata,
 		struct object_entry *to = packlist_alloc(pdata, &from->oid);
 
 		oe_set_in_pack(pdata, to,
-			       ctx->info[ctx->pack_perm[from->pack_int_id]].p);
+			       ctx->info[midx_pack_perm(ctx, from->pack_int_id)].p);
 	}
 
 	trace2_region_leave("midx", "prepare_midx_packing_data", ctx->repo);
@@ -1384,7 +1390,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
 						      sizeof(*ctx.info),
 						      idx_or_pack_name_cmp);
 		if (preferred) {
-			uint32_t perm = ctx.pack_perm[preferred->orig_pack_int_id];
+			uint32_t perm = midx_pack_perm(&ctx, preferred->orig_pack_int_id);
 			if (perm == PACK_EXPIRED)
 				warning(_("preferred pack '%s' is expired"),
 					opts->preferred_pack_name);
-- 
2.52.0.457.gb599f1ad4b0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 13/18] midx-write.c: extract `fill_pack_from_midx()`
  2026-01-14 19:54 ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                     ` (11 preceding siblings ...)
  2026-01-14 19:54   ` [PATCH v2 12/18] midx-write.c: introduce `midx_pack_perm()` helper Taylor Blau
@ 2026-01-14 19:54   ` Taylor Blau
  2026-01-14 19:54   ` [PATCH v2 14/18] midx-write.c: enumerate `pack_int_id` values directly Taylor Blau
                     ` (5 subsequent siblings)
  18 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-01-14 19:54 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

When filling packs from an existing MIDX, `fill_packs_from_midx()`
handles preparing a MIDX'd pack, and reading out its pack name from the
existing MIDX.

MIDX compaction will want to perform an identical operation, though the
caller will look quite different than `fill_packs_from_midx()`. To
reduce any future code duplication, extract `fill_pack_from_midx()`
from `fill_packs_from_midx()` to prepare to call our new helper function
in a future change.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 9d345fb4737..c54113cdc84 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -913,6 +913,21 @@ static int write_midx_bitmap(struct write_midx_context *ctx,
 	return ret;
 }
 
+static int fill_pack_from_midx(struct pack_info *info,
+			       struct multi_pack_index *m,
+			       uint32_t pack_int_id)
+{
+	if (prepare_midx_pack(m, pack_int_id))
+		return error(_("could not load pack %d"), pack_int_id);
+
+	fill_pack_info(info,
+		       m->packs[pack_int_id - m->num_packs_in_base],
+		       m->pack_names[pack_int_id - m->num_packs_in_base],
+		       pack_int_id);
+
+	return 0;
+}
+
 static int fill_packs_from_midx(struct write_midx_context *ctx)
 {
 	struct multi_pack_index *m;
@@ -921,13 +936,13 @@ static int fill_packs_from_midx(struct write_midx_context *ctx)
 		uint32_t i;
 
 		for (i = 0; i < m->num_packs; i++) {
-			if (prepare_midx_pack(m, m->num_packs_in_base + i))
-				return error(_("could not load pack"));
-
 			ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
-			fill_pack_info(&ctx->info[ctx->nr++], m->packs[i],
-				       m->pack_names[i],
-				       m->num_packs_in_base + i);
+
+			if (fill_pack_from_midx(&ctx->info[ctx->nr], m,
+						m->num_packs_in_base + i) < 0)
+				return -1;
+
+			ctx->nr++;
 		}
 	}
 	return 0;
-- 
2.52.0.457.gb599f1ad4b0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 14/18] midx-write.c: enumerate `pack_int_id` values directly
  2026-01-14 19:54 ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                     ` (12 preceding siblings ...)
  2026-01-14 19:54   ` [PATCH v2 13/18] midx-write.c: extract `fill_pack_from_midx()` Taylor Blau
@ 2026-01-14 19:54   ` Taylor Blau
  2026-01-14 19:55   ` [PATCH v2 15/18] midx-write.c: factor fanout layering from `compute_sorted_entries()` Taylor Blau
                     ` (4 subsequent siblings)
  18 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-01-14 19:54 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Our `midx-write.c::fill_packs_from_midx()` function currently enumerates
the range [0, m->num_packs), and then shifts its index variable up by
`m->num_packs_in_base` to produce a valid `pack_int_id`.

Instead, directly enumerate the range:

    [m->num_packs_in_base, m->num_packs_in_base + m->num_packs)

, which are the original pack_int_ids themselves as opposed to the
indexes of those packs relative to the MIDX layer they are contained
within.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index c54113cdc84..80334914d3e 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -935,11 +935,11 @@ static int fill_packs_from_midx(struct write_midx_context *ctx)
 	for (m = ctx->m; m; m = m->base_midx) {
 		uint32_t i;
 
-		for (i = 0; i < m->num_packs; i++) {
+		for (i = m->num_packs_in_base;
+		     i < m->num_packs_in_base + m->num_packs; i++) {
 			ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
 
-			if (fill_pack_from_midx(&ctx->info[ctx->nr], m,
-						m->num_packs_in_base + i) < 0)
+			if (fill_pack_from_midx(&ctx->info[ctx->nr], m, i) < 0)
 				return -1;
 
 			ctx->nr++;
-- 
2.52.0.457.gb599f1ad4b0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 15/18] midx-write.c: factor fanout layering from `compute_sorted_entries()`
  2026-01-14 19:54 ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                     ` (13 preceding siblings ...)
  2026-01-14 19:54   ` [PATCH v2 14/18] midx-write.c: enumerate `pack_int_id` values directly Taylor Blau
@ 2026-01-14 19:55   ` Taylor Blau
  2026-01-14 19:55   ` [PATCH v2 16/18] t/helper/test-read-midx.c: plug memory leak when selecting layer Taylor Blau
                     ` (3 subsequent siblings)
  18 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-01-14 19:55 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

When computing the set of objects to appear in a MIDX, we use
compute_sorted_entries(), which handles objects from various existing
sources one fanout layer at a time.

The process for computing this set is slightly different during MIDX
compaction, so factor out the existing functionality into its own
routine to prevent `compute_sorted_entries()` from becoming too
difficult to read.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 42 +++++++++++++++++++++++++-----------------
 1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 80334914d3e..ca2469213e6 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -328,6 +328,30 @@ static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout,
 	}
 }
 
+static void midx_fanout_add(struct midx_fanout *fanout,
+			    struct write_midx_context *ctx,
+			    uint32_t start_pack,
+			    uint32_t cur_fanout)
+{
+	uint32_t cur_pack;
+
+	if (ctx->m && !ctx->incremental)
+		midx_fanout_add_midx_fanout(fanout, ctx->m, cur_fanout,
+					    ctx->preferred_pack_idx);
+
+	for (cur_pack = start_pack; cur_pack < ctx->nr; cur_pack++) {
+		int preferred = cur_pack == ctx->preferred_pack_idx;
+		midx_fanout_add_pack_fanout(fanout, ctx->info, cur_pack,
+					    preferred, cur_fanout);
+	}
+
+	if (ctx->preferred_pack_idx != NO_PREFERRED_PACK &&
+	    ctx->preferred_pack_idx < start_pack)
+		midx_fanout_add_pack_fanout(fanout, ctx->info,
+					    ctx->preferred_pack_idx, 1,
+					    cur_fanout);
+}
+
 /*
  * It is possible to artificially get into a state where there are many
  * duplicate copies of objects. That can create high memory pressure if
@@ -364,23 +388,7 @@ static void compute_sorted_entries(struct write_midx_context *ctx,
 	for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
 		fanout.nr = 0;
 
-		if (ctx->m && !ctx->incremental)
-			midx_fanout_add_midx_fanout(&fanout, ctx->m, cur_fanout,
-						    ctx->preferred_pack_idx);
-
-		for (cur_pack = start_pack; cur_pack < ctx->nr; cur_pack++) {
-			int preferred = cur_pack == ctx->preferred_pack_idx;
-			midx_fanout_add_pack_fanout(&fanout,
-						    ctx->info, cur_pack,
-						    preferred, cur_fanout);
-		}
-
-		if (ctx->preferred_pack_idx != NO_PREFERRED_PACK &&
-		    ctx->preferred_pack_idx < start_pack)
-			midx_fanout_add_pack_fanout(&fanout, ctx->info,
-						    ctx->preferred_pack_idx, 1,
-						    cur_fanout);
-
+		midx_fanout_add(&fanout, ctx, start_pack, cur_fanout);
 		midx_fanout_sort(&fanout);
 
 		/*
-- 
2.52.0.457.gb599f1ad4b0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 16/18] t/helper/test-read-midx.c: plug memory leak when selecting layer
  2026-01-14 19:54 ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                     ` (14 preceding siblings ...)
  2026-01-14 19:55   ` [PATCH v2 15/18] midx-write.c: factor fanout layering from `compute_sorted_entries()` Taylor Blau
@ 2026-01-14 19:55   ` Taylor Blau
  2026-01-14 19:55   ` [PATCH v2 17/18] midx: implement MIDX compaction Taylor Blau
                     ` (2 subsequent siblings)
  18 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-01-14 19:55 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Though our 'read-midx' test tool is capable of printing information
about a single MIDX layer identified by its checksum, no caller in our
test suite exercises this path.

Unfortunately, there is a memory leak lurking in this (currently) unused
path that would otherwise be exposed by the following commit.

This occurs when providing a MIDX layer checksum other than the tip. As
we walk over the MIDX chain trying to find the matching layer, we drop
our reference to the top-most MIDX layer. Thus, our call to
'close_midx()' later on leaks memory between the top-most MIDX layer and
the MIDX layer immediately following the specified one.

Plug this leak by holding a reference to the tip of the MIDX chain, and
ensure that we call `close_midx()` before terminating the test tool.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/helper/test-read-midx.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index 9d42c587564..388d29e2b53 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -26,9 +26,10 @@ static int read_midx_file(const char *object_dir, const char *checksum,
 			  int show_objects)
 {
 	uint32_t i;
-	struct multi_pack_index *m;
+	struct multi_pack_index *m, *tip;
+	int ret = 0;
 
-	m = setup_midx(object_dir);
+	m = tip = setup_midx(object_dir);
 
 	if (!m)
 		return 1;
@@ -36,8 +37,11 @@ static int read_midx_file(const char *object_dir, const char *checksum,
 	if (checksum) {
 		while (m && strcmp(midx_get_checksum_hex(m), checksum))
 			m = m->base_midx;
-		if (!m)
-			return 1;
+		if (!m) {
+			ret = error(_("could not find MIDX with checksum %s"),
+				    checksum);
+			goto out;
+		}
 	}
 
 	printf("header: %08x %d %d %d %d\n",
@@ -82,9 +86,10 @@ static int read_midx_file(const char *object_dir, const char *checksum,
 		}
 	}
 
-	close_midx(m);
+out:
+	close_midx(tip);
 
-	return 0;
+	return ret;
 }
 
 static int read_midx_checksum(const char *object_dir)
-- 
2.52.0.457.gb599f1ad4b0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 17/18] midx: implement MIDX compaction
  2026-01-14 19:54 ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                     ` (15 preceding siblings ...)
  2026-01-14 19:55   ` [PATCH v2 16/18] t/helper/test-read-midx.c: plug memory leak when selecting layer Taylor Blau
@ 2026-01-14 19:55   ` Taylor Blau
  2026-01-27  7:35     ` Patrick Steinhardt
  2026-01-14 19:55   ` [PATCH v2 18/18] midx: enable reachability bitmaps during " Taylor Blau
  2026-02-20 22:24   ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Junio C Hamano
  18 siblings, 1 reply; 99+ messages in thread
From: Taylor Blau @ 2026-01-14 19:55 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

When managing a MIDX chain with many layers, it is convenient to combine
a sequence of adjacent layers into a single layer to prevent the chain
from growing too long.

While it is conceptually possible to "compact" a sequence of MIDX layers
together by running "git multi-pack-index write --stdin-packs", there
are a few drawbacks that make this less than desirable:

 - Preserving the MIDX chain is impossible, since there is no way to
   write a MIDX layer that contains objects or packs found in an earlier
   MIDX layer already part of the chain. So callers would have to write
   an entirely new (non-incremental) MIDX containing only the compacted
   layers, discarding all other objects/packs from the MIDX.

 - There is (currently) no way to write a MIDX layer outside of the MIDX
   chain to work around the above, such that the MIDX chain could be
   reassembled substituting the compacted layers with the MIDX that was
   written.

 - The `--stdin-packs` command-line option does not allow us to specify
   the order of packs as they appear in the MIDX. Therefore, even if
   there were workarounds for the previous two challenges, any bitmaps
   belonging to layers which come after the compacted layer(s) would no
   longer be valid.

This commit introduces a way to compact a sequence of adjacent MIDX
layers into a single layer while preserving the MIDX chain, as well as
any bitmap(s) in layers which are newer than the compacted ones.

Implementing MIDX compaction does not require a significant number of
changes to how MIDX layers are written. The main changes are as follows:

 - Instead of calling `fill_packs_from_midx()`, we call a new function
   `fill_packs_from_midx_range()`, which walks backwards along the
   portion of the MIDX chain which we are compacting, and adds packs one
   layer a time.

   In order to preserve the pseudo-pack order, the concatenated pack
   order is preserved, with the exception of preferred packs which are
   always added first.

 - After adding entries from the set of packs in the compaction range,
   `compute_sorted_entries()` must adjust the `pack_int_id`'s for all
   objects added in each fanout layer to match their original
   `pack_int_id`'s (as opposed to the index at which each pack appears
   in `ctx.info`).

 - When writing out the new 'multi-pack-index-chain' file, discard any
   layers in the compaction range, replacing them with the newly written
   layer, instead of keeping them and placing the new layer at the end
   of the chain.

This ends up being sufficient to implement MIDX compaction in such a way
that preserves bitmaps corresponding to more recent layers in the MIDX
chain.

The tests for MIDX compaction are so far fairly spartan, since the main
interesting behavior here is ensuring that the right packs/objects are
selected from each layer, and that the pack order is preserved despite
whether or not they are sorted in lexicographic order in the original
MIDX chain.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.adoc |  13 ++
 builtin/multi-pack-index.c              |  74 +++++++
 midx-write.c                            | 251 ++++++++++++++++++++++--
 midx.h                                  |   5 +
 t/meson.build                           |   1 +
 t/t5335-compact-multi-pack-index.sh     | 175 +++++++++++++++++
 6 files changed, 500 insertions(+), 19 deletions(-)
 create mode 100755 t/t5335-compact-multi-pack-index.sh

diff --git a/Documentation/git-multi-pack-index.adoc b/Documentation/git-multi-pack-index.adoc
index 164cf1f2291..883a0529741 100644
--- a/Documentation/git-multi-pack-index.adoc
+++ b/Documentation/git-multi-pack-index.adoc
@@ -12,6 +12,8 @@ SYNOPSIS
 'git multi-pack-index' [<options>] write [--preferred-pack=<pack>]
 		         [--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]
 		         [--refs-snapshot=<path>]
+'git multi-pack-index' [<options>] compact [--[no-]incremental]
+		         <from> <to>
 'git multi-pack-index' [<options>] verify
 'git multi-pack-index' [<options>] expire
 'git multi-pack-index' [<options>] repack [--batch-size=<size>]
@@ -83,6 +85,17 @@ marker).
 		necessary.
 --
 
+compact::
+	Write a new MIDX layer containing only objects and packs present
+	in the range `<from>` to `<to>`, where both arguments are
+	checksums of existing layers in the MIDX chain.
++
+--
+	--incremental::
+		Write the result to a MIDX chain instead of writing a
+		stand-alone MIDX.
+--
+
 verify::
 	Verify the contents of the MIDX file.
 
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index c0c6c1760c0..043ee8c478a 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -17,6 +17,10 @@
 	   "  [--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]\n" \
 	   "  [--refs-snapshot=<path>]")
 
+#define BUILTIN_MIDX_COMPACT_USAGE \
+	N_("git multi-pack-index [<options>] compact [--[no-]incremental]\n" \
+	   "  <from> <to>")
+
 #define BUILTIN_MIDX_VERIFY_USAGE \
 	N_("git multi-pack-index [<options>] verify")
 
@@ -30,6 +34,10 @@ static char const * const builtin_multi_pack_index_write_usage[] = {
 	BUILTIN_MIDX_WRITE_USAGE,
 	NULL
 };
+static char const * const builtin_multi_pack_index_compact_usage[] = {
+	BUILTIN_MIDX_COMPACT_USAGE,
+	NULL
+};
 static char const * const builtin_multi_pack_index_verify_usage[] = {
 	BUILTIN_MIDX_VERIFY_USAGE,
 	NULL
@@ -44,6 +52,7 @@ static char const * const builtin_multi_pack_index_repack_usage[] = {
 };
 static char const * const builtin_multi_pack_index_usage[] = {
 	BUILTIN_MIDX_WRITE_USAGE,
+	BUILTIN_MIDX_COMPACT_USAGE,
 	BUILTIN_MIDX_VERIFY_USAGE,
 	BUILTIN_MIDX_EXPIRE_USAGE,
 	BUILTIN_MIDX_REPACK_USAGE,
@@ -195,6 +204,70 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
 	return ret;
 }
 
+static int cmd_multi_pack_index_compact(int argc, const char **argv,
+					const char *prefix,
+					struct repository *repo)
+{
+	struct multi_pack_index *m, *cur;
+	struct multi_pack_index *from_midx = NULL;
+	struct multi_pack_index *to_midx = NULL;
+	struct odb_source *source;
+	int ret;
+
+	struct option *options;
+	static struct option builtin_multi_pack_index_compact_options[] = {
+		OPT_BIT(0, "incremental", &opts.flags,
+			N_("write a new incremental MIDX"), MIDX_WRITE_INCREMENTAL),
+		OPT_END(),
+	};
+
+	repo_config(repo, git_multi_pack_index_write_config, NULL);
+
+	options = add_common_options(builtin_multi_pack_index_compact_options);
+
+	trace2_cmd_mode(argv[0]);
+
+	if (isatty(2))
+		opts.flags |= MIDX_PROGRESS;
+	argc = parse_options(argc, argv, prefix,
+			     options, builtin_multi_pack_index_compact_usage,
+			     0);
+
+	if (argc != 2)
+		usage_with_options(builtin_multi_pack_index_compact_usage,
+				   options);
+	source = handle_object_dir_option(the_repository);
+
+	FREE_AND_NULL(options);
+
+	m = get_multi_pack_index(source);
+
+	for (cur = m; cur && !(from_midx && to_midx); cur = cur->base_midx) {
+		const char *midx_csum = midx_get_checksum_hex(cur);
+
+		if (!from_midx && !strcmp(midx_csum, argv[0]))
+			from_midx = cur;
+		if (!to_midx && !strcmp(midx_csum, argv[1]))
+			to_midx = cur;
+	}
+
+	if (!from_midx)
+		die(_("could not find MIDX: %s"), argv[0]);
+	if (!to_midx)
+		die(_("could not find MIDX: %s"), argv[1]);
+	if (from_midx == to_midx)
+		die(_("MIDX compaction endpoints must be unique"));
+
+	for (m = from_midx; m; m = m->base_midx) {
+		if (m == to_midx)
+			die(_("MIDX %s must be an ancestor of %s"), argv[0], argv[1]);
+	}
+
+	ret = write_midx_file_compact(source, from_midx, to_midx, opts.flags);
+
+	return ret;
+}
+
 static int cmd_multi_pack_index_verify(int argc, const char **argv,
 				       const char *prefix,
 				       struct repository *repo UNUSED)
@@ -295,6 +368,7 @@ int cmd_multi_pack_index(int argc,
 	struct option builtin_multi_pack_index_options[] = {
 		OPT_SUBCOMMAND("repack", &fn, cmd_multi_pack_index_repack),
 		OPT_SUBCOMMAND("write", &fn, cmd_multi_pack_index_write),
+		OPT_SUBCOMMAND("compact", &fn, cmd_multi_pack_index_compact),
 		OPT_SUBCOMMAND("verify", &fn, cmd_multi_pack_index_verify),
 		OPT_SUBCOMMAND("expire", &fn, cmd_multi_pack_index_expire),
 		OPT_END(),
diff --git a/midx-write.c b/midx-write.c
index ca2469213e6..afa077a09cc 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -113,6 +113,10 @@ struct write_midx_context {
 	int incremental;
 	uint32_t num_multi_pack_indexes_before;
 
+	struct multi_pack_index *compact_from;
+	struct multi_pack_index *compact_to;
+	int compact;
+
 	struct string_list *to_include;
 
 	struct repository *repo;
@@ -122,6 +126,8 @@ struct write_midx_context {
 static uint32_t midx_pack_perm(struct write_midx_context *ctx,
 			       uint32_t orig_pack_int_id)
 {
+	if (ctx->compact)
+		orig_pack_int_id -= ctx->compact_from->num_packs_in_base;
 	return ctx->pack_perm[orig_pack_int_id];
 }
 
@@ -352,6 +358,21 @@ static void midx_fanout_add(struct midx_fanout *fanout,
 					    cur_fanout);
 }
 
+static void midx_fanout_add_compact(struct midx_fanout *fanout,
+				    struct write_midx_context *ctx,
+				    uint32_t cur_fanout)
+{
+	struct multi_pack_index *m = ctx->compact_to;
+
+	ASSERT(ctx->compact);
+
+	while (m && m != ctx->compact_from->base_midx) {
+		midx_fanout_add_midx_fanout(fanout, m, cur_fanout,
+					    NO_PREFERRED_PACK);
+		m = m->base_midx;
+	}
+}
+
 /*
  * It is possible to artificially get into a state where there are many
  * duplicate copies of objects. That can create high memory pressure if
@@ -370,6 +391,9 @@ static void compute_sorted_entries(struct write_midx_context *ctx,
 	size_t alloc_objects, total_objects = 0;
 	struct midx_fanout fanout = { 0 };
 
+	if (ctx->compact)
+		ASSERT(!start_pack);
+
 	for (cur_pack = start_pack; cur_pack < ctx->nr; cur_pack++)
 		total_objects = st_add(total_objects,
 				       ctx->info[cur_pack].p->num_objects);
@@ -388,7 +412,10 @@ static void compute_sorted_entries(struct write_midx_context *ctx,
 	for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
 		fanout.nr = 0;
 
-		midx_fanout_add(&fanout, ctx, start_pack, cur_fanout);
+		if (ctx->compact)
+			midx_fanout_add_compact(&fanout, ctx, cur_fanout);
+		else
+			midx_fanout_add(&fanout, ctx, start_pack, cur_fanout);
 		midx_fanout_sort(&fanout);
 
 		/*
@@ -956,6 +983,72 @@ static int fill_packs_from_midx(struct write_midx_context *ctx)
 	return 0;
 }
 
+static uint32_t compactible_packs_between(const struct multi_pack_index *from,
+					  const struct multi_pack_index *to)
+{
+	uint32_t nr;
+
+	ASSERT(from && to);
+
+	nr = u32_add(to->num_packs, to->num_packs_in_base);
+	if (nr < from->num_packs_in_base)
+		BUG("unexpected number of packs in base during compaction: "
+		    "%"PRIu32" < %"PRIu32, nr, from->num_packs_in_base);
+
+	return nr - from->num_packs_in_base;
+}
+
+static int fill_packs_from_midx_range(struct write_midx_context *ctx,
+				      int bitmap_order)
+{
+	struct multi_pack_index *m = ctx->compact_to;
+	uint32_t packs_nr;
+
+	ASSERT(ctx->compact && !ctx->nr);
+	ASSERT(ctx->compact_from);
+	ASSERT(ctx->compact_to);
+
+	packs_nr = compactible_packs_between(ctx->compact_from,
+					     ctx->compact_to);
+
+	ALLOC_GROW(ctx->info, packs_nr, ctx->alloc);
+
+	while (m != ctx->compact_from->base_midx) {
+		uint32_t pack_int_id, preferred_pack_id;
+		uint32_t i;
+
+		if (bitmap_order) {
+			if (midx_preferred_pack(m, &preferred_pack_id) < 0)
+				die(_("could not determine preferred pack"));
+		} else {
+			preferred_pack_id = m->num_packs_in_base;
+		}
+
+		pack_int_id = m->num_packs_in_base - ctx->compact_from->num_packs_in_base;
+
+		if (fill_pack_from_midx(&ctx->info[pack_int_id++], m,
+					preferred_pack_id) < 0)
+			return -1;
+
+		for (i = m->num_packs_in_base;
+		     i < m->num_packs_in_base + m->num_packs; i++) {
+			if (preferred_pack_id == i)
+				continue;
+
+			if (fill_pack_from_midx(&ctx->info[pack_int_id++], m,
+						i) < 0)
+				return -1;
+		}
+
+		ctx->nr += m->num_packs;
+		m = m->base_midx;
+	}
+
+	ASSERT(ctx->nr == packs_nr);
+
+	return 0;
+}
+
 static struct {
 	const char *non_split;
 	const char *split;
@@ -1075,6 +1168,9 @@ static bool midx_needs_update(struct multi_pack_index *midx, struct write_midx_c
 	if (ctx->incremental)
 		goto out;
 
+	if (ctx->compact)
+		goto out; /* Compaction always requires an update. */
+
 	/*
 	 * Otherwise, we need to verify that the packs covered by the existing
 	 * MIDX match the packs that we already have. The logic to do so is way
@@ -1120,12 +1216,23 @@ static bool midx_needs_update(struct multi_pack_index *midx, struct write_midx_c
 	return needed;
 }
 
+static int midx_hashcmp(const struct multi_pack_index *a,
+			const struct multi_pack_index *b,
+			const struct git_hash_algo *algop)
+{
+	return hashcmp(midx_get_checksum_hash(a), midx_get_checksum_hash(b),
+		       algop);
+}
+
 struct write_midx_opts {
 	struct odb_source *source; /* non-optional */
 
 	struct string_list *packs_to_include;
 	struct string_list *packs_to_drop;
 
+	struct multi_pack_index *compact_from;
+	struct multi_pack_index *compact_to;
+
 	const char *preferred_pack_name;
 	const char *refs_snapshot;
 	unsigned flags;
@@ -1150,6 +1257,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
 	int dropped_packs = 0;
 	int result = -1;
 	const char **keep_hashes = NULL;
+	size_t keep_hashes_nr = 0;
 	struct chunkfile *cf;
 
 	trace2_region_enter("midx", "write_midx_internal", r);
@@ -1162,6 +1270,19 @@ static int write_midx_internal(struct write_midx_opts *opts)
 		die(_("unknown MIDX version: %d"), ctx.version);
 
 	ctx.incremental = !!(opts->flags & MIDX_WRITE_INCREMENTAL);
+	ctx.compact = !!(opts->flags & MIDX_WRITE_COMPACT);
+
+	if (ctx.compact) {
+		if (ctx.version != MIDX_VERSION_V2)
+			die(_("cannot perform MIDX compaction with v1 format"));
+		if (!opts->compact_from)
+			BUG("expected non-NULL 'from' MIDX during compaction");
+		if (!opts->compact_to)
+			BUG("expected non-NULL 'to' MIDX during compaction");
+
+		ctx.compact_from = opts->compact_from;
+		ctx.compact_to = opts->compact_to;
+	}
 
 	if (ctx.incremental)
 		strbuf_addf(&midx_name,
@@ -1189,11 +1310,18 @@ static int write_midx_internal(struct write_midx_opts *opts)
 			 */
 			if (ctx.incremental)
 				ctx.base_midx = m;
-			else if (!opts->packs_to_include)
+			if (!opts->packs_to_include)
 				ctx.m = m;
 		}
 	}
 
+	/*
+	 * If compacting MIDX layer(s) in the range [from, to], then the
+	 * compacted MIDX will share the same base MIDX as 'from'.
+	 */
+	if (ctx.compact)
+		ctx.base_midx = ctx.compact_from->base_midx;
+
 	ctx.nr = 0;
 	ctx.alloc = ctx.m ? ctx.m->num_packs + ctx.m->num_packs_in_base : 16;
 	ctx.info = NULL;
@@ -1210,7 +1338,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
 			ctx.num_multi_pack_indexes_before++;
 			m = m->base_midx;
 		}
-	} else if (ctx.m && fill_packs_from_midx(&ctx)) {
+	} else if (ctx.m && !ctx.compact && fill_packs_from_midx(&ctx)) {
 		goto cleanup;
 	}
 
@@ -1223,9 +1351,18 @@ static int write_midx_internal(struct write_midx_opts *opts)
 	else
 		ctx.progress = NULL;
 
-	ctx.to_include = opts->packs_to_include;
+	if (ctx.compact) {
+		int bitmap_order = 0;
+		if (opts->preferred_pack_name)
+			bitmap_order |= 1;
+		else if (opts->flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))
+			bitmap_order |= 1;
 
-	for_each_file_in_pack_dir(opts->source->path, add_pack_to_midx, &ctx);
+		fill_packs_from_midx_range(&ctx, bitmap_order);
+	} else {
+		ctx.to_include = opts->packs_to_include;
+		for_each_file_in_pack_dir(opts->source->path, add_pack_to_midx, &ctx);
+	}
 	stop_progress(&ctx.progress);
 
 	if (!opts->packs_to_drop) {
@@ -1354,12 +1491,19 @@ static int write_midx_internal(struct write_midx_opts *opts)
 			ctx.large_offsets_needed = 1;
 	}
 
-	QSORT(ctx.info, ctx.nr, pack_info_compare);
+	if (ctx.compact) {
+		if (ctx.version != MIDX_VERSION_V2)
+			BUG("performing MIDX compaction with v1 MIDX");
+	} else {
+		QSORT(ctx.info, ctx.nr, pack_info_compare);
+	}
 
 	if (opts->packs_to_drop && opts->packs_to_drop->nr) {
 		size_t drop_index = 0;
 		int missing_drops = 0;
 
+		ASSERT(!ctx.compact);
+
 		for (size_t i = 0;
 		     i < ctx.nr && drop_index < opts->packs_to_drop->nr; i++) {
 			int cmp = strcmp(ctx.info[i].pack_name,
@@ -1391,12 +1535,20 @@ static int write_midx_internal(struct write_midx_opts *opts)
 	 */
 	ALLOC_ARRAY(ctx.pack_perm, ctx.nr);
 	for (size_t i = 0; i < ctx.nr; i++) {
+		uint32_t from = ctx.info[i].orig_pack_int_id;
+		uint32_t to;
+
 		if (ctx.info[i].expired) {
+			to = PACK_EXPIRED;
 			dropped_packs++;
-			ctx.pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED;
 		} else {
-			ctx.pack_perm[ctx.info[i].orig_pack_int_id] = i - dropped_packs;
+			to = i - dropped_packs;
 		}
+
+		if (ctx.compact)
+			from -= ctx.compact_from->num_packs_in_base;
+
+		ctx.pack_perm[from] = to;
 	}
 
 	for (size_t i = 0; i < ctx.nr; i++) {
@@ -1542,7 +1694,24 @@ static int write_midx_internal(struct write_midx_opts *opts)
 	if (ctx.num_multi_pack_indexes_before == UINT32_MAX)
 		die(_("too many multi-pack-indexes"));
 
-	CALLOC_ARRAY(keep_hashes, ctx.num_multi_pack_indexes_before + 1);
+	if (ctx.compact) {
+		struct multi_pack_index *m;
+
+		/*
+		 * Keep all MIDX layers excluding those in the range [from, to].
+		 */
+		for (m = ctx.base_midx; m; m = m->base_midx)
+			keep_hashes_nr++;
+		for (m = ctx.m;
+		     m && midx_hashcmp(m, ctx.compact_to, r->hash_algo);
+		     m = m->base_midx)
+			keep_hashes_nr++;
+
+		keep_hashes_nr++; /* include the compacted layer */
+	} else {
+		keep_hashes_nr = ctx.num_multi_pack_indexes_before + 1;
+	}
+	CALLOC_ARRAY(keep_hashes, keep_hashes_nr);
 
 	if (ctx.incremental) {
 		FILE *chainf = fdopen_lock_file(&lk, "w");
@@ -1567,17 +1736,47 @@ static int write_midx_internal(struct write_midx_opts *opts)
 
 		strbuf_release(&final_midx_name);
 
-		keep_hashes[ctx.num_multi_pack_indexes_before] =
-			xstrdup(hash_to_hex_algop(midx_hash, r->hash_algo));
+		if (ctx.compact) {
+			struct multi_pack_index *m;
+			uint32_t num_layers_before_from = 0;
+			uint32_t i;
 
-		for (uint32_t i = 0; i < ctx.num_multi_pack_indexes_before; i++) {
-			uint32_t j = ctx.num_multi_pack_indexes_before - i - 1;
+			for (m = ctx.base_midx; m; m = m->base_midx)
+				num_layers_before_from++;
 
-			keep_hashes[j] = xstrdup(midx_get_checksum_hex(m));
-			m = m->base_midx;
+			m = ctx.base_midx;
+			for (i = 0; i < num_layers_before_from; i++) {
+				uint32_t j = num_layers_before_from - i - 1;
+
+				keep_hashes[j] = xstrdup(midx_get_checksum_hex(m));
+				m = m->base_midx;
+			}
+
+			keep_hashes[i] = xstrdup(hash_to_hex_algop(midx_hash,
+								   r->hash_algo));
+
+			i = 0;
+			for (m = ctx.m;
+			     m && midx_hashcmp(m, ctx.compact_to, r->hash_algo);
+			     m = m->base_midx) {
+				keep_hashes[keep_hashes_nr - i - 1] =
+					xstrdup(midx_get_checksum_hex(m));
+				i++;
+			}
+		} else {
+			keep_hashes[ctx.num_multi_pack_indexes_before] =
+				xstrdup(hash_to_hex_algop(midx_hash,
+							  r->hash_algo));
+
+			for (uint32_t i = 0; i < ctx.num_multi_pack_indexes_before; i++) {
+				uint32_t j = ctx.num_multi_pack_indexes_before - i - 1;
+
+				keep_hashes[j] = xstrdup(midx_get_checksum_hex(m));
+				m = m->base_midx;
+			}
 		}
 
-		for (uint32_t i = 0; i <= ctx.num_multi_pack_indexes_before; i++)
+		for (uint32_t i = 0; i < keep_hashes_nr; i++)
 			fprintf(get_lock_file_fp(&lk), "%s\n", keep_hashes[i]);
 	} else {
 		keep_hashes[ctx.num_multi_pack_indexes_before] =
@@ -1590,8 +1789,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
 	if (commit_lock_file(&lk) < 0)
 		die_errno(_("could not write multi-pack-index"));
 
-	clear_midx_files(opts->source, keep_hashes,
-			 ctx.num_multi_pack_indexes_before + 1,
+	clear_midx_files(opts->source, keep_hashes, keep_hashes_nr,
 			 ctx.incremental);
 	result = 0;
 
@@ -1609,7 +1807,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
 	free(ctx.pack_perm);
 	free(ctx.pack_order);
 	if (keep_hashes) {
-		for (uint32_t i = 0; i <= ctx.num_multi_pack_indexes_before; i++)
+		for (uint32_t i = 0; i < keep_hashes_nr; i++)
 			free((char *)keep_hashes[i]);
 		free(keep_hashes);
 	}
@@ -1651,6 +1849,21 @@ int write_midx_file_only(struct odb_source *source,
 	return write_midx_internal(&opts);
 }
 
+int write_midx_file_compact(struct odb_source *source,
+			    struct multi_pack_index *from,
+			    struct multi_pack_index *to,
+			    unsigned flags)
+{
+	struct write_midx_opts opts = {
+		.source = source,
+		.compact_from = from,
+		.compact_to = to,
+		.flags = flags | MIDX_WRITE_COMPACT,
+	};
+
+	return write_midx_internal(&opts);
+}
+
 int expire_midx_packs(struct odb_source *source, unsigned flags)
 {
 	uint32_t i, *count, result = 0;
diff --git a/midx.h b/midx.h
index aa99a6cb215..08f3728e520 100644
--- a/midx.h
+++ b/midx.h
@@ -82,6 +82,7 @@ struct multi_pack_index {
 #define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3)
 #define MIDX_WRITE_BITMAP_LOOKUP_TABLE (1 << 4)
 #define MIDX_WRITE_INCREMENTAL (1 << 5)
+#define MIDX_WRITE_COMPACT (1 << 6)
 
 #define MIDX_EXT_REV "rev"
 #define MIDX_EXT_BITMAP "bitmap"
@@ -131,6 +132,10 @@ int write_midx_file_only(struct odb_source *source,
 			 struct string_list *packs_to_include,
 			 const char *preferred_pack_name,
 			 const char *refs_snapshot, unsigned flags);
+int write_midx_file_compact(struct odb_source *source,
+			    struct multi_pack_index *from,
+			    struct multi_pack_index *to,
+			    unsigned flags);
 void clear_midx_file(struct repository *r);
 int verify_midx_file(struct odb_source *source, unsigned flags);
 int expire_midx_packs(struct odb_source *source, unsigned flags);
diff --git a/t/meson.build b/t/meson.build
index 459c52a4897..5393322a8d2 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -614,6 +614,7 @@ integration_tests = [
   't5332-multi-pack-reuse.sh',
   't5333-pseudo-merge-bitmaps.sh',
   't5334-incremental-multi-pack-index.sh',
+  't5335-compact-multi-pack-index.sh',
   't5351-unpack-large-objects.sh',
   't5400-send-pack.sh',
   't5401-update-hooks.sh',
diff --git a/t/t5335-compact-multi-pack-index.sh b/t/t5335-compact-multi-pack-index.sh
new file mode 100755
index 00000000000..797ae05c3bd
--- /dev/null
+++ b/t/t5335-compact-multi-pack-index.sh
@@ -0,0 +1,175 @@
+#!/bin/sh
+
+test_description='multi-pack-index compaction'
+
+. ./test-lib.sh
+
+GIT_TEST_MULTI_PACK_INDEX=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=0
+
+objdir=.git/objects
+packdir=$objdir/pack
+midxdir=$packdir/multi-pack-index.d
+midx_chain=$midxdir/multi-pack-index-chain
+
+nth_line() {
+	local n="$1"
+	shift
+	awk "NR==$n" "$@"
+}
+
+write_packs () {
+	for c in "$@"
+	do
+		test_commit "$c" &&
+
+		git pack-objects --all --unpacked $packdir/pack-$c &&
+		git prune-packed &&
+
+		git multi-pack-index write --incremental --bitmap || return 1
+	done
+}
+
+test_midx_layer_packs () {
+	local checksum="$1" &&
+	shift &&
+
+	test-tool read-midx $objdir "$checksum" >out &&
+
+	printf "%s\n" "$@" >expect &&
+	# NOTE: do *not* pipe through sort here, we want to ensure the
+	# order of packs is preserved during compaction.
+	grep "^pack-" out | cut -d"-" -f2 >actual &&
+
+	test_cmp expect actual
+}
+
+test_midx_layer_object_uniqueness () {
+	: >objs.all
+	while read layer
+	do
+		test-tool read-midx --show-objects $objdir "$layer" >out &&
+		grep "\.pack$" out | cut -d" " -f1 | sort >objs.layer &&
+		test_stdout_line_count = 0 comm -12 objs.all objs.layer &&
+		cat objs.all objs.layer | sort >objs.tmp &&
+		mv objs.tmp objs.all || return 1
+	done <$midx_chain
+}
+
+test_expect_success 'MIDX compaction with lex-ordered pack names' '
+	git init midx-compact-lex-order &&
+	(
+		cd midx-compact-lex-order &&
+
+		git config maintenance.auto false &&
+
+		write_packs A B C D E &&
+		test_line_count = 5 $midx_chain &&
+
+		git multi-pack-index compact --incremental \
+			"$(nth_line 2 "$midx_chain")" \
+			"$(nth_line 4 "$midx_chain")" &&
+		test_line_count = 3 $midx_chain &&
+
+		test_midx_layer_packs "$(nth_line 1 "$midx_chain")" A &&
+		test_midx_layer_packs "$(nth_line 2 "$midx_chain")" B C D &&
+		test_midx_layer_packs "$(nth_line 3 "$midx_chain")" E &&
+
+		test_midx_layer_object_uniqueness
+	)
+'
+
+test_expect_success 'MIDX compaction with non-lex-ordered pack names' '
+	git init midx-compact-non-lex-order &&
+	(
+		cd midx-compact-non-lex-order &&
+
+		git config maintenance.auto false &&
+
+		write_packs D C A B E &&
+		test_line_count = 5 $midx_chain &&
+
+		git multi-pack-index compact --incremental \
+			"$(nth_line 2 "$midx_chain")" \
+			"$(nth_line 4 "$midx_chain")" &&
+		test_line_count = 3 $midx_chain &&
+
+		test_midx_layer_packs "$(nth_line 1 "$midx_chain")" D &&
+		test_midx_layer_packs "$(nth_line 2 "$midx_chain")" C A B &&
+		test_midx_layer_packs "$(nth_line 3 "$midx_chain")" E &&
+
+		test_midx_layer_object_uniqueness
+	)
+'
+
+test_expect_success 'setup for bogus MIDX compaction scenarios' '
+	git init midx-compact-bogus &&
+	(
+		cd midx-compact-bogus &&
+
+		git config maintenance.auto false &&
+
+		write_packs A B C
+	)
+'
+
+test_expect_success 'MIDX compaction with missing endpoints' '
+	(
+		cd midx-compact-bogus &&
+
+		test_must_fail git multi-pack-index compact --incremental \
+			"<missing>" "<missing>" 2>err &&
+		test_grep "could not find MIDX: <missing>" err &&
+
+		test_must_fail git multi-pack-index compact --incremental \
+			"<missing>" "$(nth_line 2 "$midx_chain")" 2>err &&
+		test_grep "could not find MIDX: <missing>" err &&
+
+		test_must_fail git multi-pack-index compact --incremental \
+			"$(nth_line 2 "$midx_chain")" "<missing>" 2>err &&
+		test_grep "could not find MIDX: <missing>" err
+	)
+'
+
+test_expect_success 'MIDX compaction with reversed endpoints' '
+	(
+		cd midx-compact-bogus &&
+
+		from="$(nth_line 3 "$midx_chain")" &&
+		to="$(nth_line 1 "$midx_chain")" &&
+
+		test_must_fail git multi-pack-index compact --incremental \
+			"$from" "$to" 2>err &&
+
+		test_grep "MIDX $from must be an ancestor of $to" err
+	)
+'
+
+test_expect_success 'MIDX compaction with identical endpoints' '
+	(
+		cd midx-compact-bogus &&
+
+		from="$(nth_line 3 "$midx_chain")" &&
+		to="$(nth_line 3 "$midx_chain")" &&
+
+		test_must_fail git multi-pack-index compact --incremental \
+			"$from" "$to" 2>err &&
+
+		test_grep "MIDX compaction endpoints must be unique" err
+	)
+'
+
+test_expect_success 'MIDX compaction with midx.version=1' '
+	(
+		cd midx-compact-bogus &&
+
+		test_must_fail git -c midx.version=1 multi-pack-index compact \
+			"$(nth_line 1 "$midx_chain")" \
+			"$(nth_line 2 "$midx_chain")" 2>err &&
+
+		test_grep "fatal: cannot perform MIDX compaction with v1 format" err
+	)
+'
+
+test_done
-- 
2.52.0.457.gb599f1ad4b0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 18/18] midx: enable reachability bitmaps during MIDX compaction
  2026-01-14 19:54 ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                     ` (16 preceding siblings ...)
  2026-01-14 19:55   ` [PATCH v2 17/18] midx: implement MIDX compaction Taylor Blau
@ 2026-01-14 19:55   ` Taylor Blau
  2026-02-20 22:24   ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Junio C Hamano
  18 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-01-14 19:55 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Enable callers to generate reachability bitmaps when performing MIDX
layer compaction by combining all existing bitmaps from the compacted
layers.

Note that because of the object/pack ordering described by the previous
commit, the pseudo-pack order for the compacted MIDX is the same as
concatenating the individual pseudo-pack orderings for each layer in the
compaction range.

As a result, the only non-test or documentation change necessary is to
treat all objects as non-preferred during compaction so as not to
disturb the object ordering.

In the future, we may want to adjust which commit(s) receive
reachability bitmaps when compacting multiple .bitmap files into one, or
even generate new bitmaps (e.g., if the references have moved
significantly since the .bitmap was generated). This commit only
implements combining all existing bitmaps in range together in order to
demonstrate and lay the groundwork for more exotic strategies.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.adoc |   5 +-
 builtin/multi-pack-index.c              |   4 +-
 midx-write.c                            |   2 +-
 t/t5335-compact-multi-pack-index.sh     | 122 +++++++++++++++++++++++-
 4 files changed, 128 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-multi-pack-index.adoc b/Documentation/git-multi-pack-index.adoc
index 883a0529741..61256830141 100644
--- a/Documentation/git-multi-pack-index.adoc
+++ b/Documentation/git-multi-pack-index.adoc
@@ -13,7 +13,7 @@ SYNOPSIS
 		         [--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]
 		         [--refs-snapshot=<path>]
 'git multi-pack-index' [<options>] compact [--[no-]incremental]
-		         <from> <to>
+		         [--[no-]bitmap] <from> <to>
 'git multi-pack-index' [<options>] verify
 'git multi-pack-index' [<options>] expire
 'git multi-pack-index' [<options>] repack [--batch-size=<size>]
@@ -94,6 +94,9 @@ compact::
 	--incremental::
 		Write the result to a MIDX chain instead of writing a
 		stand-alone MIDX.
+
+	--[no-]bitmap::
+		Control whether or not a multi-pack bitmap is written.
 --
 
 verify::
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 043ee8c478a..2f24c113c8f 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -19,7 +19,7 @@
 
 #define BUILTIN_MIDX_COMPACT_USAGE \
 	N_("git multi-pack-index [<options>] compact [--[no-]incremental]\n" \
-	   "  <from> <to>")
+	   "  [--[no-]bitmap] <from> <to>")
 
 #define BUILTIN_MIDX_VERIFY_USAGE \
 	N_("git multi-pack-index [<options>] verify")
@@ -216,6 +216,8 @@ static int cmd_multi_pack_index_compact(int argc, const char **argv,
 
 	struct option *options;
 	static struct option builtin_multi_pack_index_compact_options[] = {
+		OPT_BIT(0, "bitmap", &opts.flags, N_("write multi-pack bitmap"),
+			MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
 		OPT_BIT(0, "incremental", &opts.flags,
 			N_("write a new incremental MIDX"), MIDX_WRITE_INCREMENTAL),
 		OPT_END(),
diff --git a/midx-write.c b/midx-write.c
index afa077a09cc..4c52843db9b 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -669,7 +669,7 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx)
 		struct pack_midx_entry *e = &ctx->entries[i];
 		data[i].nr = i;
 		data[i].pack = midx_pack_perm(ctx, e->pack_int_id);
-		if (!e->preferred)
+		if (!e->preferred || ctx->compact)
 			data[i].pack |= (1U << 31);
 		data[i].offset = e->offset;
 	}
diff --git a/t/t5335-compact-multi-pack-index.sh b/t/t5335-compact-multi-pack-index.sh
index 797ae05c3bd..40f3844282f 100755
--- a/t/t5335-compact-multi-pack-index.sh
+++ b/t/t5335-compact-multi-pack-index.sh
@@ -67,7 +67,7 @@ test_expect_success 'MIDX compaction with lex-ordered pack names' '
 		write_packs A B C D E &&
 		test_line_count = 5 $midx_chain &&
 
-		git multi-pack-index compact --incremental \
+		git multi-pack-index compact --incremental --bitmap \
 			"$(nth_line 2 "$midx_chain")" \
 			"$(nth_line 4 "$midx_chain")" &&
 		test_line_count = 3 $midx_chain &&
@@ -90,7 +90,7 @@ test_expect_success 'MIDX compaction with non-lex-ordered pack names' '
 		write_packs D C A B E &&
 		test_line_count = 5 $midx_chain &&
 
-		git multi-pack-index compact --incremental \
+		git multi-pack-index compact --incremental --bitmap \
 			"$(nth_line 2 "$midx_chain")" \
 			"$(nth_line 4 "$midx_chain")" &&
 		test_line_count = 3 $midx_chain &&
@@ -172,4 +172,122 @@ test_expect_success 'MIDX compaction with midx.version=1' '
 	)
 '
 
+midx_objs_by_pack () {
+	awk '/\.pack$/ { split($3, a, "-"); print a[2], $1 }' | sort
+}
+
+tag_objs_from_pack () {
+	objs="$(git rev-list --objects --no-object-names "$2")" &&
+	printf "$1 %s\n" $objs | sort
+}
+
+test_expect_success 'MIDX compaction preserves pack object selection' '
+	git init midx-compact-preserve-selection &&
+	(
+		cd midx-compact-preserve-selection &&
+
+		git config maintenance.auto false &&
+
+		test_commit A &&
+		test_commit B &&
+
+		# Create two packs, one containing just the objects from
+		# A, and another containing all objects from the
+		# repository.
+		p1="$(echo A | git pack-objects --revs --delta-base-offset \
+			$packdir/pack-1)" &&
+		p0="$(echo B | git pack-objects --revs --delta-base-offset \
+			$packdir/pack-0)" &&
+
+		echo "pack-1-$p1.idx" | git multi-pack-index write \
+			--incremental --bitmap --stdin-packs &&
+		echo "pack-0-$p0.idx" | git multi-pack-index write \
+			--incremental --bitmap --stdin-packs &&
+
+		write_packs C &&
+
+		git multi-pack-index compact --incremental --bitmap \
+			"$(nth_line 1 "$midx_chain")" \
+			"$(nth_line 2 "$midx_chain")" &&
+
+
+		test-tool read-midx --show-objects $objdir \
+			"$(nth_line 1 "$midx_chain")" >AB.info &&
+		test-tool read-midx --show-objects $objdir \
+			"$(nth_line 2 "$midx_chain")" >C.info &&
+
+		midx_objs_by_pack <AB.info >AB.actual &&
+		midx_objs_by_pack <C.info >C.actual &&
+
+		{
+			tag_objs_from_pack 1 A &&
+			tag_objs_from_pack 0 A..B
+		} | sort >AB.expect &&
+		tag_objs_from_pack C B..C >C.expect &&
+
+		test_cmp AB.expect AB.actual &&
+		test_cmp C.expect C.actual
+	)
+'
+
+test_expect_success 'MIDX compaction with bitmaps' '
+	git init midx-compact-with-bitmaps &&
+	(
+		cd midx-compact-with-bitmaps &&
+
+		git config maintenance.auto false &&
+
+		write_packs foo bar baz quux woot &&
+
+		test-tool read-midx --bitmap $objdir >bitmap.expect &&
+		git multi-pack-index compact --incremental --bitmap \
+			"$(nth_line 2 "$midx_chain")" \
+			"$(nth_line 4 "$midx_chain")" &&
+		test-tool read-midx --bitmap $objdir >bitmap.actual &&
+
+		test_cmp bitmap.expect bitmap.actual &&
+
+		true
+	)
+'
+
+test_expect_success 'MIDX compaction with bitmaps (non-trivial)' '
+	git init midx-compact-with-bitmaps-non-trivial &&
+	(
+		cd midx-compact-with-bitmaps-non-trivial &&
+
+		git config maintenance.auto false &&
+
+		git branch -m main &&
+
+		#               D(4)
+		#              /
+		# A(1) --- B(2) --- C(3) --- G(7)
+		#              \
+		#               E(5) --- F(6)
+		write_packs A B C &&
+		git checkout -b side &&
+		write_packs D &&
+		git checkout -b other B &&
+		write_packs E F &&
+		git checkout main &&
+		write_packs G &&
+
+		# Compact layers 2-4, leaving us with:
+		#
+		#  [A, [B, C, D], E, F, G]
+		git multi-pack-index compact --incremental --bitmap \
+			"$(nth_line 2 "$midx_chain")" \
+			"$(nth_line 4 "$midx_chain")" &&
+
+		# Then compact the top two layers, condensing the above
+		# such that the new 4th layer contains F and G.
+		#
+		#  [A, [B, C, D], E, [F, G]]
+		git multi-pack-index compact --incremental --bitmap \
+			"$(nth_line 4 "$midx_chain")" \
+			"$(nth_line 5 "$midx_chain")"
+	)
+'
+
 test_done
-- 
2.52.0.457.gb599f1ad4b0

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 08/18] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos`
  2026-01-14 19:54   ` [PATCH v2 08/18] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos` Taylor Blau
@ 2026-01-14 21:13     ` Junio C Hamano
  2026-01-14 21:40       ` Taylor Blau
  0 siblings, 1 reply; 99+ messages in thread
From: Junio C Hamano @ 2026-01-14 21:13 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Elijah Newren, Patrick Steinhardt

Taylor Blau <me@ttaylorr.com> writes:

> But the process is completely unnecessary in the first place: we are
> enumerating all values of `ctx->info`, and there is no reason to process
> them in a different order than they appear in memory. Index `ctx->info`
> directly to reflect that.

As long as pack_perm[] is permutations of [0..ctx->nr-1], it does
not matter, as we are clearing all the [0..ctx->nr-1] range anyway,
in other words?

>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  midx-write.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/midx-write.c b/midx-write.c
> index 13171d7e9c4..da9c5a7c295 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -637,7 +637,7 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx)
>  		pack_order[i] = data[i].nr;
>  	}
>  	for (i = 0; i < ctx->nr; i++) {
> -		struct pack_info *pack = &ctx->info[ctx->pack_perm[i]];
> +		struct pack_info *pack = &ctx->info[i];
>  		if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
>  			pack->bitmap_pos = 0;
>  	}

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 10/18] midx: do not require packs to be sorted in lexicographic order
  2026-01-14 19:54   ` [PATCH v2 10/18] midx: do not require packs to be sorted in lexicographic order Taylor Blau
@ 2026-01-14 21:28     ` Junio C Hamano
  2026-01-14 21:44       ` Taylor Blau
  2026-01-27  7:34     ` Patrick Steinhardt
  1 sibling, 1 reply; 99+ messages in thread
From: Junio C Hamano @ 2026-01-14 21:28 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Elijah Newren, Patrick Steinhardt

Taylor Blau <me@ttaylorr.com> writes:

> @@ -374,7 +374,7 @@ HEADER:
>  	    The signature is: {'M', 'I', 'D', 'X'}
>  
>  	1-byte version number:
> -	    Git only writes or recognizes version 1.
> +	    Git only writes version 2, but recognizes versions 1 and 2.

We only write version 2, then ...

> diff --git a/midx-write.c b/midx-write.c
> index 8a54644e427..5c8700065a1 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -36,10 +36,13 @@ extern int cmp_idx_or_pack_name(const char *idx_or_pack_name,
>  
>  static size_t write_midx_header(const struct git_hash_algo *hash_algo,
>  				struct hashfile *f, unsigned char num_chunks,
> -				uint32_t num_packs)
> +				uint32_t num_packs, int version)
>  {
> +	if (version != MIDX_VERSION_V1 && version != MIDX_VERSION_V2)
> +		BUG("unexpected MIDX version: %d", version);
> +

... do we need to add version parameter to this function?  Unless
the writer that calls this helper function demotes version to 1 when
it realizes that the pack files we write midx for happens to be
sorted, in which case it may need to call this function with version
set to 1, I do not quite see why we need it.

>  	hashwrite_be32(f, MIDX_SIGNATURE);
> -	hashwrite_u8(f, MIDX_VERSION);
> +	hashwrite_u8(f, version);
>  	hashwrite_u8(f, oid_version(hash_algo));
>  	hashwrite_u8(f, num_chunks);
>  	hashwrite_u8(f, 0); /* unused */
> @@ -105,6 +108,8 @@ struct write_midx_context {
>  
>  	uint32_t preferred_pack_idx;
>  
> +	int version; /* must be MIDX_VERSION_V1 or _V2 */
> +

Ditto.  When we are writing it out, shouldn't this always be 2
anyway?

> @@ -410,7 +415,9 @@ static int write_midx_pack_names(struct hashfile *f, void *data)
>  		if (ctx->info[i].expired)
>  			continue;
>  
> -		if (i && strcmp(ctx->info[i].pack_name, ctx->info[i - 1].pack_name) <= 0)
> +		if (ctx->version == MIDX_VERSION_V1 &&
> +		    i && strcmp(ctx->info[i].pack_name,
> +				ctx->info[i - 1].pack_name) <= 0)
>  			BUG("incorrect pack-file order: %s before %s",
>  			    ctx->info[i - 1].pack_name,
>  			    ctx->info[i].pack_name);

Ditto.

> @@ -1025,6 +1032,12 @@ static bool midx_needs_update(struct multi_pack_index *midx, struct write_midx_c
>  	if (!midx_checksum_valid(midx))
>  		goto out;
>  
> +	/*
> +	 * If the version differs, we need to update.
> +	 */
> +	if (midx->version != ctx->version)
> +		goto out;
> +

OK.

> @@ -1100,6 +1113,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
>  	struct tempfile *incr;
>  	struct write_midx_context ctx = {
>  		.preferred_pack_idx = NO_PREFERRED_PACK,
> +		.version = MIDX_VERSION_V2,
>  	 };
>  	struct multi_pack_index *midx_to_free = NULL;
>  	int bitmapped_packs_concat_len = 0;
> @@ -1114,6 +1128,10 @@ static int write_midx_internal(struct write_midx_opts *opts)
>  	ctx.repo = r;
>  	ctx.source = opts->source;
>  
> +	repo_config_get_int(ctx.repo, "midx.version", &ctx.version);
> +	if (ctx.version != MIDX_VERSION_V1 && ctx.version != MIDX_VERSION_V2)
> +		die(_("unknown MIDX version: %d"), ctx.version);
> +

Ditto.

> @@ -1445,7 +1463,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
>  	}
>  
>  	write_midx_header(r->hash_algo, f, get_num_chunks(cf),
> -			  ctx.nr - dropped_packs);
> +			  ctx.nr - dropped_packs, ctx.version);
>  	write_chunkfile(cf, &ctx);
>  
>  	finalize_hashfile(f, midx_hash, FSYNC_COMPONENT_PACK_METADATA,

Ditto.

> diff --git a/midx.h b/midx.h
> index a39bcc9d03f..aa99a6cb215 100644
> --- a/midx.h
> +++ b/midx.h
> @@ -11,7 +11,8 @@ struct git_hash_algo;
>  struct odb_source;
>  
>  #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
> -#define MIDX_VERSION 1
> +#define MIDX_VERSION_V1 1
> +#define MIDX_VERSION_V2 2
>  #define MIDX_BYTE_FILE_VERSION 4
>  #define MIDX_BYTE_HASH_VERSION 5
>  #define MIDX_BYTE_NUM_CHUNKS 6
> @@ -71,6 +72,7 @@ struct multi_pack_index {
>  	uint32_t num_packs_in_base;
>  
>  	const char **pack_names;
> +	size_t *pack_names_sorted;
>  	struct packed_git **packs;
>  };

This does make sense.  The code paths that reads existing on-disk
files must notice when they are dealing with v1 format and need to
act differently.

Thanks.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 08/18] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos`
  2026-01-14 21:13     ` Junio C Hamano
@ 2026-01-14 21:40       ` Taylor Blau
  0 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-01-14 21:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Elijah Newren, Patrick Steinhardt

On Wed, Jan 14, 2026 at 01:13:17PM -0800, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > But the process is completely unnecessary in the first place: we are
> > enumerating all values of `ctx->info`, and there is no reason to process
> > them in a different order than they appear in memory. Index `ctx->info`
> > directly to reflect that.
>
> As long as pack_perm[] is permutations of [0..ctx->nr-1], it does
> not matter, as we are clearing all the [0..ctx->nr-1] range anyway,
> in other words?

That's right. The contents of ctx->pack_perm here aren't so important
since it doesn't matter what order we consider the pack_info structs in.
As long as we see all of them we are OK, hence iterating over the
ctx->info array directly rather than in the permuted order given by
pack_perm.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 10/18] midx: do not require packs to be sorted in lexicographic order
  2026-01-14 21:28     ` Junio C Hamano
@ 2026-01-14 21:44       ` Taylor Blau
  0 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-01-14 21:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Elijah Newren, Patrick Steinhardt

On Wed, Jan 14, 2026 at 01:28:07PM -0800, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > @@ -374,7 +374,7 @@ HEADER:
> >  	    The signature is: {'M', 'I', 'D', 'X'}
> >
> >  	1-byte version number:
> > -	    Git only writes or recognizes version 1.
> > +	    Git only writes version 2, but recognizes versions 1 and 2.
>
> We only write version 2, then ...

Ah, thanks for spotting.

This comment is outdated, and I didn't catch it when proof-reading
the patches. I wrote this line before adding the "midx.version" config
escape hatch, and doing so made this comment stale.

It should probably look something like:

    Git writes the version specified by the "midx.version"
    configuration option, which defaults to 2. It recognizes
    both versions 1 and 2.

I'll update it and include it in the subsequent round.

> >  	hashwrite_be32(f, MIDX_SIGNATURE);
> > -	hashwrite_u8(f, MIDX_VERSION);
> > +	hashwrite_u8(f, version);
> >  	hashwrite_u8(f, oid_version(hash_algo));
> >  	hashwrite_u8(f, num_chunks);
> >  	hashwrite_u8(f, 0); /* unused */
> > @@ -105,6 +108,8 @@ struct write_midx_context {
> >
> >  	uint32_t preferred_pack_idx;
> >
> > +	int version; /* must be MIDX_VERSION_V1 or _V2 */
> > +
>
> Ditto.  When we are writing it out, shouldn't this always be 2
> anyway?

For this and below, the code here is right and the documentation is
wrong (per above).

> > @@ -410,7 +415,9 @@ static int write_midx_pack_names(struct hashfile *f, void *data)
> >  		if (ctx->info[i].expired)
> >  			continue;
> >
> > -		if (i && strcmp(ctx->info[i].pack_name, ctx->info[i - 1].pack_name) <= 0)
> > +		if (ctx->version == MIDX_VERSION_V1 &&
> > +		    i && strcmp(ctx->info[i].pack_name,
> > +				ctx->info[i - 1].pack_name) <= 0)
> >  			BUG("incorrect pack-file order: %s before %s",
> >  			    ctx->info[i - 1].pack_name,
> >  			    ctx->info[i].pack_name);
>
> Ditto.

Ditto (and so on) ;-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 11/18] git-compat-util.h: introduce `u32_add()`
  2026-01-14 19:54   ` [PATCH v2 11/18] git-compat-util.h: introduce `u32_add()` Taylor Blau
@ 2026-01-14 21:49     ` Junio C Hamano
  2026-01-14 22:03       ` Taylor Blau
  0 siblings, 1 reply; 99+ messages in thread
From: Junio C Hamano @ 2026-01-14 21:49 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Elijah Newren, Patrick Steinhardt

Taylor Blau <me@ttaylorr.com> writes:

> A future commit will want to add two 32-bit unsigned values together
> while checking for overflow. Introduce a variant of the u64_add()
> function for operating on 32-bit inputs.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  git-compat-util.h | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/git-compat-util.h b/git-compat-util.h
> index b0673d1a450..db62a6f25c5 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -641,6 +641,14 @@ static inline int cast_size_t_to_int(size_t a)
>  	return (int)a;
>  }
>  
> +static inline uint32_t u32_add(uint32_t a, uint32_t b)
> +{
> +	if (unsigned_add_overflows(a, b))
> +		die("uint32_t overflow: %"PRIuMAX" + %"PRIuMAX,
> +		    (uintmax_t)a, (uintmax_t)b);
> +	return a + b;
> +}

Neither this one, nor the original u64_add(), seem to me a
particularly good API.

When things might overflow, it is a given that we should give a
controlled death rather than nonsense behaviour and/or corrupt
output, but shouldn't the diagnosis message given to the end-user
when we find an overflow be given at a bit higher layer?  At this
level, you do not even know what quantities you are adding together
*means*.  Even though your caller might be able to give a more
intelligible message like "the number of packs to combine exceeds
2^32 that is too many".  But dying in these functions means the
callers have no chance to do so.  The only thing the user sees is
some code tried to add two u32 and overflowed---without any hint
what these quantities were or what the addition was trying to
compute.





^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 11/18] git-compat-util.h: introduce `u32_add()`
  2026-01-14 21:49     ` Junio C Hamano
@ 2026-01-14 22:03       ` Taylor Blau
  2026-01-15  0:11         ` Taylor Blau
  0 siblings, 1 reply; 99+ messages in thread
From: Taylor Blau @ 2026-01-14 22:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Elijah Newren, Patrick Steinhardt

On Wed, Jan 14, 2026 at 01:49:00PM -0800, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > A future commit will want to add two 32-bit unsigned values together
> > while checking for overflow. Introduce a variant of the u64_add()
> > function for operating on 32-bit inputs.
> >
> > Signed-off-by: Taylor Blau <me@ttaylorr.com>
> > ---
> >  git-compat-util.h | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/git-compat-util.h b/git-compat-util.h
> > index b0673d1a450..db62a6f25c5 100644
> > --- a/git-compat-util.h
> > +++ b/git-compat-util.h
> > @@ -641,6 +641,14 @@ static inline int cast_size_t_to_int(size_t a)
> >  	return (int)a;
> >  }
> >
> > +static inline uint32_t u32_add(uint32_t a, uint32_t b)
> > +{
> > +	if (unsigned_add_overflows(a, b))
> > +		die("uint32_t overflow: %"PRIuMAX" + %"PRIuMAX,
> > +		    (uintmax_t)a, (uintmax_t)b);
> > +	return a + b;
> > +}
>
> Neither this one, nor the original u64_add(), seem to me a
> particularly good API.
>
> When things might overflow, it is a given that we should give a
> controlled death rather than nonsense behaviour and/or corrupt
> output, but shouldn't the diagnosis message given to the end-user
> when we find an overflow be given at a bit higher layer?  At this
> level, you do not even know what quantities you are adding together
> *means*.  Even though your caller might be able to give a more
> intelligible message like "the number of packs to combine exceeds
> 2^32 that is too many".  But dying in these functions means the
> callers have no chance to do so.  The only thing the user sees is
> some code tried to add two u32 and overflowed---without any hint
> what these quantities were or what the addition was trying to
> compute.

I agree with your sentiment here. Let's avoid proliferating an API
pattern that encourages non-descriptive error message by dropping the
sole caller of u32_add() in this series with the following:

--- 8< ---
diff --git a/midx-write.c b/midx-write.c
index afa077a09cc..99e7116f4d0 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -990,7 +990,10 @@ static uint32_t compactible_packs_between(const struct multi_pack_index *from,

 	ASSERT(from && to);

-	nr = u32_add(to->num_packs, to->num_packs_in_base);
+	if (unsigned_add_overflows(to->num_packs, to->num_packs_in_base))
+		die(_("too many packs, unable to compact"));
+
+	nr = to->num_packs + to->num_packs_in_base;
 	if (nr < from->num_packs_in_base)
 		BUG("unexpected number of packs in base during compaction: "
 		    "%"PRIu32" < %"PRIu32, nr, from->num_packs_in_base);
--- >8 ---

As for removing u64_add(), that should be straightforward as well since
there is also a single caller. Let me know if you think that makes sense
to take up as part of this series, or if you would prefer it done
separately. I tend to prefer the latter, since the state after applying
the above is that we avoid adding any new callers.

Removing st_add() and st_mult(), on the other hand, will be much more
involved since they are extremely widely used by comparison.

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 11/18] git-compat-util.h: introduce `u32_add()`
  2026-01-14 22:03       ` Taylor Blau
@ 2026-01-15  0:11         ` Taylor Blau
  2026-01-21  8:51           ` Patrick Steinhardt
  0 siblings, 1 reply; 99+ messages in thread
From: Taylor Blau @ 2026-01-15  0:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Elijah Newren, Patrick Steinhardt

On Wed, Jan 14, 2026 at 05:03:56PM -0500, Taylor Blau wrote:
> As for removing u64_add(), that should be straightforward as well since
> there is also a single caller. Let me know if you think that makes sense
> to take up as part of this series, or if you would prefer it done
> separately. I tend to prefer the latter, since the state after applying
> the above is that we avoid adding any new callers.

This appears to be easy enough. The following applies on top of 'master'
if you want to pick it up separately:

--- 8< ---

Subject: [PATCH] git-compat-util.h: drop u64_add(), u64_mult() helpers

The u64_add() and u64_mult() helper functions were introduced in
b103881d4f4 (midx repack: avoid integer overflow on 32 bit systems,
2025-05-22) to implement overflow checks during a fixed-point
calculation when estimating pack sizes in the MIDX writing code.

However, those functions call die() when either the addition or
multiplication of their operands (depending on which function is being
called) would cause an overflow. This does not allow the caller to
provide a more detailed message, presenting the user with an opaque
message like:

    fatal: uint64_t overflow: M * N

Let's discourage these opaque error messages by dropping these functions
entirely and instead having the caller use unsigned_mult_overflows() or
unsigned_add_overflows() themselves, providing the caller the
opportunity to come up with their own die() message.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 git-compat-util.h | 16 ----------------
 midx-write.c      | 15 +++++++++++++--
 2 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index b0673d1a450..24edd68c671 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -641,22 +641,6 @@ static inline int cast_size_t_to_int(size_t a)
 	return (int)a;
 }

-static inline uint64_t u64_mult(uint64_t a, uint64_t b)
-{
-	if (unsigned_mult_overflows(a, b))
-		die("uint64_t overflow: %"PRIuMAX" * %"PRIuMAX,
-		    (uintmax_t)a, (uintmax_t)b);
-	return a * b;
-}
-
-static inline uint64_t u64_add(uint64_t a, uint64_t b)
-{
-	if (unsigned_add_overflows(a, b))
-		die("uint64_t overflow: %"PRIuMAX" + %"PRIuMAX,
-		    (uintmax_t)a, (uintmax_t)b);
-	return a + b;
-}
-
 /*
  * Limit size of IO chunks, because huge chunks only cause pain.  OS X
  * 64-bit is buggy, returning EINVAL if len >= INT_MAX; and even in
diff --git a/midx-write.c b/midx-write.c
index 87b97c70872..6006b6569c8 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1738,8 +1738,19 @@ static void fill_included_packs_batch(struct repository *r,
 		 */
 		expected_size = (uint64_t)pack_info[i].referenced_objects << 14;
 		expected_size /= p->num_objects;
-		expected_size = u64_mult(expected_size, p->pack_size);
-		expected_size = u64_add(expected_size, 1u << 13) >> 14;
+
+		if (unsigned_mult_overflows(expected_size,
+					    (uint64_t)p->pack_size))
+			die(_("overflow during fixed-point multiply (%"PRIu64" "
+			      "* %"PRIu64")"),
+			    expected_size, (uint64_t)p->pack_size);
+		expected_size = expected_size * p->pack_size;
+
+		if (unsigned_add_overflows(expected_size, 1u << 13))
+			die(_("overflow during fixed-point rounding (%"PRIu64" "
+			      " + %"PRIu64")"),
+			    expected_size, (uint64_t)(1ul << 13));
+		expected_size = (expected_size + (1u << 13)) >> 14;

 		if (expected_size >= batch_size)
 			continue;

base-commit: 8745eae506f700657882b9e32b2aa00f234a6fb6
--
2.52.0.436.g7dc2c5478ff

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 11/18] git-compat-util.h: introduce `u32_add()`
  2026-01-15  0:11         ` Taylor Blau
@ 2026-01-21  8:51           ` Patrick Steinhardt
  2026-01-21 23:55             ` Taylor Blau
  0 siblings, 1 reply; 99+ messages in thread
From: Patrick Steinhardt @ 2026-01-21  8:51 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Junio C Hamano, git, Jeff King, Elijah Newren

On Wed, Jan 14, 2026 at 07:11:11PM -0500, Taylor Blau wrote:
> On Wed, Jan 14, 2026 at 05:03:56PM -0500, Taylor Blau wrote:
> > As for removing u64_add(), that should be straightforward as well since
> > there is also a single caller. Let me know if you think that makes sense
> > to take up as part of this series, or if you would prefer it done
> > separately. I tend to prefer the latter, since the state after applying
> > the above is that we avoid adding any new callers.
> 
> This appears to be easy enough. The following applies on top of 'master'
> if you want to pick it up separately:
> 
> --- 8< ---
> 
> Subject: [PATCH] git-compat-util.h: drop u64_add(), u64_mult() helpers
> 
> The u64_add() and u64_mult() helper functions were introduced in
> b103881d4f4 (midx repack: avoid integer overflow on 32 bit systems,
> 2025-05-22) to implement overflow checks during a fixed-point
> calculation when estimating pack sizes in the MIDX writing code.
> 
> However, those functions call die() when either the addition or
> multiplication of their operands (depending on which function is being
> called) would cause an overflow. This does not allow the caller to
> provide a more detailed message, presenting the user with an opaque
> message like:
> 
>     fatal: uint64_t overflow: M * N
> 
> Let's discourage these opaque error messages by dropping these functions
> entirely and instead having the caller use unsigned_mult_overflows() or
> unsigned_add_overflows() themselves, providing the caller the
> opportunity to come up with their own die() message.
> 
> Suggested-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  git-compat-util.h | 16 ----------------
>  midx-write.c      | 15 +++++++++++++--
>  2 files changed, 13 insertions(+), 18 deletions(-)
> 
> diff --git a/git-compat-util.h b/git-compat-util.h
> index b0673d1a450..24edd68c671 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -641,22 +641,6 @@ static inline int cast_size_t_to_int(size_t a)
>  	return (int)a;
>  }
> 
> -static inline uint64_t u64_mult(uint64_t a, uint64_t b)
> -{
> -	if (unsigned_mult_overflows(a, b))
> -		die("uint64_t overflow: %"PRIuMAX" * %"PRIuMAX,
> -		    (uintmax_t)a, (uintmax_t)b);
> -	return a * b;
> -}
> -
> -static inline uint64_t u64_add(uint64_t a, uint64_t b)
> -{
> -	if (unsigned_add_overflows(a, b))
> -		die("uint64_t overflow: %"PRIuMAX" + %"PRIuMAX,
> -		    (uintmax_t)a, (uintmax_t)b);
> -	return a + b;
> -}
> -
>  /*
>   * Limit size of IO chunks, because huge chunks only cause pain.  OS X
>   * 64-bit is buggy, returning EINVAL if len >= INT_MAX; and even in
> diff --git a/midx-write.c b/midx-write.c
> index 87b97c70872..6006b6569c8 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -1738,8 +1738,19 @@ static void fill_included_packs_batch(struct repository *r,
>  		 */
>  		expected_size = (uint64_t)pack_info[i].referenced_objects << 14;
>  		expected_size /= p->num_objects;
> -		expected_size = u64_mult(expected_size, p->pack_size);
> -		expected_size = u64_add(expected_size, 1u << 13) >> 14;
> +
> +		if (unsigned_mult_overflows(expected_size,
> +					    (uint64_t)p->pack_size))
> +			die(_("overflow during fixed-point multiply (%"PRIu64" "
> +			      "* %"PRIu64")"),
> +			    expected_size, (uint64_t)p->pack_size);
> +		expected_size = expected_size * p->pack_size;
> +
> +		if (unsigned_add_overflows(expected_size, 1u << 13))
> +			die(_("overflow during fixed-point rounding (%"PRIu64" "
> +			      " + %"PRIu64")"),
> +			    expected_size, (uint64_t)(1ul << 13));
> +		expected_size = (expected_size + (1u << 13)) >> 14;

One downside this pattern has is that we repeat the computation, which
makes it easy to get it wrong or forget updating either the check or the
computation.

I think ideally, we would have interfaces that combine the two
approaches in `u64_mult()` and `unsigned_mult_overflows()`. Something
like this for example:

    static intline bool u64_mult(uint64_t a, uint64_t b, uint64_t *out)
    {
        if (unsigned_mult_overflows(a, b))
            return false;
        *out = a * b;
        return true;
    }

This would let the caller handle the failure and is thus quite flexible,
which results in the following code:

	if (!u64_mult(expected_size, (uint64_t)p->pack_size, &expected_size))
		die(_("overflow during fixed-point multiply (%"PRIu64" "
		      "* %"PRIu64")"), expected_size, (uint64_t)p->pack_size);

Patrick

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 11/18] git-compat-util.h: introduce `u32_add()`
  2026-01-21  8:51           ` Patrick Steinhardt
@ 2026-01-21 23:55             ` Taylor Blau
  2026-01-22  2:26               ` rsbecker
  2026-02-23 13:49               ` Jeff King
  0 siblings, 2 replies; 99+ messages in thread
From: Taylor Blau @ 2026-01-21 23:55 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Junio C Hamano, git, Jeff King, Elijah Newren

On Wed, Jan 21, 2026 at 09:51:36AM +0100, Patrick Steinhardt wrote:
> > diff --git a/midx-write.c b/midx-write.c
> > index 87b97c70872..6006b6569c8 100644
> > --- a/midx-write.c
> > +++ b/midx-write.c
> > @@ -1738,8 +1738,19 @@ static void fill_included_packs_batch(struct repository *r,
> >  		 */
> >  		expected_size = (uint64_t)pack_info[i].referenced_objects << 14;
> >  		expected_size /= p->num_objects;
> > -		expected_size = u64_mult(expected_size, p->pack_size);
> > -		expected_size = u64_add(expected_size, 1u << 13) >> 14;
> > +
> > +		if (unsigned_mult_overflows(expected_size,
> > +					    (uint64_t)p->pack_size))
> > +			die(_("overflow during fixed-point multiply (%"PRIu64" "
> > +			      "* %"PRIu64")"),
> > +			    expected_size, (uint64_t)p->pack_size);
> > +		expected_size = expected_size * p->pack_size;
> > +
> > +		if (unsigned_add_overflows(expected_size, 1u << 13))
> > +			die(_("overflow during fixed-point rounding (%"PRIu64" "
> > +			      " + %"PRIu64")"),
> > +			    expected_size, (uint64_t)(1ul << 13));
> > +		expected_size = (expected_size + (1u << 13)) >> 14;
>
> One downside this pattern has is that we repeat the computation, which
> makes it easy to get it wrong or forget updating either the check or the
> computation.
>
> I think ideally, we would have interfaces that combine the two
> approaches in `u64_mult()` and `unsigned_mult_overflows()`. Something
> like this for example:
>
>     static intline bool u64_mult(uint64_t a, uint64_t b, uint64_t *out)
>     {
>         if (unsigned_mult_overflows(a, b))
>             return false;
>         *out = a * b;
>         return true;
>     }

I had considered this approach when writing, but ultimately decided
against it, since it felt a little clunky to have to pass a pointer in
to do a simple arithmetic operation. But I think your point about
ensuring that we actually do:

    if (unsigned_mult_overflows(a, b))
      die(...);
    result = a * b;

and not "result = a * c" or some other expression which is not "a * b"
is a good one.

I dunno. The spots in this patch are the only uses of u64_mult() and
u64_add(), so I'm hesitant to keep a helper function around just for
that sole use-case. I wonder if we should do what you suggest here for
the much more frequently used st_add() / st_mult() / st_sub() functions?

> This would let the caller handle the failure and is thus quite flexible,
> which results in the following code:
>
> 	if (!u64_mult(expected_size, (uint64_t)p->pack_size, &expected_size))
> 		die(_("overflow during fixed-point multiply (%"PRIu64" "
> 		      "* %"PRIu64")"), expected_size, (uint64_t)p->pack_size);

It does read quite cleanly, so I think I'm convinced.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH v2 11/18] git-compat-util.h: introduce `u32_add()`
  2026-01-21 23:55             ` Taylor Blau
@ 2026-01-22  2:26               ` rsbecker
  2026-01-22 17:07                 ` Junio C Hamano
  2026-02-23 13:49               ` Jeff King
  1 sibling, 1 reply; 99+ messages in thread
From: rsbecker @ 2026-01-22  2:26 UTC (permalink / raw)
  To: 'Taylor Blau', 'Patrick Steinhardt'
  Cc: 'Junio C Hamano', git, 'Jeff King',
	'Elijah Newren'

On Taylor Blau January 21, 2026 6:56 PM writes:
>On Wed, Jan 21, 2026 at 09:51:36AM +0100, Patrick Steinhardt wrote:
>> > diff --git a/midx-write.c b/midx-write.c index
>> > 87b97c70872..6006b6569c8 100644
>> > --- a/midx-write.c
>> > +++ b/midx-write.c
>> > @@ -1738,8 +1738,19 @@ static void fill_included_packs_batch(struct
>repository *r,
>> >  		 */
>> >  		expected_size = (uint64_t)pack_info[i].referenced_objects << 14;
>> >  		expected_size /= p->num_objects;
>> > -		expected_size = u64_mult(expected_size, p->pack_size);
>> > -		expected_size = u64_add(expected_size, 1u << 13) >> 14;
>> > +
>> > +		if (unsigned_mult_overflows(expected_size,
>> > +					    (uint64_t)p->pack_size))
>> > +			die(_("overflow during fixed-point multiply (%"PRIu64" "
>> > +			      "* %"PRIu64")"),
>> > +			    expected_size, (uint64_t)p->pack_size);
>> > +		expected_size = expected_size * p->pack_size;
>> > +
>> > +		if (unsigned_add_overflows(expected_size, 1u << 13))
>> > +			die(_("overflow during fixed-point rounding (%"PRIu64" "
>> > +			      " + %"PRIu64")"),
>> > +			    expected_size, (uint64_t)(1ul << 13));
>> > +		expected_size = (expected_size + (1u << 13)) >> 14;
>>
>> One downside this pattern has is that we repeat the computation, which
>> makes it easy to get it wrong or forget updating either the check or
>> the computation.
>>
>> I think ideally, we would have interfaces that combine the two
>> approaches in `u64_mult()` and `unsigned_mult_overflows()`. Something
>> like this for example:
>>
>>     static intline bool u64_mult(uint64_t a, uint64_t b, uint64_t *out)
>>     {
>>         if (unsigned_mult_overflows(a, b))
>>             return false;
>>         *out = a * b;
>>         return true;
>>     }
>
>I had considered this approach when writing, but ultimately decided against it, since
>it felt a little clunky to have to pass a pointer in to do a simple arithmetic operation.
>But I think your point about ensuring that we actually do:
>
>    if (unsigned_mult_overflows(a, b))
>      die(...);
>    result = a * b;
>
>and not "result = a * c" or some other expression which is not "a * b"
>is a good one.
>
>I dunno. The spots in this patch are the only uses of u64_mult() and u64_add(), so
>I'm hesitant to keep a helper function around just for that sole use-case. I wonder if
>we should do what you suggest here for the much more frequently used st_add() /
>st_mult() / st_sub() functions?
>
>> This would let the caller handle the failure and is thus quite
>> flexible, which results in the following code:
>>
>> 	if (!u64_mult(expected_size, (uint64_t)p->pack_size, &expected_size))
>> 		die(_("overflow during fixed-point multiply (%"PRIu64" "
>> 		      "* %"PRIu64")"), expected_size, (uint64_t)p->pack_size);
>
>It does read quite cleanly, so I think I'm convinced.

Are all of these changes endian-safe?


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 11/18] git-compat-util.h: introduce `u32_add()`
  2026-01-22  2:26               ` rsbecker
@ 2026-01-22 17:07                 ` Junio C Hamano
  0 siblings, 0 replies; 99+ messages in thread
From: Junio C Hamano @ 2026-01-22 17:07 UTC (permalink / raw)
  To: rsbecker
  Cc: 'Taylor Blau', 'Patrick Steinhardt', git,
	'Jeff King', 'Elijah Newren'

<rsbecker@nexbridge.com> writes:

>>> This would let the caller handle the failure and is thus quite
>>> flexible, which results in the following code:
>>>
>>> 	if (!u64_mult(expected_size, (uint64_t)p->pack_size, &expected_size))
>>> 		die(_("overflow during fixed-point multiply (%"PRIu64" "
>>> 		      "* %"PRIu64")"), expected_size, (uint64_t)p->pack_size);
>>
>>It does read quite cleanly, so I think I'm convinced.
>
> Are all of these changes endian-safe?

I do not think of any endian-ness issue around here; these "mult"
helpers are checking if the result would overflow using regular
division, without doing any funny byte-shuffling ourselves.
<git-compat-util.h> has definitions of unsigned_mult_overflows(),
and maximum_unsigned_value_of_type(), if you are curious.


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 10/18] midx: do not require packs to be sorted in lexicographic order
  2026-01-14 19:54   ` [PATCH v2 10/18] midx: do not require packs to be sorted in lexicographic order Taylor Blau
  2026-01-14 21:28     ` Junio C Hamano
@ 2026-01-27  7:34     ` Patrick Steinhardt
  2026-02-24 18:47       ` Taylor Blau
  1 sibling, 1 reply; 99+ messages in thread
From: Patrick Steinhardt @ 2026-01-27  7:34 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Elijah Newren, Junio C Hamano

On Wed, Jan 14, 2026 at 02:54:45PM -0500, Taylor Blau wrote:
[snip]
> Because this change produces MIDXs which may not be correctly read with
> external tools or older versions of Git. Though older versions of Git
> know how to gracefully degrade and ignore any MIDX(s) they consider
> corrupt, external tools may not be as robust. To avoid unintentionally
> breaking any such tools, guard this change behind a version bump in the
> MIDX's on-disk format.

s/Because t/T/?

> diff --git a/midx-write.c b/midx-write.c
> index 8a54644e427..5c8700065a1 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -36,10 +36,13 @@ extern int cmp_idx_or_pack_name(const char *idx_or_pack_name,
>  
>  static size_t write_midx_header(const struct git_hash_algo *hash_algo,
>  				struct hashfile *f, unsigned char num_chunks,
> -				uint32_t num_packs)
> +				uint32_t num_packs, int version)
>  {
> +	if (version != MIDX_VERSION_V1 && version != MIDX_VERSION_V2)
> +		BUG("unexpected MIDX version: %d", version);

Asking the function to write a MIDX header of an unsupported version
would be a bug indeed.

> @@ -105,6 +108,8 @@ struct write_midx_context {
>  
>  	uint32_t preferred_pack_idx;
>  
> +	int version; /* must be MIDX_VERSION_V1 or _V2 */

Tiny nit: this could be converted into an `enum` for implicit
documentation.

> @@ -1100,6 +1113,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
>  	struct tempfile *incr;
>  	struct write_midx_context ctx = {
>  		.preferred_pack_idx = NO_PREFERRED_PACK,
> +		.version = MIDX_VERSION_V2,
>  	 };
>  	struct multi_pack_index *midx_to_free = NULL;
>  	int bitmapped_packs_concat_len = 0;

Okay. Here we set the default version, ...

> @@ -1114,6 +1128,10 @@ static int write_midx_internal(struct write_midx_opts *opts)
>  	ctx.repo = r;
>  	ctx.source = opts->source;
>  
> +	repo_config_get_int(ctx.repo, "midx.version", &ctx.version);
> +	if (ctx.version != MIDX_VERSION_V1 && ctx.version != MIDX_VERSION_V2)
> +		die(_("unknown MIDX version: %d"), ctx.version);
> +
>  	ctx.incremental = !!(opts->flags & MIDX_WRITE_INCREMENTAL);
>  
>  	if (ctx.incremental)

... but the user can change it. Good.

> diff --git a/midx.c b/midx.c
> index 19ef230d3fd..1327d0a3695 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -656,17 +658,40 @@ int cmp_idx_or_pack_name(const char *idx_or_pack_name,
>  	return strcmp(idx_or_pack_name, idx_name);
>  }
>  
> +
> +static int midx_pack_names_cmp(const void *a, const void *b, void *m_)
> +{
> +	struct multi_pack_index *m = m_;
> +	return strcmp(m->pack_names[*(const size_t *)a],
> +		      m->pack_names[*(const size_t *)b]);
> +}

Okay, this took a second to figure out. The `pack_names_sorted` is an
array of `size_t` indexes into `m->pack_names`. So what we get here are
these indices, and we can compare by using those indices via
`m->pack_names`. Makes sense.

I was wondering whether this would be easier to follow if
`pack_names_sorted` was a simple array of unowned pointers. So it would
contain the same pointers as `pack_names`, but properly sorted. It would
have the downside of more confusing ownership semantics though.

I assume we cannot live with a simple `bool sorted` field and then sort
`pack_names` lazily?

Patrick

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 17/18] midx: implement MIDX compaction
  2026-01-14 19:55   ` [PATCH v2 17/18] midx: implement MIDX compaction Taylor Blau
@ 2026-01-27  7:35     ` Patrick Steinhardt
  2026-01-27 22:13       ` Taylor Blau
  0 siblings, 1 reply; 99+ messages in thread
From: Patrick Steinhardt @ 2026-01-27  7:35 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Elijah Newren, Junio C Hamano

On Wed, Jan 14, 2026 at 02:55:10PM -0500, Taylor Blau wrote:
> diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
> index c0c6c1760c0..043ee8c478a 100644
> --- a/builtin/multi-pack-index.c
> +++ b/builtin/multi-pack-index.c
> @@ -195,6 +204,70 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
[snip]
> +	if (!from_midx)
> +		die(_("could not find MIDX: %s"), argv[0]);
> +	if (!to_midx)
> +		die(_("could not find MIDX: %s"), argv[1]);
> +	if (from_midx == to_midx)
> +		die(_("MIDX compaction endpoints must be unique"));
> +
> +	for (m = from_midx; m; m = m->base_midx) {
> +		if (m == to_midx)
> +			die(_("MIDX %s must be an ancestor of %s"), argv[0], argv[1]);
> +	}

These new checks all feel sensible to me.

> diff --git a/midx-write.c b/midx-write.c
> index ca2469213e6..afa077a09cc 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -1120,12 +1216,23 @@ static bool midx_needs_update(struct multi_pack_index *midx, struct write_midx_c
> @@ -1162,6 +1270,19 @@ static int write_midx_internal(struct write_midx_opts *opts)
>  		die(_("unknown MIDX version: %d"), ctx.version);
>  
>  	ctx.incremental = !!(opts->flags & MIDX_WRITE_INCREMENTAL);
> +	ctx.compact = !!(opts->flags & MIDX_WRITE_COMPACT);
> +
> +	if (ctx.compact) {
> +		if (ctx.version != MIDX_VERSION_V2)
> +			die(_("cannot perform MIDX compaction with v1 format"));

Right. So if the user has configured "midx.version=1" they cannot
compact.

> @@ -1354,12 +1491,19 @@ static int write_midx_internal(struct write_midx_opts *opts)
>  			ctx.large_offsets_needed = 1;
>  	}
>  
> -	QSORT(ctx.info, ctx.nr, pack_info_compare);
> +	if (ctx.compact) {
> +		if (ctx.version != MIDX_VERSION_V2)
> +			BUG("performing MIDX compaction with v1 MIDX");

Isn't this `BUG()` redundant with the above call to `die()`?

Patrick

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 17/18] midx: implement MIDX compaction
  2026-01-27  7:35     ` Patrick Steinhardt
@ 2026-01-27 22:13       ` Taylor Blau
  0 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-01-27 22:13 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Elijah Newren, Junio C Hamano

On Tue, Jan 27, 2026 at 08:35:10AM +0100, Patrick Steinhardt wrote:
> > +	if (!from_midx)
> > +		die(_("could not find MIDX: %s"), argv[0]);
> > +	if (!to_midx)
> > +		die(_("could not find MIDX: %s"), argv[1]);
> > +	if (from_midx == to_midx)
> > +		die(_("MIDX compaction endpoints must be unique"));
> > +
> > +	for (m = from_midx; m; m = m->base_midx) {
> > +		if (m == to_midx)
> > +			die(_("MIDX %s must be an ancestor of %s"), argv[0], argv[1]);
> > +	}
>
> These new checks all feel sensible to me.

Thanks for taking a look and suggesting them in the first place!

> > diff --git a/midx-write.c b/midx-write.c
> > index ca2469213e6..afa077a09cc 100644
> > --- a/midx-write.c
> > +++ b/midx-write.c
> > @@ -1120,12 +1216,23 @@ static bool midx_needs_update(struct multi_pack_index *midx, struct write_midx_c
> > @@ -1162,6 +1270,19 @@ static int write_midx_internal(struct write_midx_opts *opts)
> >  		die(_("unknown MIDX version: %d"), ctx.version);
> >
> >  	ctx.incremental = !!(opts->flags & MIDX_WRITE_INCREMENTAL);
> > +	ctx.compact = !!(opts->flags & MIDX_WRITE_COMPACT);
> > +
> > +	if (ctx.compact) {
> > +		if (ctx.version != MIDX_VERSION_V2)
> > +			die(_("cannot perform MIDX compaction with v1 format"));
>
> Right. So if the user has configured "midx.version=1" they cannot
> compact.

Exactly. I think the limitation here is a fundamental one, too, since by
its nature compaction *must* retain the pseudo-pack order concatenated
across each MIDX layer in the compaction range. With midx.version=1, we
don't have a way to express that information in a backwards-compatible
way, so midx.version=2 here is a requirement.

> > @@ -1354,12 +1491,19 @@ static int write_midx_internal(struct write_midx_opts *opts)
> >  			ctx.large_offsets_needed = 1;
> >  	}
> >
> > -	QSORT(ctx.info, ctx.nr, pack_info_compare);
> > +	if (ctx.compact) {
> > +		if (ctx.version != MIDX_VERSION_V2)
> > +			BUG("performing MIDX compaction with v1 MIDX");
>
> Isn't this `BUG()` redundant with the above call to `die()`?

Technically, though I put it in here as a sanity check to ensure that
any potential regressions with the above die() don't cause us to get
into a worse situation that would result in bitmap corruption.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction
  2026-01-14 19:54 ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                     ` (17 preceding siblings ...)
  2026-01-14 19:55   ` [PATCH v2 18/18] midx: enable reachability bitmaps during " Taylor Blau
@ 2026-02-20 22:24   ` Junio C Hamano
  2026-02-23 14:08     ` Jeff King
  18 siblings, 1 reply; 99+ messages in thread
From: Junio C Hamano @ 2026-02-20 22:24 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Elijah Newren, Patrick Steinhardt

Taylor Blau <me@ttaylorr.com> writes:

> [Note to the maintainer: this is based on 'master' with my
> 'tb/midx-write-corrupt-checksum-fix' merged in and should produce zero
> conflicts when applied on top.]
>
> This is a reroll of my series to implement MIDX layer compaction
> adjusted in response to reviewer feedback.

Haven't seen a lot happen since the end of last month or so.
Anything left to do here?

Thanks.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 11/18] git-compat-util.h: introduce `u32_add()`
  2026-01-21 23:55             ` Taylor Blau
  2026-01-22  2:26               ` rsbecker
@ 2026-02-23 13:49               ` Jeff King
  2026-02-24 18:53                 ` Taylor Blau
  1 sibling, 1 reply; 99+ messages in thread
From: Jeff King @ 2026-02-23 13:49 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Patrick Steinhardt, Junio C Hamano, git, Elijah Newren

On Wed, Jan 21, 2026 at 06:55:55PM -0500, Taylor Blau wrote:

> I had considered this approach when writing, but ultimately decided
> against it, since it felt a little clunky to have to pass a pointer in
> to do a simple arithmetic operation. But I think your point about
> ensuring that we actually do:
> 
>     if (unsigned_mult_overflows(a, b))
>       die(...);
>     result = a * b;
> 
> and not "result = a * c" or some other expression which is not "a * b"
> is a good one.

It is clunky, but it's how the compiler intrinsics work (if we ever
chose to use them).

> I dunno. The spots in this patch are the only uses of u64_mult() and
> u64_add(), so I'm hesitant to keep a helper function around just for
> that sole use-case. I wonder if we should do what you suggest here for
> the much more frequently used st_add() / st_mult() / st_sub() functions?

I am to blame for the crappy interface of the st_add() etc functions. It
did make conversion _much_ easier, because you can do stuff like:

  -foo = malloc(nr * size);
  +foo = malloc(st_mult(nr, size));

as opposed to:

  size_t total;
  ...
  st_mult(&total, nr, size));
  foo = malloc(total);

My rationale was that size_t computations like this are OK to die() with
very little useful error reporting up the chain because:

  1. The result is generally just passed along to malloc() anyway, where
     we likewise find it OK to die() without much info. So you can
     imagine a world where we just do 128-bit size computations and then
     let malloc() fail, and it would look the same. ;)

  2. They don't happen in practice unless there is a bug or a malicious
     input. Which is mostly true for 64-bit systems. Maybe less so for
     32-bit ones, where you might conceivably wish to have 4 billion of
     something.

I don't think any of that holds true for u32 values like counts of
objects. It's conceivable that you might want to try to write a midx for
two packs with 2.1 billion objects each (though from my experience, such
a repo would be unusable).

Anyway. My point is mostly that I think we can design u32_add() to be
what we want and not worry too much about going back to fix st_add(),
etc.

-Peff

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction
  2026-02-20 22:24   ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Junio C Hamano
@ 2026-02-23 14:08     ` Jeff King
  2026-02-24  5:25       ` Taylor Blau
  0 siblings, 1 reply; 99+ messages in thread
From: Jeff King @ 2026-02-23 14:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, Elijah Newren, Patrick Steinhardt

On Fri, Feb 20, 2026 at 02:24:25PM -0800, Junio C Hamano wrote:

> Taylor Blau <me@ttaylorr.com> writes:
> 
> > [Note to the maintainer: this is based on 'master' with my
> > 'tb/midx-write-corrupt-checksum-fix' merged in and should produce zero
> > conflicts when applied on top.]
> >
> > This is a reroll of my series to implement MIDX layer compaction
> > adjusted in response to reviewer feedback.
> 
> Haven't seen a lot happen since the end of last month or so.
> Anything left to do here?

Sorry, I promised Taylor I would review it and I have been dragging my
feet.

I just read through the whole series, and didn't find anything that
hadn't already been brought up. The scariest / most complicated bits
are:

  1. Dropping the sort-order requirement for midx pack lists in patch
     10. But I think the approach with the version bump is the maximally
     conservative one, and I feel good about that. IIRC we discussed
     previously the possibility of using new chunks to give the
     alternate ordering, but I think this keeps the complexity in the
     code to a minimum.

  2. The actual compaction bits in the final two patches. I didn't see
     anything wrong here, but this is exactly the kind of spot where I
     think review fails, because you don't realize the corner case that
     you missed (speaking from experience with midx and bitmap code).
     But the nice thing here is that it should be quite unlikely to
     cause a regression if you're not using compaction. As this is
     mostly a building block for "part 3" that starts doing compaction
     as part of a broader repacking strategy, I think it is OK to
     consider it somewhat-experimental, build the next layer, and then
     eventually let it see more exercise in a production environment.

     That's how we've traditionally found those corner cases, and I
     think trying to spend more time staring at it in review is not
     likely to produce more insights.

There were a couple minor issues brought up in review, like out-dated
comments and the u32_add interface. So I think we might need a v3 with a
few touch-ups, but that's it.

-Peff

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction
  2026-02-23 14:08     ` Jeff King
@ 2026-02-24  5:25       ` Taylor Blau
  0 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-02-24  5:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Elijah Newren, Patrick Steinhardt

On Mon, Feb 23, 2026 at 09:08:47AM -0500, Jeff King wrote:
> On Fri, Feb 20, 2026 at 02:24:25PM -0800, Junio C Hamano wrote:
>
> > Taylor Blau <me@ttaylorr.com> writes:
> >
> > > [Note to the maintainer: this is based on 'master' with my
> > > 'tb/midx-write-corrupt-checksum-fix' merged in and should produce zero
> > > conflicts when applied on top.]
> > >
> > > This is a reroll of my series to implement MIDX layer compaction
> > > adjusted in response to reviewer feedback.
> >
> > Haven't seen a lot happen since the end of last month or so.
> > Anything left to do here?
>
> Sorry, I promised Taylor I would review it and I have been dragging my
> feet.
>
> I just read through the whole series, and didn't find anything that
> hadn't already been brought up. The scariest / most complicated bits
> are:
>
>   1. Dropping the sort-order requirement for midx pack lists in patch
>      10. But I think the approach with the version bump is the maximally
>      conservative one, and I feel good about that. IIRC we discussed
>      previously the possibility of using new chunks to give the
>      alternate ordering, but I think this keeps the complexity in the
>      code to a minimum.

Thanks, I think the "midx.version" approach is the one that I'm most
comfortable with too.

FWIW, I think that it's impossible to supplement the existing version
with an additional chunk to indicate the new ordering while maintaining
compatibility with out-of-tree MIDX readers. If others have thoughts on
how to do this, I'd be very curious to discuss it more, but I think what
we have in this series is likely the best we can do without breaking,
e.g., libgit2.

>   2. The actual compaction bits in the final two patches. I didn't see
>      anything wrong here, but this is exactly the kind of spot where I
>      think review fails, because you don't realize the corner case that
>      you missed (speaking from experience with midx and bitmap code).

Indeed.

>      But the nice thing here is that it should be quite unlikely to
>      cause a regression if you're not using compaction. As this is
>      mostly a building block for "part 3" that starts doing compaction
>      as part of a broader repacking strategy, I think it is OK to
>      consider it somewhat-experimental, build the next layer, and then
>      eventually let it see more exercise in a production environment.
>
>      That's how we've traditionally found those corner cases, and I
>      think trying to spend more time staring at it in review is not
>      likely to produce more insights.

As you note, compaction on its own is not all that interesting, since
users are unlikely to be managing their MIDX layers manually. Compaction
really is the final building block for the new repacking strategy, at
which point this code becomes a lot more accessible and therefore easy
to test and experiment with.

I'm going to deploy this slowly to GitHub's infrastructure in the coming
weeks[^1] and opting personal repositories into the new code to flush
out any bug(s).

As I find and fix those, I'll of course share the results here.

> There were a couple minor issues brought up in review, like out-dated
> comments and the u32_add interface. So I think we might need a v3 with a
> few touch-ups, but that's it.

Thanks. I have the following range-diff prepared locally, but after a
day of traveling I am too tired to confidently declare it free of any
typos/thinkos ;-).

--- 8< ---
 1:  2e549ea6443 =  1:  275960bf36f midx: mark `get_midx_checksum()` arguments as const
 2:  7255adafe70 =  2:  6b769517656 midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()`
 3:  25b628fda97 =  3:  64a56ee1506 midx: introduce `midx_get_checksum_hex()`
 4:  2aedd72db8c =  4:  f6d3ecbb283 builtin/multi-pack-index.c: make '--progress' a common option
 5:  a00598a36a3 =  5:  7d9f5766cda git-multi-pack-index(1): remove non-existent incompatibility
 6:  92e6d868a45 =  6:  8ba8529fe32 git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h'
 7:  ff599c11f68 =  7:  cd2b9f6f94f t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39
 8:  315a0ea2985 =  8:  983cff2f1b7 midx-write.c: don't use `pack_perm` when assigning `bitmap_pos`
 9:  af174e22e1e =  9:  5f03fda72ef midx-write.c: introduce `struct write_midx_opts`
10:  72bcd4ed6c7 ! 10:  9b02544b9ae midx: do not require packs to be sorted in lexicographic order
    @@ Documentation/gitformat-pack.adoc: HEADER:

      	1-byte version number:
     -	    Git only writes or recognizes version 1.
    -+	    Git only writes version 2, but recognizes versions 1 and 2.
    ++	    Git writes the version specified by the "midx.version"
    ++	    configuration option, which defaults to 2. It recognizes
    ++	    both versions 1 and 2.

      	1-byte Object Id Version
      	    We infer the length of object IDs (OIDs) from this value:
11:  c0c1769464b <  -:  ----------- git-compat-util.h: introduce `u32_add()`
12:  c11214a51f0 = 11:  06536b2081a midx-write.c: introduce `midx_pack_perm()` helper
13:  b9244a04297 = 12:  ee92de1795e midx-write.c: extract `fill_pack_from_midx()`
14:  c6f8d323477 = 13:  217cd5799a5 midx-write.c: enumerate `pack_int_id` values directly
15:  e71aa575463 = 14:  1193e9b3c49 midx-write.c: factor fanout layering from `compute_sorted_entries()`
16:  dbbcb494563 = 15:  73f2151609b t/helper/test-read-midx.c: plug memory leak when selecting layer
17:  13336e864f4 ! 16:  7344ce8e533 midx: implement MIDX compaction
    @@ Commit message
            `pack_int_id`'s (as opposed to the index at which each pack appears
            in `ctx.info`).

    +       Note that we cannot reuse `midx_fanout_add_midx_fanout()` directly
    +       here, as it unconditionally recurs through the `->base_midx`. Factor
    +       out a `_1()` variant that operates on a single layer, reimplement
    +       the existing function in terms of it, and use the new variant from
    +       `midx_fanout_add_compact()`.
    +
    +       Since we are sorting the list of objects ourselves, the order we add
    +       them in does not matter.
    +
          - When writing out the new 'multi-pack-index-chain' file, discard any
            layers in the compaction range, replacing them with the newly written
            layer, instead of keeping them and placing the new layer at the end
    @@ midx-write.c: struct write_midx_context {
      	return ctx->pack_perm[orig_pack_int_id];
      }

    +@@ midx-write.c: static void midx_fanout_sort(struct midx_fanout *fanout)
    + 	QSORT(fanout->entries, fanout->nr, midx_oid_compare);
    + }
    +
    +-static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
    +-					struct multi_pack_index *m,
    +-					uint32_t cur_fanout,
    +-					uint32_t preferred_pack)
    ++static void midx_fanout_add_midx_fanout_1(struct midx_fanout *fanout,
    ++					  struct multi_pack_index *m,
    ++					  uint32_t cur_fanout,
    ++					  uint32_t preferred_pack)
    + {
    + 	uint32_t start = m->num_objects_in_base, end;
    + 	uint32_t cur_object;
    +
    +-	if (m->base_midx)
    +-		midx_fanout_add_midx_fanout(fanout, m->base_midx, cur_fanout,
    +-					    preferred_pack);
    +-
    + 	if (cur_fanout)
    + 		start += ntohl(m->chunk_oid_fanout[cur_fanout - 1]);
    + 	end = m->num_objects_in_base + ntohl(m->chunk_oid_fanout[cur_fanout]);
    +@@ midx-write.c: static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
    + 	}
    + }
    +
    ++static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
    ++					struct multi_pack_index *m,
    ++					uint32_t cur_fanout,
    ++					uint32_t preferred_pack)
    ++{
    ++	if (m->base_midx)
    ++		midx_fanout_add_midx_fanout(fanout, m->base_midx, cur_fanout,
    ++					    preferred_pack);
    ++	midx_fanout_add_midx_fanout_1(fanout, m, cur_fanout, preferred_pack);
    ++}
    ++
    + static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout,
    + 					struct pack_info *info,
    + 					uint32_t cur_pack,
     @@ midx-write.c: static void midx_fanout_add(struct midx_fanout *fanout,
      					    cur_fanout);
      }
    @@ midx-write.c: static void midx_fanout_add(struct midx_fanout *fanout,
     +	ASSERT(ctx->compact);
     +
     +	while (m && m != ctx->compact_from->base_midx) {
    -+		midx_fanout_add_midx_fanout(fanout, m, cur_fanout,
    -+					    NO_PREFERRED_PACK);
    ++		midx_fanout_add_midx_fanout_1(fanout, m, cur_fanout,
    ++					      NO_PREFERRED_PACK);
     +		m = m->base_midx;
     +	}
     +}
    @@ midx-write.c: static int fill_packs_from_midx(struct write_midx_context *ctx)
     +
     +	ASSERT(from && to);
     +
    -+	nr = u32_add(to->num_packs, to->num_packs_in_base);
    ++	if (unsigned_add_overflows(to->num_packs, to->num_packs_in_base))
    ++		die(_("too many packs, unable to compact"));
    ++
    ++	nr = to->num_packs + to->num_packs_in_base;
     +	if (nr < from->num_packs_in_base)
     +		BUG("unexpected number of packs in base during compaction: "
     +		    "%"PRIu32" < %"PRIu32, nr, from->num_packs_in_base);
18:  b599f1ad4b0 = 17:  b9c1b3992c5 midx: enable reachability bitmaps during MIDX compaction
--- >8 ---

Thanks,
Taylor

[^1]: Worth noting that I am at an offsite for work this week, and then
  taking a planned vacation for the following two, so I may be a little
  less available than usual.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 10/18] midx: do not require packs to be sorted in lexicographic order
  2026-01-27  7:34     ` Patrick Steinhardt
@ 2026-02-24 18:47       ` Taylor Blau
  0 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-02-24 18:47 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Elijah Newren, Junio C Hamano

On Tue, Jan 27, 2026 at 08:34:59AM +0100, Patrick Steinhardt wrote:
> On Wed, Jan 14, 2026 at 02:54:45PM -0500, Taylor Blau wrote:
> [snip]
> > Because this change produces MIDXs which may not be correctly read with
> > external tools or older versions of Git. Though older versions of Git
> > know how to gracefully degrade and ignore any MIDX(s) they consider
> > corrupt, external tools may not be as robust. To avoid unintentionally
> > breaking any such tools, guard this change behind a version bump in the
> > MIDX's on-disk format.
>
> s/Because t/T/?

Nice spotting, thank you.

> > @@ -105,6 +108,8 @@ struct write_midx_context {
> >
> >  	uint32_t preferred_pack_idx;
> >
> > +	int version; /* must be MIDX_VERSION_V1 or _V2 */
>
> Tiny nit: this could be converted into an `enum` for implicit
> documentation.

I wrote it this way initially, but ended up dropping the type down to
"int" because the enum values named "_V1" and "_V2" would have integral
values of "1", and "2", respectively.

> > diff --git a/midx.c b/midx.c
> > index 19ef230d3fd..1327d0a3695 100644
> > --- a/midx.c
> > +++ b/midx.c
> > @@ -656,17 +658,40 @@ int cmp_idx_or_pack_name(const char *idx_or_pack_name,
> >  	return strcmp(idx_or_pack_name, idx_name);
> >  }
> >
> > +
> > +static int midx_pack_names_cmp(const void *a, const void *b, void *m_)
> > +{
> > +	struct multi_pack_index *m = m_;
> > +	return strcmp(m->pack_names[*(const size_t *)a],
> > +		      m->pack_names[*(const size_t *)b]);
> > +}
>
> Okay, this took a second to figure out. The `pack_names_sorted` is an
> array of `size_t` indexes into `m->pack_names`. So what we get here are
> these indices, and we can compare by using those indices via
> `m->pack_names`. Makes sense.
>
> I was wondering whether this would be easier to follow if
> `pack_names_sorted` was a simple array of unowned pointers. So it would
> contain the same pointers as `pack_names`, but properly sorted. It would
> have the downside of more confusing ownership semantics though.

Exactly. For that reason, I generally prefer holding onto a permutation
array (of indices) rather than a permuted form of the existing array.
The ownership of the former is unambiguous, but the latter is IMHO not.

> I assume we cannot live with a simple `bool sorted` field and then sort
> `pack_names` lazily?

Unfortunately not, because the pack_names array is indexed by the
pack_int_id, which we have to preserve.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 11/18] git-compat-util.h: introduce `u32_add()`
  2026-02-23 13:49               ` Jeff King
@ 2026-02-24 18:53                 ` Taylor Blau
  0 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-02-24 18:53 UTC (permalink / raw)
  To: Jeff King; +Cc: Patrick Steinhardt, Junio C Hamano, git, Elijah Newren

On Mon, Feb 23, 2026 at 08:49:35AM -0500, Jeff King wrote:
> I am to blame for the crappy interface of the st_add() etc functions. It
> did make conversion _much_ easier, because you can do stuff like:
>
>   -foo = malloc(nr * size);
>   +foo = malloc(st_mult(nr, size));
>
> as opposed to:
>
>   size_t total;
>   ...
>   st_mult(&total, nr, size));
>   foo = malloc(total);

Yeah, I agree that the latter is far less readable and more verbose than
the former. I think that your reasoning about why it's OK to just die()
here

> My rationale was that size_t computations like this are OK to die() with
> very little useful error reporting up the chain because:

All very sensible.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3 00/17] midx: incremental MIDX/bitmap layer compaction
  2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
                   ` (17 preceding siblings ...)
  2026-01-14 19:54 ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Taylor Blau
@ 2026-02-24 18:59 ` Taylor Blau
  2026-02-24 18:59   ` [PATCH v3 01/17] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
                     ` (16 more replies)
  18 siblings, 17 replies; 99+ messages in thread
From: Taylor Blau @ 2026-02-24 18:59 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

[Note to the maintainer: this is based on 'master', which is 7c02d39fc2e
(The 6th batch, 2026-02-20) at the time of writing.]

This is a tiny reroll of my series to implement MIDX layer compaction
adjusted in response to reviewer feedback.

As usual, a range-diff is included below for convenience. The main
changes since last time are as follows:

 - Some typo and documentation fixes around the new "midx.version"
   setting.

 - Bugfix when adding objects from MIDXs in the compaction range by
   fanout layer where we recur beyond the lower end of the compaction
   boundary.

 - Drop u32_add().

I think that this should be fairly close to done, at which point I can
send the next series (tb/incremental-midx-part-3.3) which introduces the
new compaction-based repacking strategy on top of the tools introduced
in this series.

Like Peff wrote in his review of the last round, the proof here will be
in the pudding, and I am sure that as we continue to test and use these
patches we'll find and fix interesting bugs. But these changes are all
fairly well isolated to the compaction logic itself, so the risk of
regression among any existing use-cases should be low.

The original cover letter may be found here[1].

Thanks in advance for your review!

[1]: https://lore.kernel.org/git/cover.1765053054.git.me@ttaylorr.com/

Taylor Blau (17):
  midx: mark `get_midx_checksum()` arguments as const
  midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()`
  midx: introduce `midx_get_checksum_hex()`
  builtin/multi-pack-index.c: make '--progress' a common option
  git-multi-pack-index(1): remove non-existent incompatibility
  git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h'
  t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39
  midx-write.c: don't use `pack_perm` when assigning `bitmap_pos`
  midx-write.c: introduce `struct write_midx_opts`
  midx: do not require packs to be sorted in lexicographic order
  midx-write.c: introduce `midx_pack_perm()` helper
  midx-write.c: extract `fill_pack_from_midx()`
  midx-write.c: enumerate `pack_int_id` values directly
  midx-write.c: factor fanout layering from `compute_sorted_entries()`
  t/helper/test-read-midx.c: plug memory leak when selecting layer
  midx: implement MIDX compaction
  midx: enable reachability bitmaps during MIDX compaction

 Documentation/git-multi-pack-index.adoc |  27 +-
 Documentation/gitformat-pack.adoc       |   8 +-
 builtin/multi-pack-index.c              |  91 +++-
 midx-write.c                            | 525 ++++++++++++++++++------
 midx.c                                  |  39 +-
 midx.h                                  |  12 +-
 pack-bitmap.c                           |   9 +-
 pack-revindex.c                         |   4 +-
 t/helper/test-read-midx.c               |  21 +-
 t/meson.build                           |   1 +
 t/t0450/adoc-help-mismatches            |   1 -
 t/t5319-multi-pack-index.sh             |  18 +-
 t/t5335-compact-multi-pack-index.sh     | 293 +++++++++++++
 13 files changed, 891 insertions(+), 158 deletions(-)
 create mode 100755 t/t5335-compact-multi-pack-index.sh

Range-diff against v2:
 1:  2e549ea6443 =  1:  61045c604c0 midx: mark `get_midx_checksum()` arguments as const
 2:  7255adafe70 =  2:  61dd8e65d1a midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()`
 3:  25b628fda97 =  3:  cc5c7783062 midx: introduce `midx_get_checksum_hex()`
 4:  2aedd72db8c =  4:  016420264d2 builtin/multi-pack-index.c: make '--progress' a common option
 5:  a00598a36a3 =  5:  feb9ca5538e git-multi-pack-index(1): remove non-existent incompatibility
 6:  92e6d868a45 =  6:  1e86068046d git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h'
 7:  ff599c11f68 =  7:  06a96e16c37 t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39
 8:  315a0ea2985 =  8:  84d9d1ef7ba midx-write.c: don't use `pack_perm` when assigning `bitmap_pos`
 9:  af174e22e1e =  9:  132e823e758 midx-write.c: introduce `struct write_midx_opts`
10:  72bcd4ed6c7 ! 10:  3682bfd0e08 midx: do not require packs to be sorted in lexicographic order
    @@ Commit message
         lazily instantiate a `pack_names_sorted` array on the MIDX, which will
         be used to implement the binary search over pack names.
     
    -    Because this change produces MIDXs which may not be correctly read with
    -    external tools or older versions of Git. Though older versions of Git
    -    know how to gracefully degrade and ignore any MIDX(s) they consider
    -    corrupt, external tools may not be as robust. To avoid unintentionally
    -    breaking any such tools, guard this change behind a version bump in the
    -    MIDX's on-disk format.
    +    This change produces MIDXs which may not be correctly read with external
    +    tools or older versions of Git. Though older versions of Git know how to
    +    gracefully degrade and ignore any MIDX(s) they consider corrupt,
    +    external tools may not be as robust. To avoid unintentionally breaking
    +    any such tools, guard this change behind a version bump in the MIDX's
    +    on-disk format.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    @@ Documentation/gitformat-pack.adoc: HEADER:
      
      	1-byte version number:
     -	    Git only writes or recognizes version 1.
    -+	    Git only writes version 2, but recognizes versions 1 and 2.
    ++	    Git writes the version specified by the "midx.version"
    ++	    configuration option, which defaults to 2. It recognizes
    ++	    both versions 1 and 2.
      
      	1-byte Object Id Version
      	    We infer the length of object IDs (OIDs) from this value:
11:  c0c1769464b <  -:  ----------- git-compat-util.h: introduce `u32_add()`
12:  c11214a51f0 = 11:  7f99d3d728a midx-write.c: introduce `midx_pack_perm()` helper
13:  b9244a04297 = 12:  f3952f7db36 midx-write.c: extract `fill_pack_from_midx()`
14:  c6f8d323477 = 13:  238bd203eaa midx-write.c: enumerate `pack_int_id` values directly
15:  e71aa575463 = 14:  3a139575b15 midx-write.c: factor fanout layering from `compute_sorted_entries()`
16:  dbbcb494563 = 15:  505c8d72aa1 t/helper/test-read-midx.c: plug memory leak when selecting layer
17:  13336e864f4 ! 16:  19c983138c7 midx: implement MIDX compaction
    @@ Commit message
            `pack_int_id`'s (as opposed to the index at which each pack appears
            in `ctx.info`).
     
    +       Note that we cannot reuse `midx_fanout_add_midx_fanout()` directly
    +       here, as it unconditionally recurs through the `->base_midx`. Factor
    +       out a `_1()` variant that operates on a single layer, reimplement
    +       the existing function in terms of it, and use the new variant from
    +       `midx_fanout_add_compact()`.
    +
    +       Since we are sorting the list of objects ourselves, the order we add
    +       them in does not matter.
    +
          - When writing out the new 'multi-pack-index-chain' file, discard any
            layers in the compaction range, replacing them with the newly written
            layer, instead of keeping them and placing the new layer at the end
    @@ midx-write.c: struct write_midx_context {
      	return ctx->pack_perm[orig_pack_int_id];
      }
      
    +@@ midx-write.c: static void midx_fanout_sort(struct midx_fanout *fanout)
    + 	QSORT(fanout->entries, fanout->nr, midx_oid_compare);
    + }
    + 
    +-static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
    +-					struct multi_pack_index *m,
    +-					uint32_t cur_fanout,
    +-					uint32_t preferred_pack)
    ++static void midx_fanout_add_midx_fanout_1(struct midx_fanout *fanout,
    ++					  struct multi_pack_index *m,
    ++					  uint32_t cur_fanout,
    ++					  uint32_t preferred_pack)
    + {
    + 	uint32_t start = m->num_objects_in_base, end;
    + 	uint32_t cur_object;
    + 
    +-	if (m->base_midx)
    +-		midx_fanout_add_midx_fanout(fanout, m->base_midx, cur_fanout,
    +-					    preferred_pack);
    +-
    + 	if (cur_fanout)
    + 		start += ntohl(m->chunk_oid_fanout[cur_fanout - 1]);
    + 	end = m->num_objects_in_base + ntohl(m->chunk_oid_fanout[cur_fanout]);
    +@@ midx-write.c: static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
    + 	}
    + }
    + 
    ++static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
    ++					struct multi_pack_index *m,
    ++					uint32_t cur_fanout,
    ++					uint32_t preferred_pack)
    ++{
    ++	if (m->base_midx)
    ++		midx_fanout_add_midx_fanout(fanout, m->base_midx, cur_fanout,
    ++					    preferred_pack);
    ++	midx_fanout_add_midx_fanout_1(fanout, m, cur_fanout, preferred_pack);
    ++}
    ++
    + static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout,
    + 					struct pack_info *info,
    + 					uint32_t cur_pack,
     @@ midx-write.c: static void midx_fanout_add(struct midx_fanout *fanout,
      					    cur_fanout);
      }
    @@ midx-write.c: static void midx_fanout_add(struct midx_fanout *fanout,
     +	ASSERT(ctx->compact);
     +
     +	while (m && m != ctx->compact_from->base_midx) {
    -+		midx_fanout_add_midx_fanout(fanout, m, cur_fanout,
    -+					    NO_PREFERRED_PACK);
    ++		midx_fanout_add_midx_fanout_1(fanout, m, cur_fanout,
    ++					      NO_PREFERRED_PACK);
     +		m = m->base_midx;
     +	}
     +}
    @@ midx-write.c: static int fill_packs_from_midx(struct write_midx_context *ctx)
     +
     +	ASSERT(from && to);
     +
    -+	nr = u32_add(to->num_packs, to->num_packs_in_base);
    ++	if (unsigned_add_overflows(to->num_packs, to->num_packs_in_base))
    ++		die(_("too many packs, unable to compact"));
    ++
    ++	nr = to->num_packs + to->num_packs_in_base;
     +	if (nr < from->num_packs_in_base)
     +		BUG("unexpected number of packs in base during compaction: "
     +		    "%"PRIu32" < %"PRIu32, nr, from->num_packs_in_base);
18:  b599f1ad4b0 = 17:  ac6221686db midx: enable reachability bitmaps during MIDX compaction

base-commit: 7c02d39fc2ed2702223c7674f73150d9a7e61ba4
-- 
2.53.0.171.gde83996e422

^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3 01/17] midx: mark `get_midx_checksum()` arguments as const
  2026-02-24 18:59 ` [PATCH v3 00/17] " Taylor Blau
@ 2026-02-24 18:59   ` Taylor Blau
  2026-02-24 18:59   ` [PATCH v3 02/17] midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()` Taylor Blau
                     ` (15 subsequent siblings)
  16 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-02-24 18:59 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

To make clear that the function `get_midx_checksum()` does not do
anything to modify its argument, mark the MIDX pointer as const.

The following commit will rename this function altogether to make clear
that it returns the raw bytes of the checksum, not a hex-encoded copy of
it.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 2 +-
 midx.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index a75ea99a0d4..2a6b18954c5 100644
--- a/midx.c
+++ b/midx.c
@@ -24,7 +24,7 @@ void clear_incremental_midx_files_ext(struct odb_source *source, const char *ext
 int cmp_idx_or_pack_name(const char *idx_or_pack_name,
 			 const char *idx_name);
 
-const unsigned char *get_midx_checksum(struct multi_pack_index *m)
+const unsigned char *get_midx_checksum(const struct multi_pack_index *m)
 {
 	return m->data + m->data_len - m->source->odb->repo->hash_algo->rawsz;
 }
diff --git a/midx.h b/midx.h
index 6e54d73503d..7c7e0b59121 100644
--- a/midx.h
+++ b/midx.h
@@ -85,7 +85,7 @@ struct multi_pack_index {
 #define MIDX_EXT_BITMAP "bitmap"
 #define MIDX_EXT_MIDX "midx"
 
-const unsigned char *get_midx_checksum(struct multi_pack_index *m);
+const unsigned char *get_midx_checksum(const struct multi_pack_index *m);
 void get_midx_filename(struct odb_source *source, struct strbuf *out);
 void get_midx_filename_ext(struct odb_source *source, struct strbuf *out,
 			   const unsigned char *hash, const char *ext);
-- 
2.53.0.171.gde83996e422


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 02/17] midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()`
  2026-02-24 18:59 ` [PATCH v3 00/17] " Taylor Blau
  2026-02-24 18:59   ` [PATCH v3 01/17] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
@ 2026-02-24 18:59   ` Taylor Blau
  2026-02-24 18:59   ` [PATCH v3 03/17] midx: introduce `midx_get_checksum_hex()` Taylor Blau
                     ` (14 subsequent siblings)
  16 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-02-24 18:59 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Since 541204aabea (Documentation: document naming schema for structs and
their functions, 2024-07-30), we have adopted a naming convention for
functions that would prefer a name like, say, `midx_get_checksum()` over
`get_midx_checksum()`.

Adopt this convention throughout the midx.h API. Since this function
returns a raw (that is, non-hex encoded) hash, let's suffix the function
with "_hash()" to make this clear. As a side effect, this prepares us
for the subsequent change which will introduce a "_hex()" variant that
encodes the checksum itself.

Suggested-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c              | 6 +++---
 midx.c                    | 2 +-
 midx.h                    | 2 +-
 pack-bitmap.c             | 8 ++++----
 pack-revindex.c           | 4 ++--
 t/helper/test-read-midx.c | 4 ++--
 6 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 6485cb67068..73d33752ef1 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -946,7 +946,7 @@ static int link_midx_to_chain(struct multi_pack_index *m)
 	}
 
 	for (i = 0; i < ARRAY_SIZE(midx_exts); i++) {
-		const unsigned char *hash = get_midx_checksum(m);
+		const unsigned char *hash = midx_get_checksum_hash(m);
 
 		get_midx_filename_ext(m->source, &from,
 				      hash, midx_exts[i].non_split);
@@ -1151,7 +1151,7 @@ static int write_midx_internal(struct odb_source *source,
 		while (m) {
 			if (flags & MIDX_WRITE_BITMAP && load_midx_revindex(m)) {
 				error(_("could not load reverse index for MIDX %s"),
-				      hash_to_hex_algop(get_midx_checksum(m),
+				      hash_to_hex_algop(midx_get_checksum_hash(m),
 							m->source->odb->repo->hash_algo));
 				goto cleanup;
 			}
@@ -1520,7 +1520,7 @@ static int write_midx_internal(struct odb_source *source,
 		for (uint32_t i = 0; i < ctx.num_multi_pack_indexes_before; i++) {
 			uint32_t j = ctx.num_multi_pack_indexes_before - i - 1;
 
-			keep_hashes[j] = xstrdup(hash_to_hex_algop(get_midx_checksum(m),
+			keep_hashes[j] = xstrdup(hash_to_hex_algop(midx_get_checksum_hash(m),
 								   r->hash_algo));
 			m = m->base_midx;
 		}
diff --git a/midx.c b/midx.c
index 2a6b18954c5..1d072bd9931 100644
--- a/midx.c
+++ b/midx.c
@@ -24,7 +24,7 @@ void clear_incremental_midx_files_ext(struct odb_source *source, const char *ext
 int cmp_idx_or_pack_name(const char *idx_or_pack_name,
 			 const char *idx_name);
 
-const unsigned char *get_midx_checksum(const struct multi_pack_index *m)
+const unsigned char *midx_get_checksum_hash(const struct multi_pack_index *m)
 {
 	return m->data + m->data_len - m->source->odb->repo->hash_algo->rawsz;
 }
diff --git a/midx.h b/midx.h
index 7c7e0b59121..62d6105195f 100644
--- a/midx.h
+++ b/midx.h
@@ -85,7 +85,7 @@ struct multi_pack_index {
 #define MIDX_EXT_BITMAP "bitmap"
 #define MIDX_EXT_MIDX "midx"
 
-const unsigned char *get_midx_checksum(const struct multi_pack_index *m);
+const unsigned char *midx_get_checksum_hash(const struct multi_pack_index *m);
 void get_midx_filename(struct odb_source *source, struct strbuf *out);
 void get_midx_filename_ext(struct odb_source *source, struct strbuf *out,
 			   const unsigned char *hash, const char *ext);
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 972203f12b6..6307bbdf1e1 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -441,11 +441,11 @@ char *midx_bitmap_filename(struct multi_pack_index *midx)
 	struct strbuf buf = STRBUF_INIT;
 	if (midx->has_chain)
 		get_split_midx_filename_ext(midx->source, &buf,
-					    get_midx_checksum(midx),
+					    midx_get_checksum_hash(midx),
 					    MIDX_EXT_BITMAP);
 	else
 		get_midx_filename_ext(midx->source, &buf,
-				      get_midx_checksum(midx),
+				      midx_get_checksum_hash(midx),
 				      MIDX_EXT_BITMAP);
 
 	return strbuf_detach(&buf, NULL);
@@ -502,7 +502,7 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
 	if (load_bitmap_header(bitmap_git) < 0)
 		goto cleanup;
 
-	if (!hasheq(get_midx_checksum(bitmap_git->midx), bitmap_git->checksum,
+	if (!hasheq(midx_get_checksum_hash(bitmap_git->midx), bitmap_git->checksum,
 		    bitmap_repo(bitmap_git)->hash_algo)) {
 		error(_("checksum doesn't match in MIDX and bitmap"));
 		goto cleanup;
@@ -2819,7 +2819,7 @@ void test_bitmap_walk(struct rev_info *revs)
 
 		if (bitmap_is_midx(found))
 			fprintf_ln(stderr, "Located via MIDX '%s'.",
-				   hash_to_hex_algop(get_midx_checksum(found->midx),
+				   hash_to_hex_algop(midx_get_checksum_hash(found->midx),
 						     revs->repo->hash_algo));
 		else
 			fprintf_ln(stderr, "Located via pack '%s'.",
diff --git a/pack-revindex.c b/pack-revindex.c
index 56cd803a679..294b802d402 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -390,11 +390,11 @@ int load_midx_revindex(struct multi_pack_index *m)
 
 	if (m->has_chain)
 		get_split_midx_filename_ext(m->source, &revindex_name,
-					    get_midx_checksum(m),
+					    midx_get_checksum_hash(m),
 					    MIDX_EXT_REV);
 	else
 		get_midx_filename_ext(m->source, &revindex_name,
-				      get_midx_checksum(m),
+				      midx_get_checksum_hash(m),
 				      MIDX_EXT_REV);
 
 	ret = load_revindex_from_disk(m->source->odb->repo->hash_algo,
diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index 6de5d1665af..b8fefb1a124 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -34,7 +34,7 @@ static int read_midx_file(const char *object_dir, const char *checksum,
 		return 1;
 
 	if (checksum) {
-		while (m && strcmp(hash_to_hex(get_midx_checksum(m)), checksum))
+		while (m && strcmp(hash_to_hex(midx_get_checksum_hash(m)), checksum))
 			m = m->base_midx;
 		if (!m)
 			return 1;
@@ -94,7 +94,7 @@ static int read_midx_checksum(const char *object_dir)
 	m = setup_midx(object_dir);
 	if (!m)
 		return 1;
-	printf("%s\n", hash_to_hex(get_midx_checksum(m)));
+	printf("%s\n", hash_to_hex(midx_get_checksum_hash(m)));
 
 	close_midx(m);
 	return 0;
-- 
2.53.0.171.gde83996e422


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 03/17] midx: introduce `midx_get_checksum_hex()`
  2026-02-24 18:59 ` [PATCH v3 00/17] " Taylor Blau
  2026-02-24 18:59   ` [PATCH v3 01/17] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
  2026-02-24 18:59   ` [PATCH v3 02/17] midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()` Taylor Blau
@ 2026-02-24 18:59   ` Taylor Blau
  2026-02-24 18:59   ` [PATCH v3 04/17] builtin/multi-pack-index.c: make '--progress' a common option Taylor Blau
                     ` (13 subsequent siblings)
  16 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-02-24 18:59 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

When trying to print out, say, the hexadecimal representation of a
MIDX's hash, our code will do something like:

    hash_to_hex_algop(midx_get_checksum_hash(m),
                      m->source->odb->repo->hash_algo);

, which is both cumbersome and repetitive. In fact, all but a handful of
callers to `midx_get_checksum_hash()` do exactly the above. Reduce the
repetitive nature of calling `midx_get_checksum_hash()` by having it
return a pointer into a static buffer containing the above result.

For the handful of callers that do need to compare the raw bytes and
don't want to deal with an encoded copy (e.g., because they are passing
it to hasheq() or similar), they may still rely on
`midx_get_checksum_hash()` which returns the raw bytes.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c              | 6 ++----
 midx.c                    | 6 ++++++
 midx.h                    | 1 +
 pack-bitmap.c             | 3 +--
 t/helper/test-read-midx.c | 4 ++--
 5 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 73d33752ef1..13171d7e9c4 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1151,8 +1151,7 @@ static int write_midx_internal(struct odb_source *source,
 		while (m) {
 			if (flags & MIDX_WRITE_BITMAP && load_midx_revindex(m)) {
 				error(_("could not load reverse index for MIDX %s"),
-				      hash_to_hex_algop(midx_get_checksum_hash(m),
-							m->source->odb->repo->hash_algo));
+				      midx_get_checksum_hex(m));
 				goto cleanup;
 			}
 			ctx.num_multi_pack_indexes_before++;
@@ -1520,8 +1519,7 @@ static int write_midx_internal(struct odb_source *source,
 		for (uint32_t i = 0; i < ctx.num_multi_pack_indexes_before; i++) {
 			uint32_t j = ctx.num_multi_pack_indexes_before - i - 1;
 
-			keep_hashes[j] = xstrdup(hash_to_hex_algop(midx_get_checksum_hash(m),
-								   r->hash_algo));
+			keep_hashes[j] = xstrdup(midx_get_checksum_hex(m));
 			m = m->base_midx;
 		}
 
diff --git a/midx.c b/midx.c
index 1d072bd9931..bae45892323 100644
--- a/midx.c
+++ b/midx.c
@@ -24,6 +24,12 @@ void clear_incremental_midx_files_ext(struct odb_source *source, const char *ext
 int cmp_idx_or_pack_name(const char *idx_or_pack_name,
 			 const char *idx_name);
 
+const char *midx_get_checksum_hex(const struct multi_pack_index *m)
+{
+	return hash_to_hex_algop(midx_get_checksum_hash(m),
+				 m->source->odb->repo->hash_algo);
+}
+
 const unsigned char *midx_get_checksum_hash(const struct multi_pack_index *m)
 {
 	return m->data + m->data_len - m->source->odb->repo->hash_algo->rawsz;
diff --git a/midx.h b/midx.h
index 62d6105195f..a39bcc9d03f 100644
--- a/midx.h
+++ b/midx.h
@@ -85,6 +85,7 @@ struct multi_pack_index {
 #define MIDX_EXT_BITMAP "bitmap"
 #define MIDX_EXT_MIDX "midx"
 
+const char *midx_get_checksum_hex(const struct multi_pack_index *m) /* static buffer */;
 const unsigned char *midx_get_checksum_hash(const struct multi_pack_index *m);
 void get_midx_filename(struct odb_source *source, struct strbuf *out);
 void get_midx_filename_ext(struct odb_source *source, struct strbuf *out,
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 6307bbdf1e1..afc7fba0197 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2819,8 +2819,7 @@ void test_bitmap_walk(struct rev_info *revs)
 
 		if (bitmap_is_midx(found))
 			fprintf_ln(stderr, "Located via MIDX '%s'.",
-				   hash_to_hex_algop(midx_get_checksum_hash(found->midx),
-						     revs->repo->hash_algo));
+				   midx_get_checksum_hex(found->midx));
 		else
 			fprintf_ln(stderr, "Located via pack '%s'.",
 				   hash_to_hex_algop(found->pack->hash,
diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index b8fefb1a124..9d42c587564 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -34,7 +34,7 @@ static int read_midx_file(const char *object_dir, const char *checksum,
 		return 1;
 
 	if (checksum) {
-		while (m && strcmp(hash_to_hex(midx_get_checksum_hash(m)), checksum))
+		while (m && strcmp(midx_get_checksum_hex(m), checksum))
 			m = m->base_midx;
 		if (!m)
 			return 1;
@@ -94,7 +94,7 @@ static int read_midx_checksum(const char *object_dir)
 	m = setup_midx(object_dir);
 	if (!m)
 		return 1;
-	printf("%s\n", hash_to_hex(midx_get_checksum_hash(m)));
+	printf("%s\n", midx_get_checksum_hex(m));
 
 	close_midx(m);
 	return 0;
-- 
2.53.0.171.gde83996e422


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 04/17] builtin/multi-pack-index.c: make '--progress' a common option
  2026-02-24 18:59 ` [PATCH v3 00/17] " Taylor Blau
                     ` (2 preceding siblings ...)
  2026-02-24 18:59   ` [PATCH v3 03/17] midx: introduce `midx_get_checksum_hex()` Taylor Blau
@ 2026-02-24 18:59   ` Taylor Blau
  2026-02-24 18:59   ` [PATCH v3 05/17] git-multi-pack-index(1): remove non-existent incompatibility Taylor Blau
                     ` (12 subsequent siblings)
  16 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-02-24 18:59 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

All multi-pack-index sub-commands (write, verify, repack, and expire)
support a '--progress' command-line option, despite not listing it as
one of the common options in `common_opts`.

As a result each sub-command declares its own `OPT_BIT()` for a
"--progress" command-line option. Centralize this within the
`common_opts` to avoid re-declaring it in each sub-command.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.adoc |  2 ++
 builtin/multi-pack-index.c              | 10 ++--------
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-multi-pack-index.adoc b/Documentation/git-multi-pack-index.adoc
index 2f642697e9e..a4550e28bed 100644
--- a/Documentation/git-multi-pack-index.adoc
+++ b/Documentation/git-multi-pack-index.adoc
@@ -18,6 +18,8 @@ Write or verify a multi-pack-index (MIDX) file.
 OPTIONS
 -------
 
+The following command-line options are applicable to all sub-commands:
+
 --object-dir=<dir>::
 	Use given directory for the location of Git objects. We check
 	`<dir>/packs/multi-pack-index` for the current MIDX file, and
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 5f364aa816b..ca98d4c3ba3 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -84,6 +84,8 @@ static struct option common_opts[] = {
 	  N_("directory"),
 	  N_("object directory containing set of packfile and pack-index pairs"),
 	  parse_object_dir),
+	OPT_BIT(0, "progress", &opts.flags, N_("force progress reporting"),
+		MIDX_PROGRESS),
 	OPT_END(),
 };
 
@@ -138,8 +140,6 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
 			   N_("pack for reuse when computing a multi-pack bitmap")),
 		OPT_BIT(0, "bitmap", &opts.flags, N_("write multi-pack bitmap"),
 			MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
-		OPT_BIT(0, "progress", &opts.flags,
-			N_("force progress reporting"), MIDX_PROGRESS),
 		OPT_BIT(0, "incremental", &opts.flags,
 			N_("write a new incremental MIDX"), MIDX_WRITE_INCREMENTAL),
 		OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
@@ -200,8 +200,6 @@ static int cmd_multi_pack_index_verify(int argc, const char **argv,
 {
 	struct option *options;
 	static struct option builtin_multi_pack_index_verify_options[] = {
-		OPT_BIT(0, "progress", &opts.flags,
-			N_("force progress reporting"), MIDX_PROGRESS),
 		OPT_END(),
 	};
 	struct odb_source *source;
@@ -231,8 +229,6 @@ static int cmd_multi_pack_index_expire(int argc, const char **argv,
 {
 	struct option *options;
 	static struct option builtin_multi_pack_index_expire_options[] = {
-		OPT_BIT(0, "progress", &opts.flags,
-			N_("force progress reporting"), MIDX_PROGRESS),
 		OPT_END(),
 	};
 	struct odb_source *source;
@@ -264,8 +260,6 @@ static int cmd_multi_pack_index_repack(int argc, const char **argv,
 	static struct option builtin_multi_pack_index_repack_options[] = {
 		OPT_UNSIGNED(0, "batch-size", &opts.batch_size,
 		  N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")),
-		OPT_BIT(0, "progress", &opts.flags,
-		  N_("force progress reporting"), MIDX_PROGRESS),
 		OPT_END(),
 	};
 	struct odb_source *source;
-- 
2.53.0.171.gde83996e422


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 05/17] git-multi-pack-index(1): remove non-existent incompatibility
  2026-02-24 18:59 ` [PATCH v3 00/17] " Taylor Blau
                     ` (3 preceding siblings ...)
  2026-02-24 18:59   ` [PATCH v3 04/17] builtin/multi-pack-index.c: make '--progress' a common option Taylor Blau
@ 2026-02-24 18:59   ` Taylor Blau
  2026-02-24 18:59   ` [PATCH v3 06/17] git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h' Taylor Blau
                     ` (11 subsequent siblings)
  16 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-02-24 18:59 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Since fcb2205b774 (midx: implement support for writing incremental MIDX
chains, 2024-08-06), the command-line options '--incremental' and
'--bitmap' were declared to be incompatible with one another when
running 'git multi-pack-index write'.

However, since 27afc272c49 (midx: implement writing incremental MIDX
bitmaps, 2025-03-20), that incompatibility no longer exists, despite the
documentation saying so. Correct this by removing the stale reference to
their incompatibility.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.adoc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/git-multi-pack-index.adoc b/Documentation/git-multi-pack-index.adoc
index a4550e28bed..a502819fc38 100644
--- a/Documentation/git-multi-pack-index.adoc
+++ b/Documentation/git-multi-pack-index.adoc
@@ -75,7 +75,7 @@ marker).
 		Write an incremental MIDX file containing only objects
 		and packs not present in an existing MIDX layer.
 		Migrates non-incremental MIDXs to incremental ones when
-		necessary. Incompatible with `--bitmap`.
+		necessary.
 --
 
 verify::
-- 
2.53.0.171.gde83996e422


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 06/17] git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h'
  2026-02-24 18:59 ` [PATCH v3 00/17] " Taylor Blau
                     ` (4 preceding siblings ...)
  2026-02-24 18:59   ` [PATCH v3 05/17] git-multi-pack-index(1): remove non-existent incompatibility Taylor Blau
@ 2026-02-24 18:59   ` Taylor Blau
  2026-02-24 19:00   ` [PATCH v3 07/17] t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39 Taylor Blau
                     ` (10 subsequent siblings)
  16 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-02-24 18:59 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Since c39fffc1c90 (tests: start asserting that *.txt SYNOPSIS matches -h
output, 2022-10-13), the manual page for 'git multi-pack-index' has a
SYNOPSIS section which differs from 'git multi-pack-index -h'.

Correct this while also documenting additional options accepted by the
'write' sub-command.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.adoc | 7 ++++++-
 builtin/multi-pack-index.c              | 5 +++--
 t/t0450/adoc-help-mismatches            | 1 -
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-multi-pack-index.adoc b/Documentation/git-multi-pack-index.adoc
index a502819fc38..164cf1f2291 100644
--- a/Documentation/git-multi-pack-index.adoc
+++ b/Documentation/git-multi-pack-index.adoc
@@ -9,7 +9,12 @@ git-multi-pack-index - Write and verify multi-pack-indexes
 SYNOPSIS
 --------
 [verse]
-'git multi-pack-index' [--object-dir=<dir>] [--[no-]bitmap] <sub-command>
+'git multi-pack-index' [<options>] write [--preferred-pack=<pack>]
+		         [--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]
+		         [--refs-snapshot=<path>]
+'git multi-pack-index' [<options>] verify
+'git multi-pack-index' [<options>] expire
+'git multi-pack-index' [<options>] repack [--batch-size=<size>]
 
 DESCRIPTION
 -----------
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index ca98d4c3ba3..c0c6c1760c0 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -13,8 +13,9 @@
 #include "repository.h"
 
 #define BUILTIN_MIDX_WRITE_USAGE \
-	N_("git multi-pack-index [<options>] write [--preferred-pack=<pack>]" \
-	   "[--refs-snapshot=<path>]")
+	N_("git multi-pack-index [<options>] write [--preferred-pack=<pack>]\n" \
+	   "  [--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]\n" \
+	   "  [--refs-snapshot=<path>]")
 
 #define BUILTIN_MIDX_VERIFY_USAGE \
 	N_("git multi-pack-index [<options>] verify")
diff --git a/t/t0450/adoc-help-mismatches b/t/t0450/adoc-help-mismatches
index 8ee2d3f7c81..e8d6c13ccd0 100644
--- a/t/t0450/adoc-help-mismatches
+++ b/t/t0450/adoc-help-mismatches
@@ -33,7 +33,6 @@ merge
 merge-file
 merge-index
 merge-one-file
-multi-pack-index
 name-rev
 notes
 push
-- 
2.53.0.171.gde83996e422


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 07/17] t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39
  2026-02-24 18:59 ` [PATCH v3 00/17] " Taylor Blau
                     ` (5 preceding siblings ...)
  2026-02-24 18:59   ` [PATCH v3 06/17] git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h' Taylor Blau
@ 2026-02-24 19:00   ` Taylor Blau
  2026-02-24 19:00   ` [PATCH v3 08/17] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos` Taylor Blau
                     ` (9 subsequent siblings)
  16 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-02-24 19:00 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Commit d4bf1d88b90 (multi-pack-index: verify missing pack, 2018-09-13)
adds a new test to the MIDX test script to test how we handle missing
packs.

While the commit itself describes the test as "verify missing pack[s]",
the test itself is actually called "verify packnames out of order",
despite that not being what it tests.

Likely this was a copy-and-paste of the test immediately above it of the
same name. Correct this by renaming the test to match the commit
message.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5319-multi-pack-index.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index faae98c7e76..efeab4d22b7 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -517,7 +517,7 @@ test_expect_success 'verify packnames out of order' '
 		"pack names out of order"
 '
 
-test_expect_success 'verify packnames out of order' '
+test_expect_success 'verify missing pack' '
 	corrupt_midx_and_verify $MIDX_BYTE_PACKNAME_ORDER "a" $objdir \
 		"failed to load pack"
 '
-- 
2.53.0.171.gde83996e422


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 08/17] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos`
  2026-02-24 18:59 ` [PATCH v3 00/17] " Taylor Blau
                     ` (6 preceding siblings ...)
  2026-02-24 19:00   ` [PATCH v3 07/17] t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39 Taylor Blau
@ 2026-02-24 19:00   ` Taylor Blau
  2026-02-24 19:00   ` [PATCH v3 09/17] midx-write.c: introduce `struct write_midx_opts` Taylor Blau
                     ` (8 subsequent siblings)
  16 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-02-24 19:00 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

In midx_pack_order(), we compute for each bitmapped pack the first bit
to correspond to an object in that pack, along with how many bits were
assigned to object(s) in that pack.

Initially, each bitmap_nr value is set to zero, and each bitmap_pos
value is set to the sentinel BITMAP_POS_UNKNOWN. This is done to ensure
that there are no packs who have an unknown bit position but a somehow
non-zero number of objects (cf. `write_midx_bitmapped_packs()` in
midx-write.c).

Once the pack order is fully determined, midx_pack_order() sets the
bitmap_pos field for any bitmapped packs to zero if they are still
listed as BITMAP_POS_UNKNOWN.

However, we enumerate the bitmapped packs in order of `ctx->pack_perm`.
This is fine for existing cases, since the only time the
`ctx->pack_perm` array holds a value outside of the addressable range of
`ctx->info` is when there are expired packs, which only occurs via 'git
multi-pack-index expire', which does not support writing MIDX bitmaps.
As a result, the range of ctx->pack_perm covers all values in [0,
`ctx->nr`), so enumerating in this order isn't an issue.

A future change necessary for compaction will complicate this further by
introducing a wrapper around the `ctx->pack_perm` array, which turns the
given `pack_int_id` into one that is relative to the lower end of the
compaction range. As a result, indexing into `ctx->pack_perm` through
this helper, say, with "0" will produce a crash when the lower end of
the compaction range has >0 pack(s) in its base layer, since the
subtraction will wrap around the 32-bit unsigned range, resulting in an
uninitialized read.

But the process is completely unnecessary in the first place: we are
enumerating all values of `ctx->info`, and there is no reason to process
them in a different order than they appear in memory. Index `ctx->info`
directly to reflect that.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/midx-write.c b/midx-write.c
index 13171d7e9c4..da9c5a7c295 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -637,7 +637,7 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx)
 		pack_order[i] = data[i].nr;
 	}
 	for (i = 0; i < ctx->nr; i++) {
-		struct pack_info *pack = &ctx->info[ctx->pack_perm[i]];
+		struct pack_info *pack = &ctx->info[i];
 		if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
 			pack->bitmap_pos = 0;
 	}
-- 
2.53.0.171.gde83996e422


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 09/17] midx-write.c: introduce `struct write_midx_opts`
  2026-02-24 18:59 ` [PATCH v3 00/17] " Taylor Blau
                     ` (7 preceding siblings ...)
  2026-02-24 19:00   ` [PATCH v3 08/17] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos` Taylor Blau
@ 2026-02-24 19:00   ` Taylor Blau
  2026-02-24 19:00   ` [PATCH v3 10/17] midx: do not require packs to be sorted in lexicographic order Taylor Blau
                     ` (7 subsequent siblings)
  16 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-02-24 19:00 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

In the MIDX writing code, there are four functions which perform some
sort of MIDX write operation. They are:

 - write_midx_file()
 - write_midx_file_only()
 - expire_midx_packs()
 - midx_repack()

All of these functions are thin wrappers over `write_midx_internal()`,
which implements the bulk of these routines. As a result, the
`write_midx_internal()` function takes six arguments.

Future commits in this series will want to add additional arguments, and
in general this function's signature will be the union of parameters
among *all* possible ways to write a MIDX.

Instead of adding yet more arguments to this function to support MIDX
compaction, introduce a `struct write_midx_opts`, which has the same
struct members as `write_midx_internal()`'s arguments.

Adding additional fields to the `write_midx_opts` struct is preferable
to adding additional arguments to `write_midx_internal()`. This is
because the callers below all zero-initialize the struct, so each time
we add a new piece of information, we do not have to pass the zero value
for it in all other call-sites that do not care about it.

For now, no functional changes are included in this patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 135 ++++++++++++++++++++++++++++++---------------------
 1 file changed, 81 insertions(+), 54 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index da9c5a7c295..8a54644e427 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1078,14 +1078,20 @@ static bool midx_needs_update(struct multi_pack_index *midx, struct write_midx_c
 	return needed;
 }
 
-static int write_midx_internal(struct odb_source *source,
-			       struct string_list *packs_to_include,
-			       struct string_list *packs_to_drop,
-			       const char *preferred_pack_name,
-			       const char *refs_snapshot,
-			       unsigned flags)
+struct write_midx_opts {
+	struct odb_source *source; /* non-optional */
+
+	struct string_list *packs_to_include;
+	struct string_list *packs_to_drop;
+
+	const char *preferred_pack_name;
+	const char *refs_snapshot;
+	unsigned flags;
+};
+
+static int write_midx_internal(struct write_midx_opts *opts)
 {
-	struct repository *r = source->odb->repo;
+	struct repository *r = opts->source->odb->repo;
 	struct strbuf midx_name = STRBUF_INIT;
 	unsigned char midx_hash[GIT_MAX_RAWSZ];
 	uint32_t start_pack;
@@ -1106,22 +1112,22 @@ static int write_midx_internal(struct odb_source *source,
 	trace2_region_enter("midx", "write_midx_internal", r);
 
 	ctx.repo = r;
-	ctx.source = source;
+	ctx.source = opts->source;
 
-	ctx.incremental = !!(flags & MIDX_WRITE_INCREMENTAL);
+	ctx.incremental = !!(opts->flags & MIDX_WRITE_INCREMENTAL);
 
 	if (ctx.incremental)
 		strbuf_addf(&midx_name,
 			    "%s/pack/multi-pack-index.d/tmp_midx_XXXXXX",
-			    source->path);
+			    opts->source->path);
 	else
-		get_midx_filename(source, &midx_name);
+		get_midx_filename(opts->source, &midx_name);
 	if (safe_create_leading_directories(r, midx_name.buf))
 		die_errno(_("unable to create leading directories of %s"),
 			  midx_name.buf);
 
-	if (!packs_to_include || ctx.incremental) {
-		struct multi_pack_index *m = get_multi_pack_index(source);
+	if (!opts->packs_to_include || ctx.incremental) {
+		struct multi_pack_index *m = get_multi_pack_index(opts->source);
 		if (m && !midx_checksum_valid(m)) {
 			warning(_("ignoring existing multi-pack-index; checksum mismatch"));
 			m = NULL;
@@ -1136,7 +1142,7 @@ static int write_midx_internal(struct odb_source *source,
 			 */
 			if (ctx.incremental)
 				ctx.base_midx = m;
-			else if (!packs_to_include)
+			else if (!opts->packs_to_include)
 				ctx.m = m;
 		}
 	}
@@ -1149,7 +1155,7 @@ static int write_midx_internal(struct odb_source *source,
 	if (ctx.incremental) {
 		struct multi_pack_index *m = ctx.base_midx;
 		while (m) {
-			if (flags & MIDX_WRITE_BITMAP && load_midx_revindex(m)) {
+			if (opts->flags & MIDX_WRITE_BITMAP && load_midx_revindex(m)) {
 				error(_("could not load reverse index for MIDX %s"),
 				      midx_get_checksum_hex(m));
 				goto cleanup;
@@ -1164,18 +1170,18 @@ static int write_midx_internal(struct odb_source *source,
 	start_pack = ctx.nr;
 
 	ctx.pack_paths_checked = 0;
-	if (flags & MIDX_PROGRESS)
+	if (opts->flags & MIDX_PROGRESS)
 		ctx.progress = start_delayed_progress(r,
 						      _("Adding packfiles to multi-pack-index"), 0);
 	else
 		ctx.progress = NULL;
 
-	ctx.to_include = packs_to_include;
+	ctx.to_include = opts->packs_to_include;
 
-	for_each_file_in_pack_dir(source->path, add_pack_to_midx, &ctx);
+	for_each_file_in_pack_dir(opts->source->path, add_pack_to_midx, &ctx);
 	stop_progress(&ctx.progress);
 
-	if (!packs_to_drop) {
+	if (!opts->packs_to_drop) {
 		/*
 		 * If there is no MIDX then either it doesn't exist, or we're
 		 * doing a geometric repack. Try to load it from the source to
@@ -1188,7 +1194,7 @@ static int write_midx_internal(struct odb_source *source,
 		if (midx && !midx_needs_update(midx, &ctx)) {
 			struct bitmap_index *bitmap_git;
 			int bitmap_exists;
-			int want_bitmap = flags & MIDX_WRITE_BITMAP;
+			int want_bitmap = opts->flags & MIDX_WRITE_BITMAP;
 
 			bitmap_git = prepare_midx_bitmap_git(midx);
 			bitmap_exists = bitmap_git && bitmap_is_midx(bitmap_git);
@@ -1200,7 +1206,7 @@ static int write_midx_internal(struct odb_source *source,
 				 * corresponding bitmap (or one wasn't requested).
 				 */
 				if (!want_bitmap)
-					clear_midx_files_ext(source, "bitmap", NULL);
+					clear_midx_files_ext(ctx.source, "bitmap", NULL);
 				result = 0;
 				goto cleanup;
 			}
@@ -1215,11 +1221,11 @@ static int write_midx_internal(struct odb_source *source,
 		goto cleanup; /* nothing to do */
 	}
 
-	if (preferred_pack_name) {
+	if (opts->preferred_pack_name) {
 		ctx.preferred_pack_idx = NO_PREFERRED_PACK;
 
 		for (size_t i = 0; i < ctx.nr; i++) {
-			if (!cmp_idx_or_pack_name(preferred_pack_name,
+			if (!cmp_idx_or_pack_name(opts->preferred_pack_name,
 						  ctx.info[i].pack_name)) {
 				ctx.preferred_pack_idx = i;
 				break;
@@ -1228,9 +1234,9 @@ static int write_midx_internal(struct odb_source *source,
 
 		if (ctx.preferred_pack_idx == NO_PREFERRED_PACK)
 			warning(_("unknown preferred pack: '%s'"),
-				preferred_pack_name);
+				opts->preferred_pack_name);
 	} else if (ctx.nr &&
-		   (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))) {
+		   (opts->flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))) {
 		struct packed_git *oldest = ctx.info[0].p;
 		ctx.preferred_pack_idx = 0;
 
@@ -1241,7 +1247,7 @@ static int write_midx_internal(struct odb_source *source,
 		 */
 		open_pack_index(oldest);
 
-		if (packs_to_drop && packs_to_drop->nr)
+		if (opts->packs_to_drop && opts->packs_to_drop->nr)
 			BUG("cannot write a MIDX bitmap during expiration");
 
 		/*
@@ -1303,20 +1309,21 @@ static int write_midx_internal(struct odb_source *source,
 
 	QSORT(ctx.info, ctx.nr, pack_info_compare);
 
-	if (packs_to_drop && packs_to_drop->nr) {
+	if (opts->packs_to_drop && opts->packs_to_drop->nr) {
 		size_t drop_index = 0;
 		int missing_drops = 0;
 
-		for (size_t i = 0; i < ctx.nr && drop_index < packs_to_drop->nr; i++) {
+		for (size_t i = 0;
+		     i < ctx.nr && drop_index < opts->packs_to_drop->nr; i++) {
 			int cmp = strcmp(ctx.info[i].pack_name,
-					 packs_to_drop->items[drop_index].string);
+					 opts->packs_to_drop->items[drop_index].string);
 
 			if (!cmp) {
 				drop_index++;
 				ctx.info[i].expired = 1;
 			} else if (cmp > 0) {
 				error(_("did not see pack-file %s to drop"),
-				      packs_to_drop->items[drop_index].string);
+				      opts->packs_to_drop->items[drop_index].string);
 				drop_index++;
 				missing_drops++;
 				i--;
@@ -1353,8 +1360,8 @@ static int write_midx_internal(struct odb_source *source,
 	}
 
 	/* Check that the preferred pack wasn't expired (if given). */
-	if (preferred_pack_name) {
-		struct pack_info *preferred = bsearch(preferred_pack_name,
+	if (opts->preferred_pack_name) {
+		struct pack_info *preferred = bsearch(opts->preferred_pack_name,
 						      ctx.info, ctx.nr,
 						      sizeof(*ctx.info),
 						      idx_or_pack_name_cmp);
@@ -1362,7 +1369,7 @@ static int write_midx_internal(struct odb_source *source,
 			uint32_t perm = ctx.pack_perm[preferred->orig_pack_int_id];
 			if (perm == PACK_EXPIRED)
 				warning(_("preferred pack '%s' is expired"),
-					preferred_pack_name);
+					opts->preferred_pack_name);
 		}
 	}
 
@@ -1376,15 +1383,15 @@ static int write_midx_internal(struct odb_source *source,
 	}
 
 	if (!ctx.entries_nr) {
-		if (flags & MIDX_WRITE_BITMAP)
+		if (opts->flags & MIDX_WRITE_BITMAP)
 			warning(_("refusing to write multi-pack .bitmap without any objects"));
-		flags &= ~(MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP);
+		opts->flags &= ~(MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP);
 	}
 
 	if (ctx.incremental) {
 		struct strbuf lock_name = STRBUF_INIT;
 
-		get_midx_chain_filename(source, &lock_name);
+		get_midx_chain_filename(opts->source, &lock_name);
 		hold_lock_file_for_update(&lk, lock_name.buf, LOCK_DIE_ON_ERROR);
 		strbuf_release(&lock_name);
 
@@ -1427,7 +1434,7 @@ static int write_midx_internal(struct odb_source *source,
 				MIDX_CHUNK_LARGE_OFFSET_WIDTH),
 			write_midx_large_offsets);
 
-	if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP)) {
+	if (opts->flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP)) {
 		ctx.pack_order = midx_pack_order(&ctx);
 		add_chunk(cf, MIDX_CHUNKID_REVINDEX,
 			  st_mult(ctx.entries_nr, sizeof(uint32_t)),
@@ -1445,11 +1452,11 @@ static int write_midx_internal(struct odb_source *source,
 			  CSUM_FSYNC | CSUM_HASH_IN_STREAM);
 	free_chunkfile(cf);
 
-	if (flags & MIDX_WRITE_REV_INDEX &&
+	if (opts->flags & MIDX_WRITE_REV_INDEX &&
 	    git_env_bool("GIT_TEST_MIDX_WRITE_REV", 0))
 		write_midx_reverse_index(&ctx, midx_hash);
 
-	if (flags & MIDX_WRITE_BITMAP) {
+	if (opts->flags & MIDX_WRITE_BITMAP) {
 		struct packing_data pdata;
 		struct commit_stack commits = COMMIT_STACK_INIT;
 
@@ -1458,7 +1465,7 @@ static int write_midx_internal(struct odb_source *source,
 
 		prepare_midx_packing_data(&pdata, &ctx);
 
-		find_commits_for_midx_bitmap(&commits, refs_snapshot, &ctx);
+		find_commits_for_midx_bitmap(&commits, opts->refs_snapshot, &ctx);
 
 		/*
 		 * The previous steps translated the information from
@@ -1469,8 +1476,8 @@ static int write_midx_internal(struct odb_source *source,
 		FREE_AND_NULL(ctx.entries);
 		ctx.entries_nr = 0;
 
-		if (write_midx_bitmap(&ctx, midx_hash, &pdata,
-				      commits.items, commits.nr, flags) < 0) {
+		if (write_midx_bitmap(&ctx, midx_hash, &pdata, commits.items,
+				      commits.nr, opts->flags) < 0) {
 			error(_("could not write multi-pack bitmap"));
 			clear_packing_data(&pdata);
 			commit_stack_clear(&commits);
@@ -1503,7 +1510,7 @@ static int write_midx_internal(struct odb_source *source,
 		if (link_midx_to_chain(ctx.base_midx) < 0)
 			goto cleanup;
 
-		get_split_midx_filename_ext(source, &final_midx_name,
+		get_split_midx_filename_ext(opts->source, &final_midx_name,
 					    midx_hash, MIDX_EXT_MIDX);
 
 		if (rename_tempfile(&incr, final_midx_name.buf) < 0) {
@@ -1536,7 +1543,7 @@ static int write_midx_internal(struct odb_source *source,
 	if (commit_lock_file(&lk) < 0)
 		die_errno(_("could not write multi-pack-index"));
 
-	clear_midx_files(source, keep_hashes,
+	clear_midx_files(opts->source, keep_hashes,
 			 ctx.num_multi_pack_indexes_before + 1,
 			 ctx.incremental);
 	result = 0;
@@ -1571,9 +1578,14 @@ int write_midx_file(struct odb_source *source,
 		    const char *preferred_pack_name,
 		    const char *refs_snapshot, unsigned flags)
 {
-	return write_midx_internal(source, NULL, NULL,
-				   preferred_pack_name, refs_snapshot,
-				   flags);
+	struct write_midx_opts opts = {
+		.source = source,
+		.preferred_pack_name = preferred_pack_name,
+		.refs_snapshot = refs_snapshot,
+		.flags = flags,
+	};
+
+	return write_midx_internal(&opts);
 }
 
 int write_midx_file_only(struct odb_source *source,
@@ -1581,8 +1593,15 @@ int write_midx_file_only(struct odb_source *source,
 			 const char *preferred_pack_name,
 			 const char *refs_snapshot, unsigned flags)
 {
-	return write_midx_internal(source, packs_to_include, NULL,
-				   preferred_pack_name, refs_snapshot, flags);
+	struct write_midx_opts opts = {
+		.source = source,
+		.packs_to_include = packs_to_include,
+		.preferred_pack_name = preferred_pack_name,
+		.refs_snapshot = refs_snapshot,
+		.flags = flags,
+	};
+
+	return write_midx_internal(&opts);
 }
 
 int expire_midx_packs(struct odb_source *source, unsigned flags)
@@ -1641,9 +1660,14 @@ int expire_midx_packs(struct odb_source *source, unsigned flags)
 
 	free(count);
 
-	if (packs_to_drop.nr)
-		result = write_midx_internal(source, NULL,
-					     &packs_to_drop, NULL, NULL, flags);
+	if (packs_to_drop.nr) {
+		struct write_midx_opts opts = {
+			.source = source,
+			.packs_to_drop = &packs_to_drop,
+			.flags = flags & MIDX_PROGRESS,
+		};
+		result = write_midx_internal(&opts);
+	}
 
 	string_list_clear(&packs_to_drop, 0);
 
@@ -1776,6 +1800,10 @@ int midx_repack(struct odb_source *source, size_t batch_size, unsigned flags)
 	struct child_process cmd = CHILD_PROCESS_INIT;
 	FILE *cmd_in;
 	struct multi_pack_index *m = get_multi_pack_index(source);
+	struct write_midx_opts opts = {
+		.source = source,
+		.flags = flags,
+	};
 
 	/*
 	 * When updating the default for these configuration
@@ -1850,8 +1878,7 @@ int midx_repack(struct odb_source *source, size_t batch_size, unsigned flags)
 		goto cleanup;
 	}
 
-	result = write_midx_internal(source, NULL, NULL, NULL, NULL,
-				     flags);
+	result = write_midx_internal(&opts);
 
 cleanup:
 	free(include_pack);
-- 
2.53.0.171.gde83996e422


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 10/17] midx: do not require packs to be sorted in lexicographic order
  2026-02-24 18:59 ` [PATCH v3 00/17] " Taylor Blau
                     ` (8 preceding siblings ...)
  2026-02-24 19:00   ` [PATCH v3 09/17] midx-write.c: introduce `struct write_midx_opts` Taylor Blau
@ 2026-02-24 19:00   ` Taylor Blau
  2026-02-24 19:00   ` [PATCH v3 11/17] midx-write.c: introduce `midx_pack_perm()` helper Taylor Blau
                     ` (6 subsequent siblings)
  16 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-02-24 19:00 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

The MIDX file format currently requires that pack files be identified by
the lexicographic ordering of their names (that is, a pack having a
checksum beginning with "abc" would have a numeric pack_int_id which is
smaller than the same value for a pack beginning with "bcd").

As a result, it is impossible to combine adjacent MIDX layers together
without permuting bits from bitmaps that are in more recent layer(s).

To see why, consider the following example:

          | packs       | preferred pack
  --------+-------------+---------------
  MIDX #0 | { X, Y, Z } | Y
  MIDX #1 | { A, B, C } | B
  MIDX #2 | { D, E, F } | D

, where MIDX #2's base MIDX is MIDX #1, and so on. Suppose that we want
to combine MIDX layers #0 and #1, to create a new layer #0' containing
the packs from both layers. With the original three MIDX layers, objects
are laid out in the bitmap in the order they appear in their source
pack, and the packs themselves are arranged according to the pseudo-pack
order. In this case, that ordering is Y, X, Z, B, A, C.

But recall that the pseudo-pack ordering is defined by the order that
packs appear in the MIDX, with the exception of the preferred pack,
which sorts ahead of all other packs regardless of its position within
the MIDX. In the above example, that means that pack 'Y' could be placed
anywhere (so long as it is designated as preferred), however, all other
packs must be placed in the location listed above.

Because that ordering isn't sorted lexicographically, it is impossible
to compact MIDX layers in the above configuration without permuting the
object-to-bit-position mapping. Changing this mapping would affect all
bitmaps belonging to newer layers, rendering the bitmaps associated with
MIDX #2 unreadable.

One of the goals of MIDX compaction is that we are able to shrink the
length of the MIDX chain *without* invalidating bitmaps that belong to
newer layers, and the lexicographic ordering constraint is at odds with
this goal.

However, packs do not *need* to be lexicographically ordered within the
MIDX. As far as I can gather, the only reason they are sorted lexically
is to make it possible to perform a binary search over the pack names in
a MIDX, necessary to make `midx_contains_pack()`'s performance
logarithmic in the number of packs rather than linear.

Relax this constraint by allowing MIDX writes to proceed with packs that
are not arranged in lexicographic order. `midx_contains_pack()` will
lazily instantiate a `pack_names_sorted` array on the MIDX, which will
be used to implement the binary search over pack names.

This change produces MIDXs which may not be correctly read with external
tools or older versions of Git. Though older versions of Git know how to
gracefully degrade and ignore any MIDX(s) they consider corrupt,
external tools may not be as robust. To avoid unintentionally breaking
any such tools, guard this change behind a version bump in the MIDX's
on-disk format.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/gitformat-pack.adoc |  8 ++++++--
 midx-write.c                      | 26 ++++++++++++++++++++++----
 midx.c                            | 31 ++++++++++++++++++++++++++++---
 midx.h                            |  4 +++-
 t/t5319-multi-pack-index.sh       | 16 ++++++++++------
 5 files changed, 69 insertions(+), 16 deletions(-)

diff --git a/Documentation/gitformat-pack.adoc b/Documentation/gitformat-pack.adoc
index 1b4db4aa611..3416edceab8 100644
--- a/Documentation/gitformat-pack.adoc
+++ b/Documentation/gitformat-pack.adoc
@@ -374,7 +374,9 @@ HEADER:
 	    The signature is: {'M', 'I', 'D', 'X'}
 
 	1-byte version number:
-	    Git only writes or recognizes version 1.
+	    Git writes the version specified by the "midx.version"
+	    configuration option, which defaults to 2. It recognizes
+	    both versions 1 and 2.
 
 	1-byte Object Id Version
 	    We infer the length of object IDs (OIDs) from this value:
@@ -413,7 +415,9 @@ CHUNK DATA:
 	    strings. There is no extra padding between the filenames,
 	    and they are listed in lexicographic order. The chunk itself
 	    is padded at the end with between 0 and 3 NUL bytes to make the
-	    chunk size a multiple of 4 bytes.
+	    chunk size a multiple of 4 bytes. Version 1 MIDXs are required to
+	    list their packs in lexicographic order, but version 2 MIDXs may
+	    list their packs in any arbitrary order.
 
 	Bitmapped Packfiles (ID: {'B', 'T', 'M', 'P'})
 	    Stores a table of two 4-byte unsigned integers in network order.
diff --git a/midx-write.c b/midx-write.c
index 8a54644e427..5c8700065a1 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -36,10 +36,13 @@ extern int cmp_idx_or_pack_name(const char *idx_or_pack_name,
 
 static size_t write_midx_header(const struct git_hash_algo *hash_algo,
 				struct hashfile *f, unsigned char num_chunks,
-				uint32_t num_packs)
+				uint32_t num_packs, int version)
 {
+	if (version != MIDX_VERSION_V1 && version != MIDX_VERSION_V2)
+		BUG("unexpected MIDX version: %d", version);
+
 	hashwrite_be32(f, MIDX_SIGNATURE);
-	hashwrite_u8(f, MIDX_VERSION);
+	hashwrite_u8(f, version);
 	hashwrite_u8(f, oid_version(hash_algo));
 	hashwrite_u8(f, num_chunks);
 	hashwrite_u8(f, 0); /* unused */
@@ -105,6 +108,8 @@ struct write_midx_context {
 
 	uint32_t preferred_pack_idx;
 
+	int version; /* must be MIDX_VERSION_V1 or _V2 */
+
 	int incremental;
 	uint32_t num_multi_pack_indexes_before;
 
@@ -410,7 +415,9 @@ static int write_midx_pack_names(struct hashfile *f, void *data)
 		if (ctx->info[i].expired)
 			continue;
 
-		if (i && strcmp(ctx->info[i].pack_name, ctx->info[i - 1].pack_name) <= 0)
+		if (ctx->version == MIDX_VERSION_V1 &&
+		    i && strcmp(ctx->info[i].pack_name,
+				ctx->info[i - 1].pack_name) <= 0)
 			BUG("incorrect pack-file order: %s before %s",
 			    ctx->info[i - 1].pack_name,
 			    ctx->info[i].pack_name);
@@ -1025,6 +1032,12 @@ static bool midx_needs_update(struct multi_pack_index *midx, struct write_midx_c
 	if (!midx_checksum_valid(midx))
 		goto out;
 
+	/*
+	 * If the version differs, we need to update.
+	 */
+	if (midx->version != ctx->version)
+		goto out;
+
 	/*
 	 * Ignore incremental updates for now. The assumption is that any
 	 * incremental update would be either empty (in which case we will bail
@@ -1100,6 +1113,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
 	struct tempfile *incr;
 	struct write_midx_context ctx = {
 		.preferred_pack_idx = NO_PREFERRED_PACK,
+		.version = MIDX_VERSION_V2,
 	 };
 	struct multi_pack_index *midx_to_free = NULL;
 	int bitmapped_packs_concat_len = 0;
@@ -1114,6 +1128,10 @@ static int write_midx_internal(struct write_midx_opts *opts)
 	ctx.repo = r;
 	ctx.source = opts->source;
 
+	repo_config_get_int(ctx.repo, "midx.version", &ctx.version);
+	if (ctx.version != MIDX_VERSION_V1 && ctx.version != MIDX_VERSION_V2)
+		die(_("unknown MIDX version: %d"), ctx.version);
+
 	ctx.incremental = !!(opts->flags & MIDX_WRITE_INCREMENTAL);
 
 	if (ctx.incremental)
@@ -1445,7 +1463,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
 	}
 
 	write_midx_header(r->hash_algo, f, get_num_chunks(cf),
-			  ctx.nr - dropped_packs);
+			  ctx.nr - dropped_packs, ctx.version);
 	write_chunkfile(cf, &ctx);
 
 	finalize_hashfile(f, midx_hash, FSYNC_COMPONENT_PACK_METADATA,
diff --git a/midx.c b/midx.c
index bae45892323..c1b9658240d 100644
--- a/midx.c
+++ b/midx.c
@@ -149,7 +149,7 @@ static struct multi_pack_index *load_multi_pack_index_one(struct odb_source *sou
 		      m->signature, MIDX_SIGNATURE);
 
 	m->version = m->data[MIDX_BYTE_FILE_VERSION];
-	if (m->version != MIDX_VERSION)
+	if (m->version != MIDX_VERSION_V1 && m->version != MIDX_VERSION_V2)
 		die(_("multi-pack-index version %d not recognized"),
 		      m->version);
 
@@ -210,7 +210,8 @@ static struct multi_pack_index *load_multi_pack_index_one(struct odb_source *sou
 			die(_("multi-pack-index pack-name chunk is too short"));
 		cur_pack_name = end + 1;
 
-		if (i && strcmp(m->pack_names[i], m->pack_names[i - 1]) <= 0)
+		if (m->version == MIDX_VERSION_V1 &&
+		    i && strcmp(m->pack_names[i], m->pack_names[i - 1]) <= 0)
 			die(_("multi-pack-index pack names out of order: '%s' before '%s'"),
 			      m->pack_names[i - 1],
 			      m->pack_names[i]);
@@ -411,6 +412,7 @@ void close_midx(struct multi_pack_index *m)
 	}
 	FREE_AND_NULL(m->packs);
 	FREE_AND_NULL(m->pack_names);
+	FREE_AND_NULL(m->pack_names_sorted);
 	free(m);
 }
 
@@ -655,17 +657,40 @@ int cmp_idx_or_pack_name(const char *idx_or_pack_name,
 	return strcmp(idx_or_pack_name, idx_name);
 }
 
+
+static int midx_pack_names_cmp(const void *a, const void *b, void *m_)
+{
+	struct multi_pack_index *m = m_;
+	return strcmp(m->pack_names[*(const size_t *)a],
+		      m->pack_names[*(const size_t *)b]);
+}
+
 static int midx_contains_pack_1(struct multi_pack_index *m,
 				const char *idx_or_pack_name)
 {
 	uint32_t first = 0, last = m->num_packs;
 
+	if (m->version == MIDX_VERSION_V2 && !m->pack_names_sorted) {
+		uint32_t i;
+
+		ALLOC_ARRAY(m->pack_names_sorted, m->num_packs);
+
+		for (i = 0; i < m->num_packs; i++)
+			m->pack_names_sorted[i] = i;
+
+		QSORT_S(m->pack_names_sorted, m->num_packs, midx_pack_names_cmp,
+			m);
+	}
+
 	while (first < last) {
 		uint32_t mid = first + (last - first) / 2;
 		const char *current;
 		int cmp;
 
-		current = m->pack_names[mid];
+		if (m->pack_names_sorted)
+			current = m->pack_names[m->pack_names_sorted[mid]];
+		else
+			current = m->pack_names[mid];
 		cmp = cmp_idx_or_pack_name(idx_or_pack_name, current);
 		if (!cmp)
 			return 1;
diff --git a/midx.h b/midx.h
index a39bcc9d03f..aa99a6cb215 100644
--- a/midx.h
+++ b/midx.h
@@ -11,7 +11,8 @@ struct git_hash_algo;
 struct odb_source;
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
-#define MIDX_VERSION 1
+#define MIDX_VERSION_V1 1
+#define MIDX_VERSION_V2 2
 #define MIDX_BYTE_FILE_VERSION 4
 #define MIDX_BYTE_HASH_VERSION 5
 #define MIDX_BYTE_NUM_CHUNKS 6
@@ -71,6 +72,7 @@ struct multi_pack_index {
 	uint32_t num_packs_in_base;
 
 	const char **pack_names;
+	size_t *pack_names_sorted;
 	struct packed_git **packs;
 };
 
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index efeab4d22b7..250d21dbd67 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -21,7 +21,7 @@ midx_read_expect () {
 	EXTRA_CHUNKS="$5"
 	{
 		cat <<-EOF &&
-		header: 4d494458 1 $HASH_LEN $NUM_CHUNKS $NUM_PACKS
+		header: 4d494458 2 $HASH_LEN $NUM_CHUNKS $NUM_PACKS
 		chunks: pack-names oid-fanout oid-lookup object-offsets$EXTRA_CHUNKS
 		num_objects: $NUM_OBJECTS
 		packs:
@@ -512,11 +512,6 @@ test_expect_success 'verify invalid chunk offset' '
 		"improper chunk offset(s)"
 '
 
-test_expect_success 'verify packnames out of order' '
-	corrupt_midx_and_verify $MIDX_BYTE_PACKNAME_ORDER "z" $objdir \
-		"pack names out of order"
-'
-
 test_expect_success 'verify missing pack' '
 	corrupt_midx_and_verify $MIDX_BYTE_PACKNAME_ORDER "a" $objdir \
 		"failed to load pack"
@@ -578,6 +573,15 @@ test_expect_success 'verify incorrect checksum' '
 		$objdir "incorrect checksum"
 '
 
+test_expect_success 'setup for v1-specific fsck tests' '
+	git -c midx.version=1 multi-pack-index write
+'
+
+test_expect_success 'verify packnames out of order (v1)' '
+	corrupt_midx_and_verify $MIDX_BYTE_PACKNAME_ORDER "z" $objdir \
+		"pack names out of order"
+'
+
 test_expect_success 'repack progress off for redirected stderr' '
 	GIT_PROGRESS_DELAY=0 git multi-pack-index --object-dir=$objdir repack 2>err &&
 	test_line_count = 0 err
-- 
2.53.0.171.gde83996e422


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 11/17] midx-write.c: introduce `midx_pack_perm()` helper
  2026-02-24 18:59 ` [PATCH v3 00/17] " Taylor Blau
                     ` (9 preceding siblings ...)
  2026-02-24 19:00   ` [PATCH v3 10/17] midx: do not require packs to be sorted in lexicographic order Taylor Blau
@ 2026-02-24 19:00   ` Taylor Blau
  2026-02-24 19:00   ` [PATCH v3 12/17] midx-write.c: extract `fill_pack_from_midx()` Taylor Blau
                     ` (5 subsequent siblings)
  16 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-02-24 19:00 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

The `ctx->pack_perm` array can be considered as a permutation between
the original `pack_int_id` of some given pack to its position in the
`ctx->info` array containing all packs.

Today we can always index into this array with any known `pack_int_id`,
since there is never a `pack_int_id` which is greater than or equal to
the value `ctx->nr`.

That is not necessarily the case with MIDX compaction. For example,
suppose we have a MIDX chain with three layers, each containing three
packs. The base of the MIDX chain will have packs with IDs 0, 1, and 2,
the next layer 3, 4, and 5, and so on. If we are compacting the topmost
two layers, we'll have input `pack_int_id` values between [3, 8], but
`ctx->nr` will only be 6.

In that example, if we want to know where the pack whose original
`pack_int_id` value was, say, 7, we would compute `ctx->pack_perm[7]`,
leading to an uninitialized read, since there are only 6 entries
allocated in that array.

To address this, there are a couple of options:

 - We could allocate enough entries in `ctx->pack_perm` to accommodate
   the largest `orig_pack_int_id` value.

 - Or, we could internally shift the input values by the number of packs
   in the base layer of the lower end of the MIDX compaction range.

This patch prepare us to take the latter approach, since it does not
allocate more memory than strictly necessary. (In our above example, the
base of the lower end of the compaction range is the first MIDX layer
(having three packs), so we would end up indexing `ctx->pack_perm[7-3]`,
which is a valid read.)

Note that this patch does not actually implement that approach yet, but
merely performs a behavior-preserving refactoring which will make the
change easier to carry out in the future.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 5c8700065a1..9d345fb4737 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -119,6 +119,12 @@ struct write_midx_context {
 	struct odb_source *source;
 };
 
+static uint32_t midx_pack_perm(struct write_midx_context *ctx,
+			       uint32_t orig_pack_int_id)
+{
+	return ctx->pack_perm[orig_pack_int_id];
+}
+
 static int should_include_pack(const struct write_midx_context *ctx,
 			       const char *file_name)
 {
@@ -521,12 +527,12 @@ static int write_midx_object_offsets(struct hashfile *f,
 	for (i = 0; i < ctx->entries_nr; i++) {
 		struct pack_midx_entry *obj = list++;
 
-		if (ctx->pack_perm[obj->pack_int_id] == PACK_EXPIRED)
+		if (midx_pack_perm(ctx, obj->pack_int_id) == PACK_EXPIRED)
 			BUG("object %s is in an expired pack with int-id %d",
 			    oid_to_hex(&obj->oid),
 			    obj->pack_int_id);
 
-		hashwrite_be32(f, ctx->pack_perm[obj->pack_int_id]);
+		hashwrite_be32(f, midx_pack_perm(ctx, obj->pack_int_id));
 
 		if (ctx->large_offsets_needed && obj->offset >> 31)
 			hashwrite_be32(f, MIDX_LARGE_OFFSET_NEEDED | nr_large_offset++);
@@ -627,7 +633,7 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx)
 	for (i = 0; i < ctx->entries_nr; i++) {
 		struct pack_midx_entry *e = &ctx->entries[i];
 		data[i].nr = i;
-		data[i].pack = ctx->pack_perm[e->pack_int_id];
+		data[i].pack = midx_pack_perm(ctx, e->pack_int_id);
 		if (!e->preferred)
 			data[i].pack |= (1U << 31);
 		data[i].offset = e->offset;
@@ -637,7 +643,7 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx)
 
 	for (i = 0; i < ctx->entries_nr; i++) {
 		struct pack_midx_entry *e = &ctx->entries[data[i].nr];
-		struct pack_info *pack = &ctx->info[ctx->pack_perm[e->pack_int_id]];
+		struct pack_info *pack = &ctx->info[midx_pack_perm(ctx, e->pack_int_id)];
 		if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
 			pack->bitmap_pos = i + base_objects;
 		pack->bitmap_nr++;
@@ -698,7 +704,7 @@ static void prepare_midx_packing_data(struct packing_data *pdata,
 		struct object_entry *to = packlist_alloc(pdata, &from->oid);
 
 		oe_set_in_pack(pdata, to,
-			       ctx->info[ctx->pack_perm[from->pack_int_id]].p);
+			       ctx->info[midx_pack_perm(ctx, from->pack_int_id)].p);
 	}
 
 	trace2_region_leave("midx", "prepare_midx_packing_data", ctx->repo);
@@ -1384,7 +1390,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
 						      sizeof(*ctx.info),
 						      idx_or_pack_name_cmp);
 		if (preferred) {
-			uint32_t perm = ctx.pack_perm[preferred->orig_pack_int_id];
+			uint32_t perm = midx_pack_perm(&ctx, preferred->orig_pack_int_id);
 			if (perm == PACK_EXPIRED)
 				warning(_("preferred pack '%s' is expired"),
 					opts->preferred_pack_name);
-- 
2.53.0.171.gde83996e422


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 12/17] midx-write.c: extract `fill_pack_from_midx()`
  2026-02-24 18:59 ` [PATCH v3 00/17] " Taylor Blau
                     ` (10 preceding siblings ...)
  2026-02-24 19:00   ` [PATCH v3 11/17] midx-write.c: introduce `midx_pack_perm()` helper Taylor Blau
@ 2026-02-24 19:00   ` Taylor Blau
  2026-02-24 19:00   ` [PATCH v3 13/17] midx-write.c: enumerate `pack_int_id` values directly Taylor Blau
                     ` (4 subsequent siblings)
  16 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-02-24 19:00 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

When filling packs from an existing MIDX, `fill_packs_from_midx()`
handles preparing a MIDX'd pack, and reading out its pack name from the
existing MIDX.

MIDX compaction will want to perform an identical operation, though the
caller will look quite different than `fill_packs_from_midx()`. To
reduce any future code duplication, extract `fill_pack_from_midx()`
from `fill_packs_from_midx()` to prepare to call our new helper function
in a future change.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 9d345fb4737..c54113cdc84 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -913,6 +913,21 @@ static int write_midx_bitmap(struct write_midx_context *ctx,
 	return ret;
 }
 
+static int fill_pack_from_midx(struct pack_info *info,
+			       struct multi_pack_index *m,
+			       uint32_t pack_int_id)
+{
+	if (prepare_midx_pack(m, pack_int_id))
+		return error(_("could not load pack %d"), pack_int_id);
+
+	fill_pack_info(info,
+		       m->packs[pack_int_id - m->num_packs_in_base],
+		       m->pack_names[pack_int_id - m->num_packs_in_base],
+		       pack_int_id);
+
+	return 0;
+}
+
 static int fill_packs_from_midx(struct write_midx_context *ctx)
 {
 	struct multi_pack_index *m;
@@ -921,13 +936,13 @@ static int fill_packs_from_midx(struct write_midx_context *ctx)
 		uint32_t i;
 
 		for (i = 0; i < m->num_packs; i++) {
-			if (prepare_midx_pack(m, m->num_packs_in_base + i))
-				return error(_("could not load pack"));
-
 			ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
-			fill_pack_info(&ctx->info[ctx->nr++], m->packs[i],
-				       m->pack_names[i],
-				       m->num_packs_in_base + i);
+
+			if (fill_pack_from_midx(&ctx->info[ctx->nr], m,
+						m->num_packs_in_base + i) < 0)
+				return -1;
+
+			ctx->nr++;
 		}
 	}
 	return 0;
-- 
2.53.0.171.gde83996e422


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 13/17] midx-write.c: enumerate `pack_int_id` values directly
  2026-02-24 18:59 ` [PATCH v3 00/17] " Taylor Blau
                     ` (11 preceding siblings ...)
  2026-02-24 19:00   ` [PATCH v3 12/17] midx-write.c: extract `fill_pack_from_midx()` Taylor Blau
@ 2026-02-24 19:00   ` Taylor Blau
  2026-02-24 19:00   ` [PATCH v3 14/17] midx-write.c: factor fanout layering from `compute_sorted_entries()` Taylor Blau
                     ` (3 subsequent siblings)
  16 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-02-24 19:00 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Our `midx-write.c::fill_packs_from_midx()` function currently enumerates
the range [0, m->num_packs), and then shifts its index variable up by
`m->num_packs_in_base` to produce a valid `pack_int_id`.

Instead, directly enumerate the range:

    [m->num_packs_in_base, m->num_packs_in_base + m->num_packs)

, which are the original pack_int_ids themselves as opposed to the
indexes of those packs relative to the MIDX layer they are contained
within.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index c54113cdc84..80334914d3e 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -935,11 +935,11 @@ static int fill_packs_from_midx(struct write_midx_context *ctx)
 	for (m = ctx->m; m; m = m->base_midx) {
 		uint32_t i;
 
-		for (i = 0; i < m->num_packs; i++) {
+		for (i = m->num_packs_in_base;
+		     i < m->num_packs_in_base + m->num_packs; i++) {
 			ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
 
-			if (fill_pack_from_midx(&ctx->info[ctx->nr], m,
-						m->num_packs_in_base + i) < 0)
+			if (fill_pack_from_midx(&ctx->info[ctx->nr], m, i) < 0)
 				return -1;
 
 			ctx->nr++;
-- 
2.53.0.171.gde83996e422


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 14/17] midx-write.c: factor fanout layering from `compute_sorted_entries()`
  2026-02-24 18:59 ` [PATCH v3 00/17] " Taylor Blau
                     ` (12 preceding siblings ...)
  2026-02-24 19:00   ` [PATCH v3 13/17] midx-write.c: enumerate `pack_int_id` values directly Taylor Blau
@ 2026-02-24 19:00   ` Taylor Blau
  2026-02-24 19:00   ` [PATCH v3 15/17] t/helper/test-read-midx.c: plug memory leak when selecting layer Taylor Blau
                     ` (2 subsequent siblings)
  16 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-02-24 19:00 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

When computing the set of objects to appear in a MIDX, we use
compute_sorted_entries(), which handles objects from various existing
sources one fanout layer at a time.

The process for computing this set is slightly different during MIDX
compaction, so factor out the existing functionality into its own
routine to prevent `compute_sorted_entries()` from becoming too
difficult to read.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 42 +++++++++++++++++++++++++-----------------
 1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 80334914d3e..ca2469213e6 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -328,6 +328,30 @@ static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout,
 	}
 }
 
+static void midx_fanout_add(struct midx_fanout *fanout,
+			    struct write_midx_context *ctx,
+			    uint32_t start_pack,
+			    uint32_t cur_fanout)
+{
+	uint32_t cur_pack;
+
+	if (ctx->m && !ctx->incremental)
+		midx_fanout_add_midx_fanout(fanout, ctx->m, cur_fanout,
+					    ctx->preferred_pack_idx);
+
+	for (cur_pack = start_pack; cur_pack < ctx->nr; cur_pack++) {
+		int preferred = cur_pack == ctx->preferred_pack_idx;
+		midx_fanout_add_pack_fanout(fanout, ctx->info, cur_pack,
+					    preferred, cur_fanout);
+	}
+
+	if (ctx->preferred_pack_idx != NO_PREFERRED_PACK &&
+	    ctx->preferred_pack_idx < start_pack)
+		midx_fanout_add_pack_fanout(fanout, ctx->info,
+					    ctx->preferred_pack_idx, 1,
+					    cur_fanout);
+}
+
 /*
  * It is possible to artificially get into a state where there are many
  * duplicate copies of objects. That can create high memory pressure if
@@ -364,23 +388,7 @@ static void compute_sorted_entries(struct write_midx_context *ctx,
 	for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
 		fanout.nr = 0;
 
-		if (ctx->m && !ctx->incremental)
-			midx_fanout_add_midx_fanout(&fanout, ctx->m, cur_fanout,
-						    ctx->preferred_pack_idx);
-
-		for (cur_pack = start_pack; cur_pack < ctx->nr; cur_pack++) {
-			int preferred = cur_pack == ctx->preferred_pack_idx;
-			midx_fanout_add_pack_fanout(&fanout,
-						    ctx->info, cur_pack,
-						    preferred, cur_fanout);
-		}
-
-		if (ctx->preferred_pack_idx != NO_PREFERRED_PACK &&
-		    ctx->preferred_pack_idx < start_pack)
-			midx_fanout_add_pack_fanout(&fanout, ctx->info,
-						    ctx->preferred_pack_idx, 1,
-						    cur_fanout);
-
+		midx_fanout_add(&fanout, ctx, start_pack, cur_fanout);
 		midx_fanout_sort(&fanout);
 
 		/*
-- 
2.53.0.171.gde83996e422


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 15/17] t/helper/test-read-midx.c: plug memory leak when selecting layer
  2026-02-24 18:59 ` [PATCH v3 00/17] " Taylor Blau
                     ` (13 preceding siblings ...)
  2026-02-24 19:00   ` [PATCH v3 14/17] midx-write.c: factor fanout layering from `compute_sorted_entries()` Taylor Blau
@ 2026-02-24 19:00   ` Taylor Blau
  2026-02-24 19:00   ` [PATCH v3 16/17] midx: implement MIDX compaction Taylor Blau
  2026-02-24 19:00   ` [PATCH v3 17/17] midx: enable reachability bitmaps during " Taylor Blau
  16 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-02-24 19:00 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Though our 'read-midx' test tool is capable of printing information
about a single MIDX layer identified by its checksum, no caller in our
test suite exercises this path.

Unfortunately, there is a memory leak lurking in this (currently) unused
path that would otherwise be exposed by the following commit.

This occurs when providing a MIDX layer checksum other than the tip. As
we walk over the MIDX chain trying to find the matching layer, we drop
our reference to the top-most MIDX layer. Thus, our call to
'close_midx()' later on leaks memory between the top-most MIDX layer and
the MIDX layer immediately following the specified one.

Plug this leak by holding a reference to the tip of the MIDX chain, and
ensure that we call `close_midx()` before terminating the test tool.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/helper/test-read-midx.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index 9d42c587564..388d29e2b53 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -26,9 +26,10 @@ static int read_midx_file(const char *object_dir, const char *checksum,
 			  int show_objects)
 {
 	uint32_t i;
-	struct multi_pack_index *m;
+	struct multi_pack_index *m, *tip;
+	int ret = 0;
 
-	m = setup_midx(object_dir);
+	m = tip = setup_midx(object_dir);
 
 	if (!m)
 		return 1;
@@ -36,8 +37,11 @@ static int read_midx_file(const char *object_dir, const char *checksum,
 	if (checksum) {
 		while (m && strcmp(midx_get_checksum_hex(m), checksum))
 			m = m->base_midx;
-		if (!m)
-			return 1;
+		if (!m) {
+			ret = error(_("could not find MIDX with checksum %s"),
+				    checksum);
+			goto out;
+		}
 	}
 
 	printf("header: %08x %d %d %d %d\n",
@@ -82,9 +86,10 @@ static int read_midx_file(const char *object_dir, const char *checksum,
 		}
 	}
 
-	close_midx(m);
+out:
+	close_midx(tip);
 
-	return 0;
+	return ret;
 }
 
 static int read_midx_checksum(const char *object_dir)
-- 
2.53.0.171.gde83996e422


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 16/17] midx: implement MIDX compaction
  2026-02-24 18:59 ` [PATCH v3 00/17] " Taylor Blau
                     ` (14 preceding siblings ...)
  2026-02-24 19:00   ` [PATCH v3 15/17] t/helper/test-read-midx.c: plug memory leak when selecting layer Taylor Blau
@ 2026-02-24 19:00   ` Taylor Blau
  2026-02-24 19:00   ` [PATCH v3 17/17] midx: enable reachability bitmaps during " Taylor Blau
  16 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-02-24 19:00 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

When managing a MIDX chain with many layers, it is convenient to combine
a sequence of adjacent layers into a single layer to prevent the chain
from growing too long.

While it is conceptually possible to "compact" a sequence of MIDX layers
together by running "git multi-pack-index write --stdin-packs", there
are a few drawbacks that make this less than desirable:

 - Preserving the MIDX chain is impossible, since there is no way to
   write a MIDX layer that contains objects or packs found in an earlier
   MIDX layer already part of the chain. So callers would have to write
   an entirely new (non-incremental) MIDX containing only the compacted
   layers, discarding all other objects/packs from the MIDX.

 - There is (currently) no way to write a MIDX layer outside of the MIDX
   chain to work around the above, such that the MIDX chain could be
   reassembled substituting the compacted layers with the MIDX that was
   written.

 - The `--stdin-packs` command-line option does not allow us to specify
   the order of packs as they appear in the MIDX. Therefore, even if
   there were workarounds for the previous two challenges, any bitmaps
   belonging to layers which come after the compacted layer(s) would no
   longer be valid.

This commit introduces a way to compact a sequence of adjacent MIDX
layers into a single layer while preserving the MIDX chain, as well as
any bitmap(s) in layers which are newer than the compacted ones.

Implementing MIDX compaction does not require a significant number of
changes to how MIDX layers are written. The main changes are as follows:

 - Instead of calling `fill_packs_from_midx()`, we call a new function
   `fill_packs_from_midx_range()`, which walks backwards along the
   portion of the MIDX chain which we are compacting, and adds packs one
   layer a time.

   In order to preserve the pseudo-pack order, the concatenated pack
   order is preserved, with the exception of preferred packs which are
   always added first.

 - After adding entries from the set of packs in the compaction range,
   `compute_sorted_entries()` must adjust the `pack_int_id`'s for all
   objects added in each fanout layer to match their original
   `pack_int_id`'s (as opposed to the index at which each pack appears
   in `ctx.info`).

   Note that we cannot reuse `midx_fanout_add_midx_fanout()` directly
   here, as it unconditionally recurs through the `->base_midx`. Factor
   out a `_1()` variant that operates on a single layer, reimplement
   the existing function in terms of it, and use the new variant from
   `midx_fanout_add_compact()`.

   Since we are sorting the list of objects ourselves, the order we add
   them in does not matter.

 - When writing out the new 'multi-pack-index-chain' file, discard any
   layers in the compaction range, replacing them with the newly written
   layer, instead of keeping them and placing the new layer at the end
   of the chain.

This ends up being sufficient to implement MIDX compaction in such a way
that preserves bitmaps corresponding to more recent layers in the MIDX
chain.

The tests for MIDX compaction are so far fairly spartan, since the main
interesting behavior here is ensuring that the right packs/objects are
selected from each layer, and that the pack order is preserved despite
whether or not they are sorted in lexicographic order in the original
MIDX chain.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.adoc |  13 ++
 builtin/multi-pack-index.c              |  74 +++++++
 midx-write.c                            | 277 +++++++++++++++++++++---
 midx.h                                  |   5 +
 t/meson.build                           |   1 +
 t/t5335-compact-multi-pack-index.sh     | 175 +++++++++++++++
 6 files changed, 518 insertions(+), 27 deletions(-)
 create mode 100755 t/t5335-compact-multi-pack-index.sh

diff --git a/Documentation/git-multi-pack-index.adoc b/Documentation/git-multi-pack-index.adoc
index 164cf1f2291..883a0529741 100644
--- a/Documentation/git-multi-pack-index.adoc
+++ b/Documentation/git-multi-pack-index.adoc
@@ -12,6 +12,8 @@ SYNOPSIS
 'git multi-pack-index' [<options>] write [--preferred-pack=<pack>]
 		         [--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]
 		         [--refs-snapshot=<path>]
+'git multi-pack-index' [<options>] compact [--[no-]incremental]
+		         <from> <to>
 'git multi-pack-index' [<options>] verify
 'git multi-pack-index' [<options>] expire
 'git multi-pack-index' [<options>] repack [--batch-size=<size>]
@@ -83,6 +85,17 @@ marker).
 		necessary.
 --
 
+compact::
+	Write a new MIDX layer containing only objects and packs present
+	in the range `<from>` to `<to>`, where both arguments are
+	checksums of existing layers in the MIDX chain.
++
+--
+	--incremental::
+		Write the result to a MIDX chain instead of writing a
+		stand-alone MIDX.
+--
+
 verify::
 	Verify the contents of the MIDX file.
 
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index c0c6c1760c0..043ee8c478a 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -17,6 +17,10 @@
 	   "  [--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]\n" \
 	   "  [--refs-snapshot=<path>]")
 
+#define BUILTIN_MIDX_COMPACT_USAGE \
+	N_("git multi-pack-index [<options>] compact [--[no-]incremental]\n" \
+	   "  <from> <to>")
+
 #define BUILTIN_MIDX_VERIFY_USAGE \
 	N_("git multi-pack-index [<options>] verify")
 
@@ -30,6 +34,10 @@ static char const * const builtin_multi_pack_index_write_usage[] = {
 	BUILTIN_MIDX_WRITE_USAGE,
 	NULL
 };
+static char const * const builtin_multi_pack_index_compact_usage[] = {
+	BUILTIN_MIDX_COMPACT_USAGE,
+	NULL
+};
 static char const * const builtin_multi_pack_index_verify_usage[] = {
 	BUILTIN_MIDX_VERIFY_USAGE,
 	NULL
@@ -44,6 +52,7 @@ static char const * const builtin_multi_pack_index_repack_usage[] = {
 };
 static char const * const builtin_multi_pack_index_usage[] = {
 	BUILTIN_MIDX_WRITE_USAGE,
+	BUILTIN_MIDX_COMPACT_USAGE,
 	BUILTIN_MIDX_VERIFY_USAGE,
 	BUILTIN_MIDX_EXPIRE_USAGE,
 	BUILTIN_MIDX_REPACK_USAGE,
@@ -195,6 +204,70 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
 	return ret;
 }
 
+static int cmd_multi_pack_index_compact(int argc, const char **argv,
+					const char *prefix,
+					struct repository *repo)
+{
+	struct multi_pack_index *m, *cur;
+	struct multi_pack_index *from_midx = NULL;
+	struct multi_pack_index *to_midx = NULL;
+	struct odb_source *source;
+	int ret;
+
+	struct option *options;
+	static struct option builtin_multi_pack_index_compact_options[] = {
+		OPT_BIT(0, "incremental", &opts.flags,
+			N_("write a new incremental MIDX"), MIDX_WRITE_INCREMENTAL),
+		OPT_END(),
+	};
+
+	repo_config(repo, git_multi_pack_index_write_config, NULL);
+
+	options = add_common_options(builtin_multi_pack_index_compact_options);
+
+	trace2_cmd_mode(argv[0]);
+
+	if (isatty(2))
+		opts.flags |= MIDX_PROGRESS;
+	argc = parse_options(argc, argv, prefix,
+			     options, builtin_multi_pack_index_compact_usage,
+			     0);
+
+	if (argc != 2)
+		usage_with_options(builtin_multi_pack_index_compact_usage,
+				   options);
+	source = handle_object_dir_option(the_repository);
+
+	FREE_AND_NULL(options);
+
+	m = get_multi_pack_index(source);
+
+	for (cur = m; cur && !(from_midx && to_midx); cur = cur->base_midx) {
+		const char *midx_csum = midx_get_checksum_hex(cur);
+
+		if (!from_midx && !strcmp(midx_csum, argv[0]))
+			from_midx = cur;
+		if (!to_midx && !strcmp(midx_csum, argv[1]))
+			to_midx = cur;
+	}
+
+	if (!from_midx)
+		die(_("could not find MIDX: %s"), argv[0]);
+	if (!to_midx)
+		die(_("could not find MIDX: %s"), argv[1]);
+	if (from_midx == to_midx)
+		die(_("MIDX compaction endpoints must be unique"));
+
+	for (m = from_midx; m; m = m->base_midx) {
+		if (m == to_midx)
+			die(_("MIDX %s must be an ancestor of %s"), argv[0], argv[1]);
+	}
+
+	ret = write_midx_file_compact(source, from_midx, to_midx, opts.flags);
+
+	return ret;
+}
+
 static int cmd_multi_pack_index_verify(int argc, const char **argv,
 				       const char *prefix,
 				       struct repository *repo UNUSED)
@@ -295,6 +368,7 @@ int cmd_multi_pack_index(int argc,
 	struct option builtin_multi_pack_index_options[] = {
 		OPT_SUBCOMMAND("repack", &fn, cmd_multi_pack_index_repack),
 		OPT_SUBCOMMAND("write", &fn, cmd_multi_pack_index_write),
+		OPT_SUBCOMMAND("compact", &fn, cmd_multi_pack_index_compact),
 		OPT_SUBCOMMAND("verify", &fn, cmd_multi_pack_index_verify),
 		OPT_SUBCOMMAND("expire", &fn, cmd_multi_pack_index_expire),
 		OPT_END(),
diff --git a/midx-write.c b/midx-write.c
index ca2469213e6..bf53ad1c4b7 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -113,6 +113,10 @@ struct write_midx_context {
 	int incremental;
 	uint32_t num_multi_pack_indexes_before;
 
+	struct multi_pack_index *compact_from;
+	struct multi_pack_index *compact_to;
+	int compact;
+
 	struct string_list *to_include;
 
 	struct repository *repo;
@@ -122,6 +126,8 @@ struct write_midx_context {
 static uint32_t midx_pack_perm(struct write_midx_context *ctx,
 			       uint32_t orig_pack_int_id)
 {
+	if (ctx->compact)
+		orig_pack_int_id -= ctx->compact_from->num_packs_in_base;
 	return ctx->pack_perm[orig_pack_int_id];
 }
 
@@ -268,18 +274,14 @@ static void midx_fanout_sort(struct midx_fanout *fanout)
 	QSORT(fanout->entries, fanout->nr, midx_oid_compare);
 }
 
-static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
-					struct multi_pack_index *m,
-					uint32_t cur_fanout,
-					uint32_t preferred_pack)
+static void midx_fanout_add_midx_fanout_1(struct midx_fanout *fanout,
+					  struct multi_pack_index *m,
+					  uint32_t cur_fanout,
+					  uint32_t preferred_pack)
 {
 	uint32_t start = m->num_objects_in_base, end;
 	uint32_t cur_object;
 
-	if (m->base_midx)
-		midx_fanout_add_midx_fanout(fanout, m->base_midx, cur_fanout,
-					    preferred_pack);
-
 	if (cur_fanout)
 		start += ntohl(m->chunk_oid_fanout[cur_fanout - 1]);
 	end = m->num_objects_in_base + ntohl(m->chunk_oid_fanout[cur_fanout]);
@@ -303,6 +305,17 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
 	}
 }
 
+static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
+					struct multi_pack_index *m,
+					uint32_t cur_fanout,
+					uint32_t preferred_pack)
+{
+	if (m->base_midx)
+		midx_fanout_add_midx_fanout(fanout, m->base_midx, cur_fanout,
+					    preferred_pack);
+	midx_fanout_add_midx_fanout_1(fanout, m, cur_fanout, preferred_pack);
+}
+
 static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout,
 					struct pack_info *info,
 					uint32_t cur_pack,
@@ -352,6 +365,21 @@ static void midx_fanout_add(struct midx_fanout *fanout,
 					    cur_fanout);
 }
 
+static void midx_fanout_add_compact(struct midx_fanout *fanout,
+				    struct write_midx_context *ctx,
+				    uint32_t cur_fanout)
+{
+	struct multi_pack_index *m = ctx->compact_to;
+
+	ASSERT(ctx->compact);
+
+	while (m && m != ctx->compact_from->base_midx) {
+		midx_fanout_add_midx_fanout_1(fanout, m, cur_fanout,
+					      NO_PREFERRED_PACK);
+		m = m->base_midx;
+	}
+}
+
 /*
  * It is possible to artificially get into a state where there are many
  * duplicate copies of objects. That can create high memory pressure if
@@ -370,6 +398,9 @@ static void compute_sorted_entries(struct write_midx_context *ctx,
 	size_t alloc_objects, total_objects = 0;
 	struct midx_fanout fanout = { 0 };
 
+	if (ctx->compact)
+		ASSERT(!start_pack);
+
 	for (cur_pack = start_pack; cur_pack < ctx->nr; cur_pack++)
 		total_objects = st_add(total_objects,
 				       ctx->info[cur_pack].p->num_objects);
@@ -388,7 +419,10 @@ static void compute_sorted_entries(struct write_midx_context *ctx,
 	for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
 		fanout.nr = 0;
 
-		midx_fanout_add(&fanout, ctx, start_pack, cur_fanout);
+		if (ctx->compact)
+			midx_fanout_add_compact(&fanout, ctx, cur_fanout);
+		else
+			midx_fanout_add(&fanout, ctx, start_pack, cur_fanout);
 		midx_fanout_sort(&fanout);
 
 		/*
@@ -956,6 +990,75 @@ static int fill_packs_from_midx(struct write_midx_context *ctx)
 	return 0;
 }
 
+static uint32_t compactible_packs_between(const struct multi_pack_index *from,
+					  const struct multi_pack_index *to)
+{
+	uint32_t nr;
+
+	ASSERT(from && to);
+
+	if (unsigned_add_overflows(to->num_packs, to->num_packs_in_base))
+		die(_("too many packs, unable to compact"));
+
+	nr = to->num_packs + to->num_packs_in_base;
+	if (nr < from->num_packs_in_base)
+		BUG("unexpected number of packs in base during compaction: "
+		    "%"PRIu32" < %"PRIu32, nr, from->num_packs_in_base);
+
+	return nr - from->num_packs_in_base;
+}
+
+static int fill_packs_from_midx_range(struct write_midx_context *ctx,
+				      int bitmap_order)
+{
+	struct multi_pack_index *m = ctx->compact_to;
+	uint32_t packs_nr;
+
+	ASSERT(ctx->compact && !ctx->nr);
+	ASSERT(ctx->compact_from);
+	ASSERT(ctx->compact_to);
+
+	packs_nr = compactible_packs_between(ctx->compact_from,
+					     ctx->compact_to);
+
+	ALLOC_GROW(ctx->info, packs_nr, ctx->alloc);
+
+	while (m != ctx->compact_from->base_midx) {
+		uint32_t pack_int_id, preferred_pack_id;
+		uint32_t i;
+
+		if (bitmap_order) {
+			if (midx_preferred_pack(m, &preferred_pack_id) < 0)
+				die(_("could not determine preferred pack"));
+		} else {
+			preferred_pack_id = m->num_packs_in_base;
+		}
+
+		pack_int_id = m->num_packs_in_base - ctx->compact_from->num_packs_in_base;
+
+		if (fill_pack_from_midx(&ctx->info[pack_int_id++], m,
+					preferred_pack_id) < 0)
+			return -1;
+
+		for (i = m->num_packs_in_base;
+		     i < m->num_packs_in_base + m->num_packs; i++) {
+			if (preferred_pack_id == i)
+				continue;
+
+			if (fill_pack_from_midx(&ctx->info[pack_int_id++], m,
+						i) < 0)
+				return -1;
+		}
+
+		ctx->nr += m->num_packs;
+		m = m->base_midx;
+	}
+
+	ASSERT(ctx->nr == packs_nr);
+
+	return 0;
+}
+
 static struct {
 	const char *non_split;
 	const char *split;
@@ -1075,6 +1178,9 @@ static bool midx_needs_update(struct multi_pack_index *midx, struct write_midx_c
 	if (ctx->incremental)
 		goto out;
 
+	if (ctx->compact)
+		goto out; /* Compaction always requires an update. */
+
 	/*
 	 * Otherwise, we need to verify that the packs covered by the existing
 	 * MIDX match the packs that we already have. The logic to do so is way
@@ -1120,12 +1226,23 @@ static bool midx_needs_update(struct multi_pack_index *midx, struct write_midx_c
 	return needed;
 }
 
+static int midx_hashcmp(const struct multi_pack_index *a,
+			const struct multi_pack_index *b,
+			const struct git_hash_algo *algop)
+{
+	return hashcmp(midx_get_checksum_hash(a), midx_get_checksum_hash(b),
+		       algop);
+}
+
 struct write_midx_opts {
 	struct odb_source *source; /* non-optional */
 
 	struct string_list *packs_to_include;
 	struct string_list *packs_to_drop;
 
+	struct multi_pack_index *compact_from;
+	struct multi_pack_index *compact_to;
+
 	const char *preferred_pack_name;
 	const char *refs_snapshot;
 	unsigned flags;
@@ -1150,6 +1267,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
 	int dropped_packs = 0;
 	int result = -1;
 	const char **keep_hashes = NULL;
+	size_t keep_hashes_nr = 0;
 	struct chunkfile *cf;
 
 	trace2_region_enter("midx", "write_midx_internal", r);
@@ -1162,6 +1280,19 @@ static int write_midx_internal(struct write_midx_opts *opts)
 		die(_("unknown MIDX version: %d"), ctx.version);
 
 	ctx.incremental = !!(opts->flags & MIDX_WRITE_INCREMENTAL);
+	ctx.compact = !!(opts->flags & MIDX_WRITE_COMPACT);
+
+	if (ctx.compact) {
+		if (ctx.version != MIDX_VERSION_V2)
+			die(_("cannot perform MIDX compaction with v1 format"));
+		if (!opts->compact_from)
+			BUG("expected non-NULL 'from' MIDX during compaction");
+		if (!opts->compact_to)
+			BUG("expected non-NULL 'to' MIDX during compaction");
+
+		ctx.compact_from = opts->compact_from;
+		ctx.compact_to = opts->compact_to;
+	}
 
 	if (ctx.incremental)
 		strbuf_addf(&midx_name,
@@ -1189,11 +1320,18 @@ static int write_midx_internal(struct write_midx_opts *opts)
 			 */
 			if (ctx.incremental)
 				ctx.base_midx = m;
-			else if (!opts->packs_to_include)
+			if (!opts->packs_to_include)
 				ctx.m = m;
 		}
 	}
 
+	/*
+	 * If compacting MIDX layer(s) in the range [from, to], then the
+	 * compacted MIDX will share the same base MIDX as 'from'.
+	 */
+	if (ctx.compact)
+		ctx.base_midx = ctx.compact_from->base_midx;
+
 	ctx.nr = 0;
 	ctx.alloc = ctx.m ? ctx.m->num_packs + ctx.m->num_packs_in_base : 16;
 	ctx.info = NULL;
@@ -1210,7 +1348,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
 			ctx.num_multi_pack_indexes_before++;
 			m = m->base_midx;
 		}
-	} else if (ctx.m && fill_packs_from_midx(&ctx)) {
+	} else if (ctx.m && !ctx.compact && fill_packs_from_midx(&ctx)) {
 		goto cleanup;
 	}
 
@@ -1223,9 +1361,18 @@ static int write_midx_internal(struct write_midx_opts *opts)
 	else
 		ctx.progress = NULL;
 
-	ctx.to_include = opts->packs_to_include;
+	if (ctx.compact) {
+		int bitmap_order = 0;
+		if (opts->preferred_pack_name)
+			bitmap_order |= 1;
+		else if (opts->flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))
+			bitmap_order |= 1;
 
-	for_each_file_in_pack_dir(opts->source->path, add_pack_to_midx, &ctx);
+		fill_packs_from_midx_range(&ctx, bitmap_order);
+	} else {
+		ctx.to_include = opts->packs_to_include;
+		for_each_file_in_pack_dir(opts->source->path, add_pack_to_midx, &ctx);
+	}
 	stop_progress(&ctx.progress);
 
 	if (!opts->packs_to_drop) {
@@ -1354,12 +1501,19 @@ static int write_midx_internal(struct write_midx_opts *opts)
 			ctx.large_offsets_needed = 1;
 	}
 
-	QSORT(ctx.info, ctx.nr, pack_info_compare);
+	if (ctx.compact) {
+		if (ctx.version != MIDX_VERSION_V2)
+			BUG("performing MIDX compaction with v1 MIDX");
+	} else {
+		QSORT(ctx.info, ctx.nr, pack_info_compare);
+	}
 
 	if (opts->packs_to_drop && opts->packs_to_drop->nr) {
 		size_t drop_index = 0;
 		int missing_drops = 0;
 
+		ASSERT(!ctx.compact);
+
 		for (size_t i = 0;
 		     i < ctx.nr && drop_index < opts->packs_to_drop->nr; i++) {
 			int cmp = strcmp(ctx.info[i].pack_name,
@@ -1391,12 +1545,20 @@ static int write_midx_internal(struct write_midx_opts *opts)
 	 */
 	ALLOC_ARRAY(ctx.pack_perm, ctx.nr);
 	for (size_t i = 0; i < ctx.nr; i++) {
+		uint32_t from = ctx.info[i].orig_pack_int_id;
+		uint32_t to;
+
 		if (ctx.info[i].expired) {
+			to = PACK_EXPIRED;
 			dropped_packs++;
-			ctx.pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED;
 		} else {
-			ctx.pack_perm[ctx.info[i].orig_pack_int_id] = i - dropped_packs;
+			to = i - dropped_packs;
 		}
+
+		if (ctx.compact)
+			from -= ctx.compact_from->num_packs_in_base;
+
+		ctx.pack_perm[from] = to;
 	}
 
 	for (size_t i = 0; i < ctx.nr; i++) {
@@ -1542,7 +1704,24 @@ static int write_midx_internal(struct write_midx_opts *opts)
 	if (ctx.num_multi_pack_indexes_before == UINT32_MAX)
 		die(_("too many multi-pack-indexes"));
 
-	CALLOC_ARRAY(keep_hashes, ctx.num_multi_pack_indexes_before + 1);
+	if (ctx.compact) {
+		struct multi_pack_index *m;
+
+		/*
+		 * Keep all MIDX layers excluding those in the range [from, to].
+		 */
+		for (m = ctx.base_midx; m; m = m->base_midx)
+			keep_hashes_nr++;
+		for (m = ctx.m;
+		     m && midx_hashcmp(m, ctx.compact_to, r->hash_algo);
+		     m = m->base_midx)
+			keep_hashes_nr++;
+
+		keep_hashes_nr++; /* include the compacted layer */
+	} else {
+		keep_hashes_nr = ctx.num_multi_pack_indexes_before + 1;
+	}
+	CALLOC_ARRAY(keep_hashes, keep_hashes_nr);
 
 	if (ctx.incremental) {
 		FILE *chainf = fdopen_lock_file(&lk, "w");
@@ -1567,17 +1746,47 @@ static int write_midx_internal(struct write_midx_opts *opts)
 
 		strbuf_release(&final_midx_name);
 
-		keep_hashes[ctx.num_multi_pack_indexes_before] =
-			xstrdup(hash_to_hex_algop(midx_hash, r->hash_algo));
+		if (ctx.compact) {
+			struct multi_pack_index *m;
+			uint32_t num_layers_before_from = 0;
+			uint32_t i;
 
-		for (uint32_t i = 0; i < ctx.num_multi_pack_indexes_before; i++) {
-			uint32_t j = ctx.num_multi_pack_indexes_before - i - 1;
+			for (m = ctx.base_midx; m; m = m->base_midx)
+				num_layers_before_from++;
 
-			keep_hashes[j] = xstrdup(midx_get_checksum_hex(m));
-			m = m->base_midx;
+			m = ctx.base_midx;
+			for (i = 0; i < num_layers_before_from; i++) {
+				uint32_t j = num_layers_before_from - i - 1;
+
+				keep_hashes[j] = xstrdup(midx_get_checksum_hex(m));
+				m = m->base_midx;
+			}
+
+			keep_hashes[i] = xstrdup(hash_to_hex_algop(midx_hash,
+								   r->hash_algo));
+
+			i = 0;
+			for (m = ctx.m;
+			     m && midx_hashcmp(m, ctx.compact_to, r->hash_algo);
+			     m = m->base_midx) {
+				keep_hashes[keep_hashes_nr - i - 1] =
+					xstrdup(midx_get_checksum_hex(m));
+				i++;
+			}
+		} else {
+			keep_hashes[ctx.num_multi_pack_indexes_before] =
+				xstrdup(hash_to_hex_algop(midx_hash,
+							  r->hash_algo));
+
+			for (uint32_t i = 0; i < ctx.num_multi_pack_indexes_before; i++) {
+				uint32_t j = ctx.num_multi_pack_indexes_before - i - 1;
+
+				keep_hashes[j] = xstrdup(midx_get_checksum_hex(m));
+				m = m->base_midx;
+			}
 		}
 
-		for (uint32_t i = 0; i <= ctx.num_multi_pack_indexes_before; i++)
+		for (uint32_t i = 0; i < keep_hashes_nr; i++)
 			fprintf(get_lock_file_fp(&lk), "%s\n", keep_hashes[i]);
 	} else {
 		keep_hashes[ctx.num_multi_pack_indexes_before] =
@@ -1590,8 +1799,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
 	if (commit_lock_file(&lk) < 0)
 		die_errno(_("could not write multi-pack-index"));
 
-	clear_midx_files(opts->source, keep_hashes,
-			 ctx.num_multi_pack_indexes_before + 1,
+	clear_midx_files(opts->source, keep_hashes, keep_hashes_nr,
 			 ctx.incremental);
 	result = 0;
 
@@ -1609,7 +1817,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
 	free(ctx.pack_perm);
 	free(ctx.pack_order);
 	if (keep_hashes) {
-		for (uint32_t i = 0; i <= ctx.num_multi_pack_indexes_before; i++)
+		for (uint32_t i = 0; i < keep_hashes_nr; i++)
 			free((char *)keep_hashes[i]);
 		free(keep_hashes);
 	}
@@ -1651,6 +1859,21 @@ int write_midx_file_only(struct odb_source *source,
 	return write_midx_internal(&opts);
 }
 
+int write_midx_file_compact(struct odb_source *source,
+			    struct multi_pack_index *from,
+			    struct multi_pack_index *to,
+			    unsigned flags)
+{
+	struct write_midx_opts opts = {
+		.source = source,
+		.compact_from = from,
+		.compact_to = to,
+		.flags = flags | MIDX_WRITE_COMPACT,
+	};
+
+	return write_midx_internal(&opts);
+}
+
 int expire_midx_packs(struct odb_source *source, unsigned flags)
 {
 	uint32_t i, *count, result = 0;
diff --git a/midx.h b/midx.h
index aa99a6cb215..08f3728e520 100644
--- a/midx.h
+++ b/midx.h
@@ -82,6 +82,7 @@ struct multi_pack_index {
 #define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3)
 #define MIDX_WRITE_BITMAP_LOOKUP_TABLE (1 << 4)
 #define MIDX_WRITE_INCREMENTAL (1 << 5)
+#define MIDX_WRITE_COMPACT (1 << 6)
 
 #define MIDX_EXT_REV "rev"
 #define MIDX_EXT_BITMAP "bitmap"
@@ -131,6 +132,10 @@ int write_midx_file_only(struct odb_source *source,
 			 struct string_list *packs_to_include,
 			 const char *preferred_pack_name,
 			 const char *refs_snapshot, unsigned flags);
+int write_midx_file_compact(struct odb_source *source,
+			    struct multi_pack_index *from,
+			    struct multi_pack_index *to,
+			    unsigned flags);
 void clear_midx_file(struct repository *r);
 int verify_midx_file(struct odb_source *source, unsigned flags);
 int expire_midx_packs(struct odb_source *source, unsigned flags);
diff --git a/t/meson.build b/t/meson.build
index f80e366cff7..2421220917a 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -618,6 +618,7 @@ integration_tests = [
   't5332-multi-pack-reuse.sh',
   't5333-pseudo-merge-bitmaps.sh',
   't5334-incremental-multi-pack-index.sh',
+  't5335-compact-multi-pack-index.sh',
   't5351-unpack-large-objects.sh',
   't5400-send-pack.sh',
   't5401-update-hooks.sh',
diff --git a/t/t5335-compact-multi-pack-index.sh b/t/t5335-compact-multi-pack-index.sh
new file mode 100755
index 00000000000..797ae05c3bd
--- /dev/null
+++ b/t/t5335-compact-multi-pack-index.sh
@@ -0,0 +1,175 @@
+#!/bin/sh
+
+test_description='multi-pack-index compaction'
+
+. ./test-lib.sh
+
+GIT_TEST_MULTI_PACK_INDEX=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=0
+
+objdir=.git/objects
+packdir=$objdir/pack
+midxdir=$packdir/multi-pack-index.d
+midx_chain=$midxdir/multi-pack-index-chain
+
+nth_line() {
+	local n="$1"
+	shift
+	awk "NR==$n" "$@"
+}
+
+write_packs () {
+	for c in "$@"
+	do
+		test_commit "$c" &&
+
+		git pack-objects --all --unpacked $packdir/pack-$c &&
+		git prune-packed &&
+
+		git multi-pack-index write --incremental --bitmap || return 1
+	done
+}
+
+test_midx_layer_packs () {
+	local checksum="$1" &&
+	shift &&
+
+	test-tool read-midx $objdir "$checksum" >out &&
+
+	printf "%s\n" "$@" >expect &&
+	# NOTE: do *not* pipe through sort here, we want to ensure the
+	# order of packs is preserved during compaction.
+	grep "^pack-" out | cut -d"-" -f2 >actual &&
+
+	test_cmp expect actual
+}
+
+test_midx_layer_object_uniqueness () {
+	: >objs.all
+	while read layer
+	do
+		test-tool read-midx --show-objects $objdir "$layer" >out &&
+		grep "\.pack$" out | cut -d" " -f1 | sort >objs.layer &&
+		test_stdout_line_count = 0 comm -12 objs.all objs.layer &&
+		cat objs.all objs.layer | sort >objs.tmp &&
+		mv objs.tmp objs.all || return 1
+	done <$midx_chain
+}
+
+test_expect_success 'MIDX compaction with lex-ordered pack names' '
+	git init midx-compact-lex-order &&
+	(
+		cd midx-compact-lex-order &&
+
+		git config maintenance.auto false &&
+
+		write_packs A B C D E &&
+		test_line_count = 5 $midx_chain &&
+
+		git multi-pack-index compact --incremental \
+			"$(nth_line 2 "$midx_chain")" \
+			"$(nth_line 4 "$midx_chain")" &&
+		test_line_count = 3 $midx_chain &&
+
+		test_midx_layer_packs "$(nth_line 1 "$midx_chain")" A &&
+		test_midx_layer_packs "$(nth_line 2 "$midx_chain")" B C D &&
+		test_midx_layer_packs "$(nth_line 3 "$midx_chain")" E &&
+
+		test_midx_layer_object_uniqueness
+	)
+'
+
+test_expect_success 'MIDX compaction with non-lex-ordered pack names' '
+	git init midx-compact-non-lex-order &&
+	(
+		cd midx-compact-non-lex-order &&
+
+		git config maintenance.auto false &&
+
+		write_packs D C A B E &&
+		test_line_count = 5 $midx_chain &&
+
+		git multi-pack-index compact --incremental \
+			"$(nth_line 2 "$midx_chain")" \
+			"$(nth_line 4 "$midx_chain")" &&
+		test_line_count = 3 $midx_chain &&
+
+		test_midx_layer_packs "$(nth_line 1 "$midx_chain")" D &&
+		test_midx_layer_packs "$(nth_line 2 "$midx_chain")" C A B &&
+		test_midx_layer_packs "$(nth_line 3 "$midx_chain")" E &&
+
+		test_midx_layer_object_uniqueness
+	)
+'
+
+test_expect_success 'setup for bogus MIDX compaction scenarios' '
+	git init midx-compact-bogus &&
+	(
+		cd midx-compact-bogus &&
+
+		git config maintenance.auto false &&
+
+		write_packs A B C
+	)
+'
+
+test_expect_success 'MIDX compaction with missing endpoints' '
+	(
+		cd midx-compact-bogus &&
+
+		test_must_fail git multi-pack-index compact --incremental \
+			"<missing>" "<missing>" 2>err &&
+		test_grep "could not find MIDX: <missing>" err &&
+
+		test_must_fail git multi-pack-index compact --incremental \
+			"<missing>" "$(nth_line 2 "$midx_chain")" 2>err &&
+		test_grep "could not find MIDX: <missing>" err &&
+
+		test_must_fail git multi-pack-index compact --incremental \
+			"$(nth_line 2 "$midx_chain")" "<missing>" 2>err &&
+		test_grep "could not find MIDX: <missing>" err
+	)
+'
+
+test_expect_success 'MIDX compaction with reversed endpoints' '
+	(
+		cd midx-compact-bogus &&
+
+		from="$(nth_line 3 "$midx_chain")" &&
+		to="$(nth_line 1 "$midx_chain")" &&
+
+		test_must_fail git multi-pack-index compact --incremental \
+			"$from" "$to" 2>err &&
+
+		test_grep "MIDX $from must be an ancestor of $to" err
+	)
+'
+
+test_expect_success 'MIDX compaction with identical endpoints' '
+	(
+		cd midx-compact-bogus &&
+
+		from="$(nth_line 3 "$midx_chain")" &&
+		to="$(nth_line 3 "$midx_chain")" &&
+
+		test_must_fail git multi-pack-index compact --incremental \
+			"$from" "$to" 2>err &&
+
+		test_grep "MIDX compaction endpoints must be unique" err
+	)
+'
+
+test_expect_success 'MIDX compaction with midx.version=1' '
+	(
+		cd midx-compact-bogus &&
+
+		test_must_fail git -c midx.version=1 multi-pack-index compact \
+			"$(nth_line 1 "$midx_chain")" \
+			"$(nth_line 2 "$midx_chain")" 2>err &&
+
+		test_grep "fatal: cannot perform MIDX compaction with v1 format" err
+	)
+'
+
+test_done
-- 
2.53.0.171.gde83996e422


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 17/17] midx: enable reachability bitmaps during MIDX compaction
  2026-02-24 18:59 ` [PATCH v3 00/17] " Taylor Blau
                     ` (15 preceding siblings ...)
  2026-02-24 19:00   ` [PATCH v3 16/17] midx: implement MIDX compaction Taylor Blau
@ 2026-02-24 19:00   ` Taylor Blau
  16 siblings, 0 replies; 99+ messages in thread
From: Taylor Blau @ 2026-02-24 19:00 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Enable callers to generate reachability bitmaps when performing MIDX
layer compaction by combining all existing bitmaps from the compacted
layers.

Note that because of the object/pack ordering described by the previous
commit, the pseudo-pack order for the compacted MIDX is the same as
concatenating the individual pseudo-pack orderings for each layer in the
compaction range.

As a result, the only non-test or documentation change necessary is to
treat all objects as non-preferred during compaction so as not to
disturb the object ordering.

In the future, we may want to adjust which commit(s) receive
reachability bitmaps when compacting multiple .bitmap files into one, or
even generate new bitmaps (e.g., if the references have moved
significantly since the .bitmap was generated). This commit only
implements combining all existing bitmaps in range together in order to
demonstrate and lay the groundwork for more exotic strategies.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.adoc |   5 +-
 builtin/multi-pack-index.c              |   4 +-
 midx-write.c                            |   2 +-
 t/t5335-compact-multi-pack-index.sh     | 122 +++++++++++++++++++++++-
 4 files changed, 128 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-multi-pack-index.adoc b/Documentation/git-multi-pack-index.adoc
index 883a0529741..61256830141 100644
--- a/Documentation/git-multi-pack-index.adoc
+++ b/Documentation/git-multi-pack-index.adoc
@@ -13,7 +13,7 @@ SYNOPSIS
 		         [--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]
 		         [--refs-snapshot=<path>]
 'git multi-pack-index' [<options>] compact [--[no-]incremental]
-		         <from> <to>
+		         [--[no-]bitmap] <from> <to>
 'git multi-pack-index' [<options>] verify
 'git multi-pack-index' [<options>] expire
 'git multi-pack-index' [<options>] repack [--batch-size=<size>]
@@ -94,6 +94,9 @@ compact::
 	--incremental::
 		Write the result to a MIDX chain instead of writing a
 		stand-alone MIDX.
+
+	--[no-]bitmap::
+		Control whether or not a multi-pack bitmap is written.
 --
 
 verify::
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 043ee8c478a..2f24c113c8f 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -19,7 +19,7 @@
 
 #define BUILTIN_MIDX_COMPACT_USAGE \
 	N_("git multi-pack-index [<options>] compact [--[no-]incremental]\n" \
-	   "  <from> <to>")
+	   "  [--[no-]bitmap] <from> <to>")
 
 #define BUILTIN_MIDX_VERIFY_USAGE \
 	N_("git multi-pack-index [<options>] verify")
@@ -216,6 +216,8 @@ static int cmd_multi_pack_index_compact(int argc, const char **argv,
 
 	struct option *options;
 	static struct option builtin_multi_pack_index_compact_options[] = {
+		OPT_BIT(0, "bitmap", &opts.flags, N_("write multi-pack bitmap"),
+			MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
 		OPT_BIT(0, "incremental", &opts.flags,
 			N_("write a new incremental MIDX"), MIDX_WRITE_INCREMENTAL),
 		OPT_END(),
diff --git a/midx-write.c b/midx-write.c
index bf53ad1c4b7..0ff2e45aa7a 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -676,7 +676,7 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx)
 		struct pack_midx_entry *e = &ctx->entries[i];
 		data[i].nr = i;
 		data[i].pack = midx_pack_perm(ctx, e->pack_int_id);
-		if (!e->preferred)
+		if (!e->preferred || ctx->compact)
 			data[i].pack |= (1U << 31);
 		data[i].offset = e->offset;
 	}
diff --git a/t/t5335-compact-multi-pack-index.sh b/t/t5335-compact-multi-pack-index.sh
index 797ae05c3bd..40f3844282f 100755
--- a/t/t5335-compact-multi-pack-index.sh
+++ b/t/t5335-compact-multi-pack-index.sh
@@ -67,7 +67,7 @@ test_expect_success 'MIDX compaction with lex-ordered pack names' '
 		write_packs A B C D E &&
 		test_line_count = 5 $midx_chain &&
 
-		git multi-pack-index compact --incremental \
+		git multi-pack-index compact --incremental --bitmap \
 			"$(nth_line 2 "$midx_chain")" \
 			"$(nth_line 4 "$midx_chain")" &&
 		test_line_count = 3 $midx_chain &&
@@ -90,7 +90,7 @@ test_expect_success 'MIDX compaction with non-lex-ordered pack names' '
 		write_packs D C A B E &&
 		test_line_count = 5 $midx_chain &&
 
-		git multi-pack-index compact --incremental \
+		git multi-pack-index compact --incremental --bitmap \
 			"$(nth_line 2 "$midx_chain")" \
 			"$(nth_line 4 "$midx_chain")" &&
 		test_line_count = 3 $midx_chain &&
@@ -172,4 +172,122 @@ test_expect_success 'MIDX compaction with midx.version=1' '
 	)
 '
 
+midx_objs_by_pack () {
+	awk '/\.pack$/ { split($3, a, "-"); print a[2], $1 }' | sort
+}
+
+tag_objs_from_pack () {
+	objs="$(git rev-list --objects --no-object-names "$2")" &&
+	printf "$1 %s\n" $objs | sort
+}
+
+test_expect_success 'MIDX compaction preserves pack object selection' '
+	git init midx-compact-preserve-selection &&
+	(
+		cd midx-compact-preserve-selection &&
+
+		git config maintenance.auto false &&
+
+		test_commit A &&
+		test_commit B &&
+
+		# Create two packs, one containing just the objects from
+		# A, and another containing all objects from the
+		# repository.
+		p1="$(echo A | git pack-objects --revs --delta-base-offset \
+			$packdir/pack-1)" &&
+		p0="$(echo B | git pack-objects --revs --delta-base-offset \
+			$packdir/pack-0)" &&
+
+		echo "pack-1-$p1.idx" | git multi-pack-index write \
+			--incremental --bitmap --stdin-packs &&
+		echo "pack-0-$p0.idx" | git multi-pack-index write \
+			--incremental --bitmap --stdin-packs &&
+
+		write_packs C &&
+
+		git multi-pack-index compact --incremental --bitmap \
+			"$(nth_line 1 "$midx_chain")" \
+			"$(nth_line 2 "$midx_chain")" &&
+
+
+		test-tool read-midx --show-objects $objdir \
+			"$(nth_line 1 "$midx_chain")" >AB.info &&
+		test-tool read-midx --show-objects $objdir \
+			"$(nth_line 2 "$midx_chain")" >C.info &&
+
+		midx_objs_by_pack <AB.info >AB.actual &&
+		midx_objs_by_pack <C.info >C.actual &&
+
+		{
+			tag_objs_from_pack 1 A &&
+			tag_objs_from_pack 0 A..B
+		} | sort >AB.expect &&
+		tag_objs_from_pack C B..C >C.expect &&
+
+		test_cmp AB.expect AB.actual &&
+		test_cmp C.expect C.actual
+	)
+'
+
+test_expect_success 'MIDX compaction with bitmaps' '
+	git init midx-compact-with-bitmaps &&
+	(
+		cd midx-compact-with-bitmaps &&
+
+		git config maintenance.auto false &&
+
+		write_packs foo bar baz quux woot &&
+
+		test-tool read-midx --bitmap $objdir >bitmap.expect &&
+		git multi-pack-index compact --incremental --bitmap \
+			"$(nth_line 2 "$midx_chain")" \
+			"$(nth_line 4 "$midx_chain")" &&
+		test-tool read-midx --bitmap $objdir >bitmap.actual &&
+
+		test_cmp bitmap.expect bitmap.actual &&
+
+		true
+	)
+'
+
+test_expect_success 'MIDX compaction with bitmaps (non-trivial)' '
+	git init midx-compact-with-bitmaps-non-trivial &&
+	(
+		cd midx-compact-with-bitmaps-non-trivial &&
+
+		git config maintenance.auto false &&
+
+		git branch -m main &&
+
+		#               D(4)
+		#              /
+		# A(1) --- B(2) --- C(3) --- G(7)
+		#              \
+		#               E(5) --- F(6)
+		write_packs A B C &&
+		git checkout -b side &&
+		write_packs D &&
+		git checkout -b other B &&
+		write_packs E F &&
+		git checkout main &&
+		write_packs G &&
+
+		# Compact layers 2-4, leaving us with:
+		#
+		#  [A, [B, C, D], E, F, G]
+		git multi-pack-index compact --incremental --bitmap \
+			"$(nth_line 2 "$midx_chain")" \
+			"$(nth_line 4 "$midx_chain")" &&
+
+		# Then compact the top two layers, condensing the above
+		# such that the new 4th layer contains F and G.
+		#
+		#  [A, [B, C, D], E, [F, G]]
+		git multi-pack-index compact --incremental --bitmap \
+			"$(nth_line 4 "$midx_chain")" \
+			"$(nth_line 5 "$midx_chain")"
+	)
+'
+
 test_done
-- 
2.53.0.171.gde83996e422

^ permalink raw reply related	[flat|nested] 99+ messages in thread

end of thread, other threads:[~2026-02-24 19:08 UTC | newest]

Thread overview: 99+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
2025-12-06 20:31 ` [PATCH 01/17] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
2025-12-08 18:26   ` Patrick Steinhardt
2025-12-09  1:41     ` Taylor Blau
2025-12-06 20:31 ` [PATCH 02/17] midx: split `get_midx_checksum()` by adding `get_midx_hash()` Taylor Blau
2025-12-08 18:25   ` Patrick Steinhardt
2025-12-09  1:42     ` Taylor Blau
2025-12-09  1:50       ` Taylor Blau
2025-12-09  6:27         ` Patrick Steinhardt
2026-01-13 22:46           ` Taylor Blau
2025-12-06 20:31 ` [PATCH 03/17] builtin/multi-pack-index.c: make '--progress' a common option Taylor Blau
2025-12-06 20:31 ` [PATCH 04/17] git-multi-pack-index(1): remove non-existent incompatibility Taylor Blau
2025-12-06 20:31 ` [PATCH 05/17] git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h' Taylor Blau
2025-12-06 20:31 ` [PATCH 06/17] t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39 Taylor Blau
2025-12-06 20:31 ` [PATCH 07/17] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos` Taylor Blau
2025-12-08 18:26   ` Patrick Steinhardt
2025-12-09  1:59     ` Taylor Blau
2025-12-06 20:31 ` [PATCH 08/17] midx-write.c: introduce `struct write_midx_opts` Taylor Blau
2025-12-08 18:26   ` Patrick Steinhardt
2025-12-09  2:04     ` Taylor Blau
2025-12-06 20:31 ` [PATCH 09/17] midx: do not require packs to be sorted in lexicographic order Taylor Blau
2025-12-08 18:26   ` Patrick Steinhardt
2025-12-09  2:07     ` Taylor Blau
2025-12-09  2:11       ` Taylor Blau
2025-12-06 20:31 ` [PATCH 10/17] git-compat-util.h: introduce `u32_add()` Taylor Blau
2025-12-08 18:27   ` Patrick Steinhardt
2025-12-09  2:13     ` Taylor Blau
2025-12-06 20:31 ` [PATCH 11/17] midx-write.c: introduce `midx_pack_perm()` helper Taylor Blau
2025-12-06 20:31 ` [PATCH 12/17] midx-write.c: extract `fill_pack_from_midx()` Taylor Blau
2025-12-06 20:31 ` [PATCH 13/17] midx-write.c: enumerate `pack_int_id` values directly Taylor Blau
2025-12-08 18:27   ` Patrick Steinhardt
2025-12-09  2:14     ` Taylor Blau
2025-12-06 20:31 ` [PATCH 14/17] midx-write.c: factor fanout layering from `compute_sorted_entries()` Taylor Blau
2025-12-06 20:31 ` [PATCH 15/17] t/helper/test-read-midx.c: plug memory leak when selecting layer Taylor Blau
2025-12-08 18:27   ` Patrick Steinhardt
2025-12-09  2:16     ` Taylor Blau
2025-12-06 20:31 ` [PATCH 16/17] midx: implement MIDX compaction Taylor Blau
2025-12-09  7:21   ` Patrick Steinhardt
2026-01-13 23:32     ` Taylor Blau
2025-12-06 20:31 ` [PATCH 17/17] midx: enable reachability bitmaps during " Taylor Blau
2025-12-09  7:21   ` Patrick Steinhardt
2026-01-13 23:47     ` Taylor Blau
2026-01-14 19:54 ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Taylor Blau
2026-01-14 19:54   ` [PATCH v2 01/18] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
2026-01-14 19:54   ` [PATCH v2 02/18] midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()` Taylor Blau
2026-01-14 19:54   ` [PATCH v2 03/18] midx: introduce `midx_get_checksum_hex()` Taylor Blau
2026-01-14 19:54   ` [PATCH v2 04/18] builtin/multi-pack-index.c: make '--progress' a common option Taylor Blau
2026-01-14 19:54   ` [PATCH v2 05/18] git-multi-pack-index(1): remove non-existent incompatibility Taylor Blau
2026-01-14 19:54   ` [PATCH v2 06/18] git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h' Taylor Blau
2026-01-14 19:54   ` [PATCH v2 07/18] t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39 Taylor Blau
2026-01-14 19:54   ` [PATCH v2 08/18] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos` Taylor Blau
2026-01-14 21:13     ` Junio C Hamano
2026-01-14 21:40       ` Taylor Blau
2026-01-14 19:54   ` [PATCH v2 09/18] midx-write.c: introduce `struct write_midx_opts` Taylor Blau
2026-01-14 19:54   ` [PATCH v2 10/18] midx: do not require packs to be sorted in lexicographic order Taylor Blau
2026-01-14 21:28     ` Junio C Hamano
2026-01-14 21:44       ` Taylor Blau
2026-01-27  7:34     ` Patrick Steinhardt
2026-02-24 18:47       ` Taylor Blau
2026-01-14 19:54   ` [PATCH v2 11/18] git-compat-util.h: introduce `u32_add()` Taylor Blau
2026-01-14 21:49     ` Junio C Hamano
2026-01-14 22:03       ` Taylor Blau
2026-01-15  0:11         ` Taylor Blau
2026-01-21  8:51           ` Patrick Steinhardt
2026-01-21 23:55             ` Taylor Blau
2026-01-22  2:26               ` rsbecker
2026-01-22 17:07                 ` Junio C Hamano
2026-02-23 13:49               ` Jeff King
2026-02-24 18:53                 ` Taylor Blau
2026-01-14 19:54   ` [PATCH v2 12/18] midx-write.c: introduce `midx_pack_perm()` helper Taylor Blau
2026-01-14 19:54   ` [PATCH v2 13/18] midx-write.c: extract `fill_pack_from_midx()` Taylor Blau
2026-01-14 19:54   ` [PATCH v2 14/18] midx-write.c: enumerate `pack_int_id` values directly Taylor Blau
2026-01-14 19:55   ` [PATCH v2 15/18] midx-write.c: factor fanout layering from `compute_sorted_entries()` Taylor Blau
2026-01-14 19:55   ` [PATCH v2 16/18] t/helper/test-read-midx.c: plug memory leak when selecting layer Taylor Blau
2026-01-14 19:55   ` [PATCH v2 17/18] midx: implement MIDX compaction Taylor Blau
2026-01-27  7:35     ` Patrick Steinhardt
2026-01-27 22:13       ` Taylor Blau
2026-01-14 19:55   ` [PATCH v2 18/18] midx: enable reachability bitmaps during " Taylor Blau
2026-02-20 22:24   ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Junio C Hamano
2026-02-23 14:08     ` Jeff King
2026-02-24  5:25       ` Taylor Blau
2026-02-24 18:59 ` [PATCH v3 00/17] " Taylor Blau
2026-02-24 18:59   ` [PATCH v3 01/17] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
2026-02-24 18:59   ` [PATCH v3 02/17] midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()` Taylor Blau
2026-02-24 18:59   ` [PATCH v3 03/17] midx: introduce `midx_get_checksum_hex()` Taylor Blau
2026-02-24 18:59   ` [PATCH v3 04/17] builtin/multi-pack-index.c: make '--progress' a common option Taylor Blau
2026-02-24 18:59   ` [PATCH v3 05/17] git-multi-pack-index(1): remove non-existent incompatibility Taylor Blau
2026-02-24 18:59   ` [PATCH v3 06/17] git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h' Taylor Blau
2026-02-24 19:00   ` [PATCH v3 07/17] t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39 Taylor Blau
2026-02-24 19:00   ` [PATCH v3 08/17] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos` Taylor Blau
2026-02-24 19:00   ` [PATCH v3 09/17] midx-write.c: introduce `struct write_midx_opts` Taylor Blau
2026-02-24 19:00   ` [PATCH v3 10/17] midx: do not require packs to be sorted in lexicographic order Taylor Blau
2026-02-24 19:00   ` [PATCH v3 11/17] midx-write.c: introduce `midx_pack_perm()` helper Taylor Blau
2026-02-24 19:00   ` [PATCH v3 12/17] midx-write.c: extract `fill_pack_from_midx()` Taylor Blau
2026-02-24 19:00   ` [PATCH v3 13/17] midx-write.c: enumerate `pack_int_id` values directly Taylor Blau
2026-02-24 19:00   ` [PATCH v3 14/17] midx-write.c: factor fanout layering from `compute_sorted_entries()` Taylor Blau
2026-02-24 19:00   ` [PATCH v3 15/17] t/helper/test-read-midx.c: plug memory leak when selecting layer Taylor Blau
2026-02-24 19:00   ` [PATCH v3 16/17] midx: implement MIDX compaction Taylor Blau
2026-02-24 19:00   ` [PATCH v3 17/17] midx: enable reachability bitmaps during " Taylor Blau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox