git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/19] midx: incremental multi-pack indexes, part one
@ 2024-06-06 23:04 Taylor Blau
  2024-06-06 23:04 ` [PATCH 01/19] Documentation: describe incremental MIDX format Taylor Blau
                   ` (23 more replies)
  0 siblings, 24 replies; 102+ messages in thread
From: Taylor Blau @ 2024-06-06 23:04 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

This series implements incremental MIDXs, which allow for storing
a MIDX across multiple layers, each with their own distinct set of
packs.

MOTIVATION
==========

Doing so allows large repositories to make use of the MIDX feature
without having to rewrite the entire MIDX every time they want to update
the set of packs contained in the MIDX. For extremely large
repositories, doing so is often infeasible.

OVERVIEW
========

This series implements the first component of incremental MIDXs, meaning
by the end of it you can run:

    $ git multi-pack-index write --incremental

a couple of times, and produce a directory structure like:

    $ .git/objects/pack/multi-pack-index.d
    .git/objects/pack/multi-pack-index.d
    ├── multi-pack-index-chain
    ├── multi-pack-index-baa53bc5092bed50378fe9232ae7878828df2890.midx
    └── multi-pack-index-f60023a8a104be94eab96dd7c42a6a5db67c82ba.midx

where each *.midx file behaves the same way as existing non-incremental
MIDX implementation behaves today, but in a way that stitches together
multiple MIDX "layers" without having to rewrite the whole MIDX anytime
you want to make a modification to it.

This is "part one" of a multi-part series. The overview of how all of
these series fit together is as follows:

  - "Part zero": preparatory work like 'tb/midx-write-cleanup' and my
    series to clean up temporary file handling [1, 2].

  - "Part one": this series, which enables reading and writing
    incremental MIDXs, but does not have support for more advanced
    features like bitmaps support or rewriting parts of the MIDX chain.

  - "Part two": the next series, which builds on support for multi-pack
    reachability bitmaps in an incremental MIDX world, meaning that each
    `*.midx` layer can have its own `*.bitmap`, and the bitmaps at each
    layer can be used together.

  - "Part three": which supports more advanced management of the MIDX
    chain, like compressing intermediate layers to avoid the chain
    growing too long.

Parts zero, one, and two all exist, and the first two have been shared
with the list. Part two exists in ttaylorr/git [3], but is excluded from
this series to keep the length manageable. I avoided sending this series
until I was confident that bitmaps worked on top of incremental MIDXs to
avoid designing ourselves into a corner.

Part three doesn't exist yet, but is straightforward to do on top. None
of the design decisions made in this series inhibit my goals for part
three.

[1]: https://lore.kernel.org/git/cover.1717023301.git.me@ttaylorr.com/
[2]: https://lore.kernel.org/git/cover.1717712358.git.me@ttaylorr.com/
[3]: https://github.com/ttaylorr/git/compare/tb/incremental-midx...ttaylorr:git:tb/incremental-midx-bitmaps

Taylor Blau (19):
  Documentation: describe incremental MIDX format
  midx: add new fields for incremental MIDX chains
  midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs
  midx: teach `prepare_midx_pack()` about incremental MIDXs
  midx: teach `nth_midxed_object_oid()` about incremental MIDXs
  midx: teach `nth_bitmapped_pack()` about incremental MIDXs
  midx: introduce `bsearch_one_midx()`
  midx: teach `bsearch_midx()` about incremental MIDXs
  midx: teach `nth_midxed_offset()` about incremental MIDXs
  midx: teach `fill_midx_entry()` about incremental MIDXs
  midx: remove unused `midx_locate_pack()`
  midx: teach `midx_contains_pack()` about incremental MIDXs
  midx: teach `midx_preferred_pack()` about incremental MIDXs
  midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs
  midx: support reading incremental MIDX chains
  midx: implement verification support for incremental MIDXs
  t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  t/t5313-pack-bounds-checks.sh: prepare for sub-directories
  midx: implement support for writing incremental MIDX chains

 Documentation/git-multi-pack-index.txt       |  11 +-
 Documentation/technical/multi-pack-index.txt | 100 +++++
 builtin/multi-pack-index.c                   |   2 +
 builtin/repack.c                             |   8 +-
 ci/run-build-and-tests.sh                    |   2 +-
 midx-write.c                                 | 293 +++++++++++--
 midx.c                                       | 410 ++++++++++++++++---
 midx.h                                       |  26 +-
 object-name.c                                |  99 ++---
 packfile.c                                   |  21 +-
 packfile.h                                   |   4 +
 t/README                                     |   6 +-
 t/helper/test-read-midx.c                    |  24 +-
 t/lib-bitmap.sh                              |   6 +-
 t/lib-midx.sh                                |  28 ++
 t/t0410-partial-clone.sh                     |   2 -
 t/t5310-pack-bitmaps.sh                      |   4 -
 t/t5313-pack-bounds-checks.sh                |   8 +-
 t/t5319-multi-pack-index.sh                  |  30 +-
 t/t5326-multi-pack-bitmaps.sh                |   4 +-
 t/t5327-multi-pack-bitmaps-rev.sh            |   6 +-
 t/t5332-multi-pack-reuse.sh                  |   2 +
 t/t5334-incremental-multi-pack-index.sh      |  46 +++
 t/t7700-repack.sh                            |  48 +--
 24 files changed, 935 insertions(+), 255 deletions(-)
 create mode 100755 t/t5334-incremental-multi-pack-index.sh


base-commit: 680474691b4639280a73baa0bb8792634f99f611
-- 
2.45.2.437.gecb9450a0e

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 01/19] Documentation: describe incremental MIDX format
  2024-06-06 23:04 [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
@ 2024-06-06 23:04 ` Taylor Blau
  2024-06-06 23:04 ` [PATCH 02/19] midx: add new fields for incremental MIDX chains Taylor Blau
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-06-06 23:04 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Prepare to implement incremental multi-pack indexes (MIDXs) over the
next several commits by first describing the relevant prerequisites
(like a new chunk in the MIDX format, the directory structure for
incremental MIDXs, etc.)

The format is described in detail in the patch contents below, but the
high-level description is as follows.

Incremental MIDXs live in $GIT_DIR/objects/pack/multi-pack-index.d, and
each `*.midx` within that directory has a single "parent" MIDX, which is
the MIDX layer immediately before it in the MIDX chain. The chain order
resides in a file 'multi-pack-index-chain' in the same directory.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/technical/multi-pack-index.txt | 100 +++++++++++++++++++
 1 file changed, 100 insertions(+)

diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
index f2221d2b44..d05e3d6dd9 100644
--- a/Documentation/technical/multi-pack-index.txt
+++ b/Documentation/technical/multi-pack-index.txt
@@ -61,6 +61,106 @@ Design Details
 - The MIDX file format uses a chunk-based approach (similar to the
   commit-graph file) that allows optional data to be added.
 
+Incremental multi-pack indexes
+------------------------------
+
+As repositories grow in size, it becomes more expensive to write a
+multi-pack index (MIDX) that includes all packfiles. To accommodate
+this, the "incremental multi-pack indexes" feature allows for combining
+a "chain" of multi-pack indexes.
+
+Each individual component of the chain need only contain a small number
+of packfiles. Appending to the chain does not invalidate earlier parts
+of the chain, so repositories can control how much time is spent
+updating the MIDX chain by determining the number of packs in each layer
+of the MIDX chain.
+
+=== Design state
+
+At present, the incremental multi-pack indexes feature is missing two
+important components:
+
+  - The ability to rewrite earlier portions of the MIDX chain (i.e., to
+    "compact" some collection of adjacent MIDX layers into a single
+    MIDX). At present the only supported way of shrinking a MIDX chain
+    is to rewrite the entire chain from scratch without the `--split`
+    flag.
++
+There are no fundamental limitations that stand in the way of being able
+to implement this feature. It is omitted from the initial implementation
+in order to reduce the complexity, but will be added later.
+
+  - Support for reachability bitmaps. The classic single MIDX
+    implementation does support reachability bitmaps (see the section
+    titled "multi-pack-index reverse indexes" in
+    linkgit:gitformat-pack[5] for more details).
++
+As above, there are no fundamental limitations that stand in the way of
+extending the incremental MIDX format to support reachability bitmaps.
+The design below specifically takes this into account, and support for
+reachability bitmaps will be added in a future patch series. It is
+omitted from this series for the same reason as above.
++
+In brief, to support reachability bitmaps with the incremental MIDX
+feature, the concept of the pseudo-pack order is extended across each
+layer of the incremental MIDX chain to form a concatenated pseudo-pack
+order. This concatenation takes place in the same order as the chain
+itself (in other words, the concatenated pseudo-pack order for a chain
+`{$H1, $H2, $H3}` would be the pseudo-pack order for `$H1`, followed by
+the pseudo-pack order for `$H2`, followed by the pseudo-pack order for
+`$H3`).
++
+The layout will then be extended so that each layer of the incremental
+MIDX chain can write a `*.bitmap`. The objects in each layer's bitmap
+are offset by the number of objects in the previous layers of the chain.
+
+=== File layout
+
+Instead of storing a single `multi-pack-index` file (with an optional
+`.rev` and `.bitmap` extension) in `$GIT_DIR/objects/pack`, incremental
+MIDXs are stored in the following layout:
+
+----
+$GIT_DIR/objects/pack/multi-pack-index.d/
+$GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-chain
+$GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-$H1.midx
+$GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-$H2.midx
+$GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-$H3.midx
+----
+
+The `multi-pack-index-chain` file contains a list of the incremental
+MIDX files in the chain, in order. The above example shows a chain whose
+`multi-pack-index-chain` file would contain the following lines:
+
+----
+$H1
+$H2
+$H3
+----
+
+The `multi-pack-index-$H1.midx` file contains the first layer of the
+multi-pack-index chain. The `multi-pack-index-$H2.midx` file contains
+the second layer of the chain, and so on.
+
+=== Object positions for incremental MIDXs
+
+In the original multi-pack-index design, we refer to objects via their
+lexicographic position (by object IDs) within the repository's singular
+multi-pack-index. In the incremental multi-pack-index design, we refer
+to objects via their index into a concatenated lexicographic ordering
+among each component in the MIDX chain.
+
+If `objects_nr()` is a function that returns the number of objects in a
+given MIDX layer, then the index of an object at lexicographic position
+`i` within, say, $H3 is defined as:
+
+----
+objects_nr($H2) + objects_nr($H1) + i
+----
+
+(in the C implementation, this is often computed as `i +
+m->num_objects_in_base`).
+
 Future Work
 -----------
 
-- 
2.45.2.437.gecb9450a0e


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 02/19] midx: add new fields for incremental MIDX chains
  2024-06-06 23:04 [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
  2024-06-06 23:04 ` [PATCH 01/19] Documentation: describe incremental MIDX format Taylor Blau
@ 2024-06-06 23:04 ` Taylor Blau
  2024-06-06 23:04 ` [PATCH 03/19] midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs Taylor Blau
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-06-06 23:04 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The incremental MIDX chain feature is designed around the idea of
indexing into a concatenated lexicographic ordering of object IDs
present in the MIDX.

When given an object position, the MIDX machinery needs to be able to
locate both (a) which MIDX layer contains the given object, and (b) at
what position *within that MIDX layer* that object appears.

To do this, three new fields are added to the `struct multi_pack_index`:

  - struct multi_pack_index *base_midx;
  - uint32_t num_objects_in_base;
  - uint32_t num_packs_in_base;

These three fields store the pieces of information suggested by their
respective field names. In turn, the `num_objects_in_base` and
`num_packs_in_base` fields are used to crawl backwards along the
`base_midx` pointer to locate the appropriate position for a given
object within the MIDX that contains it.

The following commits will update various parts of the MIDX machinery
(as well as their callers from outside of midx.c and midx-write.c) to be
aware and make use of these fields when performing object lookups.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/midx.h b/midx.h
index 8554f2d616..020e49f77c 100644
--- a/midx.h
+++ b/midx.h
@@ -63,6 +63,10 @@ struct multi_pack_index {
 	const unsigned char *chunk_revindex;
 	size_t chunk_revindex_len;
 
+	struct multi_pack_index *base_midx;
+	uint32_t num_objects_in_base;
+	uint32_t num_packs_in_base;
+
 	const char **pack_names;
 	struct packed_git **packs;
 	char object_dir[FLEX_ARRAY];
-- 
2.45.2.437.gecb9450a0e


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 03/19] midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs
  2024-06-06 23:04 [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
  2024-06-06 23:04 ` [PATCH 01/19] Documentation: describe incremental MIDX format Taylor Blau
  2024-06-06 23:04 ` [PATCH 02/19] midx: add new fields for incremental MIDX chains Taylor Blau
@ 2024-06-06 23:04 ` Taylor Blau
  2024-06-06 23:04 ` [PATCH 04/19] midx: teach `prepare_midx_pack()` " Taylor Blau
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-06-06 23:04 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The function `nth_midxed_pack_int_id()` takes in a object position in
MIDX lexicographic order and returns an identifier of the pack from
which that object was selected in the MIDX.

Currently, the given object position is an index into the lexicographic
order of objects in a single MIDX. Change this position to instead refer
into the concatenated lexicographic order of all MIDXs in a MIDX chain.

This has two visible effects within the implementation of
`prepare_midx_pack()`:

  - First, the given position is now an index into the concatenated
    lexicographic order of all MIDXs in the order in which they appear
    in the MIDX chain.

  - Second the pack ID returned from this function is now also in the
    concatenated order of packs among all layers of the MIDX chain in
    the same order that they appear in the MIDX chain.

To do this, introduce the first of two general purpose helpers, this one
being `midx_for_object()`. `midx_for_object()` takes a double pointer to
a `struct multi_pack_index` as well as an object `pos` in terms of the
entire MIDX chain[^1].

The function chases down the '->base_midx' field until it finds the MIDX
layer within the chain that contains the given object. It then:

  - modifies the double pointer to point to the containing MIDX, instead
    of the tip of the chain, and

  - returns the MIDX-local position[^2] at which the given object can be
    found.

Use this function within `nth_midxed_pack_int_id()` so that the `pos` it
expects is now relative to the entire MIDX chain, and that it returns
the appropriate pack position for that object.

[^1]: As a reminder, this means that the object is identified among the
  objects contained in all layers of the incremental MIDX chain, not any
  particular layer. For example, consider MIDX chain with two individual
  MIDXs, one with 4 objects and another with 3 objects. If the MIDX with
  4 objects appears earlier in the chain, then asking for pack "6" would
  return the second object in the MIDX with 3 objects.

[^2]: Building on the previous example, asking for object 6 in a MIDX
  chain with (4, 3) objects, respectively, this would set the double
  pointer to point at the MIDX containing three objects, and would
  return an index to the second object within that MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index bc4797196f..d5828b48fd 100644
--- a/midx.c
+++ b/midx.c
@@ -240,6 +240,23 @@ void close_midx(struct multi_pack_index *m)
 	free(m);
 }
 
+static uint32_t midx_for_object(struct multi_pack_index **_m, uint32_t pos)
+{
+	struct multi_pack_index *m = *_m;
+	while (m && pos < m->num_objects_in_base)
+		m = m->base_midx;
+
+	if (!m)
+		BUG("NULL multi-pack-index for object position: %"PRIu32, pos);
+
+	if (pos >= m->num_objects + m->num_objects_in_base)
+		die(_("invalid MIDX object position, MIDX is likely corrupt"));
+
+	*_m = m;
+
+	return pos - m->num_objects_in_base;
+}
+
 int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id)
 {
 	struct strbuf pack_name = STRBUF_INIT;
@@ -331,8 +348,10 @@ off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
 
 uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos)
 {
-	return get_be32(m->chunk_object_offsets +
-			(off_t)pos * MIDX_CHUNK_OFFSET_WIDTH);
+	pos = midx_for_object(&m, pos);
+
+	return m->num_packs_in_base + get_be32(m->chunk_object_offsets +
+					       (off_t)pos * MIDX_CHUNK_OFFSET_WIDTH);
 }
 
 int fill_midx_entry(struct repository *r,
-- 
2.45.2.437.gecb9450a0e


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 04/19] midx: teach `prepare_midx_pack()` about incremental MIDXs
  2024-06-06 23:04 [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
                   ` (2 preceding siblings ...)
  2024-06-06 23:04 ` [PATCH 03/19] midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs Taylor Blau
@ 2024-06-06 23:04 ` Taylor Blau
  2024-06-06 23:04 ` [PATCH 05/19] midx: teach `nth_midxed_object_oid()` " Taylor Blau
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-06-06 23:04 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The function `prepare_midx_pack()` is part of the midx.h API and
loads the pack identified by the MIDX-local 'pack_int_id'. This patch
prepares that function to be aware of an incremental MIDX world.

To do this, introduce the second of the two general purpose helpers
mentioned in the previous commit. This commit introduces
`midx_for_pack()`, which is the pack-specific analog of
`midx_for_object()`, and works in the same fashion.

Like `midx_for_object()`, this function chases down the '->base_midx'
field until it finds the MIDX layer within the chain that contains the
given pack.

Use this function within `prepare_midx_pack()` so that the `pack_int_id`
it expects is now relative to the entire MIDX chain, and that it
prepares the given pack in the appropriate MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 33 +++++++++++++++++++++++++--------
 1 file changed, 25 insertions(+), 8 deletions(-)

diff --git a/midx.c b/midx.c
index d5828b48fd..7fa3a1a7f8 100644
--- a/midx.c
+++ b/midx.c
@@ -257,20 +257,37 @@ static uint32_t midx_for_object(struct multi_pack_index **_m, uint32_t pos)
 	return pos - m->num_objects_in_base;
 }
 
-int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id)
+static uint32_t midx_for_pack(struct multi_pack_index **_m,
+			      uint32_t pack_int_id)
 {
-	struct strbuf pack_name = STRBUF_INIT;
-	struct packed_git *p;
+	struct multi_pack_index *m = *_m;
+	while (m && pack_int_id < m->num_packs_in_base)
+		m = m->base_midx;
 
-	if (pack_int_id >= m->num_packs)
+	if (!m)
+		BUG("NULL multi-pack-index for pack ID: %"PRIu32, pack_int_id);
+
+	if (pack_int_id >= m->num_packs + m->num_packs_in_base)
 		die(_("bad pack-int-id: %u (%u total packs)"),
-		    pack_int_id, m->num_packs);
+		    pack_int_id, m->num_packs + m->num_packs_in_base);
 
-	if (m->packs[pack_int_id])
+	*_m = m;
+
+	return pack_int_id - m->num_packs_in_base;
+}
+
+int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
+		      uint32_t pack_int_id)
+{
+	struct strbuf pack_name = STRBUF_INIT;
+	struct packed_git *p;
+	uint32_t local_pack_int_id = midx_for_pack(&m, pack_int_id);
+
+	if (m->packs[local_pack_int_id])
 		return 0;
 
 	strbuf_addf(&pack_name, "%s/pack/%s", m->object_dir,
-		    m->pack_names[pack_int_id]);
+		    m->pack_names[local_pack_int_id]);
 
 	p = add_packed_git(pack_name.buf, pack_name.len, m->local);
 	strbuf_release(&pack_name);
@@ -279,7 +296,7 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t
 		return 1;
 
 	p->multi_pack_index = 1;
-	m->packs[pack_int_id] = p;
+	m->packs[local_pack_int_id] = p;
 	install_packed_git(r, p);
 	list_add_tail(&p->mru, &r->objects->packed_git_mru);
 
-- 
2.45.2.437.gecb9450a0e


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 05/19] midx: teach `nth_midxed_object_oid()` about incremental MIDXs
  2024-06-06 23:04 [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
                   ` (3 preceding siblings ...)
  2024-06-06 23:04 ` [PATCH 04/19] midx: teach `prepare_midx_pack()` " Taylor Blau
@ 2024-06-06 23:04 ` Taylor Blau
  2024-06-06 23:04 ` [PATCH 06/19] midx: teach `nth_bitmapped_pack()` " Taylor Blau
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-06-06 23:04 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The function `nth_midxed_object_oid()` returns the object ID for a given
object position in the MIDX lexicographic order.

Teach this function to instead operate over the concatenated
lexicographic order defined in an earlier step so that it is able to be
used with incremental MIDXs.

To do this, we need to both (a) adjust the bounds check for the given
'n', as well as record the MIDX-local position after chasing the
`->base_midx` pointer to find the MIDX which contains that object.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 7fa3a1a7f8..cfab7f8113 100644
--- a/midx.c
+++ b/midx.c
@@ -335,9 +335,11 @@ struct object_id *nth_midxed_object_oid(struct object_id *oid,
 					struct multi_pack_index *m,
 					uint32_t n)
 {
-	if (n >= m->num_objects)
+	if (n >= m->num_objects + m->num_objects_in_base)
 		return NULL;
 
+	n = midx_for_object(&m, n);
+
 	oidread(oid, m->chunk_oid_lookup + st_mult(m->hash_len, n));
 	return oid;
 }
-- 
2.45.2.437.gecb9450a0e


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 06/19] midx: teach `nth_bitmapped_pack()` about incremental MIDXs
  2024-06-06 23:04 [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
                   ` (4 preceding siblings ...)
  2024-06-06 23:04 ` [PATCH 05/19] midx: teach `nth_midxed_object_oid()` " Taylor Blau
@ 2024-06-06 23:04 ` Taylor Blau
  2024-06-06 23:04 ` [PATCH 07/19] midx: introduce `bsearch_one_midx()` Taylor Blau
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-06-06 23:04 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

In a similar fashion as in previous commits, teach the function
`nth_bitmapped_pack()` about incremental MIDXs by translating the given
`pack_int_id` from the concatenated lexical order to a MIDX-local
lexical position.

When accessing the containing MIDX's array of packs, use the local pack
ID. Likewise, when reading the 'BTMP' chunk, use the MIDX-local offset
when accessing the data within that chunk.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/midx.c b/midx.c
index cfab7f8113..cdc754af97 100644
--- a/midx.c
+++ b/midx.c
@@ -308,17 +308,19 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
 int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
 		       struct bitmapped_pack *bp, uint32_t pack_int_id)
 {
+	uint32_t local_pack_int_id = midx_for_pack(&m, pack_int_id);
+
 	if (!m->chunk_bitmapped_packs)
 		return error(_("MIDX does not contain the BTMP chunk"));
 
 	if (prepare_midx_pack(r, m, pack_int_id))
 		return error(_("could not load bitmapped pack %"PRIu32), pack_int_id);
 
-	bp->p = m->packs[pack_int_id];
+	bp->p = m->packs[local_pack_int_id];
 	bp->bitmap_pos = get_be32((char *)m->chunk_bitmapped_packs +
-				  MIDX_CHUNK_BITMAPPED_PACKS_WIDTH * pack_int_id);
+				  MIDX_CHUNK_BITMAPPED_PACKS_WIDTH * local_pack_int_id);
 	bp->bitmap_nr = get_be32((char *)m->chunk_bitmapped_packs +
-				 MIDX_CHUNK_BITMAPPED_PACKS_WIDTH * pack_int_id +
+				 MIDX_CHUNK_BITMAPPED_PACKS_WIDTH * local_pack_int_id +
 				 sizeof(uint32_t));
 	bp->pack_int_id = pack_int_id;
 
-- 
2.45.2.437.gecb9450a0e


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 07/19] midx: introduce `bsearch_one_midx()`
  2024-06-06 23:04 [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
                   ` (5 preceding siblings ...)
  2024-06-06 23:04 ` [PATCH 06/19] midx: teach `nth_bitmapped_pack()` " Taylor Blau
@ 2024-06-06 23:04 ` Taylor Blau
  2024-06-06 23:04 ` [PATCH 08/19] midx: teach `bsearch_midx()` about incremental MIDXs Taylor Blau
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-06-06 23:04 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The `bsearch_midx()` function will be extended in a following commit to
search for the location of a given object ID across all MIDXs in a chain
(or the single non-chain MIDX if no chain is available).

While most callers will naturally want to use the updated
`bsearch_midx()` function, there are a handful of special cases that
will want finer control and will only want to search through a single
MIDX.

For instance, the object abbreviation code, which cares about object IDs
near to where we'd expect to find a match in a MIDX. In that case, we
want to look at the nearby matches in each layer of the MIDX chain, not
just a single one).

Split the more fine-grained control out into a separate function called
`bsearch_one_midx()` which searches only a single MIDX.

At present both `bsearch_midx()` and `bsearch_one_midx()` have identical
behavior, but the following commit will rewrite the former to be aware
of incremental MIDXs for the remaining non-special case callers.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c        | 17 +++++++--
 midx.h        |  5 ++-
 object-name.c | 99 +++++++++++++++++++++++++++------------------------
 3 files changed, 71 insertions(+), 50 deletions(-)

diff --git a/midx.c b/midx.c
index cdc754af97..1b4a9d5d00 100644
--- a/midx.c
+++ b/midx.c
@@ -327,10 +327,21 @@ int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
 	return 0;
 }
 
-int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result)
+int bsearch_one_midx(const struct object_id *oid, struct multi_pack_index *m,
+		     uint32_t *result)
 {
-	return bsearch_hash(oid->hash, m->chunk_oid_fanout, m->chunk_oid_lookup,
-			    the_hash_algo->rawsz, result);
+	int ret = bsearch_hash(oid->hash, m->chunk_oid_fanout,
+			       m->chunk_oid_lookup, the_hash_algo->rawsz,
+			       result);
+	if (result)
+		*result += m->num_objects_in_base;
+	return ret;
+}
+
+int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m,
+		 uint32_t *result)
+{
+		return bsearch_one_midx(oid, m, result);
 }
 
 struct object_id *nth_midxed_object_oid(struct object_id *oid,
diff --git a/midx.h b/midx.h
index 020e49f77c..46c53d69ff 100644
--- a/midx.h
+++ b/midx.h
@@ -90,7 +90,10 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
 int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
 		       struct bitmapped_pack *bp, uint32_t pack_int_id);
-int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result);
+int bsearch_one_midx(const struct object_id *oid, struct multi_pack_index *m,
+		     uint32_t *result);
+int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m,
+		 uint32_t *result);
 off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos);
 uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos);
 struct object_id *nth_midxed_object_oid(struct object_id *oid,
diff --git a/object-name.c b/object-name.c
index 523af6f64f..3307d5200c 100644
--- a/object-name.c
+++ b/object-name.c
@@ -132,28 +132,32 @@ static int match_hash(unsigned len, const unsigned char *a, const unsigned char
 static void unique_in_midx(struct multi_pack_index *m,
 			   struct disambiguate_state *ds)
 {
-	uint32_t num, i, first = 0;
-	const struct object_id *current = NULL;
-	int len = ds->len > ds->repo->hash_algo->hexsz ?
-		ds->repo->hash_algo->hexsz : ds->len;
-	num = m->num_objects;
+	for (; m; m = m->base_midx) {
+		uint32_t num, i, first = 0;
+		const struct object_id *current = NULL;
+		int len = ds->len > ds->repo->hash_algo->hexsz ?
+			ds->repo->hash_algo->hexsz : ds->len;
 
-	if (!num)
-		return;
+		num = m->num_objects + m->num_objects_in_base;
 
-	bsearch_midx(&ds->bin_pfx, m, &first);
+		if (!num)
+			continue;
 
-	/*
-	 * At this point, "first" is the location of the lowest object
-	 * with an object name that could match "bin_pfx".  See if we have
-	 * 0, 1 or more objects that actually match(es).
-	 */
-	for (i = first; i < num && !ds->ambiguous; i++) {
-		struct object_id oid;
-		current = nth_midxed_object_oid(&oid, m, i);
-		if (!match_hash(len, ds->bin_pfx.hash, current->hash))
-			break;
-		update_candidates(ds, current);
+		bsearch_one_midx(&ds->bin_pfx, m, &first);
+
+		/*
+		 * At this point, "first" is the location of the lowest
+		 * object with an object name that could match
+		 * "bin_pfx".  See if we have 0, 1 or more objects that
+		 * actually match(es).
+		 */
+		for (i = first; i < num && !ds->ambiguous; i++) {
+			struct object_id oid;
+			current = nth_midxed_object_oid(&oid, m, i);
+			if (!match_hash(len, ds->bin_pfx.hash, current->hash))
+				break;
+			update_candidates(ds, current);
+		}
 	}
 }
 
@@ -706,37 +710,40 @@ static int repo_extend_abbrev_len(struct repository *r UNUSED,
 static void find_abbrev_len_for_midx(struct multi_pack_index *m,
 				     struct min_abbrev_data *mad)
 {
-	int match = 0;
-	uint32_t num, first = 0;
-	struct object_id oid;
-	const struct object_id *mad_oid;
+	for (; m; m = m->base_midx) {
+		int match = 0;
+		uint32_t num, first = 0;
+		struct object_id oid;
+		const struct object_id *mad_oid;
 
-	if (!m->num_objects)
-		return;
+		if (!m->num_objects)
+			continue;
 
-	num = m->num_objects;
-	mad_oid = mad->oid;
-	match = bsearch_midx(mad_oid, m, &first);
+		num = m->num_objects + m->num_objects_in_base;
+		mad_oid = mad->oid;
+		match = bsearch_one_midx(mad_oid, m, &first);
 
-	/*
-	 * first is now the position in the packfile where we would insert
-	 * mad->hash if it does not exist (or the position of mad->hash if
-	 * it does exist). Hence, we consider a maximum of two objects
-	 * nearby for the abbreviation length.
-	 */
-	mad->init_len = 0;
-	if (!match) {
-		if (nth_midxed_object_oid(&oid, m, first))
-			extend_abbrev_len(&oid, mad);
-	} else if (first < num - 1) {
-		if (nth_midxed_object_oid(&oid, m, first + 1))
-			extend_abbrev_len(&oid, mad);
+		/*
+		 * first is now the position in the packfile where we
+		 * would insert mad->hash if it does not exist (or the
+		 * position of mad->hash if it does exist). Hence, we
+		 * consider a maximum of two objects nearby for the
+		 * abbreviation length.
+		 */
+		mad->init_len = 0;
+		if (!match) {
+			if (nth_midxed_object_oid(&oid, m, first))
+				extend_abbrev_len(&oid, mad);
+		} else if (first < num - 1) {
+			if (nth_midxed_object_oid(&oid, m, first + 1))
+				extend_abbrev_len(&oid, mad);
+		}
+		if (first > 0) {
+			if (nth_midxed_object_oid(&oid, m, first - 1))
+				extend_abbrev_len(&oid, mad);
+		}
+		mad->init_len = mad->cur_len;
 	}
-	if (first > 0) {
-		if (nth_midxed_object_oid(&oid, m, first - 1))
-			extend_abbrev_len(&oid, mad);
-	}
-	mad->init_len = mad->cur_len;
 }
 
 static void find_abbrev_len_for_pack(struct packed_git *p,
-- 
2.45.2.437.gecb9450a0e


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 08/19] midx: teach `bsearch_midx()` about incremental MIDXs
  2024-06-06 23:04 [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
                   ` (6 preceding siblings ...)
  2024-06-06 23:04 ` [PATCH 07/19] midx: introduce `bsearch_one_midx()` Taylor Blau
@ 2024-06-06 23:04 ` Taylor Blau
  2024-06-06 23:04 ` [PATCH 09/19] midx: teach `nth_midxed_offset()` " Taylor Blau
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-06-06 23:04 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Now that the special cases callers of `bsearch_midx()` have been dealt
with, teach `bsearch_midx()` to handle incremental MIDX chains.

The incremental MIDX-aware version of `bsearch_midx()` works by
repeatedly searching for a given OID in each layer along the
`->base_midx` pointer, stopping either when an exact match is found, or
the end of the chain is reached.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 1b4a9d5d00..7c4f58f7f1 100644
--- a/midx.c
+++ b/midx.c
@@ -341,7 +341,10 @@ int bsearch_one_midx(const struct object_id *oid, struct multi_pack_index *m,
 int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m,
 		 uint32_t *result)
 {
-		return bsearch_one_midx(oid, m, result);
+	for (; m; m = m->base_midx)
+		if (bsearch_one_midx(oid, m, result))
+			return 1;
+	return 0;
 }
 
 struct object_id *nth_midxed_object_oid(struct object_id *oid,
-- 
2.45.2.437.gecb9450a0e


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 09/19] midx: teach `nth_midxed_offset()` about incremental MIDXs
  2024-06-06 23:04 [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
                   ` (7 preceding siblings ...)
  2024-06-06 23:04 ` [PATCH 08/19] midx: teach `bsearch_midx()` about incremental MIDXs Taylor Blau
@ 2024-06-06 23:04 ` Taylor Blau
  2024-06-06 23:04 ` [PATCH 10/19] midx: teach `fill_midx_entry()` " Taylor Blau
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-06-06 23:04 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

In a similar fashion as in previous commits, teach the function
`nth_midxed_offset()` about incremental MIDXs.

The given object `pos` is used to find the containing MIDX, and
translated back into a MIDX-local position by assigning the return value
of `midx_for_object()` to it.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/midx.c b/midx.c
index 7c4f58f7f1..d351dbb7e0 100644
--- a/midx.c
+++ b/midx.c
@@ -365,6 +365,8 @@ off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
 	const unsigned char *offset_data;
 	uint32_t offset32;
 
+	pos = midx_for_object(&m, pos);
+
 	offset_data = m->chunk_object_offsets + (off_t)pos * MIDX_CHUNK_OFFSET_WIDTH;
 	offset32 = get_be32(offset_data + sizeof(uint32_t));
 
-- 
2.45.2.437.gecb9450a0e


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 10/19] midx: teach `fill_midx_entry()` about incremental MIDXs
  2024-06-06 23:04 [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
                   ` (8 preceding siblings ...)
  2024-06-06 23:04 ` [PATCH 09/19] midx: teach `nth_midxed_offset()` " Taylor Blau
@ 2024-06-06 23:04 ` Taylor Blau
  2024-06-06 23:04 ` [PATCH 11/19] midx: remove unused `midx_locate_pack()` Taylor Blau
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-06-06 23:04 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

In a similar fashion as previous commits, teach the `fill_midx_entry()`
function to work in a incremental MIDX-aware fashion.

This function, unlike others which accept an index into either the
lexical order of objects or packs, takes in an object_id, and attempts
to fill a caller-provided 'struct pack_entry' with the remaining pieces
of information about that object from the MIDX.

The function uses `bsearch_midx()` which fills out the frame-local 'pos'
variable, recording the given object_id's lexical position within the
MIDX chain, if found (if no matching object ID was found, we'll return
immediately without filling out the `pack_entry` structure).

Once given that position, we jump back through the `->base_midx` pointer
to ensure that our `m` points at the MIDX layer which contains the given
object_id (and not an ancestor or descendant of it in the chain). Note
that we can drop the bounds check "if (pos >= m->num_objects)" because
`midx_for_object()` performs this check for us.

After that point, we only need to make two special considerations within
this function:

  - First, the pack_int_id returned to us by `nth_midxed_pack_int_id()`
    is a position in the concatenated lexical order of packs, so we must
    ensure that we subtract `m->num_packs_in_base` before accessing the
    MIDX-local `packs` array.

  - Second, we must avoid translating the `pos` back to a MIDX-local
    index, since we use it as an argument to `nth_midxed_offset()` which
    expects a position relative to the concatenated lexical order of
    objects.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/midx.c b/midx.c
index d351dbb7e0..c3802354e3 100644
--- a/midx.c
+++ b/midx.c
@@ -403,14 +403,12 @@ int fill_midx_entry(struct repository *r,
 	if (!bsearch_midx(oid, m, &pos))
 		return 0;
 
-	if (pos >= m->num_objects)
-		return 0;
-
+	midx_for_object(&m, pos);
 	pack_int_id = nth_midxed_pack_int_id(m, pos);
 
 	if (prepare_midx_pack(r, m, pack_int_id))
 		return 0;
-	p = m->packs[pack_int_id];
+	p = m->packs[pack_int_id - m->num_packs_in_base];
 
 	/*
 	* We are about to tell the caller where they can locate the
-- 
2.45.2.437.gecb9450a0e


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 11/19] midx: remove unused `midx_locate_pack()`
  2024-06-06 23:04 [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
                   ` (9 preceding siblings ...)
  2024-06-06 23:04 ` [PATCH 10/19] midx: teach `fill_midx_entry()` " Taylor Blau
@ 2024-06-06 23:04 ` Taylor Blau
  2024-06-06 23:05 ` [PATCH 12/19] midx: teach `midx_contains_pack()` about incremental MIDXs Taylor Blau
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-06-06 23:04 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Commit 307d75bbe6 (midx: implement `midx_locate_pack()`, 2023-12-14)
introduced `midx_locate_pack()`, which was described at the time as a
complement to the function `midx_contains_pack()` which allowed
callers to determine where in the MIDX lexical order a pack appeared, as
opposed to whether or not it was simply contained.

307d75bbe6 suggests that future patches would be added which would
introduce callers for this new function, but none ever were, meaning the
function has gone unused since its introduction.

Clean this up by in effect reverting 307d75bbe6, which removes the
unused functions and inlines its definition back into
`midx_contains_pack()`.

(Looking back through the list archives when 307d75bbe6 was written,
this was in preparation for this[1] patch from back when we had the
concept of "disjoint" packs while developing multi-pack verbatim reuse.
That concept was abandoned before the series was merged, but I never
dropped what would become 307d75bbe6 from the series, leading to the
state prior to this commit).

[1]: https://lore.kernel.org/git/3019738b52ba8cd78ea696a3b800fa91e722eb66.1701198172.git.me@ttaylorr.com/

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 13 ++-----------
 midx.h |  2 --
 2 files changed, 2 insertions(+), 13 deletions(-)

diff --git a/midx.c b/midx.c
index c3802354e3..186d8344dc 100644
--- a/midx.c
+++ b/midx.c
@@ -462,8 +462,7 @@ int cmp_idx_or_pack_name(const char *idx_or_pack_name,
 	return strcmp(idx_or_pack_name, idx_name);
 }
 
-int midx_locate_pack(struct multi_pack_index *m, const char *idx_or_pack_name,
-		     uint32_t *pos)
+int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name)
 {
 	uint32_t first = 0, last = m->num_packs;
 
@@ -474,11 +473,8 @@ int midx_locate_pack(struct multi_pack_index *m, const char *idx_or_pack_name,
 
 		current = m->pack_names[mid];
 		cmp = cmp_idx_or_pack_name(idx_or_pack_name, current);
-		if (!cmp) {
-			if (pos)
-				*pos = mid;
+		if (!cmp)
 			return 1;
-		}
 		if (cmp > 0) {
 			first = mid + 1;
 			continue;
@@ -489,11 +485,6 @@ int midx_locate_pack(struct multi_pack_index *m, const char *idx_or_pack_name,
 	return 0;
 }
 
-int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name)
-{
-	return midx_locate_pack(m, idx_or_pack_name, NULL);
-}
-
 int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
 {
 	if (m->preferred_pack_idx == -1) {
diff --git a/midx.h b/midx.h
index 46c53d69ff..86af7dfc5e 100644
--- a/midx.h
+++ b/midx.h
@@ -102,8 +102,6 @@ struct object_id *nth_midxed_object_oid(struct object_id *oid,
 int fill_midx_entry(struct repository *r, const struct object_id *oid, struct pack_entry *e, struct multi_pack_index *m);
 int midx_contains_pack(struct multi_pack_index *m,
 		       const char *idx_or_pack_name);
-int midx_locate_pack(struct multi_pack_index *m, const char *idx_or_pack_name,
-		     uint32_t *pos);
 int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id);
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local);
 
-- 
2.45.2.437.gecb9450a0e


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 12/19] midx: teach `midx_contains_pack()` about incremental MIDXs
  2024-06-06 23:04 [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
                   ` (10 preceding siblings ...)
  2024-06-06 23:04 ` [PATCH 11/19] midx: remove unused `midx_locate_pack()` Taylor Blau
@ 2024-06-06 23:05 ` Taylor Blau
  2024-06-06 23:05 ` [PATCH 13/19] midx: teach `midx_preferred_pack()` " Taylor Blau
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-06-06 23:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Now that the `midx_contains_pack()` versus `midx_locate_pack()` debacle
has been cleaned up, teach the former about how to operate in an
incremental MIDX-aware world in a similar fashion as in previous
commits.

Instead of using either of the two `midx_for_object()` or
`midx_for_pack()` helpers, this function is split into two: one that
determines whether a pack is contained in a single MIDX, and another
which calls the former in a loop over all MIDXs.

This approach does not require that we change any of the implementation
in what is now `midx_contains_pack_1()` as it still operates over a
single MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 186d8344dc..564e922533 100644
--- a/midx.c
+++ b/midx.c
@@ -462,7 +462,8 @@ int cmp_idx_or_pack_name(const char *idx_or_pack_name,
 	return strcmp(idx_or_pack_name, idx_name);
 }
 
-int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name)
+static int midx_contains_pack_1(struct multi_pack_index *m,
+				const char *idx_or_pack_name)
 {
 	uint32_t first = 0, last = m->num_packs;
 
@@ -485,6 +486,14 @@ int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name)
 	return 0;
 }
 
+int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name)
+{
+	for (; m; m = m->base_midx)
+		if (midx_contains_pack_1(m, idx_or_pack_name))
+			return 1;
+	return 0;
+}
+
 int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
 {
 	if (m->preferred_pack_idx == -1) {
-- 
2.45.2.437.gecb9450a0e


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 13/19] midx: teach `midx_preferred_pack()` about incremental MIDXs
  2024-06-06 23:04 [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
                   ` (11 preceding siblings ...)
  2024-06-06 23:05 ` [PATCH 12/19] midx: teach `midx_contains_pack()` about incremental MIDXs Taylor Blau
@ 2024-06-06 23:05 ` Taylor Blau
  2024-06-06 23:05 ` [PATCH 14/19] midx: teach `midx_fanout_add_midx_fanout()` " Taylor Blau
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-06-06 23:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The function `midx_preferred_pack()` is used to determine the identity
of the preferred pack, which is the identity of a unique pack within
the MIDX which is used as a tie-breaker when selecting from which pack
to represent an object that appears in multiple packs within the MIDX.

Historically we have said that the MIDX's preferred pack has the unique
property that all objects from that pack are represented in the MIDX.
But that isn't quite true: a more precise statement would be that all
objects from that pack *which appear in the MIDX* are selected from that
pack.

This helps us extend the concept of preferred packs across a MIDX chain,
where some object(s) in the preferred pack may appear in other packs
in an earlier MIDX layer, in which case those object(s) will not appear
in a subsequent MIDX layer from either the preferred pack or any other
pack.

Extend the concept of preferred packs by using the pack which represents
the object at the first position in MIDX pseudo-pack order belonging to
the current MIDX layer (i.e., at position 'm->num_objects_in_base').

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index 564e922533..cb7b623b5d 100644
--- a/midx.c
+++ b/midx.c
@@ -497,13 +497,16 @@ int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name)
 int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
 {
 	if (m->preferred_pack_idx == -1) {
+		uint32_t midx_pos;
 		if (load_midx_revindex(m) < 0) {
 			m->preferred_pack_idx = -2;
 			return -1;
 		}
 
-		m->preferred_pack_idx =
-			nth_midxed_pack_int_id(m, pack_pos_to_midx(m, 0));
+		midx_pos = pack_pos_to_midx(m, m->num_objects_in_base);
+
+		m->preferred_pack_idx = nth_midxed_pack_int_id(m, midx_pos);
+
 	} else if (m->preferred_pack_idx == -2)
 		return -1; /* no revindex */
 
-- 
2.45.2.437.gecb9450a0e


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 14/19] midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs
  2024-06-06 23:04 [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
                   ` (12 preceding siblings ...)
  2024-06-06 23:05 ` [PATCH 13/19] midx: teach `midx_preferred_pack()` " Taylor Blau
@ 2024-06-06 23:05 ` Taylor Blau
  2024-06-06 23:05 ` [PATCH 15/19] midx: support reading incremental MIDX chains Taylor Blau
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-06-06 23:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The function `midx_fanout_add_midx_fanout()` is used to help construct
the fanout table when generating a MIDX by reusing data from an existing
MIDX.

Prepare this function to work with incremental MIDXs by making a few
changes:

  - The bounds checks need to be adjusted to start object lookups taking
    into account the number of objects in the previous MIDX layer (i.e.,
    by starting the lookups at position `m->num_objects_in_base` instead
    of position 0).

  - Likewise, the bounds checks need to end at `m->num_objects_in_base`
    objects after `m->num_objects`.

  - Finally, `midx_fanout_add_midx_fanout()` needs to recur on earlier
    MIDX layers when dealing with an incremental MIDX chain by calling
    itself when given a MIDX with a non-NULL `base_midx`.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 55a6b63bac..b148ee443a 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -180,7 +180,7 @@ static int nth_midxed_pack_midx_entry(struct multi_pack_index *m,
 				      struct pack_midx_entry *e,
 				      uint32_t pos)
 {
-	if (pos >= m->num_objects)
+	if (pos >= m->num_objects + m->num_objects_in_base)
 		return 1;
 
 	nth_midxed_object_oid(&e->oid, m, pos);
@@ -231,12 +231,16 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
 					uint32_t cur_fanout,
 					int preferred_pack)
 {
-	uint32_t start = 0, end;
+	uint32_t start = m->num_objects_in_base, end;
 	uint32_t cur_object;
 
+	if (m->base_midx)
+		midx_fanout_add_midx_fanout(fanout, m->base_midx, cur_fanout,
+					    preferred_pack);
+
 	if (cur_fanout)
-		start = ntohl(m->chunk_oid_fanout[cur_fanout - 1]);
-	end = ntohl(m->chunk_oid_fanout[cur_fanout]);
+		start += ntohl(m->chunk_oid_fanout[cur_fanout - 1]);
+	end = m->num_objects_in_base + ntohl(m->chunk_oid_fanout[cur_fanout]);
 
 	for (cur_object = start; cur_object < end; cur_object++) {
 		if ((preferred_pack > -1) &&
-- 
2.45.2.437.gecb9450a0e


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 15/19] midx: support reading incremental MIDX chains
  2024-06-06 23:04 [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
                   ` (13 preceding siblings ...)
  2024-06-06 23:05 ` [PATCH 14/19] midx: teach `midx_fanout_add_midx_fanout()` " Taylor Blau
@ 2024-06-06 23:05 ` Taylor Blau
  2024-06-06 23:05 ` [PATCH 16/19] midx: implement verification support for incremental MIDXs Taylor Blau
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-06-06 23:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Now that the MIDX machinery's internals have been taught to understand
incremental MIDXs over the previous handful of commits, the MIDX
machinery itself can begin reading incremental MIDXs.

(Note that while the on-disk format for incremental MIDXs has been
defined, the writing end has not been implemented. This will take place
in the commit after next.)

The core of this change involves following the order specified in the
MIDX chain and opening up MIDXs in the chain one-by-one, adding them to
the previous layer's `->base_midx` pointer at each step.

In order to implement this, the `load_multi_pack_index()` function is
taught to call a new `load_multi_pack_index_chain()` function if loading
a non-incremental MIDX failed via `load_multi_pack_index_one()`.

When loading a MIDX chain, `load_midx_chain_fd_st()` reads each line in
the file one-by-one and dispatches calls to
`load_multi_pack_index_one()` to read each layer of the MIDX chain. When
a layer was successfully read, it is added to the MIDX chain by calling
`add_midx_to_chain()` which validates the contents of the `BASE` chunk,
performs some bounds checks on the number of combined packs and objects,
and attaches the new MIDX by assigning its `base_midx` pointer to the
existing part of the chain.

As a supplement to this, introduce a new mode in the test-read-midx
test-tool which allows us to read the information for a specific MIDX in
the chain by specifying its trailing checksum via the command-line
arguments like so:

    $ test-tool read-midx .git/objects [checksum]

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c                    | 184 +++++++++++++++++++++++++++++++++++---
 midx.h                    |   7 ++
 packfile.c                |   5 +-
 t/helper/test-read-midx.c |  24 +++--
 4 files changed, 201 insertions(+), 19 deletions(-)

diff --git a/midx.c b/midx.c
index cb7b623b5d..ac44fcefc2 100644
--- a/midx.c
+++ b/midx.c
@@ -89,7 +89,9 @@ static int midx_read_object_offsets(const unsigned char *chunk_start,
 
 #define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + the_hash_algo->rawsz)
 
-struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local)
+static struct multi_pack_index *load_multi_pack_index_one(const char *object_dir,
+							  const char *midx_name,
+							  int local)
 {
 	struct multi_pack_index *m = NULL;
 	int fd;
@@ -97,31 +99,26 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 	size_t midx_size;
 	void *midx_map = NULL;
 	uint32_t hash_version;
-	struct strbuf midx_name = STRBUF_INIT;
 	uint32_t i;
 	const char *cur_pack_name;
 	struct chunkfile *cf = NULL;
 
-	get_midx_filename(&midx_name, object_dir);
-
-	fd = git_open(midx_name.buf);
+	fd = git_open(midx_name);
 
 	if (fd < 0)
 		goto cleanup_fail;
 	if (fstat(fd, &st)) {
-		error_errno(_("failed to read %s"), midx_name.buf);
+		error_errno(_("failed to read %s"), midx_name);
 		goto cleanup_fail;
 	}
 
 	midx_size = xsize_t(st.st_size);
 
 	if (midx_size < MIDX_MIN_SIZE) {
-		error(_("multi-pack-index file %s is too small"), midx_name.buf);
+		error(_("multi-pack-index file %s is too small"), midx_name);
 		goto cleanup_fail;
 	}
 
-	strbuf_release(&midx_name);
-
 	midx_map = xmmap(NULL, midx_size, PROT_READ, MAP_PRIVATE, fd, 0);
 	close(fd);
 
@@ -211,7 +208,6 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 
 cleanup_fail:
 	free(m);
-	strbuf_release(&midx_name);
 	free_chunkfile(cf);
 	if (midx_map)
 		munmap(midx_map, midx_size);
@@ -220,6 +216,173 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 	return NULL;
 }
 
+void get_midx_chain_dirname(struct strbuf *buf, const char *object_dir)
+{
+	strbuf_addf(buf, "%s/pack/multi-pack-index.d", object_dir);
+}
+
+void get_midx_chain_filename(struct strbuf *buf, const char *object_dir)
+{
+	get_midx_chain_dirname(buf, object_dir);
+	strbuf_addstr(buf, "/multi-pack-index-chain");
+}
+
+void get_split_midx_filename_ext(struct strbuf *buf, const char *object_dir,
+				 const unsigned char *hash, const char *ext)
+{
+	get_midx_chain_dirname(buf, object_dir);
+	strbuf_addf(buf, "/multi-pack-index-%s.%s", hash_to_hex(hash), ext);
+}
+
+static int open_multi_pack_index_chain(const char *chain_file,
+				       int *fd, struct stat *st)
+{
+	*fd = git_open(chain_file);
+	if (*fd < 0)
+		return 0;
+	if (fstat(*fd, st)) {
+		close(*fd);
+		return 0;
+	}
+	if (st->st_size < the_hash_algo->hexsz) {
+		close(*fd);
+		if (!st->st_size) {
+			/* treat empty files the same as missing */
+			errno = ENOENT;
+		} else {
+			warning(_("multi-pack-index chain file too small"));
+			errno = EINVAL;
+		}
+		return 0;
+	}
+	return 1;
+}
+
+static int add_midx_to_chain(struct multi_pack_index *midx,
+			     struct multi_pack_index *midx_chain,
+			     struct object_id *oids,
+			     int n)
+{
+	if (midx_chain) {
+		if (unsigned_add_overflows(midx_chain->num_packs,
+					   midx_chain->num_packs_in_base)) {
+			warning(_("pack count in base MIDX too high: %"PRIuMAX),
+				(uintmax_t)midx_chain->num_packs_in_base);
+			return 0;
+		}
+		if (unsigned_add_overflows(midx_chain->num_objects,
+					   midx_chain->num_objects_in_base)) {
+			warning(_("object count in base MIDX too high: %"PRIuMAX),
+				(uintmax_t)midx_chain->num_objects_in_base);
+			return 0;
+		}
+		midx->num_packs_in_base = midx_chain->num_packs +
+			midx_chain->num_packs_in_base;
+		midx->num_objects_in_base = midx_chain->num_objects +
+			midx_chain->num_objects_in_base;
+	}
+
+	midx->base_midx = midx_chain;
+	midx->has_chain = 1;
+
+	return 1;
+}
+
+static struct multi_pack_index *load_midx_chain_fd_st(const char *object_dir,
+						      int local,
+						      int fd, struct stat *st,
+						      int *incomplete_chain)
+{
+	struct multi_pack_index *midx_chain = NULL;
+	struct strbuf buf = STRBUF_INIT;
+	struct object_id *layers = NULL;
+	int valid = 1;
+	uint32_t i, count;
+	FILE *fp = xfdopen(fd, "r");
+
+	count = st->st_size / (the_hash_algo->hexsz + 1);
+	CALLOC_ARRAY(layers, count);
+
+	for (i = 0; i < count; i++) {
+		struct multi_pack_index *m;
+
+		if (strbuf_getline_lf(&buf, fp) == EOF)
+			break;
+
+		if (get_oid_hex(buf.buf, &layers[i])) {
+			warning(_("invalid multi-pack-index chain: line '%s' "
+				  "not a hash"),
+				buf.buf);
+			valid = 0;
+			break;
+		}
+
+		valid = 0;
+
+		strbuf_reset(&buf);
+		get_split_midx_filename_ext(&buf, object_dir, layers[i].hash,
+					    MIDX_EXT_MIDX);
+		m = load_multi_pack_index_one(object_dir, buf.buf, local);
+
+		if (m) {
+			if (add_midx_to_chain(m, midx_chain, layers, i)) {
+				midx_chain = m;
+				valid = 1;
+			} else {
+				close_midx(m);
+			}
+		}
+		if (!valid) {
+			warning(_("unable to find all multi-pack index files"));
+			break;
+		}
+	}
+
+	free(layers);
+	fclose(fp);
+	strbuf_release(&buf);
+
+	*incomplete_chain = !valid;
+	return midx_chain;
+}
+
+static struct multi_pack_index *load_multi_pack_index_chain(const char *object_dir,
+							    int local)
+{
+	struct strbuf chain_file = STRBUF_INIT;
+	struct stat st;
+	int fd;
+	struct multi_pack_index *m = NULL;
+
+	get_midx_chain_filename(&chain_file, object_dir);
+	if (open_multi_pack_index_chain(chain_file.buf, &fd, &st)) {
+		int incomplete;
+		/* ownership of fd is taken over by load function */
+		m = load_midx_chain_fd_st(object_dir, local, fd, &st,
+					  &incomplete);
+	}
+
+	strbuf_release(&chain_file);
+	return m;
+}
+
+struct multi_pack_index *load_multi_pack_index(const char *object_dir,
+					       int local)
+{
+	struct strbuf midx_name = STRBUF_INIT;
+	struct multi_pack_index *m;
+
+	get_midx_filename(&midx_name, object_dir);
+
+	m = load_multi_pack_index_one(object_dir, midx_name.buf, local);
+	if (!m)
+		m = load_multi_pack_index_chain(object_dir, local);
+
+	strbuf_release(&midx_name);
+
+	return m;
+}
+
 void close_midx(struct multi_pack_index *m)
 {
 	uint32_t i;
@@ -228,6 +391,7 @@ void close_midx(struct multi_pack_index *m)
 		return;
 
 	close_midx(m->next);
+	close_midx(m->base_midx);
 
 	munmap((unsigned char *)m->data, m->data_len);
 
diff --git a/midx.h b/midx.h
index 86af7dfc5e..94de16a8c4 100644
--- a/midx.h
+++ b/midx.h
@@ -24,6 +24,7 @@ struct bitmapped_pack;
 #define MIDX_CHUNKID_OBJECTOFFSETS 0x4f4f4646 /* "OOFF" */
 #define MIDX_CHUNKID_LARGEOFFSETS 0x4c4f4646 /* "LOFF" */
 #define MIDX_CHUNKID_REVINDEX 0x52494458 /* "RIDX" */
+#define MIDX_CHUNKID_BASE 0x42415345 /* "BASE" */
 #define MIDX_CHUNK_OFFSET_WIDTH (2 * sizeof(uint32_t))
 #define MIDX_LARGE_OFFSET_NEEDED 0x80000000
 
@@ -50,6 +51,7 @@ struct multi_pack_index {
 	int preferred_pack_idx;
 
 	int local;
+	int has_chain;
 
 	const unsigned char *chunk_pack_names;
 	size_t chunk_pack_names_len;
@@ -80,11 +82,16 @@ struct multi_pack_index {
 
 #define MIDX_EXT_REV "rev"
 #define MIDX_EXT_BITMAP "bitmap"
+#define MIDX_EXT_MIDX "midx"
 
 const unsigned char *get_midx_checksum(struct multi_pack_index *m);
 void get_midx_filename(struct strbuf *out, const char *object_dir);
 void get_midx_filename_ext(struct strbuf *out, const char *object_dir,
 			   const unsigned char *hash, const char *ext);
+void get_midx_chain_dirname(struct strbuf *buf, const char *object_dir);
+void get_midx_chain_filename(struct strbuf *buf, const char *object_dir);
+void get_split_midx_filename_ext(struct strbuf *buf, const char *object_dir,
+				 const unsigned char *hash, const char *ext);
 
 struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
 int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
diff --git a/packfile.c b/packfile.c
index d4df7fdeea..85f0345435 100644
--- a/packfile.c
+++ b/packfile.c
@@ -878,7 +878,8 @@ static void prepare_pack(const char *full_name, size_t full_name_len,
 	if (!report_garbage)
 		return;
 
-	if (!strcmp(file_name, "multi-pack-index"))
+	if (!strcmp(file_name, "multi-pack-index") ||
+	    !strcmp(file_name, "multi-pack-index.d"))
 		return;
 	if (starts_with(file_name, "multi-pack-index") &&
 	    (ends_with(file_name, ".bitmap") || ends_with(file_name, ".rev")))
@@ -1062,7 +1063,7 @@ struct packed_git *get_all_packs(struct repository *r)
 	prepare_packed_git(r);
 	for (m = r->objects->multi_pack_index; m; m = m->next) {
 		uint32_t i;
-		for (i = 0; i < m->num_packs; i++)
+		for (i = 0; i < m->num_packs + m->num_packs_in_base; i++)
 			prepare_midx_pack(r, m, i);
 	}
 
diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index 4acae41bb9..f9148328e3 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -7,8 +7,10 @@
 #include "packfile.h"
 #include "setup.h"
 #include "gettext.h"
+#include "pack-revindex.h"
 
-static int read_midx_file(const char *object_dir, int show_objects)
+static int read_midx_file(const char *object_dir, const char *checksum,
+			  int show_objects)
 {
 	uint32_t i;
 	struct multi_pack_index *m;
@@ -19,6 +21,13 @@ static int read_midx_file(const char *object_dir, int show_objects)
 	if (!m)
 		return 1;
 
+	if (checksum) {
+		while (m && strcmp(hash_to_hex(get_midx_checksum(m)), checksum))
+			m = m->base_midx;
+		if (!m)
+			return 1;
+	}
+
 	printf("header: %08x %d %d %d %d\n",
 	       m->signature,
 	       m->version,
@@ -52,7 +61,8 @@ static int read_midx_file(const char *object_dir, int show_objects)
 		struct pack_entry e;
 
 		for (i = 0; i < m->num_objects; i++) {
-			nth_midxed_object_oid(&oid, m, i);
+			nth_midxed_object_oid(&oid, m,
+					      i + m->num_objects_in_base);
 			fill_midx_entry(the_repository, &oid, &e, m);
 
 			printf("%s %"PRIu64"\t%s\n",
@@ -109,7 +119,7 @@ static int read_midx_bitmapped_packs(const char *object_dir)
 	if (!midx)
 		return 1;
 
-	for (i = 0; i < midx->num_packs; i++) {
+	for (i = 0; i < midx->num_packs + midx->num_packs_in_base; i++) {
 		if (nth_bitmapped_pack(the_repository, midx, &pack, i) < 0)
 			return 1;
 
@@ -125,16 +135,16 @@ static int read_midx_bitmapped_packs(const char *object_dir)
 
 int cmd__read_midx(int argc, const char **argv)
 {
-	if (!(argc == 2 || argc == 3))
-		usage("read-midx [--show-objects|--checksum|--preferred-pack|--bitmap] <object-dir>");
+	if (!(argc == 2 || argc == 3 || argc == 4))
+		usage("read-midx [--show-objects|--checksum|--preferred-pack|--bitmap] <object-dir> <checksum>");
 
 	if (!strcmp(argv[1], "--show-objects"))
-		return read_midx_file(argv[2], 1);
+		return read_midx_file(argv[2], argv[3], 1);
 	else if (!strcmp(argv[1], "--checksum"))
 		return read_midx_checksum(argv[2]);
 	else if (!strcmp(argv[1], "--preferred-pack"))
 		return read_midx_preferred_pack(argv[2]);
 	else if (!strcmp(argv[1], "--bitmap"))
 		return read_midx_bitmapped_packs(argv[2]);
-	return read_midx_file(argv[1], 0);
+	return read_midx_file(argv[1], argv[2], 0);
 }
-- 
2.45.2.437.gecb9450a0e


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 16/19] midx: implement verification support for incremental MIDXs
  2024-06-06 23:04 [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
                   ` (14 preceding siblings ...)
  2024-06-06 23:05 ` [PATCH 15/19] midx: support reading incremental MIDX chains Taylor Blau
@ 2024-06-06 23:05 ` Taylor Blau
  2024-06-06 23:05 ` [PATCH 17/19] t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-06-06 23:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Teach the verification implementation used by `git multi-pack-index
verify` to perform verification for incremental MIDX chains by
independently validating each layer within the chain.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 47 ++++++++++++++++++++++++++++++-----------------
 midx.h |  2 ++
 2 files changed, 32 insertions(+), 17 deletions(-)

diff --git a/midx.c b/midx.c
index ac44fcefc2..ae3e30a062 100644
--- a/midx.c
+++ b/midx.c
@@ -467,6 +467,13 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
 	return 0;
 }
 
+struct packed_git *nth_midxed_pack(struct multi_pack_index *m,
+				   uint32_t pack_int_id)
+{
+	uint32_t local_pack_int_id = midx_for_pack(&m, pack_int_id);
+	return m->packs[local_pack_int_id];
+}
+
 #define MIDX_CHUNK_BITMAPPED_PACKS_WIDTH (2 * sizeof(uint32_t))
 
 int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
@@ -814,6 +821,7 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
 	uint32_t i;
 	struct progress *progress = NULL;
 	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
+	struct multi_pack_index *curr;
 	verify_midx_error = 0;
 
 	if (!m) {
@@ -836,8 +844,8 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
 
 	if (flags & MIDX_PROGRESS)
 		progress = start_delayed_progress(_("Looking for referenced packfiles"),
-					  m->num_packs);
-	for (i = 0; i < m->num_packs; i++) {
+						  m->num_packs + m->num_packs_in_base);
+	for (i = 0; i < m->num_packs + m->num_packs_in_base; i++) {
 		if (prepare_midx_pack(r, m, i))
 			midx_report("failed to load pack in position %d", i);
 
@@ -857,17 +865,20 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
 	if (flags & MIDX_PROGRESS)
 		progress = start_sparse_progress(_("Verifying OID order in multi-pack-index"),
 						 m->num_objects - 1);
-	for (i = 0; i < m->num_objects - 1; i++) {
-		struct object_id oid1, oid2;
 
-		nth_midxed_object_oid(&oid1, m, i);
-		nth_midxed_object_oid(&oid2, m, i + 1);
+	for (curr = m; curr; curr = curr->base_midx) {
+		for (i = 0; i < m->num_objects - 1; i++) {
+			struct object_id oid1, oid2;
 
-		if (oidcmp(&oid1, &oid2) >= 0)
-			midx_report(_("oid lookup out of order: oid[%d] = %s >= %s = oid[%d]"),
-				    i, oid_to_hex(&oid1), oid_to_hex(&oid2), i + 1);
+			nth_midxed_object_oid(&oid1, m, m->num_objects_in_base + i);
+			nth_midxed_object_oid(&oid2, m, m->num_objects_in_base + i + 1);
 
-		midx_display_sparse_progress(progress, i + 1);
+			if (oidcmp(&oid1, &oid2) >= 0)
+				midx_report(_("oid lookup out of order: oid[%d] = %s >= %s = oid[%d]"),
+					    i, oid_to_hex(&oid1), oid_to_hex(&oid2), i + 1);
+
+			midx_display_sparse_progress(progress, i + 1);
+		}
 	}
 	stop_progress(&progress);
 
@@ -877,8 +888,8 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
 	 * each of the objects and only require 1 packfile to be open at a
 	 * time.
 	 */
-	ALLOC_ARRAY(pairs, m->num_objects);
-	for (i = 0; i < m->num_objects; i++) {
+	ALLOC_ARRAY(pairs, m->num_objects + m->num_objects_in_base);
+	for (i = 0; i < m->num_objects + m->num_objects_in_base; i++) {
 		pairs[i].pos = i;
 		pairs[i].pack_int_id = nth_midxed_pack_int_id(m, i);
 	}
@@ -892,16 +903,18 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
 
 	if (flags & MIDX_PROGRESS)
 		progress = start_sparse_progress(_("Verifying object offsets"), m->num_objects);
-	for (i = 0; i < m->num_objects; i++) {
+	for (i = 0; i < m->num_objects + m->num_objects_in_base; i++) {
 		struct object_id oid;
 		struct pack_entry e;
 		off_t m_offset, p_offset;
 
 		if (i > 0 && pairs[i-1].pack_int_id != pairs[i].pack_int_id &&
-		    m->packs[pairs[i-1].pack_int_id])
-		{
-			close_pack_fd(m->packs[pairs[i-1].pack_int_id]);
-			close_pack_index(m->packs[pairs[i-1].pack_int_id]);
+		    m->packs[pairs[i-1].pack_int_id]) {
+			uint32_t pack_int_id = pairs[i-1].pack_int_id;
+			struct packed_git *p = nth_midxed_pack(m, pack_int_id);
+
+			close_pack_fd(p);
+			close_pack_index(p);
 		}
 
 		nth_midxed_object_oid(&oid, m, pairs[i].pos);
diff --git a/midx.h b/midx.h
index 94de16a8c4..9d30935589 100644
--- a/midx.h
+++ b/midx.h
@@ -95,6 +95,8 @@ void get_split_midx_filename_ext(struct strbuf *buf, const char *object_dir,
 
 struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
 int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
+struct packed_git *nth_midxed_pack(struct multi_pack_index *m,
+				   uint32_t pack_int_id);
 int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
 		       struct bitmapped_pack *bp, uint32_t pack_int_id);
 int bsearch_one_midx(const struct object_id *oid, struct multi_pack_index *m,
-- 
2.45.2.437.gecb9450a0e


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 17/19] t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  2024-06-06 23:04 [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
                   ` (15 preceding siblings ...)
  2024-06-06 23:05 ` [PATCH 16/19] midx: implement verification support for incremental MIDXs Taylor Blau
@ 2024-06-06 23:05 ` Taylor Blau
  2024-06-06 23:05 ` [PATCH 18/19] t/t5313-pack-bounds-checks.sh: prepare for sub-directories Taylor Blau
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-06-06 23:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Two years ago, commit ff1e653c8e2 (midx: respect
'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP', 2021-08-31) introduced a new
environment variable which caused the test suite to write MIDX bitmaps
after any 'git repack' invocation.

At the time, this was done to help flush out any bugs with MIDX bitmaps
that weren't explicitly covered in the t5326-multi-pack-bitmap.sh
script.

Two years later, that flag has served us well and is no longer providing
meaningful coverage, as the script in t5326 has matured substantially
and covers many more interesting cases than it did back when ff1e653c8e2
was originally written.

Remove the 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' environment variable
as it is no longer serving a useful purpose. More importantly, removing
this variable clears the way for us to introduce a new one to help
similarly flush out bugs related to incremental MIDX chains.

Because these incremental MIDX chains are (for now) incompatible with
MIDX bitmaps, we cannot have both.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/repack.c                  | 12 ++----------
 ci/run-build-and-tests.sh         |  1 -
 midx.h                            |  2 --
 t/README                          |  4 ----
 t/t0410-partial-clone.sh          |  2 --
 t/t5310-pack-bitmaps.sh           |  4 ----
 t/t5319-multi-pack-index.sh       |  3 +--
 t/t5326-multi-pack-bitmaps.sh     |  3 +--
 t/t5327-multi-pack-bitmaps-rev.sh |  5 ++---
 t/t7700-repack.sh                 | 21 +++++++--------------
 10 files changed, 13 insertions(+), 44 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 58ad82dd97..e2fec16389 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -1217,10 +1217,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		if (!write_midx &&
 		    (!(pack_everything & ALL_INTO_ONE) || !is_bare_repository()))
 			write_bitmaps = 0;
-	} else if (write_bitmaps &&
-		   git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0) &&
-		   git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP, 0)) {
-		write_bitmaps = 0;
 	}
 	if (pack_kept_objects < 0)
 		pack_kept_objects = write_bitmaps > 0 && !write_midx;
@@ -1518,12 +1514,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	if (run_update_server_info)
 		update_server_info(0);
 
-	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0)) {
-		unsigned flags = 0;
-		if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP, 0))
-			flags |= MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX;
-		write_midx_file(get_object_directory(), NULL, NULL, flags);
-	}
+	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
+		write_midx_file(get_object_directory(), NULL, NULL, 0);
 
 cleanup:
 	string_list_clear(&names, 1);
diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index 98dda42045..e6fd68630c 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -25,7 +25,6 @@ linux-TEST-vars)
 	export GIT_TEST_COMMIT_GRAPH=1
 	export GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=1
 	export GIT_TEST_MULTI_PACK_INDEX=1
-	export GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=1
 	export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master
 	export GIT_TEST_NO_WRITE_REV_INDEX=1
 	export GIT_TEST_CHECKOUT_WORKERS=2
diff --git a/midx.h b/midx.h
index 9d30935589..3714cad2cc 100644
--- a/midx.h
+++ b/midx.h
@@ -29,8 +29,6 @@ struct bitmapped_pack;
 #define MIDX_LARGE_OFFSET_NEEDED 0x80000000
 
 #define GIT_TEST_MULTI_PACK_INDEX "GIT_TEST_MULTI_PACK_INDEX"
-#define GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP \
-	"GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP"
 
 struct multi_pack_index {
 	struct multi_pack_index *next;
diff --git a/t/README b/t/README
index d9e0e07506..e8a11926e4 100644
--- a/t/README
+++ b/t/README
@@ -469,10 +469,6 @@ GIT_TEST_MULTI_PACK_INDEX=<boolean>, when true, forces the multi-pack-
 index to be written after every 'git repack' command, and overrides the
 'core.multiPackIndex' setting to true.
 
-GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=<boolean>, when true, sets the
-'--bitmap' option on all invocations of 'git multi-pack-index write',
-and ignores pack-objects' '--write-bitmap-index'.
-
 GIT_TEST_SIDEBAND_ALL=<boolean>, when true, overrides the
 'uploadpack.allowSidebandAll' setting to true, and when false, forces
 fetch-pack to not request sideband-all (even if the server advertises
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 7797391c03..f6c58d80dd 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -4,8 +4,6 @@ test_description='partial clone'
 
 . ./test-lib.sh
 
-# missing promisor objects cause repacks which write bitmaps to fail
-GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
 # When enabled, some commands will write commit-graphs. This causes fsck
 # to fail when delete_object() is called because fsck will attempt to
 # verify the out-of-sync commit graph.
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index d7fd71360e..a6de7c5764 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -5,10 +5,6 @@ test_description='exercise basic bitmap functionality'
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-bitmap.sh
 
-# t5310 deals only with single-pack bitmaps, so don't write MIDX bitmaps in
-# their place.
-GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
-
 # Likewise, allow individual tests to control whether or not they use
 # the boundary-based traversal.
 sane_unset GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 10d2a6bf92..6e9ee23398 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -600,8 +600,7 @@ test_expect_success 'repack preserves multi-pack-index when creating packs' '
 compare_results_with_midx "after repack"
 
 test_expect_success 'multi-pack-index and pack-bitmap' '
-	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
-		git -c repack.writeBitmaps=true repack -ad &&
+	git -c repack.writeBitmaps=true repack -ad &&
 	git multi-pack-index write &&
 	git rev-list --test-bitmap HEAD
 '
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index cc7220b6c0..dff3b26849 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -4,10 +4,9 @@ test_description='exercise basic multi-pack bitmap functionality'
 . ./test-lib.sh
 . "${TEST_DIRECTORY}/lib-bitmap.sh"
 
-# We'll be writing our own midx and bitmaps, so avoid getting confused by the
+# We'll be writing our own MIDX, so avoid getting confused by the
 # automatic ones.
 GIT_TEST_MULTI_PACK_INDEX=0
-GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
 
 # This test exercise multi-pack bitmap functionality where the object order is
 # stored and read from a special chunk within the MIDX, so use the default
diff --git a/t/t5327-multi-pack-bitmaps-rev.sh b/t/t5327-multi-pack-bitmaps-rev.sh
index e65e311cd7..23db949c20 100755
--- a/t/t5327-multi-pack-bitmaps-rev.sh
+++ b/t/t5327-multi-pack-bitmaps-rev.sh
@@ -5,10 +5,9 @@ test_description='exercise basic multi-pack bitmap functionality (.rev files)'
 . ./test-lib.sh
 . "${TEST_DIRECTORY}/lib-bitmap.sh"
 
-# We'll be writing our own midx and bitmaps, so avoid getting confused by the
-# automatic ones.
+# We'll be writing our own MIDX, so avoid getting confused by the automatic
+# ones.
 GIT_TEST_MULTI_PACK_INDEX=0
-GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
 
 # Unlike t5326, this test exercise multi-pack bitmap functionality where the
 # object order is stored in a separate .rev file.
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 127efe99f8..8f34f05087 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -70,14 +70,13 @@ test_expect_success 'objects in packs marked .keep are not repacked' '
 
 test_expect_success 'writing bitmaps via command-line can duplicate .keep objects' '
 	# build on $oid, $packid, and .keep state from previous
-	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 git repack -Adbl &&
+	git repack -Adbl &&
 	test_has_duplicate_object true
 '
 
 test_expect_success 'writing bitmaps via config can duplicate .keep objects' '
 	# build on $oid, $packid, and .keep state from previous
-	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
-		git -c repack.writebitmaps=true repack -Adl &&
+	git -c repack.writebitmaps=true repack -Adl &&
 	test_has_duplicate_object true
 '
 
@@ -284,8 +283,7 @@ test_expect_success 'repacking fails when missing .pack actually means missing o
 test_expect_success 'bitmaps are created by default in bare repos' '
 	git clone --bare .git bare.git &&
 	rm -f bare.git/objects/pack/*.bitmap &&
-	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
-		git -C bare.git repack -ad &&
+	git -C bare.git repack -ad &&
 	bitmap=$(ls bare.git/objects/pack/*.bitmap) &&
 	test_path_is_file "$bitmap"
 '
@@ -296,8 +294,7 @@ test_expect_success 'incremental repack does not complain' '
 '
 
 test_expect_success 'bitmaps can be disabled on bare repos' '
-	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
-		git -c repack.writeBitmaps=false -C bare.git repack -ad &&
+	git -c repack.writeBitmaps=false -C bare.git repack -ad &&
 	bitmap=$(ls bare.git/objects/pack/*.bitmap || :) &&
 	test -z "$bitmap"
 '
@@ -308,8 +305,7 @@ test_expect_success 'no bitmaps created if .keep files present' '
 	keep=${pack%.pack}.keep &&
 	test_when_finished "rm -f \"\$keep\"" &&
 	>"$keep" &&
-	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
-		git -C bare.git repack -ad 2>stderr &&
+	git -C bare.git repack -ad 2>stderr &&
 	test_must_be_empty stderr &&
 	find bare.git/objects/pack/ -type f -name "*.bitmap" >actual &&
 	test_must_be_empty actual
@@ -320,8 +316,7 @@ test_expect_success 'auto-bitmaps do not complain if unavailable' '
 	blob=$(test-tool genrandom big $((1024*1024)) |
 	       git -C bare.git hash-object -w --stdin) &&
 	git -C bare.git update-ref refs/tags/big $blob &&
-	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
-		git -C bare.git repack -ad 2>stderr &&
+	git -C bare.git repack -ad 2>stderr &&
 	test_must_be_empty stderr &&
 	find bare.git/objects/pack -type f -name "*.bitmap" >actual &&
 	test_must_be_empty actual
@@ -342,9 +337,7 @@ test_expect_success 'repacking with a filter works' '
 '
 
 test_expect_success '--filter fails with --write-bitmap-index' '
-	test_must_fail \
-		env GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
-		git -C bare.git repack -a -d --write-bitmap-index --filter=blob:none
+	test_must_fail git -C bare.git repack -a -d --write-bitmap-index --filter=blob:none
 '
 
 test_expect_success 'repacking with two filters works' '
-- 
2.45.2.437.gecb9450a0e


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 18/19] t/t5313-pack-bounds-checks.sh: prepare for sub-directories
  2024-06-06 23:04 [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
                   ` (16 preceding siblings ...)
  2024-06-06 23:05 ` [PATCH 17/19] t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
@ 2024-06-06 23:05 ` Taylor Blau
  2024-06-06 23:05 ` [PATCH 19/19] midx: implement support for writing incremental MIDX chains Taylor Blau
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-06-06 23:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Prepare for sub-directories to appear in $GIT_DIR/objects/pack by
adjusting the copy, remove, and chmod invocations to perform their
behavior recursively.

This prepares us for the new $GIT_DIR/objects/pack/multi-pack-index.d
directory which will be added in a following commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5313-pack-bounds-checks.sh | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/t/t5313-pack-bounds-checks.sh b/t/t5313-pack-bounds-checks.sh
index ceaa6700a2..86fc73f9fb 100755
--- a/t/t5313-pack-bounds-checks.sh
+++ b/t/t5313-pack-bounds-checks.sh
@@ -7,11 +7,11 @@ TEST_PASSES_SANITIZE_LEAK=true
 
 clear_base () {
 	test_when_finished 'restore_base' &&
-	rm -f $base
+	rm -r -f $base
 }
 
 restore_base () {
-	cp base-backup/* .git/objects/pack/
+	cp -r base-backup/* .git/objects/pack/
 }
 
 do_pack () {
@@ -64,9 +64,9 @@ test_expect_success 'set up base packfile and variables' '
 	git commit -m base &&
 	git repack -ad &&
 	base=$(echo .git/objects/pack/*) &&
-	chmod +w $base &&
+	chmod -R +w $base &&
 	mkdir base-backup &&
-	cp $base base-backup/ &&
+	cp -r $base base-backup/ &&
 	object=$(git rev-parse HEAD:file)
 '
 
-- 
2.45.2.437.gecb9450a0e


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 19/19] midx: implement support for writing incremental MIDX chains
  2024-06-06 23:04 [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
                   ` (17 preceding siblings ...)
  2024-06-06 23:05 ` [PATCH 18/19] t/t5313-pack-bounds-checks.sh: prepare for sub-directories Taylor Blau
@ 2024-06-06 23:05 ` Taylor Blau
  2024-06-06 23:06 ` [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-06-06 23:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Now that the rest of the MIDX subsystem and relevant callers have been
updated to learn about how to read and process incremental MIDX chains,
let's finally update the implementation in `write_midx_internal()` to be
able to write incremental MIDX chains.

This new feature is available behind the `--incremental` option for the
`multi-pack-index` builtin, like so:

    $ git multi-pack-index write --incremental

The implementation for doing so is relatively straightforward, and boils
down to a handful of different kinds of changes implemented in this
patch:

  - The `compute_sorted_entries()` function is taught to reject objects
    which appear in any existing MIDX layer.

  - Functions like `write_midx_revindex()` are adjusted to write
    pack_order values which are offset by the number of objects in the
    base MIDX layer.

  - The end of `write_midx_internal()` is adjusted to move
    non-incremental MIDX files when necessary (i.e. when creating an
    incremental chain with an existing non-incremental MIDX in the
    repository).

There are a handful of other changes that are introduced, like new
functions to clear incremental MIDX files that are unrelated to the
current chain (using the same "keep_hash" mechanism as in the
non-incremental case).

The tests explicitly exercising the new incremental MIDX feature are
relatively limited for two reasons:

  1. Most of the "interesting" behavior is already thoroughly covered in
     t5319-multi-pack-index.sh, which handles the core logic of reading
     objects through a MIDX.

     The new tests in t5334-incremental-multi-pack-index.sh are mostly
     focused on creating and destroying incremental MIDXs, as well as
     stitching their results together across layers.

  2. A new GIT_TEST environment variable is added called
     "GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL", which modifies the
     entire test suite to write incremental MIDXs after repacking when
     combined with the "GIT_TEST_MULTI_PACK_INDEX" variable.

     This exercises the long tail of other interesting behavior that is
     defined implicitly throughout the rest of the CI suite. It is
     likewise added to the linux-TEST-vars job.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.txt  |  11 +-
 builtin/multi-pack-index.c              |   2 +
 builtin/repack.c                        |   8 +-
 ci/run-build-and-tests.sh               |   1 +
 midx-write.c                            | 281 ++++++++++++++++++++----
 midx.c                                  |  62 +++++-
 midx.h                                  |   4 +
 packfile.c                              |  16 +-
 packfile.h                              |   4 +
 t/README                                |   4 +
 t/lib-bitmap.sh                         |   6 +-
 t/lib-midx.sh                           |  28 +++
 t/t5319-multi-pack-index.sh             |  27 +--
 t/t5326-multi-pack-bitmaps.sh           |   1 +
 t/t5327-multi-pack-bitmaps-rev.sh       |   1 +
 t/t5332-multi-pack-reuse.sh             |   2 +
 t/t5334-incremental-multi-pack-index.sh |  46 ++++
 t/t7700-repack.sh                       |  27 +--
 18 files changed, 436 insertions(+), 95 deletions(-)
 create mode 100755 t/t5334-incremental-multi-pack-index.sh

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index 3696506eb3..631d5c7d15 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -64,6 +64,12 @@ The file given at `<path>` is expected to be readable, and can contain
 duplicates. (If a given OID is given more than once, it is marked as
 preferred if at least one instance of it begins with the special `+`
 marker).
+
+	--incremental::
+		Write an incremental MIDX file containing only objects
+		and packs not present in an existing MIDX layer.
+		Migrates non-incremental MIDXs to incremental ones when
+		necessary. Incompatible with `--bitmap`.
 --
 
 verify::
@@ -74,6 +80,8 @@ expire::
 	have no objects referenced by the MIDX (with the exception of
 	`.keep` packs and cruft packs). Rewrite the MIDX file afterward
 	to remove all references to these pack-files.
++
+NOTE: this mode is incompatible with incremental MIDX files.
 
 repack::
 	Create a new pack-file containing objects in small pack-files
@@ -95,7 +103,8 @@ repack::
 +
 If `repack.packKeptObjects` is `false`, then any pack-files with an
 associated `.keep` file will not be selected for the batch to repack.
-
++
+NOTE: this mode is incompatible with incremental MIDX files.
 
 EXAMPLES
 --------
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 8360932d2e..92b86153ba 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -129,6 +129,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
 			MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
 		OPT_BIT(0, "progress", &opts.flags,
 			N_("force progress reporting"), MIDX_PROGRESS),
+		OPT_BIT(0, "incremental", &opts.flags,
+			N_("write a new incremental MIDX"), MIDX_WRITE_INCREMENTAL),
 		OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
 			 N_("write multi-pack index containing only given indexes")),
 		OPT_FILENAME(0, "refs-snapshot", &opts.refs_snapshot,
diff --git a/builtin/repack.c b/builtin/repack.c
index e2fec16389..e1fab4d809 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -1514,8 +1514,12 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	if (run_update_server_info)
 		update_server_info(0);
 
-	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
-		write_midx_file(get_object_directory(), NULL, NULL, 0);
+	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0)) {
+		unsigned flags = 0;
+		if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL, 0))
+			flags |= MIDX_WRITE_INCREMENTAL;
+		write_midx_file(get_object_directory(), NULL, NULL, flags);
+	}
 
 cleanup:
 	string_list_clear(&names, 1);
diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index e6fd68630c..2e28d02b20 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -25,6 +25,7 @@ linux-TEST-vars)
 	export GIT_TEST_COMMIT_GRAPH=1
 	export GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=1
 	export GIT_TEST_MULTI_PACK_INDEX=1
+	export GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=1
 	export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master
 	export GIT_TEST_NO_WRITE_REV_INDEX=1
 	export GIT_TEST_CHECKOUT_WORKERS=2
diff --git a/midx-write.c b/midx-write.c
index b148ee443a..241557d03e 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -15,6 +15,8 @@
 #include "refs.h"
 #include "revision.h"
 #include "list-objects.h"
+#include "path.h"
+#include "pack-revindex.h"
 
 #define PACK_EXPIRED UINT_MAX
 #define BITMAP_POS_UNKNOWN (~((uint32_t)0))
@@ -23,7 +25,11 @@
 
 extern int midx_checksum_valid(struct multi_pack_index *m);
 extern void clear_midx_files_ext(const char *object_dir, const char *ext,
-				 unsigned char *keep_hash);
+				 const char *keep_hash);
+extern void clear_incremental_midx_files_ext(const char *object_dir,
+					     const char *ext,
+					     const char **keep_hashes,
+					     uint32_t hashes_nr);
 extern int cmp_idx_or_pack_name(const char *idx_or_pack_name,
 				const char *idx_name);
 
@@ -97,6 +103,9 @@ struct write_midx_context {
 
 	int preferred_pack_idx;
 
+	int incremental;
+	uint32_t num_multi_pack_indexes_before;
+
 	struct string_list *to_include;
 };
 
@@ -322,7 +331,7 @@ static void compute_sorted_entries(struct write_midx_context *ctx,
 	for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
 		fanout.nr = 0;
 
-		if (ctx->m)
+		if (ctx->m && !ctx->incremental)
 			midx_fanout_add_midx_fanout(&fanout, ctx->m, cur_fanout,
 						    ctx->preferred_pack_idx);
 
@@ -348,6 +357,9 @@ static void compute_sorted_entries(struct write_midx_context *ctx,
 			if (cur_object && oideq(&fanout.entries[cur_object - 1].oid,
 						&fanout.entries[cur_object].oid))
 				continue;
+			if (ctx->incremental && ctx->m &&
+			    midx_has_oid(ctx->m, &fanout.entries[cur_object].oid))
+				continue;
 
 			ALLOC_GROW(ctx->entries, st_add(ctx->entries_nr, 1),
 				   alloc_objects);
@@ -531,10 +543,15 @@ static int write_midx_revindex(struct hashfile *f,
 			       void *data)
 {
 	struct write_midx_context *ctx = data;
-	uint32_t i;
+	uint32_t i, nr_base;
+
+	if (ctx->m && ctx->incremental)
+		nr_base = ctx->m->num_objects + ctx->m->num_objects_in_base;
+	else
+		nr_base = 0;
 
 	for (i = 0; i < ctx->entries_nr; i++)
-		hashwrite_be32(f, ctx->pack_order[i]);
+		hashwrite_be32(f, ctx->pack_order[i] + nr_base);
 
 	return 0;
 }
@@ -563,12 +580,17 @@ static int midx_pack_order_cmp(const void *va, const void *vb)
 static uint32_t *midx_pack_order(struct write_midx_context *ctx)
 {
 	struct midx_pack_order_data *data;
-	uint32_t *pack_order;
+	uint32_t *pack_order, base_objects = 0;
 	uint32_t i;
 
 	trace2_region_enter("midx", "midx_pack_order", the_repository);
 
+	if (ctx->incremental && ctx->m)
+		base_objects = ctx->m->num_objects + ctx->m->num_objects_in_base;
+
+	ALLOC_ARRAY(pack_order, ctx->entries_nr);
 	ALLOC_ARRAY(data, ctx->entries_nr);
+
 	for (i = 0; i < ctx->entries_nr; i++) {
 		struct pack_midx_entry *e = &ctx->entries[i];
 		data[i].nr = i;
@@ -580,12 +602,11 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx)
 
 	QSORT(data, ctx->entries_nr, midx_pack_order_cmp);
 
-	ALLOC_ARRAY(pack_order, ctx->entries_nr);
 	for (i = 0; i < ctx->entries_nr; i++) {
 		struct pack_midx_entry *e = &ctx->entries[data[i].nr];
 		struct pack_info *pack = &ctx->info[ctx->pack_perm[e->pack_int_id]];
 		if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
-			pack->bitmap_pos = i;
+			pack->bitmap_pos = i + base_objects;
 		pack->bitmap_nr++;
 		pack_order[i] = data[i].nr;
 	}
@@ -633,7 +654,8 @@ static void prepare_midx_packing_data(struct packing_data *pdata,
 	prepare_packing_data(the_repository, pdata);
 
 	for (i = 0; i < ctx->entries_nr; i++) {
-		struct pack_midx_entry *from = &ctx->entries[ctx->pack_order[i]];
+		uint32_t pos = ctx->pack_order[i];
+		struct pack_midx_entry *from = &ctx->entries[pos];
 		struct object_entry *to = packlist_alloc(pdata, &from->oid);
 
 		oe_set_in_pack(pdata, to,
@@ -881,40 +903,133 @@ static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
 static int fill_packs_from_midx(struct write_midx_context *ctx,
 				const char *preferred_pack_name, uint32_t flags)
 {
-	uint32_t i;
+	struct multi_pack_index *m;
 
-	for (i = 0; i < ctx->m->num_packs; i++) {
-		if (!should_include_pack(ctx, ctx->m->pack_names[i], 0))
-			continue;
+	for (m = ctx->m; m; m = m->base_midx) {
+		uint32_t i;
 
-		ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
-
-		if (flags & MIDX_WRITE_REV_INDEX || preferred_pack_name) {
+		for (i = 0; i < m->num_packs; i++) {
 			/*
 			 * If generating a reverse index, need to have
 			 * packed_git's loaded to compare their
 			 * mtimes and object count.
 			 *
-			 *
 			 * If a preferred pack is specified, need to
 			 * have packed_git's loaded to ensure the chosen
 			 * preferred pack has a non-zero object count.
 			 */
-			if (prepare_midx_pack(the_repository, ctx->m, i))
-				return error(_("could not load pack"));
+			if (!should_include_pack(ctx, m->pack_names[i], 0))
+				continue;
 
-			if (open_pack_index(ctx->m->packs[i]))
-				die(_("could not open index for %s"),
-				    ctx->m->packs[i]->pack_name);
+			ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
+
+			if (flags & MIDX_WRITE_REV_INDEX ||
+			    preferred_pack_name) {
+				if (prepare_midx_pack(the_repository, m,
+						      m->num_packs_in_base + i)) {
+					error(_("could not load pack"));
+					return 1;
+				}
+
+				if (open_pack_index(m->packs[i]))
+					die(_("could not open index for %s"),
+					    m->packs[i]->pack_name);
+			}
+
+			fill_pack_info(&ctx->info[ctx->nr++], m->packs[i],
+				       m->pack_names[i],
+				       m->num_packs_in_base + i);
 		}
-
-		fill_pack_info(&ctx->info[ctx->nr++], ctx->m->packs[i],
-			       ctx->m->pack_names[i], i);
 	}
-
 	return 0;
 }
 
+static struct {
+	const char *non_split;
+	const char *split;
+} midx_exts[] = {
+	{NULL, MIDX_EXT_MIDX},
+	{MIDX_EXT_BITMAP, MIDX_EXT_BITMAP},
+	{MIDX_EXT_REV, MIDX_EXT_REV},
+};
+
+static int link_midx_to_chain(struct multi_pack_index *m)
+{
+	struct strbuf from = STRBUF_INIT;
+	struct strbuf to = STRBUF_INIT;
+	int ret = 0;
+	size_t i;
+
+	if (!m || m->has_chain) {
+		/*
+		 * Either no MIDX previously existed, or it was already
+		 * part of a MIDX chain. In both cases, we have nothing
+		 * to link, so return early.
+		 */
+		goto done;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(midx_exts); i++) {
+		const unsigned char *hash = get_midx_checksum(m);
+
+		get_midx_filename_ext(&from, m->object_dir, hash,
+				      midx_exts[i].non_split);
+		get_split_midx_filename_ext(&to, m->object_dir, hash,
+					    midx_exts[i].split);
+
+		if (link(from.buf, to.buf) < 0 && errno != ENOENT) {
+			ret = error_errno(_("unable to link '%s' to '%s'"),
+					  from.buf, to.buf);
+			goto done;
+		}
+
+		strbuf_reset(&from);
+		strbuf_reset(&to);
+	}
+
+done:
+	strbuf_release(&from);
+	strbuf_release(&to);
+	return ret;
+}
+
+static void clear_midx_files(const char *object_dir,
+			     const char **hashes,
+			     uint32_t hashes_nr,
+			     unsigned incremental)
+{
+	/*
+	 * if incremental:
+	 *   - remove all non-incremental MIDX files
+	 *   - remove any incremental MIDX files not in the current one
+	 *
+	 * if non-incremental:
+	 *   - remove all incremental MIDX files
+	 *   - remove any non-incremental MIDX files not matching the current
+	 *     hash
+	 */
+	struct strbuf buf = STRBUF_INIT;
+	const char *exts[] = { MIDX_EXT_BITMAP, MIDX_EXT_REV, MIDX_EXT_MIDX };
+	uint32_t i, j;
+
+	for (i = 0; i < ARRAY_SIZE(exts); i++) {
+		clear_incremental_midx_files_ext(object_dir, exts[i],
+						 hashes, hashes_nr);
+		for (j = 0; j < hashes_nr; j++)
+			clear_midx_files_ext(object_dir, exts[i], hashes[j]);
+	}
+
+	if (incremental)
+		get_midx_filename(&buf, object_dir);
+	else
+		get_midx_chain_filename(&buf, object_dir);
+
+	if (unlink(buf.buf) && errno != ENOENT)
+		die_errno(_("failed to clear multi-pack-index at %s"), buf.buf);
+
+	strbuf_release(&buf);
+}
+
 static int write_midx_internal(const char *object_dir,
 			       struct string_list *packs_to_include,
 			       struct string_list *packs_to_drop,
@@ -927,16 +1042,27 @@ static int write_midx_internal(const char *object_dir,
 	uint32_t i, start_pack;
 	struct hashfile *f = NULL;
 	struct lock_file lk;
+	struct tempfile *incr;
 	struct write_midx_context ctx = { 0 };
 	int bitmapped_packs_concat_len = 0;
 	int pack_name_concat_len = 0;
 	int dropped_packs = 0;
 	int result = 0;
+	const char **keep_hashes = NULL;
 	struct chunkfile *cf;
 
 	trace2_region_enter("midx", "write_midx_internal", the_repository);
 
-	get_midx_filename(&midx_name, object_dir);
+	ctx.incremental = !!(flags & MIDX_WRITE_INCREMENTAL);
+	if (ctx.incremental && (flags & MIDX_WRITE_BITMAP))
+		die(_("cannot write incremental MIDX with bitmap"));
+
+	if (ctx.incremental)
+		strbuf_addf(&midx_name,
+			    "%s/pack/multi-pack-index.d/tmp_midx_XXXXXX",
+			    object_dir);
+	else
+		get_midx_filename(&midx_name, object_dir);
 	if (safe_create_leading_directories(midx_name.buf))
 		die_errno(_("unable to create leading directories of %s"),
 			  midx_name.buf);
@@ -948,14 +1074,19 @@ static int write_midx_internal(const char *object_dir,
 	}
 
 	ctx.nr = 0;
-	ctx.alloc = ctx.m ? ctx.m->num_packs : 16;
+	ctx.alloc = ctx.m ? ctx.m->num_packs + ctx.m->num_packs_in_base : 16;
 	ctx.info = NULL;
 	ctx.to_include = packs_to_include;
 	ALLOC_ARRAY(ctx.info, ctx.alloc);
 
-	if (ctx.m && fill_packs_from_midx(&ctx, preferred_pack_name,
-					  flags) < 0) {
-		result = 1;
+	if (ctx.incremental) {
+		struct multi_pack_index *m = ctx.m;
+		while (m) {
+			ctx.num_multi_pack_indexes_before++;
+			m = m->base_midx;
+		}
+	} else if (ctx.m && fill_packs_from_midx(&ctx, preferred_pack_name,
+						 flags) < 0) {
 		goto cleanup;
 	}
 
@@ -970,7 +1101,8 @@ static int write_midx_internal(const char *object_dir,
 	for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
 	stop_progress(&ctx.progress);
 
-	if ((ctx.m && ctx.nr == ctx.m->num_packs) &&
+	if ((ctx.m && ctx.nr == ctx.m->num_packs + ctx.m->num_packs_in_base) &&
+	    !ctx.incremental &&
 	    !(packs_to_include || packs_to_drop)) {
 		struct bitmap_index *bitmap_git;
 		int bitmap_exists;
@@ -986,12 +1118,14 @@ static int write_midx_internal(const char *object_dir,
 			 * corresponding bitmap (or one wasn't requested).
 			 */
 			if (!want_bitmap)
-				clear_midx_files_ext(object_dir, ".bitmap",
-						     NULL);
+				clear_midx_files_ext(object_dir, "bitmap", NULL);
 			goto cleanup;
 		}
 	}
 
+	if (ctx.incremental && !ctx.nr)
+		goto cleanup; /* nothing to do */
+
 	if (preferred_pack_name) {
 		ctx.preferred_pack_idx = -1;
 
@@ -1137,8 +1271,30 @@ static int write_midx_internal(const char *object_dir,
 		pack_name_concat_len += MIDX_CHUNK_ALIGNMENT -
 					(pack_name_concat_len % MIDX_CHUNK_ALIGNMENT);
 
-	hold_lock_file_for_update(&lk, midx_name.buf, LOCK_DIE_ON_ERROR);
-	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
+	if (ctx.incremental) {
+		struct strbuf lock_name = STRBUF_INIT;
+
+		get_midx_chain_filename(&lock_name, object_dir);
+		hold_lock_file_for_update(&lk, lock_name.buf, LOCK_DIE_ON_ERROR);
+		strbuf_release(&lock_name);
+
+		incr = mks_tempfile_m(midx_name.buf, 0444);
+		if (!incr) {
+			error(_("unable to create temporary MIDX layer"));
+			return -1;
+		}
+
+		if (adjust_shared_perm(get_tempfile_path(incr))) {
+			error(_("unable to adjust shared permissions for '%s'"),
+			      get_tempfile_path(incr));
+			return -1;
+		}
+
+		f = hashfd(get_tempfile_fd(incr), get_tempfile_path(incr));
+	} else {
+		hold_lock_file_for_update(&lk, midx_name.buf, LOCK_DIE_ON_ERROR);
+		f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
+	}
 
 	if (ctx.nr - dropped_packs == 0) {
 		error(_("no pack files to index."));
@@ -1231,14 +1387,55 @@ static int write_midx_internal(const char *object_dir,
 	 * have been freed in the previous if block.
 	 */
 
+	CALLOC_ARRAY(keep_hashes, ctx.num_multi_pack_indexes_before + 1);
+
+	if (ctx.incremental) {
+		FILE *chainf = fdopen_lock_file(&lk, "w");
+		struct strbuf final_midx_name = STRBUF_INIT;
+		struct multi_pack_index *m = ctx.m;
+
+		if (!chainf) {
+			error_errno(_("unable to open multi-pack-index chain file"));
+			return -1;
+		}
+
+		if (link_midx_to_chain(ctx.m) < 0)
+			return -1;
+
+		get_split_midx_filename_ext(&final_midx_name, object_dir,
+					    midx_hash, MIDX_EXT_MIDX);
+
+		if (rename_tempfile(&incr, final_midx_name.buf) < 0) {
+			error_errno(_("unable to rename new multi-pack-index layer"));
+			return -1;
+		}
+
+		keep_hashes[ctx.num_multi_pack_indexes_before] =
+			xstrdup(hash_to_hex(midx_hash));
+
+		for (i = 0; i < ctx.num_multi_pack_indexes_before; i++) {
+			uint32_t j = ctx.num_multi_pack_indexes_before - i - 1;
+
+			keep_hashes[j] = xstrdup(hash_to_hex(get_midx_checksum(m)));
+			m = m->base_midx;
+		}
+
+		for (i = 0; i < ctx.num_multi_pack_indexes_before + 1; i++)
+			fprintf(get_lock_file_fp(&lk), "%s\n", keep_hashes[i]);
+	} else {
+		keep_hashes[ctx.num_multi_pack_indexes_before] =
+			xstrdup(hash_to_hex(midx_hash));
+	}
+
 	if (ctx.m)
 		close_object_store(the_repository->objects);
 
 	if (commit_lock_file(&lk) < 0)
 		die_errno(_("could not write multi-pack-index"));
 
-	clear_midx_files_ext(object_dir, ".bitmap", midx_hash);
-	clear_midx_files_ext(object_dir, ".rev", midx_hash);
+	clear_midx_files(object_dir, keep_hashes,
+			 ctx.num_multi_pack_indexes_before + 1,
+			 ctx.incremental);
 
 cleanup:
 	for (i = 0; i < ctx.nr; i++) {
@@ -1253,6 +1450,11 @@ static int write_midx_internal(const char *object_dir,
 	free(ctx.entries);
 	free(ctx.pack_perm);
 	free(ctx.pack_order);
+	if (keep_hashes) {
+		for (i = 0; i < ctx.num_multi_pack_indexes_before + 1; i++)
+			free((char *)keep_hashes[i]);
+		free(keep_hashes);
+	}
 	strbuf_release(&midx_name);
 
 	trace2_region_leave("midx", "write_midx_internal", the_repository);
@@ -1289,6 +1491,9 @@ int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla
 	if (!m)
 		return 0;
 
+	if (m->base_midx)
+		die(_("cannot expire packs from an incremental multi-pack-index"));
+
 	CALLOC_ARRAY(count, m->num_packs);
 
 	if (flags & MIDX_PROGRESS)
@@ -1463,6 +1668,8 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 
 	if (!m)
 		return 0;
+	if (m->base_midx)
+		die(_("cannot repack an incremental multi-pack-index"));
 
 	CALLOC_ARRAY(include_pack, m->num_packs);
 
diff --git a/midx.c b/midx.c
index ae3e30a062..5aa7e2a6e6 100644
--- a/midx.c
+++ b/midx.c
@@ -14,7 +14,10 @@
 
 int midx_checksum_valid(struct multi_pack_index *m);
 void clear_midx_files_ext(const char *object_dir, const char *ext,
-			  unsigned char *keep_hash);
+			  const char *keep_hash);
+void clear_incremental_midx_files_ext(const char *object_dir, const char *ext,
+				      char **keep_hashes,
+				      uint32_t hashes_nr);
 int cmp_idx_or_pack_name(const char *idx_or_pack_name,
 			 const char *idx_name);
 
@@ -518,6 +521,11 @@ int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m,
 	return 0;
 }
 
+int midx_has_oid(struct multi_pack_index *m, const struct object_id *oid)
+{
+	return bsearch_midx(oid, m, NULL);
+}
+
 struct object_id *nth_midxed_object_oid(struct object_id *oid,
 					struct multi_pack_index *m,
 					uint32_t n)
@@ -719,7 +727,8 @@ int midx_checksum_valid(struct multi_pack_index *m)
 }
 
 struct clear_midx_data {
-	char *keep;
+	char **keep;
+	uint32_t keep_nr;
 	const char *ext;
 };
 
@@ -727,32 +736,63 @@ static void clear_midx_file_ext(const char *full_path, size_t full_path_len UNUS
 				const char *file_name, void *_data)
 {
 	struct clear_midx_data *data = _data;
+	uint32_t i;
 
 	if (!(starts_with(file_name, "multi-pack-index-") &&
 	      ends_with(file_name, data->ext)))
 		return;
-	if (data->keep && !strcmp(data->keep, file_name))
-		return;
-
+	for (i = 0; i < data->keep_nr; i++) {
+		if (!strcmp(data->keep[i], file_name))
+			return;
+	}
 	if (unlink(full_path))
 		die_errno(_("failed to remove %s"), full_path);
 }
 
 void clear_midx_files_ext(const char *object_dir, const char *ext,
-			  unsigned char *keep_hash)
+			  const char *keep_hash)
 {
 	struct clear_midx_data data;
 	memset(&data, 0, sizeof(struct clear_midx_data));
 
-	if (keep_hash)
-		data.keep = xstrfmt("multi-pack-index-%s%s",
-				    hash_to_hex(keep_hash), ext);
+	if (keep_hash) {
+		ALLOC_ARRAY(data.keep, 1);
+
+		data.keep[0] = xstrfmt("multi-pack-index-%s.%s", keep_hash, ext);
+		data.keep_nr = 1;
+	}
 	data.ext = ext;
 
 	for_each_file_in_pack_dir(object_dir,
 				  clear_midx_file_ext,
 				  &data);
 
+	if (keep_hash)
+		free(data.keep[0]);
+	free(data.keep);
+}
+
+void clear_incremental_midx_files_ext(const char *object_dir, const char *ext,
+				      char **keep_hashes,
+				      uint32_t hashes_nr)
+{
+	struct clear_midx_data data;
+	uint32_t i;
+
+	memset(&data, 0, sizeof(struct clear_midx_data));
+
+	ALLOC_ARRAY(data.keep, hashes_nr);
+	for (i = 0; i < hashes_nr; i++)
+		data.keep[i] = xstrfmt("multi-pack-index-%s.%s", keep_hashes[i],
+				       ext);
+	data.keep_nr = hashes_nr;
+	data.ext = ext;
+
+	for_each_file_in_pack_subdir(object_dir, "multi-pack-index.d",
+				     clear_midx_file_ext, &data);
+
+	for (i = 0; i < hashes_nr; i++)
+		free(data.keep[i]);
 	free(data.keep);
 }
 
@@ -770,8 +810,8 @@ void clear_midx_file(struct repository *r)
 	if (remove_path(midx.buf))
 		die(_("failed to clear multi-pack-index at %s"), midx.buf);
 
-	clear_midx_files_ext(r->objects->odb->path, ".bitmap", NULL);
-	clear_midx_files_ext(r->objects->odb->path, ".rev", NULL);
+	clear_midx_files_ext(r->objects->odb->path, MIDX_EXT_BITMAP, NULL);
+	clear_midx_files_ext(r->objects->odb->path, MIDX_EXT_REV, NULL);
 
 	strbuf_release(&midx);
 }
diff --git a/midx.h b/midx.h
index 3714cad2cc..42d4f8d149 100644
--- a/midx.h
+++ b/midx.h
@@ -29,6 +29,8 @@ struct bitmapped_pack;
 #define MIDX_LARGE_OFFSET_NEEDED 0x80000000
 
 #define GIT_TEST_MULTI_PACK_INDEX "GIT_TEST_MULTI_PACK_INDEX"
+#define GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL \
+	"GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL"
 
 struct multi_pack_index {
 	struct multi_pack_index *next;
@@ -77,6 +79,7 @@ struct multi_pack_index {
 #define MIDX_WRITE_BITMAP (1 << 2)
 #define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3)
 #define MIDX_WRITE_BITMAP_LOOKUP_TABLE (1 << 4)
+#define MIDX_WRITE_INCREMENTAL (1 << 5)
 
 #define MIDX_EXT_REV "rev"
 #define MIDX_EXT_BITMAP "bitmap"
@@ -101,6 +104,7 @@ int bsearch_one_midx(const struct object_id *oid, struct multi_pack_index *m,
 		     uint32_t *result);
 int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m,
 		 uint32_t *result);
+int midx_has_oid(struct multi_pack_index *m, const struct object_id *oid);
 off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos);
 uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos);
 struct object_id *nth_midxed_object_oid(struct object_id *oid,
diff --git a/packfile.c b/packfile.c
index 85f0345435..2c335f4c4d 100644
--- a/packfile.c
+++ b/packfile.c
@@ -813,9 +813,10 @@ static void report_pack_garbage(struct string_list *list)
 	report_helper(list, seen_bits, first, list->nr);
 }
 
-void for_each_file_in_pack_dir(const char *objdir,
-			       each_file_in_pack_dir_fn fn,
-			       void *data)
+void for_each_file_in_pack_subdir(const char *objdir,
+				  const char *subdir,
+				  each_file_in_pack_dir_fn fn,
+				  void *data)
 {
 	struct strbuf path = STRBUF_INIT;
 	size_t dirnamelen;
@@ -824,6 +825,8 @@ void for_each_file_in_pack_dir(const char *objdir,
 
 	strbuf_addstr(&path, objdir);
 	strbuf_addstr(&path, "/pack");
+	if (subdir)
+		strbuf_addf(&path, "/%s", subdir);
 	dir = opendir(path.buf);
 	if (!dir) {
 		if (errno != ENOENT)
@@ -845,6 +848,13 @@ void for_each_file_in_pack_dir(const char *objdir,
 	strbuf_release(&path);
 }
 
+void for_each_file_in_pack_dir(const char *objdir,
+			       each_file_in_pack_dir_fn fn,
+			       void *data)
+{
+	for_each_file_in_pack_subdir(objdir, NULL, fn, data);
+}
+
 struct prepare_pack_data {
 	struct repository *r;
 	struct string_list *garbage;
diff --git a/packfile.h b/packfile.h
index 28c8fd3e39..07ba2c0be0 100644
--- a/packfile.h
+++ b/packfile.h
@@ -55,6 +55,10 @@ struct packed_git *parse_pack_index(unsigned char *sha1, const char *idx_path);
 
 typedef void each_file_in_pack_dir_fn(const char *full_path, size_t full_path_len,
 				      const char *file_name, void *data);
+void for_each_file_in_pack_subdir(const char *objdir,
+				  const char *subdir,
+				  each_file_in_pack_dir_fn fn,
+				  void *data);
 void for_each_file_in_pack_dir(const char *objdir,
 			       each_file_in_pack_dir_fn fn,
 			       void *data);
diff --git a/t/README b/t/README
index e8a11926e4..e93a29de1b 100644
--- a/t/README
+++ b/t/README
@@ -469,6 +469,10 @@ GIT_TEST_MULTI_PACK_INDEX=<boolean>, when true, forces the multi-pack-
 index to be written after every 'git repack' command, and overrides the
 'core.multiPackIndex' setting to true.
 
+GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=<boolean>, when true, sets
+the '--incremental' option on all invocations of 'git multi-pack-index
+write'.
+
 GIT_TEST_SIDEBAND_ALL=<boolean>, when true, overrides the
 'uploadpack.allowSidebandAll' setting to true, and when false, forces
 fetch-pack to not request sideband-all (even if the server advertises
diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
index f595937094..62aa6744a6 100644
--- a/t/lib-bitmap.sh
+++ b/t/lib-bitmap.sh
@@ -1,6 +1,8 @@
 # Helpers for scripts testing bitmap functionality; see t5310 for
 # example usage.
 
+. "$TEST_DIRECTORY"/lib-midx.sh
+
 objdir=.git/objects
 midx=$objdir/pack/multi-pack-index
 
@@ -264,10 +266,6 @@ have_delta () {
 	test_cmp expect actual
 }
 
-midx_checksum () {
-	test-tool read-midx --checksum "$1"
-}
-
 # midx_pack_source <obj>
 midx_pack_source () {
 	test-tool read-midx --show-objects .git/objects | grep "^$1 " | cut -f2
diff --git a/t/lib-midx.sh b/t/lib-midx.sh
index 1261994744..e38c609604 100644
--- a/t/lib-midx.sh
+++ b/t/lib-midx.sh
@@ -6,3 +6,31 @@ test_midx_consistent () {
 	test_cmp expect actual &&
 	git multi-pack-index --object-dir=$1 verify
 }
+
+midx_checksum () {
+	test-tool read-midx --checksum "$1"
+}
+
+midx_git_two_modes () {
+	git -c core.multiPackIndex=false $1 >expect &&
+	git -c core.multiPackIndex=true $1 >actual &&
+	if [ "$2" = "sorted" ]
+	then
+		sort <expect >expect.sorted &&
+		mv expect.sorted expect &&
+		sort <actual >actual.sorted &&
+		mv actual.sorted actual
+	fi &&
+	test_cmp expect actual
+}
+
+compare_results_with_midx () {
+	MSG=$1
+	test_expect_success "check normal git operations: $MSG" '
+		midx_git_two_modes "rev-list --objects --all" &&
+		midx_git_two_modes "log --raw" &&
+		midx_git_two_modes "count-objects --verbose" &&
+		midx_git_two_modes "cat-file --batch-all-objects --batch-check" &&
+		midx_git_two_modes "cat-file --batch-all-objects --batch-check --unordered" sorted
+	'
+}
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 6e9ee23398..4b0b5a5c9f 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -3,8 +3,11 @@
 test_description='multi-pack-indexes'
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-chunk.sh
+. "$TEST_DIRECTORY"/lib-midx.sh
 
 GIT_TEST_MULTI_PACK_INDEX=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=0
 objdir=.git/objects
 
 HASH_LEN=$(test_oid rawsz)
@@ -107,30 +110,6 @@ test_expect_success 'write midx with one v1 pack' '
 	midx_read_expect 1 18 4 $objdir
 '
 
-midx_git_two_modes () {
-	git -c core.multiPackIndex=false $1 >expect &&
-	git -c core.multiPackIndex=true $1 >actual &&
-	if [ "$2" = "sorted" ]
-	then
-		sort <expect >expect.sorted &&
-		mv expect.sorted expect &&
-		sort <actual >actual.sorted &&
-		mv actual.sorted actual
-	fi &&
-	test_cmp expect actual
-}
-
-compare_results_with_midx () {
-	MSG=$1
-	test_expect_success "check normal git operations: $MSG" '
-		midx_git_two_modes "rev-list --objects --all" &&
-		midx_git_two_modes "log --raw" &&
-		midx_git_two_modes "count-objects --verbose" &&
-		midx_git_two_modes "cat-file --batch-all-objects --batch-check" &&
-		midx_git_two_modes "cat-file --batch-all-objects --batch-check --unordered" sorted
-	'
-}
-
 test_expect_success 'write midx with one v2 pack' '
 	git pack-objects --index-version=2,0x40 $objdir/pack/test <obj-list &&
 	git multi-pack-index --object-dir=$objdir write &&
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index dff3b26849..5836187170 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -7,6 +7,7 @@ test_description='exercise basic multi-pack bitmap functionality'
 # We'll be writing our own MIDX, so avoid getting confused by the
 # automatic ones.
 GIT_TEST_MULTI_PACK_INDEX=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=0
 
 # This test exercise multi-pack bitmap functionality where the object order is
 # stored and read from a special chunk within the MIDX, so use the default
diff --git a/t/t5327-multi-pack-bitmaps-rev.sh b/t/t5327-multi-pack-bitmaps-rev.sh
index 23db949c20..9cac03a94b 100755
--- a/t/t5327-multi-pack-bitmaps-rev.sh
+++ b/t/t5327-multi-pack-bitmaps-rev.sh
@@ -8,6 +8,7 @@ test_description='exercise basic multi-pack bitmap functionality (.rev files)'
 # We'll be writing our own MIDX, so avoid getting confused by the automatic
 # ones.
 GIT_TEST_MULTI_PACK_INDEX=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=0
 
 # Unlike t5326, this test exercise multi-pack bitmap functionality where the
 # object order is stored in a separate .rev file.
diff --git a/t/t5332-multi-pack-reuse.sh b/t/t5332-multi-pack-reuse.sh
index 3c20738bce..517617d59d 100755
--- a/t/t5332-multi-pack-reuse.sh
+++ b/t/t5332-multi-pack-reuse.sh
@@ -6,6 +6,8 @@ TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-bitmap.sh
 
+GIT_TEST_MULTI_PACK_INDEX=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=0
 objdir=.git/objects
 packdir=$objdir/pack
 
diff --git a/t/t5334-incremental-multi-pack-index.sh b/t/t5334-incremental-multi-pack-index.sh
new file mode 100755
index 0000000000..c3b08acc73
--- /dev/null
+++ b/t/t5334-incremental-multi-pack-index.sh
@@ -0,0 +1,46 @@
+#!/bin/sh
+
+test_description='incremental multi-pack-index'
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-midx.sh
+
+GIT_TEST_MULTI_PACK_INDEX=0
+export GIT_TEST_MULTI_PACK_INDEX
+
+objdir=.git/objects
+packdir=$objdir/pack
+midxdir=$packdir/multi-pack-index.d
+midx_chain=$midxdir/multi-pack-index-chain
+
+test_expect_success 'convert non-incremental MIDX to incremental' '
+	test_commit base &&
+	git repack -ad &&
+	git multi-pack-index write &&
+
+	test_path_is_file $packdir/multi-pack-index &&
+	old_hash="$(midx_checksum $objdir)" &&
+
+	test_commit other &&
+	git repack -d &&
+	git multi-pack-index write --incremental &&
+
+	test_path_is_missing $packdir/multi-pack-index &&
+	test_path_is_file $midx_chain &&
+	test_line_count = 2 $midx_chain &&
+	grep $old_hash $midx_chain
+'
+
+compare_results_with_midx 'incremental MIDX'
+
+test_expect_success 'convert incremental to non-incremental' '
+	test_commit squash &&
+	git repack -d &&
+	git multi-pack-index write &&
+
+	test_path_is_file $packdir/multi-pack-index &&
+	test_dir_is_empty $midxdir
+'
+
+compare_results_with_midx 'non-incremental MIDX conversion'
+
+test_done
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 8f34f05087..be1188e736 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -7,6 +7,9 @@ test_description='git repack works correctly'
 . "${TEST_DIRECTORY}/lib-midx.sh"
 . "${TEST_DIRECTORY}/lib-terminal.sh"
 
+GIT_TEST_MULTI_PACK_INDEX=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=0
+
 commit_and_pack () {
 	test_commit "$@" 1>&2 &&
 	incrpackid=$(git pack-objects --all --unpacked --incremental .git/objects/pack/pack </dev/null) &&
@@ -117,7 +120,7 @@ test_expect_success '--local disables writing bitmaps when connected to alternat
 	(
 		cd member &&
 		test_commit "object" &&
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adl --write-bitmap-index 2>err &&
+		git repack -Adl --write-bitmap-index 2>err &&
 		cat >expect <<-EOF &&
 		warning: disabling bitmap writing, as some objects are not being packed
 		EOF
@@ -533,11 +536,11 @@ test_expect_success 'setup for --write-midx tests' '
 test_expect_success '--write-midx unchanged' '
 	(
 		cd midx &&
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack &&
+		git repack &&
 		test_path_is_missing $midx &&
 		test_path_is_missing $midx-*.bitmap &&
 
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack --write-midx &&
+		git repack --write-midx &&
 
 		test_path_is_file $midx &&
 		test_path_is_missing $midx-*.bitmap &&
@@ -550,7 +553,7 @@ test_expect_success '--write-midx with a new pack' '
 		cd midx &&
 		test_commit loose &&
 
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack --write-midx &&
+		git repack --write-midx &&
 
 		test_path_is_file $midx &&
 		test_path_is_missing $midx-*.bitmap &&
@@ -561,7 +564,7 @@ test_expect_success '--write-midx with a new pack' '
 test_expect_success '--write-midx with -b' '
 	(
 		cd midx &&
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack -mb &&
+		git repack -mb &&
 
 		test_path_is_file $midx &&
 		test_path_is_file $midx-*.bitmap &&
@@ -574,7 +577,7 @@ test_expect_success '--write-midx with -d' '
 		cd midx &&
 		test_commit repack &&
 
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack -Ad --write-midx &&
+		git repack -Ad --write-midx &&
 
 		test_path_is_file $midx &&
 		test_path_is_missing $midx-*.bitmap &&
@@ -587,21 +590,21 @@ test_expect_success 'cleans up MIDX when appropriate' '
 		cd midx &&
 
 		test_commit repack-2 &&
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb --write-midx &&
+		git repack -Adb --write-midx &&
 
 		checksum=$(midx_checksum $objdir) &&
 		test_path_is_file $midx &&
 		test_path_is_file $midx-$checksum.bitmap &&
 
 		test_commit repack-3 &&
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb --write-midx &&
+		git repack -Adb --write-midx &&
 
 		test_path_is_file $midx &&
 		test_path_is_missing $midx-$checksum.bitmap &&
 		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
 
 		test_commit repack-4 &&
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb &&
+		git repack -Adb &&
 
 		find $objdir/pack -type f -name "multi-pack-index*" >files &&
 		test_must_be_empty files
@@ -622,7 +625,6 @@ test_expect_success '--write-midx with preferred bitmap tips' '
 		git log --format="create refs/tags/%s/%s %H" HEAD >refs &&
 		git update-ref --stdin <refs &&
 
-		GIT_TEST_MULTI_PACK_INDEX=0 \
 		git repack --write-midx --write-bitmap-index &&
 		test_path_is_file $midx &&
 		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
@@ -714,13 +716,13 @@ test_expect_success '--write-midx removes stale pack-based bitmaps' '
 	(
 		cd repo &&
 		test_commit base &&
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack -Ab &&
+		git repack -Ab &&
 
 		pack_bitmap=$(ls $objdir/pack/pack-*.bitmap) &&
 		test_path_is_file "$pack_bitmap" &&
 
 		test_commit tip &&
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack -bm &&
+		git repack -bm &&
 
 		test_path_is_file $midx &&
 		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
@@ -743,7 +745,6 @@ test_expect_success '--write-midx with --pack-kept-objects' '
 		keep="$objdir/pack/pack-$one.keep" &&
 		touch "$keep" &&
 
-		GIT_TEST_MULTI_PACK_INDEX=0 \
 		git repack --write-midx --write-bitmap-index --geometric=2 -d \
 			--pack-kept-objects &&
 
-- 
2.45.2.437.gecb9450a0e

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH 00/19] midx: incremental multi-pack indexes, part one
  2024-06-06 23:04 [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
                   ` (18 preceding siblings ...)
  2024-06-06 23:05 ` [PATCH 19/19] midx: implement support for writing incremental MIDX chains Taylor Blau
@ 2024-06-06 23:06 ` Taylor Blau
  2024-06-07 18:33   ` Junio C Hamano
  2024-06-07 17:55 ` Junio C Hamano
                   ` (3 subsequent siblings)
  23 siblings, 1 reply; 102+ messages in thread
From: Taylor Blau @ 2024-06-06 23:06 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

On Thu, Jun 06, 2024 at 07:04:22PM -0400, Taylor Blau wrote:
> This series implements incremental MIDXs, which allow for storing
> a MIDX across multiple layers, each with their own distinct set of
> packs.

I forgot to mention, this series is based off a merge with current
master and 'tb/midx-write-cleanup'.

The latter topic is marked to merge into 'master', but hasn't been
pushed out yet, hence the dependency on a merge with that and 'master'
instead of just 'master'.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 00/19] midx: incremental multi-pack indexes, part one
  2024-06-06 23:04 [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
                   ` (19 preceding siblings ...)
  2024-06-06 23:06 ` [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
@ 2024-06-07 17:55 ` Junio C Hamano
  2024-06-07 20:31   ` Taylor Blau
  2024-06-25 23:21 ` Junio C Hamano
                   ` (2 subsequent siblings)
  23 siblings, 1 reply; 102+ messages in thread
From: Junio C Hamano @ 2024-06-07 17:55 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Elijah Newren

Taylor Blau <me@ttaylorr.com> writes:

> Part three doesn't exist yet, but is straightforward to do on top. None
> of the design decisions made in this series inhibit my goals for part
> three.

Nice to always see the bigger picture to come to understand where
the current series fits, but the above is a bit peculiar thing to
say.  Of course there should be no design decision the currently
posted series makes that would block your future work---otherwise
you would not be posting it.  The real question is rather the future
and yet to be written work is still feasible after the design
decisions the current series made are found to be broken and need to
be revised (if it happens---but we do not know until we see reviews).

Thanks.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 00/19] midx: incremental multi-pack indexes, part one
  2024-06-06 23:06 ` [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
@ 2024-06-07 18:33   ` Junio C Hamano
  2024-06-07 20:29     ` Taylor Blau
  0 siblings, 1 reply; 102+ messages in thread
From: Junio C Hamano @ 2024-06-07 18:33 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Elijah Newren

Taylor Blau <me@ttaylorr.com> writes:

> I forgot to mention, this series is based off a merge with current
> master and 'tb/midx-write-cleanup'.

I think I saw "am -3" fall back to three-way at around [17/19] for
t0410 while applying on that base, but it wasn't anything "am -3"
couldn't handle.

Queued.

Thanks.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 00/19] midx: incremental multi-pack indexes, part one
  2024-06-07 18:33   ` Junio C Hamano
@ 2024-06-07 20:29     ` Taylor Blau
  0 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-06-07 20:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Elijah Newren

On Fri, Jun 07, 2024 at 11:33:13AM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > I forgot to mention, this series is based off a merge with current
> > master and 'tb/midx-write-cleanup'.
>
> I think I saw "am -3" fall back to three-way at around [17/19] for
> t0410 while applying on that base, but it wasn't anything "am -3"
> couldn't handle.
>
> Queued.

Great, thanks. Sorry again for forgetting to mention it sooner.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 00/19] midx: incremental multi-pack indexes, part one
  2024-06-07 17:55 ` Junio C Hamano
@ 2024-06-07 20:31   ` Taylor Blau
  0 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-06-07 20:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Elijah Newren

On Fri, Jun 07, 2024 at 10:55:43AM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > Part three doesn't exist yet, but is straightforward to do on top. None
> > of the design decisions made in this series inhibit my goals for part
> > three.
>
> Nice to always see the bigger picture to come to understand where
> the current series fits, but the above is a bit peculiar thing to
> say.  Of course there should be no design decision the currently
> posted series makes that would block your future work---otherwise
> you would not be posting it.i

Yeah. What I was trying to say was that part two actually exists, and
works in practice rather than just thinking that it would work without
having actually demonstrated anything ;-).

> The real question is rather the future and yet to be written work is
> still feasible after the design decisions the current series made are
> found to be broken and need to be revised (if it happens---but we do
> not know until we see reviews).

Indeed. I'll make sure that before I push out a new round that the
rebased part two still works as I expect it to.

Certainly all of this could be avoided by combining the two together,
but I think the result is just too large to review.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 00/19] midx: incremental multi-pack indexes, part one
  2024-06-06 23:04 [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
                   ` (20 preceding siblings ...)
  2024-06-07 17:55 ` Junio C Hamano
@ 2024-06-25 23:21 ` Junio C Hamano
  2024-06-26  0:44   ` Elijah Newren
  2024-07-17 21:11 ` [PATCH v2 " Taylor Blau
  2024-08-06 15:36 ` [PATCH v3 " Taylor Blau
  23 siblings, 1 reply; 102+ messages in thread
From: Junio C Hamano @ 2024-06-25 23:21 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Jeff King, Elijah Newren

Taylor Blau <me@ttaylorr.com> writes:

> This series implements incremental MIDXs, which allow for storing
> a MIDX across multiple layers, each with their own distinct set of
> packs.

So, ...  it is unfortunate that this hasn't seen any responses (not
even a question, let alone a proper review) and almost 3 weeks have
passed.

Any takers?

Thanks.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 00/19] midx: incremental multi-pack indexes, part one
  2024-06-25 23:21 ` Junio C Hamano
@ 2024-06-26  0:44   ` Elijah Newren
  0 siblings, 0 replies; 102+ messages in thread
From: Elijah Newren @ 2024-06-26  0:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Taylor Blau, Jeff King

On Tue, Jun 25, 2024 at 4:21 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Taylor Blau <me@ttaylorr.com> writes:
>
> > This series implements incremental MIDXs, which allow for storing
> > a MIDX across multiple layers, each with their own distinct set of
> > packs.
>
> So, ...  it is unfortunate that this hasn't seen any responses (not
> even a question, let alone a proper review) and almost 3 weeks have
> passed.
>
> Any takers?
>
> Thanks.

I've got it on my list, and I'll try to look at it soon.  It'll take a
bit longer since I'm not familiar with the area.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH v2 00/19] midx: incremental multi-pack indexes, part one
  2024-06-06 23:04 [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
                   ` (21 preceding siblings ...)
  2024-06-25 23:21 ` Junio C Hamano
@ 2024-07-17 21:11 ` Taylor Blau
  2024-07-17 21:11   ` [PATCH v2 01/19] Documentation: describe incremental MIDX format Taylor Blau
                     ` (19 more replies)
  2024-08-06 15:36 ` [PATCH v3 " Taylor Blau
  23 siblings, 20 replies; 102+ messages in thread
From: Taylor Blau @ 2024-07-17 21:11 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

This series implements incremental MIDXs, which allow for storing
a MIDX across multiple layers, each with their own distinct set of
packs.

This round is mostly unchanged from the previous since there has not yet
been substantial review. But it does rebase to current 'master' (which
is 04f5a52757 (Post 2.46-rc0 batch #2, 2024-07-16), at the time of
writing).

Importantly, this rebase moves this topic to be based on an ancestor of
0c5a62f14b (midx-write.c: do not read existing MIDX with
`packs_to_include`, 2024-06-11), which resulted in a non-trivial
conflict prior to this rebase.

The rest of the topic is unchanged. I don't expect that we'll see much
review here for the next couple of weeks while we are in the -rc phase,
but I figured it would be useful to have it on the list for folks that
are interested in taking a look.

Thanks in advance for any review! :-)

Taylor Blau (19):
  Documentation: describe incremental MIDX format
  midx: add new fields for incremental MIDX chains
  midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs
  midx: teach `prepare_midx_pack()` about incremental MIDXs
  midx: teach `nth_midxed_object_oid()` about incremental MIDXs
  midx: teach `nth_bitmapped_pack()` about incremental MIDXs
  midx: introduce `bsearch_one_midx()`
  midx: teach `bsearch_midx()` about incremental MIDXs
  midx: teach `nth_midxed_offset()` about incremental MIDXs
  midx: teach `fill_midx_entry()` about incremental MIDXs
  midx: remove unused `midx_locate_pack()`
  midx: teach `midx_contains_pack()` about incremental MIDXs
  midx: teach `midx_preferred_pack()` about incremental MIDXs
  midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs
  midx: support reading incremental MIDX chains
  midx: implement verification support for incremental MIDXs
  t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  t/t5313-pack-bounds-checks.sh: prepare for sub-directories
  midx: implement support for writing incremental MIDX chains

 Documentation/git-multi-pack-index.txt       |  11 +-
 Documentation/technical/multi-pack-index.txt | 100 +++++
 builtin/multi-pack-index.c                   |   2 +
 builtin/repack.c                             |   8 +-
 ci/run-build-and-tests.sh                    |   2 +-
 midx-write.c                                 | 326 ++++++++++++---
 midx.c                                       | 410 ++++++++++++++++---
 midx.h                                       |  26 +-
 object-name.c                                |  99 ++---
 packfile.c                                   |  21 +-
 packfile.h                                   |   4 +
 t/README                                     |   6 +-
 t/helper/test-read-midx.c                    |  24 +-
 t/lib-bitmap.sh                              |   6 +-
 t/lib-midx.sh                                |  28 ++
 t/t0410-partial-clone.sh                     |   2 -
 t/t5310-pack-bitmaps.sh                      |   4 -
 t/t5313-pack-bounds-checks.sh                |   8 +-
 t/t5319-multi-pack-index.sh                  |  30 +-
 t/t5326-multi-pack-bitmaps.sh                |   4 +-
 t/t5327-multi-pack-bitmaps-rev.sh            |   6 +-
 t/t5332-multi-pack-reuse.sh                  |   2 +
 t/t5334-incremental-multi-pack-index.sh      |  46 +++
 t/t7700-repack.sh                            |  48 +--
 24 files changed, 959 insertions(+), 264 deletions(-)
 create mode 100755 t/t5334-incremental-multi-pack-index.sh

Range-diff against v1:
 1:  e5ce916f67 =  1:  014588b3ec Documentation: describe incremental MIDX format
 2:  6569289ca7 =  2:  337ebc6de7 midx: add new fields for incremental MIDX chains
 3:  d2e845a9d4 =  3:  f449a72877 midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs
 4:  2100c6ddfa =  4:  f88569c819 midx: teach `prepare_midx_pack()` about incremental MIDXs
 5:  454c3d2fe7 !  5:  ec57ff4349 midx: teach `nth_midxed_object_oid()` about incremental MIDXs
    @@ midx.c: struct object_id *nth_midxed_object_oid(struct object_id *oid,
      
     +	n = midx_for_object(&m, n);
     +
    - 	oidread(oid, m->chunk_oid_lookup + st_mult(m->hash_len, n));
    + 	oidread(oid, m->chunk_oid_lookup + st_mult(m->hash_len, n),
    + 		the_repository->hash_algo);
      	return oid;
    - }
 6:  7d945c41bc =  6:  650b8c8c21 midx: teach `nth_bitmapped_pack()` about incremental MIDXs
 7:  4d4d924aa2 =  7:  bfd1dadbf1 midx: introduce `bsearch_one_midx()`
 8:  86d88bc6a3 =  8:  38bd45bd24 midx: teach `bsearch_midx()` about incremental MIDXs
 9:  eb9ed10ca3 =  9:  342ed56033 midx: teach `nth_midxed_offset()` about incremental MIDXs
10:  36cfdd9b95 = 10:  2b335c45ae midx: teach `fill_midx_entry()` about incremental MIDXs
11:  1ae5fd7e89 = 11:  22de5898f3 midx: remove unused `midx_locate_pack()`
12:  e3319967b9 = 12:  fb60f2b022 midx: teach `midx_contains_pack()` about incremental MIDXs
13:  3b8dffa051 = 13:  38b642d404 midx: teach `midx_preferred_pack()` about incremental MIDXs
14:  35fbe05a4a = 14:  594386da10 midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs
15:  a5eedb15fa = 15:  dad130799c midx: support reading incremental MIDX chains
16:  186b15e6bd ! 16:  ad976ef413 midx: implement verification support for incremental MIDXs
    @@ midx.c: int verify_midx_file(struct repository *r, const char *object_dir, unsig
     -		{
     -			close_pack_fd(m->packs[pairs[i-1].pack_int_id]);
     -			close_pack_index(m->packs[pairs[i-1].pack_int_id]);
    -+		    m->packs[pairs[i-1].pack_int_id]) {
    ++		    nth_midxed_pack(m, pairs[i-1].pack_int_id)) {
     +			uint32_t pack_int_id = pairs[i-1].pack_int_id;
     +			struct packed_git *p = nth_midxed_pack(m, pack_int_id);
     +
17:  94362c057a ! 17:  23912425bf t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
    @@ t/README: GIT_TEST_MULTI_PACK_INDEX=<boolean>, when true, forces the multi-pack-
     
      ## t/t0410-partial-clone.sh ##
     @@ t/t0410-partial-clone.sh: test_description='partial clone'
    - 
      . ./test-lib.sh
    + . "$TEST_DIRECTORY"/lib-terminal.sh
      
     -# missing promisor objects cause repacks which write bitmaps to fail
     -GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
18:  4442e7ca52 = 18:  814da1916d t/t5313-pack-bounds-checks.sh: prepare for sub-directories
19:  0cbe34b0bd ! 19:  e2b5961b45 midx: implement support for writing incremental MIDX chains
    @@ midx-write.c
      extern int cmp_idx_or_pack_name(const char *idx_or_pack_name,
      				const char *idx_name);
      
    +@@ midx-write.c: struct write_midx_context {
    + 	size_t nr;
    + 	size_t alloc;
    + 	struct multi_pack_index *m;
    ++	struct multi_pack_index *base_midx;
    + 	struct progress *progress;
    + 	unsigned pack_paths_checked;
    + 
     @@ midx-write.c: struct write_midx_context {
      
      	int preferred_pack_idx;
    @@ midx-write.c: struct write_midx_context {
      	struct string_list *to_include;
      };
      
    +@@ midx-write.c: static int should_include_pack(const struct write_midx_context *ctx,
    + 	 */
    + 	if (ctx->m && midx_contains_pack(ctx->m, file_name))
    + 		return 0;
    ++	else if (ctx->base_midx && midx_contains_pack(ctx->base_midx,
    ++						      file_name))
    ++		return 0;
    + 	else if (ctx->to_include &&
    + 		 !string_list_has_string(ctx->to_include, file_name))
    + 		return 0;
     @@ midx-write.c: static void compute_sorted_entries(struct write_midx_context *ctx,
      	for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
      		fanout.nr = 0;
    @@ midx-write.c: static void compute_sorted_entries(struct write_midx_context *ctx,
      			if (cur_object && oideq(&fanout.entries[cur_object - 1].oid,
      						&fanout.entries[cur_object].oid))
      				continue;
    -+			if (ctx->incremental && ctx->m &&
    -+			    midx_has_oid(ctx->m, &fanout.entries[cur_object].oid))
    ++			if (ctx->incremental && ctx->base_midx &&
    ++			    midx_has_oid(ctx->base_midx,
    ++					 &fanout.entries[cur_object].oid))
     +				continue;
      
      			ALLOC_GROW(ctx->entries, st_add(ctx->entries_nr, 1),
    @@ midx-write.c: static int write_midx_revindex(struct hashfile *f,
     -	uint32_t i;
     +	uint32_t i, nr_base;
     +
    -+	if (ctx->m && ctx->incremental)
    -+		nr_base = ctx->m->num_objects + ctx->m->num_objects_in_base;
    ++	if (ctx->incremental && ctx->base_midx)
    ++		nr_base = ctx->base_midx->num_objects +
    ++			ctx->base_midx->num_objects_in_base;
     +	else
     +		nr_base = 0;
      
    @@ midx-write.c: static int midx_pack_order_cmp(const void *va, const void *vb)
      
      	trace2_region_enter("midx", "midx_pack_order", the_repository);
      
    -+	if (ctx->incremental && ctx->m)
    -+		base_objects = ctx->m->num_objects + ctx->m->num_objects_in_base;
    ++	if (ctx->incremental && ctx->base_midx)
    ++		base_objects = ctx->base_midx->num_objects +
    ++			ctx->base_midx->num_objects_in_base;
     +
     +	ALLOC_ARRAY(pack_order, ctx->entries_nr);
      	ALLOC_ARRAY(data, ctx->entries_nr);
    @@ midx-write.c: static struct multi_pack_index *lookup_multi_pack_index(struct rep
     +	struct multi_pack_index *m;
      
     -	for (i = 0; i < ctx->m->num_packs; i++) {
    --		if (!should_include_pack(ctx, ctx->m->pack_names[i], 0))
    --			continue;
    +-		ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
     +	for (m = ctx->m; m; m = m->base_midx) {
     +		uint32_t i;
    - 
    --		ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
    --
    --		if (flags & MIDX_WRITE_REV_INDEX || preferred_pack_name) {
    ++
     +		for (i = 0; i < m->num_packs; i++) {
    ++			ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
    + 
    +-		if (flags & MIDX_WRITE_REV_INDEX || preferred_pack_name) {
      			/*
      			 * If generating a reverse index, need to have
      			 * packed_git's loaded to compare their
    @@ midx-write.c: static struct multi_pack_index *lookup_multi_pack_index(struct rep
      			 */
     -			if (prepare_midx_pack(the_repository, ctx->m, i))
     -				return error(_("could not load pack"));
    -+			if (!should_include_pack(ctx, m->pack_names[i], 0))
    -+				continue;
    - 
    --			if (open_pack_index(ctx->m->packs[i]))
    --				die(_("could not open index for %s"),
    --				    ctx->m->packs[i]->pack_name);
    -+			ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
    -+
     +			if (flags & MIDX_WRITE_REV_INDEX ||
     +			    preferred_pack_name) {
     +				if (prepare_midx_pack(the_repository, m,
    @@ midx-write.c: static struct multi_pack_index *lookup_multi_pack_index(struct rep
     +					error(_("could not load pack"));
     +					return 1;
     +				}
    -+
    + 
    +-			if (open_pack_index(ctx->m->packs[i]))
    +-				die(_("could not open index for %s"),
    +-				    ctx->m->packs[i]->pack_name);
     +				if (open_pack_index(m->packs[i]))
     +					die(_("could not open index for %s"),
     +					    m->packs[i]->pack_name);
    @@ midx-write.c: static int write_midx_internal(const char *object_dir,
      	if (safe_create_leading_directories(midx_name.buf))
      		die_errno(_("unable to create leading directories of %s"),
      			  midx_name.buf);
    -@@ midx-write.c: static int write_midx_internal(const char *object_dir,
    + 
    +-	if (!packs_to_include) {
    +-		/*
    +-		 * Only reference an existing MIDX when not filtering which
    +-		 * packs to include, since all packs and objects are copied
    +-		 * blindly from an existing MIDX if one is present.
    +-		 */
    +-		ctx.m = lookup_multi_pack_index(the_repository, object_dir);
    +-	}
    ++	if (!packs_to_include || ctx.incremental) {
    ++		struct multi_pack_index *m = lookup_multi_pack_index(the_repository,
    ++								     object_dir);
    ++		if (m && !midx_checksum_valid(m)) {
    ++			warning(_("ignoring existing multi-pack-index; checksum mismatch"));
    ++			m = NULL;
    ++		}
    + 
    +-	if (ctx.m && !midx_checksum_valid(ctx.m)) {
    +-		warning(_("ignoring existing multi-pack-index; checksum mismatch"));
    +-		ctx.m = NULL;
    ++		if (m) {
    ++			/*
    ++			 * Only reference an existing MIDX when not filtering
    ++			 * which packs to include, since all packs and objects
    ++			 * are copied blindly from an existing MIDX if one is
    ++			 * present.
    ++			 */
    ++			if (ctx.incremental)
    ++				ctx.base_midx = m;
    ++			else if (!packs_to_include)
    ++				ctx.m = m;
    ++		}
      	}
      
      	ctx.nr = 0;
     -	ctx.alloc = ctx.m ? ctx.m->num_packs : 16;
     +	ctx.alloc = ctx.m ? ctx.m->num_packs + ctx.m->num_packs_in_base : 16;
      	ctx.info = NULL;
    - 	ctx.to_include = packs_to_include;
      	ALLOC_ARRAY(ctx.info, ctx.alloc);
      
     -	if (ctx.m && fill_packs_from_midx(&ctx, preferred_pack_name,
     -					  flags) < 0) {
     -		result = 1;
     +	if (ctx.incremental) {
    -+		struct multi_pack_index *m = ctx.m;
    ++		struct multi_pack_index *m = ctx.base_midx;
     +		while (m) {
     +			ctx.num_multi_pack_indexes_before++;
     +			m = m->base_midx;
    @@ midx-write.c: static int write_midx_internal(const char *object_dir,
      	 * have been freed in the previous if block.
      	 */
      
    +-	if (ctx.m)
     +	CALLOC_ARRAY(keep_hashes, ctx.num_multi_pack_indexes_before + 1);
     +
     +	if (ctx.incremental) {
     +		FILE *chainf = fdopen_lock_file(&lk, "w");
     +		struct strbuf final_midx_name = STRBUF_INIT;
    -+		struct multi_pack_index *m = ctx.m;
    ++		struct multi_pack_index *m = ctx.base_midx;
     +
     +		if (!chainf) {
     +			error_errno(_("unable to open multi-pack-index chain file"));
     +			return -1;
     +		}
     +
    -+		if (link_midx_to_chain(ctx.m) < 0)
    ++		if (link_midx_to_chain(ctx.base_midx) < 0)
     +			return -1;
     +
     +		get_split_midx_filename_ext(&final_midx_name, object_dir,
    @@ midx-write.c: static int write_midx_internal(const char *object_dir,
     +			xstrdup(hash_to_hex(midx_hash));
     +	}
     +
    - 	if (ctx.m)
    ++	if (ctx.m || ctx.base_midx)
      		close_object_store(the_repository->objects);
      
      	if (commit_lock_file(&lk) < 0)

base-commit: 04f5a52757cd92347271e24f7cbdfe15dafce3b7
-- 
2.46.0.rc0.94.g9b2aff57b3

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH v2 01/19] Documentation: describe incremental MIDX format
  2024-07-17 21:11 ` [PATCH v2 " Taylor Blau
@ 2024-07-17 21:11   ` Taylor Blau
  2024-08-01  9:19     ` Jeff King
  2024-07-17 21:12   ` [PATCH v2 02/19] midx: add new fields for incremental MIDX chains Taylor Blau
                     ` (18 subsequent siblings)
  19 siblings, 1 reply; 102+ messages in thread
From: Taylor Blau @ 2024-07-17 21:11 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Prepare to implement incremental multi-pack indexes (MIDXs) over the
next several commits by first describing the relevant prerequisites
(like a new chunk in the MIDX format, the directory structure for
incremental MIDXs, etc.)

The format is described in detail in the patch contents below, but the
high-level description is as follows.

Incremental MIDXs live in $GIT_DIR/objects/pack/multi-pack-index.d, and
each `*.midx` within that directory has a single "parent" MIDX, which is
the MIDX layer immediately before it in the MIDX chain. The chain order
resides in a file 'multi-pack-index-chain' in the same directory.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/technical/multi-pack-index.txt | 100 +++++++++++++++++++
 1 file changed, 100 insertions(+)

diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
index f2221d2b44..d05e3d6dd9 100644
--- a/Documentation/technical/multi-pack-index.txt
+++ b/Documentation/technical/multi-pack-index.txt
@@ -61,6 +61,106 @@ Design Details
 - The MIDX file format uses a chunk-based approach (similar to the
   commit-graph file) that allows optional data to be added.
 
+Incremental multi-pack indexes
+------------------------------
+
+As repositories grow in size, it becomes more expensive to write a
+multi-pack index (MIDX) that includes all packfiles. To accommodate
+this, the "incremental multi-pack indexes" feature allows for combining
+a "chain" of multi-pack indexes.
+
+Each individual component of the chain need only contain a small number
+of packfiles. Appending to the chain does not invalidate earlier parts
+of the chain, so repositories can control how much time is spent
+updating the MIDX chain by determining the number of packs in each layer
+of the MIDX chain.
+
+=== Design state
+
+At present, the incremental multi-pack indexes feature is missing two
+important components:
+
+  - The ability to rewrite earlier portions of the MIDX chain (i.e., to
+    "compact" some collection of adjacent MIDX layers into a single
+    MIDX). At present the only supported way of shrinking a MIDX chain
+    is to rewrite the entire chain from scratch without the `--split`
+    flag.
++
+There are no fundamental limitations that stand in the way of being able
+to implement this feature. It is omitted from the initial implementation
+in order to reduce the complexity, but will be added later.
+
+  - Support for reachability bitmaps. The classic single MIDX
+    implementation does support reachability bitmaps (see the section
+    titled "multi-pack-index reverse indexes" in
+    linkgit:gitformat-pack[5] for more details).
++
+As above, there are no fundamental limitations that stand in the way of
+extending the incremental MIDX format to support reachability bitmaps.
+The design below specifically takes this into account, and support for
+reachability bitmaps will be added in a future patch series. It is
+omitted from this series for the same reason as above.
++
+In brief, to support reachability bitmaps with the incremental MIDX
+feature, the concept of the pseudo-pack order is extended across each
+layer of the incremental MIDX chain to form a concatenated pseudo-pack
+order. This concatenation takes place in the same order as the chain
+itself (in other words, the concatenated pseudo-pack order for a chain
+`{$H1, $H2, $H3}` would be the pseudo-pack order for `$H1`, followed by
+the pseudo-pack order for `$H2`, followed by the pseudo-pack order for
+`$H3`).
++
+The layout will then be extended so that each layer of the incremental
+MIDX chain can write a `*.bitmap`. The objects in each layer's bitmap
+are offset by the number of objects in the previous layers of the chain.
+
+=== File layout
+
+Instead of storing a single `multi-pack-index` file (with an optional
+`.rev` and `.bitmap` extension) in `$GIT_DIR/objects/pack`, incremental
+MIDXs are stored in the following layout:
+
+----
+$GIT_DIR/objects/pack/multi-pack-index.d/
+$GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-chain
+$GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-$H1.midx
+$GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-$H2.midx
+$GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-$H3.midx
+----
+
+The `multi-pack-index-chain` file contains a list of the incremental
+MIDX files in the chain, in order. The above example shows a chain whose
+`multi-pack-index-chain` file would contain the following lines:
+
+----
+$H1
+$H2
+$H3
+----
+
+The `multi-pack-index-$H1.midx` file contains the first layer of the
+multi-pack-index chain. The `multi-pack-index-$H2.midx` file contains
+the second layer of the chain, and so on.
+
+=== Object positions for incremental MIDXs
+
+In the original multi-pack-index design, we refer to objects via their
+lexicographic position (by object IDs) within the repository's singular
+multi-pack-index. In the incremental multi-pack-index design, we refer
+to objects via their index into a concatenated lexicographic ordering
+among each component in the MIDX chain.
+
+If `objects_nr()` is a function that returns the number of objects in a
+given MIDX layer, then the index of an object at lexicographic position
+`i` within, say, $H3 is defined as:
+
+----
+objects_nr($H2) + objects_nr($H1) + i
+----
+
+(in the C implementation, this is often computed as `i +
+m->num_objects_in_base`).
+
 Future Work
 -----------
 
-- 
2.46.0.rc0.94.g9b2aff57b3


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v2 02/19] midx: add new fields for incremental MIDX chains
  2024-07-17 21:11 ` [PATCH v2 " Taylor Blau
  2024-07-17 21:11   ` [PATCH v2 01/19] Documentation: describe incremental MIDX format Taylor Blau
@ 2024-07-17 21:12   ` Taylor Blau
  2024-08-01  9:21     ` Jeff King
  2024-07-17 21:12   ` [PATCH v2 03/19] midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs Taylor Blau
                     ` (17 subsequent siblings)
  19 siblings, 1 reply; 102+ messages in thread
From: Taylor Blau @ 2024-07-17 21:12 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The incremental MIDX chain feature is designed around the idea of
indexing into a concatenated lexicographic ordering of object IDs
present in the MIDX.

When given an object position, the MIDX machinery needs to be able to
locate both (a) which MIDX layer contains the given object, and (b) at
what position *within that MIDX layer* that object appears.

To do this, three new fields are added to the `struct multi_pack_index`:

  - struct multi_pack_index *base_midx;
  - uint32_t num_objects_in_base;
  - uint32_t num_packs_in_base;

These three fields store the pieces of information suggested by their
respective field names. In turn, the `num_objects_in_base` and
`num_packs_in_base` fields are used to crawl backwards along the
`base_midx` pointer to locate the appropriate position for a given
object within the MIDX that contains it.

The following commits will update various parts of the MIDX machinery
(as well as their callers from outside of midx.c and midx-write.c) to be
aware and make use of these fields when performing object lookups.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/midx.h b/midx.h
index 8554f2d616..020e49f77c 100644
--- a/midx.h
+++ b/midx.h
@@ -63,6 +63,10 @@ struct multi_pack_index {
 	const unsigned char *chunk_revindex;
 	size_t chunk_revindex_len;
 
+	struct multi_pack_index *base_midx;
+	uint32_t num_objects_in_base;
+	uint32_t num_packs_in_base;
+
 	const char **pack_names;
 	struct packed_git **packs;
 	char object_dir[FLEX_ARRAY];
-- 
2.46.0.rc0.94.g9b2aff57b3


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v2 03/19] midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs
  2024-07-17 21:11 ` [PATCH v2 " Taylor Blau
  2024-07-17 21:11   ` [PATCH v2 01/19] Documentation: describe incremental MIDX format Taylor Blau
  2024-07-17 21:12   ` [PATCH v2 02/19] midx: add new fields for incremental MIDX chains Taylor Blau
@ 2024-07-17 21:12   ` Taylor Blau
  2024-08-01  9:30     ` Jeff King
  2024-07-17 21:12   ` [PATCH v2 04/19] midx: teach `prepare_midx_pack()` " Taylor Blau
                     ` (16 subsequent siblings)
  19 siblings, 1 reply; 102+ messages in thread
From: Taylor Blau @ 2024-07-17 21:12 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The function `nth_midxed_pack_int_id()` takes in a object position in
MIDX lexicographic order and returns an identifier of the pack from
which that object was selected in the MIDX.

Currently, the given object position is an index into the lexicographic
order of objects in a single MIDX. Change this position to instead refer
into the concatenated lexicographic order of all MIDXs in a MIDX chain.

This has two visible effects within the implementation of
`prepare_midx_pack()`:

  - First, the given position is now an index into the concatenated
    lexicographic order of all MIDXs in the order in which they appear
    in the MIDX chain.

  - Second the pack ID returned from this function is now also in the
    concatenated order of packs among all layers of the MIDX chain in
    the same order that they appear in the MIDX chain.

To do this, introduce the first of two general purpose helpers, this one
being `midx_for_object()`. `midx_for_object()` takes a double pointer to
a `struct multi_pack_index` as well as an object `pos` in terms of the
entire MIDX chain[^1].

The function chases down the '->base_midx' field until it finds the MIDX
layer within the chain that contains the given object. It then:

  - modifies the double pointer to point to the containing MIDX, instead
    of the tip of the chain, and

  - returns the MIDX-local position[^2] at which the given object can be
    found.

Use this function within `nth_midxed_pack_int_id()` so that the `pos` it
expects is now relative to the entire MIDX chain, and that it returns
the appropriate pack position for that object.

[^1]: As a reminder, this means that the object is identified among the
  objects contained in all layers of the incremental MIDX chain, not any
  particular layer. For example, consider MIDX chain with two individual
  MIDXs, one with 4 objects and another with 3 objects. If the MIDX with
  4 objects appears earlier in the chain, then asking for pack "6" would
  return the second object in the MIDX with 3 objects.

[^2]: Building on the previous example, asking for object 6 in a MIDX
  chain with (4, 3) objects, respectively, this would set the double
  pointer to point at the MIDX containing three objects, and would
  return an index to the second object within that MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index 3992b05465..39d358da20 100644
--- a/midx.c
+++ b/midx.c
@@ -242,6 +242,23 @@ void close_midx(struct multi_pack_index *m)
 	free(m);
 }
 
+static uint32_t midx_for_object(struct multi_pack_index **_m, uint32_t pos)
+{
+	struct multi_pack_index *m = *_m;
+	while (m && pos < m->num_objects_in_base)
+		m = m->base_midx;
+
+	if (!m)
+		BUG("NULL multi-pack-index for object position: %"PRIu32, pos);
+
+	if (pos >= m->num_objects + m->num_objects_in_base)
+		die(_("invalid MIDX object position, MIDX is likely corrupt"));
+
+	*_m = m;
+
+	return pos - m->num_objects_in_base;
+}
+
 int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id)
 {
 	struct strbuf pack_name = STRBUF_INIT;
@@ -334,8 +351,10 @@ off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
 
 uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos)
 {
-	return get_be32(m->chunk_object_offsets +
-			(off_t)pos * MIDX_CHUNK_OFFSET_WIDTH);
+	pos = midx_for_object(&m, pos);
+
+	return m->num_packs_in_base + get_be32(m->chunk_object_offsets +
+					       (off_t)pos * MIDX_CHUNK_OFFSET_WIDTH);
 }
 
 int fill_midx_entry(struct repository *r,
-- 
2.46.0.rc0.94.g9b2aff57b3


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v2 04/19] midx: teach `prepare_midx_pack()` about incremental MIDXs
  2024-07-17 21:11 ` [PATCH v2 " Taylor Blau
                     ` (2 preceding siblings ...)
  2024-07-17 21:12   ` [PATCH v2 03/19] midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs Taylor Blau
@ 2024-07-17 21:12   ` Taylor Blau
  2024-08-01  9:35     ` Jeff King
  2024-07-17 21:12   ` [PATCH v2 05/19] midx: teach `nth_midxed_object_oid()` " Taylor Blau
                     ` (15 subsequent siblings)
  19 siblings, 1 reply; 102+ messages in thread
From: Taylor Blau @ 2024-07-17 21:12 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The function `prepare_midx_pack()` is part of the midx.h API and
loads the pack identified by the MIDX-local 'pack_int_id'. This patch
prepares that function to be aware of an incremental MIDX world.

To do this, introduce the second of the two general purpose helpers
mentioned in the previous commit. This commit introduces
`midx_for_pack()`, which is the pack-specific analog of
`midx_for_object()`, and works in the same fashion.

Like `midx_for_object()`, this function chases down the '->base_midx'
field until it finds the MIDX layer within the chain that contains the
given pack.

Use this function within `prepare_midx_pack()` so that the `pack_int_id`
it expects is now relative to the entire MIDX chain, and that it
prepares the given pack in the appropriate MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 33 +++++++++++++++++++++++++--------
 1 file changed, 25 insertions(+), 8 deletions(-)

diff --git a/midx.c b/midx.c
index 39d358da20..da5e0bb940 100644
--- a/midx.c
+++ b/midx.c
@@ -259,20 +259,37 @@ static uint32_t midx_for_object(struct multi_pack_index **_m, uint32_t pos)
 	return pos - m->num_objects_in_base;
 }
 
-int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id)
+static uint32_t midx_for_pack(struct multi_pack_index **_m,
+			      uint32_t pack_int_id)
 {
-	struct strbuf pack_name = STRBUF_INIT;
-	struct packed_git *p;
+	struct multi_pack_index *m = *_m;
+	while (m && pack_int_id < m->num_packs_in_base)
+		m = m->base_midx;
 
-	if (pack_int_id >= m->num_packs)
+	if (!m)
+		BUG("NULL multi-pack-index for pack ID: %"PRIu32, pack_int_id);
+
+	if (pack_int_id >= m->num_packs + m->num_packs_in_base)
 		die(_("bad pack-int-id: %u (%u total packs)"),
-		    pack_int_id, m->num_packs);
+		    pack_int_id, m->num_packs + m->num_packs_in_base);
 
-	if (m->packs[pack_int_id])
+	*_m = m;
+
+	return pack_int_id - m->num_packs_in_base;
+}
+
+int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
+		      uint32_t pack_int_id)
+{
+	struct strbuf pack_name = STRBUF_INIT;
+	struct packed_git *p;
+	uint32_t local_pack_int_id = midx_for_pack(&m, pack_int_id);
+
+	if (m->packs[local_pack_int_id])
 		return 0;
 
 	strbuf_addf(&pack_name, "%s/pack/%s", m->object_dir,
-		    m->pack_names[pack_int_id]);
+		    m->pack_names[local_pack_int_id]);
 
 	p = add_packed_git(pack_name.buf, pack_name.len, m->local);
 	strbuf_release(&pack_name);
@@ -281,7 +298,7 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t
 		return 1;
 
 	p->multi_pack_index = 1;
-	m->packs[pack_int_id] = p;
+	m->packs[local_pack_int_id] = p;
 	install_packed_git(r, p);
 	list_add_tail(&p->mru, &r->objects->packed_git_mru);
 
-- 
2.46.0.rc0.94.g9b2aff57b3


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v2 05/19] midx: teach `nth_midxed_object_oid()` about incremental MIDXs
  2024-07-17 21:11 ` [PATCH v2 " Taylor Blau
                     ` (3 preceding siblings ...)
  2024-07-17 21:12   ` [PATCH v2 04/19] midx: teach `prepare_midx_pack()` " Taylor Blau
@ 2024-07-17 21:12   ` Taylor Blau
  2024-08-01  9:38     ` Jeff King
  2024-07-17 21:12   ` [PATCH v2 06/19] midx: teach `nth_bitmapped_pack()` " Taylor Blau
                     ` (14 subsequent siblings)
  19 siblings, 1 reply; 102+ messages in thread
From: Taylor Blau @ 2024-07-17 21:12 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The function `nth_midxed_object_oid()` returns the object ID for a given
object position in the MIDX lexicographic order.

Teach this function to instead operate over the concatenated
lexicographic order defined in an earlier step so that it is able to be
used with incremental MIDXs.

To do this, we need to both (a) adjust the bounds check for the given
'n', as well as record the MIDX-local position after chasing the
`->base_midx` pointer to find the MIDX which contains that object.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index da5e0bb940..d470a88755 100644
--- a/midx.c
+++ b/midx.c
@@ -337,9 +337,11 @@ struct object_id *nth_midxed_object_oid(struct object_id *oid,
 					struct multi_pack_index *m,
 					uint32_t n)
 {
-	if (n >= m->num_objects)
+	if (n >= m->num_objects + m->num_objects_in_base)
 		return NULL;
 
+	n = midx_for_object(&m, n);
+
 	oidread(oid, m->chunk_oid_lookup + st_mult(m->hash_len, n),
 		the_repository->hash_algo);
 	return oid;
-- 
2.46.0.rc0.94.g9b2aff57b3


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v2 06/19] midx: teach `nth_bitmapped_pack()` about incremental MIDXs
  2024-07-17 21:11 ` [PATCH v2 " Taylor Blau
                     ` (4 preceding siblings ...)
  2024-07-17 21:12   ` [PATCH v2 05/19] midx: teach `nth_midxed_object_oid()` " Taylor Blau
@ 2024-07-17 21:12   ` Taylor Blau
  2024-08-01  9:39     ` Jeff King
  2024-07-17 21:12   ` [PATCH v2 07/19] midx: introduce `bsearch_one_midx()` Taylor Blau
                     ` (13 subsequent siblings)
  19 siblings, 1 reply; 102+ messages in thread
From: Taylor Blau @ 2024-07-17 21:12 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

In a similar fashion as in previous commits, teach the function
`nth_bitmapped_pack()` about incremental MIDXs by translating the given
`pack_int_id` from the concatenated lexical order to a MIDX-local
lexical position.

When accessing the containing MIDX's array of packs, use the local pack
ID. Likewise, when reading the 'BTMP' chunk, use the MIDX-local offset
when accessing the data within that chunk.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/midx.c b/midx.c
index d470a88755..b6c3cd3e59 100644
--- a/midx.c
+++ b/midx.c
@@ -310,17 +310,19 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
 int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
 		       struct bitmapped_pack *bp, uint32_t pack_int_id)
 {
+	uint32_t local_pack_int_id = midx_for_pack(&m, pack_int_id);
+
 	if (!m->chunk_bitmapped_packs)
 		return error(_("MIDX does not contain the BTMP chunk"));
 
 	if (prepare_midx_pack(r, m, pack_int_id))
 		return error(_("could not load bitmapped pack %"PRIu32), pack_int_id);
 
-	bp->p = m->packs[pack_int_id];
+	bp->p = m->packs[local_pack_int_id];
 	bp->bitmap_pos = get_be32((char *)m->chunk_bitmapped_packs +
-				  MIDX_CHUNK_BITMAPPED_PACKS_WIDTH * pack_int_id);
+				  MIDX_CHUNK_BITMAPPED_PACKS_WIDTH * local_pack_int_id);
 	bp->bitmap_nr = get_be32((char *)m->chunk_bitmapped_packs +
-				 MIDX_CHUNK_BITMAPPED_PACKS_WIDTH * pack_int_id +
+				 MIDX_CHUNK_BITMAPPED_PACKS_WIDTH * local_pack_int_id +
 				 sizeof(uint32_t));
 	bp->pack_int_id = pack_int_id;
 
-- 
2.46.0.rc0.94.g9b2aff57b3


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v2 07/19] midx: introduce `bsearch_one_midx()`
  2024-07-17 21:11 ` [PATCH v2 " Taylor Blau
                     ` (5 preceding siblings ...)
  2024-07-17 21:12   ` [PATCH v2 06/19] midx: teach `nth_bitmapped_pack()` " Taylor Blau
@ 2024-07-17 21:12   ` Taylor Blau
  2024-08-01 10:06     ` Jeff King
  2024-07-17 21:12   ` [PATCH v2 08/19] midx: teach `bsearch_midx()` about incremental MIDXs Taylor Blau
                     ` (12 subsequent siblings)
  19 siblings, 1 reply; 102+ messages in thread
From: Taylor Blau @ 2024-07-17 21:12 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The `bsearch_midx()` function will be extended in a following commit to
search for the location of a given object ID across all MIDXs in a chain
(or the single non-chain MIDX if no chain is available).

While most callers will naturally want to use the updated
`bsearch_midx()` function, there are a handful of special cases that
will want finer control and will only want to search through a single
MIDX.

For instance, the object abbreviation code, which cares about object IDs
near to where we'd expect to find a match in a MIDX. In that case, we
want to look at the nearby matches in each layer of the MIDX chain, not
just a single one).

Split the more fine-grained control out into a separate function called
`bsearch_one_midx()` which searches only a single MIDX.

At present both `bsearch_midx()` and `bsearch_one_midx()` have identical
behavior, but the following commit will rewrite the former to be aware
of incremental MIDXs for the remaining non-special case callers.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c        | 17 +++++++--
 midx.h        |  5 ++-
 object-name.c | 99 +++++++++++++++++++++++++++------------------------
 3 files changed, 71 insertions(+), 50 deletions(-)

diff --git a/midx.c b/midx.c
index b6c3cd3e59..bb3fa43492 100644
--- a/midx.c
+++ b/midx.c
@@ -329,10 +329,21 @@ int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
 	return 0;
 }
 
-int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result)
+int bsearch_one_midx(const struct object_id *oid, struct multi_pack_index *m,
+		     uint32_t *result)
 {
-	return bsearch_hash(oid->hash, m->chunk_oid_fanout, m->chunk_oid_lookup,
-			    the_hash_algo->rawsz, result);
+	int ret = bsearch_hash(oid->hash, m->chunk_oid_fanout,
+			       m->chunk_oid_lookup, the_hash_algo->rawsz,
+			       result);
+	if (result)
+		*result += m->num_objects_in_base;
+	return ret;
+}
+
+int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m,
+		 uint32_t *result)
+{
+		return bsearch_one_midx(oid, m, result);
 }
 
 struct object_id *nth_midxed_object_oid(struct object_id *oid,
diff --git a/midx.h b/midx.h
index 020e49f77c..46c53d69ff 100644
--- a/midx.h
+++ b/midx.h
@@ -90,7 +90,10 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
 int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
 		       struct bitmapped_pack *bp, uint32_t pack_int_id);
-int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result);
+int bsearch_one_midx(const struct object_id *oid, struct multi_pack_index *m,
+		     uint32_t *result);
+int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m,
+		 uint32_t *result);
 off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos);
 uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos);
 struct object_id *nth_midxed_object_oid(struct object_id *oid,
diff --git a/object-name.c b/object-name.c
index 527b853ac4..e23b3e695a 100644
--- a/object-name.c
+++ b/object-name.c
@@ -134,28 +134,32 @@ static int match_hash(unsigned len, const unsigned char *a, const unsigned char
 static void unique_in_midx(struct multi_pack_index *m,
 			   struct disambiguate_state *ds)
 {
-	uint32_t num, i, first = 0;
-	const struct object_id *current = NULL;
-	int len = ds->len > ds->repo->hash_algo->hexsz ?
-		ds->repo->hash_algo->hexsz : ds->len;
-	num = m->num_objects;
+	for (; m; m = m->base_midx) {
+		uint32_t num, i, first = 0;
+		const struct object_id *current = NULL;
+		int len = ds->len > ds->repo->hash_algo->hexsz ?
+			ds->repo->hash_algo->hexsz : ds->len;
 
-	if (!num)
-		return;
+		num = m->num_objects + m->num_objects_in_base;
 
-	bsearch_midx(&ds->bin_pfx, m, &first);
+		if (!num)
+			continue;
 
-	/*
-	 * At this point, "first" is the location of the lowest object
-	 * with an object name that could match "bin_pfx".  See if we have
-	 * 0, 1 or more objects that actually match(es).
-	 */
-	for (i = first; i < num && !ds->ambiguous; i++) {
-		struct object_id oid;
-		current = nth_midxed_object_oid(&oid, m, i);
-		if (!match_hash(len, ds->bin_pfx.hash, current->hash))
-			break;
-		update_candidates(ds, current);
+		bsearch_one_midx(&ds->bin_pfx, m, &first);
+
+		/*
+		 * At this point, "first" is the location of the lowest
+		 * object with an object name that could match
+		 * "bin_pfx".  See if we have 0, 1 or more objects that
+		 * actually match(es).
+		 */
+		for (i = first; i < num && !ds->ambiguous; i++) {
+			struct object_id oid;
+			current = nth_midxed_object_oid(&oid, m, i);
+			if (!match_hash(len, ds->bin_pfx.hash, current->hash))
+				break;
+			update_candidates(ds, current);
+		}
 	}
 }
 
@@ -708,37 +712,40 @@ static int repo_extend_abbrev_len(struct repository *r UNUSED,
 static void find_abbrev_len_for_midx(struct multi_pack_index *m,
 				     struct min_abbrev_data *mad)
 {
-	int match = 0;
-	uint32_t num, first = 0;
-	struct object_id oid;
-	const struct object_id *mad_oid;
+	for (; m; m = m->base_midx) {
+		int match = 0;
+		uint32_t num, first = 0;
+		struct object_id oid;
+		const struct object_id *mad_oid;
 
-	if (!m->num_objects)
-		return;
+		if (!m->num_objects)
+			continue;
 
-	num = m->num_objects;
-	mad_oid = mad->oid;
-	match = bsearch_midx(mad_oid, m, &first);
+		num = m->num_objects + m->num_objects_in_base;
+		mad_oid = mad->oid;
+		match = bsearch_one_midx(mad_oid, m, &first);
 
-	/*
-	 * first is now the position in the packfile where we would insert
-	 * mad->hash if it does not exist (or the position of mad->hash if
-	 * it does exist). Hence, we consider a maximum of two objects
-	 * nearby for the abbreviation length.
-	 */
-	mad->init_len = 0;
-	if (!match) {
-		if (nth_midxed_object_oid(&oid, m, first))
-			extend_abbrev_len(&oid, mad);
-	} else if (first < num - 1) {
-		if (nth_midxed_object_oid(&oid, m, first + 1))
-			extend_abbrev_len(&oid, mad);
+		/*
+		 * first is now the position in the packfile where we
+		 * would insert mad->hash if it does not exist (or the
+		 * position of mad->hash if it does exist). Hence, we
+		 * consider a maximum of two objects nearby for the
+		 * abbreviation length.
+		 */
+		mad->init_len = 0;
+		if (!match) {
+			if (nth_midxed_object_oid(&oid, m, first))
+				extend_abbrev_len(&oid, mad);
+		} else if (first < num - 1) {
+			if (nth_midxed_object_oid(&oid, m, first + 1))
+				extend_abbrev_len(&oid, mad);
+		}
+		if (first > 0) {
+			if (nth_midxed_object_oid(&oid, m, first - 1))
+				extend_abbrev_len(&oid, mad);
+		}
+		mad->init_len = mad->cur_len;
 	}
-	if (first > 0) {
-		if (nth_midxed_object_oid(&oid, m, first - 1))
-			extend_abbrev_len(&oid, mad);
-	}
-	mad->init_len = mad->cur_len;
 }
 
 static void find_abbrev_len_for_pack(struct packed_git *p,
-- 
2.46.0.rc0.94.g9b2aff57b3


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v2 08/19] midx: teach `bsearch_midx()` about incremental MIDXs
  2024-07-17 21:11 ` [PATCH v2 " Taylor Blau
                     ` (6 preceding siblings ...)
  2024-07-17 21:12   ` [PATCH v2 07/19] midx: introduce `bsearch_one_midx()` Taylor Blau
@ 2024-07-17 21:12   ` Taylor Blau
  2024-08-01 10:07     ` Jeff King
  2024-07-17 21:12   ` [PATCH v2 09/19] midx: teach `nth_midxed_offset()` " Taylor Blau
                     ` (11 subsequent siblings)
  19 siblings, 1 reply; 102+ messages in thread
From: Taylor Blau @ 2024-07-17 21:12 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Now that the special cases callers of `bsearch_midx()` have been dealt
with, teach `bsearch_midx()` to handle incremental MIDX chains.

The incremental MIDX-aware version of `bsearch_midx()` works by
repeatedly searching for a given OID in each layer along the
`->base_midx` pointer, stopping either when an exact match is found, or
the end of the chain is reached.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index bb3fa43492..cd6e4afde4 100644
--- a/midx.c
+++ b/midx.c
@@ -343,7 +343,10 @@ int bsearch_one_midx(const struct object_id *oid, struct multi_pack_index *m,
 int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m,
 		 uint32_t *result)
 {
-		return bsearch_one_midx(oid, m, result);
+	for (; m; m = m->base_midx)
+		if (bsearch_one_midx(oid, m, result))
+			return 1;
+	return 0;
 }
 
 struct object_id *nth_midxed_object_oid(struct object_id *oid,
-- 
2.46.0.rc0.94.g9b2aff57b3


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v2 09/19] midx: teach `nth_midxed_offset()` about incremental MIDXs
  2024-07-17 21:11 ` [PATCH v2 " Taylor Blau
                     ` (7 preceding siblings ...)
  2024-07-17 21:12   ` [PATCH v2 08/19] midx: teach `bsearch_midx()` about incremental MIDXs Taylor Blau
@ 2024-07-17 21:12   ` Taylor Blau
  2024-08-01 10:08     ` Jeff King
  2024-07-17 21:12   ` [PATCH v2 10/19] midx: teach `fill_midx_entry()` " Taylor Blau
                     ` (10 subsequent siblings)
  19 siblings, 1 reply; 102+ messages in thread
From: Taylor Blau @ 2024-07-17 21:12 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

In a similar fashion as in previous commits, teach the function
`nth_midxed_offset()` about incremental MIDXs.

The given object `pos` is used to find the containing MIDX, and
translated back into a MIDX-local position by assigning the return value
of `midx_for_object()` to it.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/midx.c b/midx.c
index cd6e4afde4..f87df6bede 100644
--- a/midx.c
+++ b/midx.c
@@ -368,6 +368,8 @@ off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
 	const unsigned char *offset_data;
 	uint32_t offset32;
 
+	pos = midx_for_object(&m, pos);
+
 	offset_data = m->chunk_object_offsets + (off_t)pos * MIDX_CHUNK_OFFSET_WIDTH;
 	offset32 = get_be32(offset_data + sizeof(uint32_t));
 
-- 
2.46.0.rc0.94.g9b2aff57b3


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v2 10/19] midx: teach `fill_midx_entry()` about incremental MIDXs
  2024-07-17 21:11 ` [PATCH v2 " Taylor Blau
                     ` (8 preceding siblings ...)
  2024-07-17 21:12   ` [PATCH v2 09/19] midx: teach `nth_midxed_offset()` " Taylor Blau
@ 2024-07-17 21:12   ` Taylor Blau
  2024-08-01 10:12     ` Jeff King
  2024-07-17 21:12   ` [PATCH v2 11/19] midx: remove unused `midx_locate_pack()` Taylor Blau
                     ` (9 subsequent siblings)
  19 siblings, 1 reply; 102+ messages in thread
From: Taylor Blau @ 2024-07-17 21:12 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

In a similar fashion as previous commits, teach the `fill_midx_entry()`
function to work in a incremental MIDX-aware fashion.

This function, unlike others which accept an index into either the
lexical order of objects or packs, takes in an object_id, and attempts
to fill a caller-provided 'struct pack_entry' with the remaining pieces
of information about that object from the MIDX.

The function uses `bsearch_midx()` which fills out the frame-local 'pos'
variable, recording the given object_id's lexical position within the
MIDX chain, if found (if no matching object ID was found, we'll return
immediately without filling out the `pack_entry` structure).

Once given that position, we jump back through the `->base_midx` pointer
to ensure that our `m` points at the MIDX layer which contains the given
object_id (and not an ancestor or descendant of it in the chain). Note
that we can drop the bounds check "if (pos >= m->num_objects)" because
`midx_for_object()` performs this check for us.

After that point, we only need to make two special considerations within
this function:

  - First, the pack_int_id returned to us by `nth_midxed_pack_int_id()`
    is a position in the concatenated lexical order of packs, so we must
    ensure that we subtract `m->num_packs_in_base` before accessing the
    MIDX-local `packs` array.

  - Second, we must avoid translating the `pos` back to a MIDX-local
    index, since we use it as an argument to `nth_midxed_offset()` which
    expects a position relative to the concatenated lexical order of
    objects.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/midx.c b/midx.c
index f87df6bede..dea78474a2 100644
--- a/midx.c
+++ b/midx.c
@@ -406,14 +406,12 @@ int fill_midx_entry(struct repository *r,
 	if (!bsearch_midx(oid, m, &pos))
 		return 0;
 
-	if (pos >= m->num_objects)
-		return 0;
-
+	midx_for_object(&m, pos);
 	pack_int_id = nth_midxed_pack_int_id(m, pos);
 
 	if (prepare_midx_pack(r, m, pack_int_id))
 		return 0;
-	p = m->packs[pack_int_id];
+	p = m->packs[pack_int_id - m->num_packs_in_base];
 
 	/*
 	* We are about to tell the caller where they can locate the
-- 
2.46.0.rc0.94.g9b2aff57b3


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v2 11/19] midx: remove unused `midx_locate_pack()`
  2024-07-17 21:11 ` [PATCH v2 " Taylor Blau
                     ` (9 preceding siblings ...)
  2024-07-17 21:12   ` [PATCH v2 10/19] midx: teach `fill_midx_entry()` " Taylor Blau
@ 2024-07-17 21:12   ` Taylor Blau
  2024-08-01 10:14     ` Jeff King
  2024-07-17 21:12   ` [PATCH v2 12/19] midx: teach `midx_contains_pack()` about incremental MIDXs Taylor Blau
                     ` (8 subsequent siblings)
  19 siblings, 1 reply; 102+ messages in thread
From: Taylor Blau @ 2024-07-17 21:12 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Commit 307d75bbe6 (midx: implement `midx_locate_pack()`, 2023-12-14)
introduced `midx_locate_pack()`, which was described at the time as a
complement to the function `midx_contains_pack()` which allowed
callers to determine where in the MIDX lexical order a pack appeared, as
opposed to whether or not it was simply contained.

307d75bbe6 suggests that future patches would be added which would
introduce callers for this new function, but none ever were, meaning the
function has gone unused since its introduction.

Clean this up by in effect reverting 307d75bbe6, which removes the
unused functions and inlines its definition back into
`midx_contains_pack()`.

(Looking back through the list archives when 307d75bbe6 was written,
this was in preparation for this[1] patch from back when we had the
concept of "disjoint" packs while developing multi-pack verbatim reuse.
That concept was abandoned before the series was merged, but I never
dropped what would become 307d75bbe6 from the series, leading to the
state prior to this commit).

[1]: https://lore.kernel.org/git/3019738b52ba8cd78ea696a3b800fa91e722eb66.1701198172.git.me@ttaylorr.com/

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 13 ++-----------
 midx.h |  2 --
 2 files changed, 2 insertions(+), 13 deletions(-)

diff --git a/midx.c b/midx.c
index dea78474a2..59097808a9 100644
--- a/midx.c
+++ b/midx.c
@@ -465,8 +465,7 @@ int cmp_idx_or_pack_name(const char *idx_or_pack_name,
 	return strcmp(idx_or_pack_name, idx_name);
 }
 
-int midx_locate_pack(struct multi_pack_index *m, const char *idx_or_pack_name,
-		     uint32_t *pos)
+int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name)
 {
 	uint32_t first = 0, last = m->num_packs;
 
@@ -477,11 +476,8 @@ int midx_locate_pack(struct multi_pack_index *m, const char *idx_or_pack_name,
 
 		current = m->pack_names[mid];
 		cmp = cmp_idx_or_pack_name(idx_or_pack_name, current);
-		if (!cmp) {
-			if (pos)
-				*pos = mid;
+		if (!cmp)
 			return 1;
-		}
 		if (cmp > 0) {
 			first = mid + 1;
 			continue;
@@ -492,11 +488,6 @@ int midx_locate_pack(struct multi_pack_index *m, const char *idx_or_pack_name,
 	return 0;
 }
 
-int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name)
-{
-	return midx_locate_pack(m, idx_or_pack_name, NULL);
-}
-
 int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
 {
 	if (m->preferred_pack_idx == -1) {
diff --git a/midx.h b/midx.h
index 46c53d69ff..86af7dfc5e 100644
--- a/midx.h
+++ b/midx.h
@@ -102,8 +102,6 @@ struct object_id *nth_midxed_object_oid(struct object_id *oid,
 int fill_midx_entry(struct repository *r, const struct object_id *oid, struct pack_entry *e, struct multi_pack_index *m);
 int midx_contains_pack(struct multi_pack_index *m,
 		       const char *idx_or_pack_name);
-int midx_locate_pack(struct multi_pack_index *m, const char *idx_or_pack_name,
-		     uint32_t *pos);
 int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id);
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local);
 
-- 
2.46.0.rc0.94.g9b2aff57b3


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v2 12/19] midx: teach `midx_contains_pack()` about incremental MIDXs
  2024-07-17 21:11 ` [PATCH v2 " Taylor Blau
                     ` (10 preceding siblings ...)
  2024-07-17 21:12   ` [PATCH v2 11/19] midx: remove unused `midx_locate_pack()` Taylor Blau
@ 2024-07-17 21:12   ` Taylor Blau
  2024-08-01 10:17     ` Jeff King
  2024-07-17 21:12   ` [PATCH v2 13/19] midx: teach `midx_preferred_pack()` " Taylor Blau
                     ` (7 subsequent siblings)
  19 siblings, 1 reply; 102+ messages in thread
From: Taylor Blau @ 2024-07-17 21:12 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Now that the `midx_contains_pack()` versus `midx_locate_pack()` debacle
has been cleaned up, teach the former about how to operate in an
incremental MIDX-aware world in a similar fashion as in previous
commits.

Instead of using either of the two `midx_for_object()` or
`midx_for_pack()` helpers, this function is split into two: one that
determines whether a pack is contained in a single MIDX, and another
which calls the former in a loop over all MIDXs.

This approach does not require that we change any of the implementation
in what is now `midx_contains_pack_1()` as it still operates over a
single MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 59097808a9..0fa8febb9d 100644
--- a/midx.c
+++ b/midx.c
@@ -465,7 +465,8 @@ int cmp_idx_or_pack_name(const char *idx_or_pack_name,
 	return strcmp(idx_or_pack_name, idx_name);
 }
 
-int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name)
+static int midx_contains_pack_1(struct multi_pack_index *m,
+				const char *idx_or_pack_name)
 {
 	uint32_t first = 0, last = m->num_packs;
 
@@ -488,6 +489,14 @@ int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name)
 	return 0;
 }
 
+int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name)
+{
+	for (; m; m = m->base_midx)
+		if (midx_contains_pack_1(m, idx_or_pack_name))
+			return 1;
+	return 0;
+}
+
 int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
 {
 	if (m->preferred_pack_idx == -1) {
-- 
2.46.0.rc0.94.g9b2aff57b3


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v2 13/19] midx: teach `midx_preferred_pack()` about incremental MIDXs
  2024-07-17 21:11 ` [PATCH v2 " Taylor Blau
                     ` (11 preceding siblings ...)
  2024-07-17 21:12   ` [PATCH v2 12/19] midx: teach `midx_contains_pack()` about incremental MIDXs Taylor Blau
@ 2024-07-17 21:12   ` Taylor Blau
  2024-08-01 10:25     ` Jeff King
  2024-07-17 21:12   ` [PATCH v2 14/19] midx: teach `midx_fanout_add_midx_fanout()` " Taylor Blau
                     ` (6 subsequent siblings)
  19 siblings, 1 reply; 102+ messages in thread
From: Taylor Blau @ 2024-07-17 21:12 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The function `midx_preferred_pack()` is used to determine the identity
of the preferred pack, which is the identity of a unique pack within
the MIDX which is used as a tie-breaker when selecting from which pack
to represent an object that appears in multiple packs within the MIDX.

Historically we have said that the MIDX's preferred pack has the unique
property that all objects from that pack are represented in the MIDX.
But that isn't quite true: a more precise statement would be that all
objects from that pack *which appear in the MIDX* are selected from that
pack.

This helps us extend the concept of preferred packs across a MIDX chain,
where some object(s) in the preferred pack may appear in other packs
in an earlier MIDX layer, in which case those object(s) will not appear
in a subsequent MIDX layer from either the preferred pack or any other
pack.

Extend the concept of preferred packs by using the pack which represents
the object at the first position in MIDX pseudo-pack order belonging to
the current MIDX layer (i.e., at position 'm->num_objects_in_base').

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index 0fa8febb9d..d2dbea41e4 100644
--- a/midx.c
+++ b/midx.c
@@ -500,13 +500,16 @@ int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name)
 int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
 {
 	if (m->preferred_pack_idx == -1) {
+		uint32_t midx_pos;
 		if (load_midx_revindex(m) < 0) {
 			m->preferred_pack_idx = -2;
 			return -1;
 		}
 
-		m->preferred_pack_idx =
-			nth_midxed_pack_int_id(m, pack_pos_to_midx(m, 0));
+		midx_pos = pack_pos_to_midx(m, m->num_objects_in_base);
+
+		m->preferred_pack_idx = nth_midxed_pack_int_id(m, midx_pos);
+
 	} else if (m->preferred_pack_idx == -2)
 		return -1; /* no revindex */
 
-- 
2.46.0.rc0.94.g9b2aff57b3


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v2 14/19] midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs
  2024-07-17 21:11 ` [PATCH v2 " Taylor Blau
                     ` (12 preceding siblings ...)
  2024-07-17 21:12   ` [PATCH v2 13/19] midx: teach `midx_preferred_pack()` " Taylor Blau
@ 2024-07-17 21:12   ` Taylor Blau
  2024-08-01 10:29     ` Jeff King
  2024-07-17 21:12   ` [PATCH v2 15/19] midx: support reading incremental MIDX chains Taylor Blau
                     ` (5 subsequent siblings)
  19 siblings, 1 reply; 102+ messages in thread
From: Taylor Blau @ 2024-07-17 21:12 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The function `midx_fanout_add_midx_fanout()` is used to help construct
the fanout table when generating a MIDX by reusing data from an existing
MIDX.

Prepare this function to work with incremental MIDXs by making a few
changes:

  - The bounds checks need to be adjusted to start object lookups taking
    into account the number of objects in the previous MIDX layer (i.e.,
    by starting the lookups at position `m->num_objects_in_base` instead
    of position 0).

  - Likewise, the bounds checks need to end at `m->num_objects_in_base`
    objects after `m->num_objects`.

  - Finally, `midx_fanout_add_midx_fanout()` needs to recur on earlier
    MIDX layers when dealing with an incremental MIDX chain by calling
    itself when given a MIDX with a non-NULL `base_midx`.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 478b42e720..d5275d719b 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -196,7 +196,7 @@ static int nth_midxed_pack_midx_entry(struct multi_pack_index *m,
 				      struct pack_midx_entry *e,
 				      uint32_t pos)
 {
-	if (pos >= m->num_objects)
+	if (pos >= m->num_objects + m->num_objects_in_base)
 		return 1;
 
 	nth_midxed_object_oid(&e->oid, m, pos);
@@ -247,12 +247,16 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
 					uint32_t cur_fanout,
 					int preferred_pack)
 {
-	uint32_t start = 0, end;
+	uint32_t start = m->num_objects_in_base, end;
 	uint32_t cur_object;
 
+	if (m->base_midx)
+		midx_fanout_add_midx_fanout(fanout, m->base_midx, cur_fanout,
+					    preferred_pack);
+
 	if (cur_fanout)
-		start = ntohl(m->chunk_oid_fanout[cur_fanout - 1]);
-	end = ntohl(m->chunk_oid_fanout[cur_fanout]);
+		start += ntohl(m->chunk_oid_fanout[cur_fanout - 1]);
+	end = m->num_objects_in_base + ntohl(m->chunk_oid_fanout[cur_fanout]);
 
 	for (cur_object = start; cur_object < end; cur_object++) {
 		if ((preferred_pack > -1) &&
-- 
2.46.0.rc0.94.g9b2aff57b3


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v2 15/19] midx: support reading incremental MIDX chains
  2024-07-17 21:11 ` [PATCH v2 " Taylor Blau
                     ` (13 preceding siblings ...)
  2024-07-17 21:12   ` [PATCH v2 14/19] midx: teach `midx_fanout_add_midx_fanout()` " Taylor Blau
@ 2024-07-17 21:12   ` Taylor Blau
  2024-08-01 10:40     ` Jeff King
  2024-07-17 21:12   ` [PATCH v2 16/19] midx: implement verification support for incremental MIDXs Taylor Blau
                     ` (4 subsequent siblings)
  19 siblings, 1 reply; 102+ messages in thread
From: Taylor Blau @ 2024-07-17 21:12 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Now that the MIDX machinery's internals have been taught to understand
incremental MIDXs over the previous handful of commits, the MIDX
machinery itself can begin reading incremental MIDXs.

(Note that while the on-disk format for incremental MIDXs has been
defined, the writing end has not been implemented. This will take place
in the commit after next.)

The core of this change involves following the order specified in the
MIDX chain and opening up MIDXs in the chain one-by-one, adding them to
the previous layer's `->base_midx` pointer at each step.

In order to implement this, the `load_multi_pack_index()` function is
taught to call a new `load_multi_pack_index_chain()` function if loading
a non-incremental MIDX failed via `load_multi_pack_index_one()`.

When loading a MIDX chain, `load_midx_chain_fd_st()` reads each line in
the file one-by-one and dispatches calls to
`load_multi_pack_index_one()` to read each layer of the MIDX chain. When
a layer was successfully read, it is added to the MIDX chain by calling
`add_midx_to_chain()` which validates the contents of the `BASE` chunk,
performs some bounds checks on the number of combined packs and objects,
and attaches the new MIDX by assigning its `base_midx` pointer to the
existing part of the chain.

As a supplement to this, introduce a new mode in the test-read-midx
test-tool which allows us to read the information for a specific MIDX in
the chain by specifying its trailing checksum via the command-line
arguments like so:

    $ test-tool read-midx .git/objects [checksum]

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c                    | 184 +++++++++++++++++++++++++++++++++++---
 midx.h                    |   7 ++
 packfile.c                |   5 +-
 t/helper/test-read-midx.c |  24 +++--
 4 files changed, 201 insertions(+), 19 deletions(-)

diff --git a/midx.c b/midx.c
index d2dbea41e4..0bfd17c021 100644
--- a/midx.c
+++ b/midx.c
@@ -91,7 +91,9 @@ static int midx_read_object_offsets(const unsigned char *chunk_start,
 
 #define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + the_hash_algo->rawsz)
 
-struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local)
+static struct multi_pack_index *load_multi_pack_index_one(const char *object_dir,
+							  const char *midx_name,
+							  int local)
 {
 	struct multi_pack_index *m = NULL;
 	int fd;
@@ -99,31 +101,26 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 	size_t midx_size;
 	void *midx_map = NULL;
 	uint32_t hash_version;
-	struct strbuf midx_name = STRBUF_INIT;
 	uint32_t i;
 	const char *cur_pack_name;
 	struct chunkfile *cf = NULL;
 
-	get_midx_filename(&midx_name, object_dir);
-
-	fd = git_open(midx_name.buf);
+	fd = git_open(midx_name);
 
 	if (fd < 0)
 		goto cleanup_fail;
 	if (fstat(fd, &st)) {
-		error_errno(_("failed to read %s"), midx_name.buf);
+		error_errno(_("failed to read %s"), midx_name);
 		goto cleanup_fail;
 	}
 
 	midx_size = xsize_t(st.st_size);
 
 	if (midx_size < MIDX_MIN_SIZE) {
-		error(_("multi-pack-index file %s is too small"), midx_name.buf);
+		error(_("multi-pack-index file %s is too small"), midx_name);
 		goto cleanup_fail;
 	}
 
-	strbuf_release(&midx_name);
-
 	midx_map = xmmap(NULL, midx_size, PROT_READ, MAP_PRIVATE, fd, 0);
 	close(fd);
 
@@ -213,7 +210,6 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 
 cleanup_fail:
 	free(m);
-	strbuf_release(&midx_name);
 	free_chunkfile(cf);
 	if (midx_map)
 		munmap(midx_map, midx_size);
@@ -222,6 +218,173 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 	return NULL;
 }
 
+void get_midx_chain_dirname(struct strbuf *buf, const char *object_dir)
+{
+	strbuf_addf(buf, "%s/pack/multi-pack-index.d", object_dir);
+}
+
+void get_midx_chain_filename(struct strbuf *buf, const char *object_dir)
+{
+	get_midx_chain_dirname(buf, object_dir);
+	strbuf_addstr(buf, "/multi-pack-index-chain");
+}
+
+void get_split_midx_filename_ext(struct strbuf *buf, const char *object_dir,
+				 const unsigned char *hash, const char *ext)
+{
+	get_midx_chain_dirname(buf, object_dir);
+	strbuf_addf(buf, "/multi-pack-index-%s.%s", hash_to_hex(hash), ext);
+}
+
+static int open_multi_pack_index_chain(const char *chain_file,
+				       int *fd, struct stat *st)
+{
+	*fd = git_open(chain_file);
+	if (*fd < 0)
+		return 0;
+	if (fstat(*fd, st)) {
+		close(*fd);
+		return 0;
+	}
+	if (st->st_size < the_hash_algo->hexsz) {
+		close(*fd);
+		if (!st->st_size) {
+			/* treat empty files the same as missing */
+			errno = ENOENT;
+		} else {
+			warning(_("multi-pack-index chain file too small"));
+			errno = EINVAL;
+		}
+		return 0;
+	}
+	return 1;
+}
+
+static int add_midx_to_chain(struct multi_pack_index *midx,
+			     struct multi_pack_index *midx_chain,
+			     struct object_id *oids,
+			     int n)
+{
+	if (midx_chain) {
+		if (unsigned_add_overflows(midx_chain->num_packs,
+					   midx_chain->num_packs_in_base)) {
+			warning(_("pack count in base MIDX too high: %"PRIuMAX),
+				(uintmax_t)midx_chain->num_packs_in_base);
+			return 0;
+		}
+		if (unsigned_add_overflows(midx_chain->num_objects,
+					   midx_chain->num_objects_in_base)) {
+			warning(_("object count in base MIDX too high: %"PRIuMAX),
+				(uintmax_t)midx_chain->num_objects_in_base);
+			return 0;
+		}
+		midx->num_packs_in_base = midx_chain->num_packs +
+			midx_chain->num_packs_in_base;
+		midx->num_objects_in_base = midx_chain->num_objects +
+			midx_chain->num_objects_in_base;
+	}
+
+	midx->base_midx = midx_chain;
+	midx->has_chain = 1;
+
+	return 1;
+}
+
+static struct multi_pack_index *load_midx_chain_fd_st(const char *object_dir,
+						      int local,
+						      int fd, struct stat *st,
+						      int *incomplete_chain)
+{
+	struct multi_pack_index *midx_chain = NULL;
+	struct strbuf buf = STRBUF_INIT;
+	struct object_id *layers = NULL;
+	int valid = 1;
+	uint32_t i, count;
+	FILE *fp = xfdopen(fd, "r");
+
+	count = st->st_size / (the_hash_algo->hexsz + 1);
+	CALLOC_ARRAY(layers, count);
+
+	for (i = 0; i < count; i++) {
+		struct multi_pack_index *m;
+
+		if (strbuf_getline_lf(&buf, fp) == EOF)
+			break;
+
+		if (get_oid_hex(buf.buf, &layers[i])) {
+			warning(_("invalid multi-pack-index chain: line '%s' "
+				  "not a hash"),
+				buf.buf);
+			valid = 0;
+			break;
+		}
+
+		valid = 0;
+
+		strbuf_reset(&buf);
+		get_split_midx_filename_ext(&buf, object_dir, layers[i].hash,
+					    MIDX_EXT_MIDX);
+		m = load_multi_pack_index_one(object_dir, buf.buf, local);
+
+		if (m) {
+			if (add_midx_to_chain(m, midx_chain, layers, i)) {
+				midx_chain = m;
+				valid = 1;
+			} else {
+				close_midx(m);
+			}
+		}
+		if (!valid) {
+			warning(_("unable to find all multi-pack index files"));
+			break;
+		}
+	}
+
+	free(layers);
+	fclose(fp);
+	strbuf_release(&buf);
+
+	*incomplete_chain = !valid;
+	return midx_chain;
+}
+
+static struct multi_pack_index *load_multi_pack_index_chain(const char *object_dir,
+							    int local)
+{
+	struct strbuf chain_file = STRBUF_INIT;
+	struct stat st;
+	int fd;
+	struct multi_pack_index *m = NULL;
+
+	get_midx_chain_filename(&chain_file, object_dir);
+	if (open_multi_pack_index_chain(chain_file.buf, &fd, &st)) {
+		int incomplete;
+		/* ownership of fd is taken over by load function */
+		m = load_midx_chain_fd_st(object_dir, local, fd, &st,
+					  &incomplete);
+	}
+
+	strbuf_release(&chain_file);
+	return m;
+}
+
+struct multi_pack_index *load_multi_pack_index(const char *object_dir,
+					       int local)
+{
+	struct strbuf midx_name = STRBUF_INIT;
+	struct multi_pack_index *m;
+
+	get_midx_filename(&midx_name, object_dir);
+
+	m = load_multi_pack_index_one(object_dir, midx_name.buf, local);
+	if (!m)
+		m = load_multi_pack_index_chain(object_dir, local);
+
+	strbuf_release(&midx_name);
+
+	return m;
+}
+
 void close_midx(struct multi_pack_index *m)
 {
 	uint32_t i;
@@ -230,6 +393,7 @@ void close_midx(struct multi_pack_index *m)
 		return;
 
 	close_midx(m->next);
+	close_midx(m->base_midx);
 
 	munmap((unsigned char *)m->data, m->data_len);
 
diff --git a/midx.h b/midx.h
index 86af7dfc5e..94de16a8c4 100644
--- a/midx.h
+++ b/midx.h
@@ -24,6 +24,7 @@ struct bitmapped_pack;
 #define MIDX_CHUNKID_OBJECTOFFSETS 0x4f4f4646 /* "OOFF" */
 #define MIDX_CHUNKID_LARGEOFFSETS 0x4c4f4646 /* "LOFF" */
 #define MIDX_CHUNKID_REVINDEX 0x52494458 /* "RIDX" */
+#define MIDX_CHUNKID_BASE 0x42415345 /* "BASE" */
 #define MIDX_CHUNK_OFFSET_WIDTH (2 * sizeof(uint32_t))
 #define MIDX_LARGE_OFFSET_NEEDED 0x80000000
 
@@ -50,6 +51,7 @@ struct multi_pack_index {
 	int preferred_pack_idx;
 
 	int local;
+	int has_chain;
 
 	const unsigned char *chunk_pack_names;
 	size_t chunk_pack_names_len;
@@ -80,11 +82,16 @@ struct multi_pack_index {
 
 #define MIDX_EXT_REV "rev"
 #define MIDX_EXT_BITMAP "bitmap"
+#define MIDX_EXT_MIDX "midx"
 
 const unsigned char *get_midx_checksum(struct multi_pack_index *m);
 void get_midx_filename(struct strbuf *out, const char *object_dir);
 void get_midx_filename_ext(struct strbuf *out, const char *object_dir,
 			   const unsigned char *hash, const char *ext);
+void get_midx_chain_dirname(struct strbuf *buf, const char *object_dir);
+void get_midx_chain_filename(struct strbuf *buf, const char *object_dir);
+void get_split_midx_filename_ext(struct strbuf *buf, const char *object_dir,
+				 const unsigned char *hash, const char *ext);
 
 struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
 int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
diff --git a/packfile.c b/packfile.c
index 813584646f..1eb18e3041 100644
--- a/packfile.c
+++ b/packfile.c
@@ -880,7 +880,8 @@ static void prepare_pack(const char *full_name, size_t full_name_len,
 	if (!report_garbage)
 		return;
 
-	if (!strcmp(file_name, "multi-pack-index"))
+	if (!strcmp(file_name, "multi-pack-index") ||
+	    !strcmp(file_name, "multi-pack-index.d"))
 		return;
 	if (starts_with(file_name, "multi-pack-index") &&
 	    (ends_with(file_name, ".bitmap") || ends_with(file_name, ".rev")))
@@ -1064,7 +1065,7 @@ struct packed_git *get_all_packs(struct repository *r)
 	prepare_packed_git(r);
 	for (m = r->objects->multi_pack_index; m; m = m->next) {
 		uint32_t i;
-		for (i = 0; i < m->num_packs; i++)
+		for (i = 0; i < m->num_packs + m->num_packs_in_base; i++)
 			prepare_midx_pack(r, m, i);
 	}
 
diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index 83effc2b5f..69757e94fc 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -9,8 +9,10 @@
 #include "packfile.h"
 #include "setup.h"
 #include "gettext.h"
+#include "pack-revindex.h"
 
-static int read_midx_file(const char *object_dir, int show_objects)
+static int read_midx_file(const char *object_dir, const char *checksum,
+			  int show_objects)
 {
 	uint32_t i;
 	struct multi_pack_index *m;
@@ -21,6 +23,13 @@ static int read_midx_file(const char *object_dir, int show_objects)
 	if (!m)
 		return 1;
 
+	if (checksum) {
+		while (m && strcmp(hash_to_hex(get_midx_checksum(m)), checksum))
+			m = m->base_midx;
+		if (!m)
+			return 1;
+	}
+
 	printf("header: %08x %d %d %d %d\n",
 	       m->signature,
 	       m->version,
@@ -54,7 +63,8 @@ static int read_midx_file(const char *object_dir, int show_objects)
 		struct pack_entry e;
 
 		for (i = 0; i < m->num_objects; i++) {
-			nth_midxed_object_oid(&oid, m, i);
+			nth_midxed_object_oid(&oid, m,
+					      i + m->num_objects_in_base);
 			fill_midx_entry(the_repository, &oid, &e, m);
 
 			printf("%s %"PRIu64"\t%s\n",
@@ -111,7 +121,7 @@ static int read_midx_bitmapped_packs(const char *object_dir)
 	if (!midx)
 		return 1;
 
-	for (i = 0; i < midx->num_packs; i++) {
+	for (i = 0; i < midx->num_packs + midx->num_packs_in_base; i++) {
 		if (nth_bitmapped_pack(the_repository, midx, &pack, i) < 0)
 			return 1;
 
@@ -127,16 +137,16 @@ static int read_midx_bitmapped_packs(const char *object_dir)
 
 int cmd__read_midx(int argc, const char **argv)
 {
-	if (!(argc == 2 || argc == 3))
-		usage("read-midx [--show-objects|--checksum|--preferred-pack|--bitmap] <object-dir>");
+	if (!(argc == 2 || argc == 3 || argc == 4))
+		usage("read-midx [--show-objects|--checksum|--preferred-pack|--bitmap] <object-dir> <checksum>");
 
 	if (!strcmp(argv[1], "--show-objects"))
-		return read_midx_file(argv[2], 1);
+		return read_midx_file(argv[2], argv[3], 1);
 	else if (!strcmp(argv[1], "--checksum"))
 		return read_midx_checksum(argv[2]);
 	else if (!strcmp(argv[1], "--preferred-pack"))
 		return read_midx_preferred_pack(argv[2]);
 	else if (!strcmp(argv[1], "--bitmap"))
 		return read_midx_bitmapped_packs(argv[2]);
-	return read_midx_file(argv[1], 0);
+	return read_midx_file(argv[1], argv[2], 0);
 }
-- 
2.46.0.rc0.94.g9b2aff57b3


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v2 16/19] midx: implement verification support for incremental MIDXs
  2024-07-17 21:11 ` [PATCH v2 " Taylor Blau
                     ` (14 preceding siblings ...)
  2024-07-17 21:12   ` [PATCH v2 15/19] midx: support reading incremental MIDX chains Taylor Blau
@ 2024-07-17 21:12   ` Taylor Blau
  2024-08-01 10:41     ` Jeff King
  2024-07-17 21:12   ` [PATCH v2 17/19] t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
                     ` (3 subsequent siblings)
  19 siblings, 1 reply; 102+ messages in thread
From: Taylor Blau @ 2024-07-17 21:12 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Teach the verification implementation used by `git multi-pack-index
verify` to perform verification for incremental MIDX chains by
independently validating each layer within the chain.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 47 ++++++++++++++++++++++++++++++-----------------
 midx.h |  2 ++
 2 files changed, 32 insertions(+), 17 deletions(-)

diff --git a/midx.c b/midx.c
index 0bfd17c021..21a9dbe23a 100644
--- a/midx.c
+++ b/midx.c
@@ -469,6 +469,13 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
 	return 0;
 }
 
+struct packed_git *nth_midxed_pack(struct multi_pack_index *m,
+				   uint32_t pack_int_id)
+{
+	uint32_t local_pack_int_id = midx_for_pack(&m, pack_int_id);
+	return m->packs[local_pack_int_id];
+}
+
 #define MIDX_CHUNK_BITMAPPED_PACKS_WIDTH (2 * sizeof(uint32_t))
 
 int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
@@ -817,6 +824,7 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
 	uint32_t i;
 	struct progress *progress = NULL;
 	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
+	struct multi_pack_index *curr;
 	verify_midx_error = 0;
 
 	if (!m) {
@@ -839,8 +847,8 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
 
 	if (flags & MIDX_PROGRESS)
 		progress = start_delayed_progress(_("Looking for referenced packfiles"),
-					  m->num_packs);
-	for (i = 0; i < m->num_packs; i++) {
+						  m->num_packs + m->num_packs_in_base);
+	for (i = 0; i < m->num_packs + m->num_packs_in_base; i++) {
 		if (prepare_midx_pack(r, m, i))
 			midx_report("failed to load pack in position %d", i);
 
@@ -860,17 +868,20 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
 	if (flags & MIDX_PROGRESS)
 		progress = start_sparse_progress(_("Verifying OID order in multi-pack-index"),
 						 m->num_objects - 1);
-	for (i = 0; i < m->num_objects - 1; i++) {
-		struct object_id oid1, oid2;
 
-		nth_midxed_object_oid(&oid1, m, i);
-		nth_midxed_object_oid(&oid2, m, i + 1);
+	for (curr = m; curr; curr = curr->base_midx) {
+		for (i = 0; i < m->num_objects - 1; i++) {
+			struct object_id oid1, oid2;
 
-		if (oidcmp(&oid1, &oid2) >= 0)
-			midx_report(_("oid lookup out of order: oid[%d] = %s >= %s = oid[%d]"),
-				    i, oid_to_hex(&oid1), oid_to_hex(&oid2), i + 1);
+			nth_midxed_object_oid(&oid1, m, m->num_objects_in_base + i);
+			nth_midxed_object_oid(&oid2, m, m->num_objects_in_base + i + 1);
 
-		midx_display_sparse_progress(progress, i + 1);
+			if (oidcmp(&oid1, &oid2) >= 0)
+				midx_report(_("oid lookup out of order: oid[%d] = %s >= %s = oid[%d]"),
+					    i, oid_to_hex(&oid1), oid_to_hex(&oid2), i + 1);
+
+			midx_display_sparse_progress(progress, i + 1);
+		}
 	}
 	stop_progress(&progress);
 
@@ -880,8 +891,8 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
 	 * each of the objects and only require 1 packfile to be open at a
 	 * time.
 	 */
-	ALLOC_ARRAY(pairs, m->num_objects);
-	for (i = 0; i < m->num_objects; i++) {
+	ALLOC_ARRAY(pairs, m->num_objects + m->num_objects_in_base);
+	for (i = 0; i < m->num_objects + m->num_objects_in_base; i++) {
 		pairs[i].pos = i;
 		pairs[i].pack_int_id = nth_midxed_pack_int_id(m, i);
 	}
@@ -895,16 +906,18 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
 
 	if (flags & MIDX_PROGRESS)
 		progress = start_sparse_progress(_("Verifying object offsets"), m->num_objects);
-	for (i = 0; i < m->num_objects; i++) {
+	for (i = 0; i < m->num_objects + m->num_objects_in_base; i++) {
 		struct object_id oid;
 		struct pack_entry e;
 		off_t m_offset, p_offset;
 
 		if (i > 0 && pairs[i-1].pack_int_id != pairs[i].pack_int_id &&
-		    m->packs[pairs[i-1].pack_int_id])
-		{
-			close_pack_fd(m->packs[pairs[i-1].pack_int_id]);
-			close_pack_index(m->packs[pairs[i-1].pack_int_id]);
+		    nth_midxed_pack(m, pairs[i-1].pack_int_id)) {
+			uint32_t pack_int_id = pairs[i-1].pack_int_id;
+			struct packed_git *p = nth_midxed_pack(m, pack_int_id);
+
+			close_pack_fd(p);
+			close_pack_index(p);
 		}
 
 		nth_midxed_object_oid(&oid, m, pairs[i].pos);
diff --git a/midx.h b/midx.h
index 94de16a8c4..9d30935589 100644
--- a/midx.h
+++ b/midx.h
@@ -95,6 +95,8 @@ void get_split_midx_filename_ext(struct strbuf *buf, const char *object_dir,
 
 struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
 int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
+struct packed_git *nth_midxed_pack(struct multi_pack_index *m,
+				   uint32_t pack_int_id);
 int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
 		       struct bitmapped_pack *bp, uint32_t pack_int_id);
 int bsearch_one_midx(const struct object_id *oid, struct multi_pack_index *m,
-- 
2.46.0.rc0.94.g9b2aff57b3


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v2 17/19] t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  2024-07-17 21:11 ` [PATCH v2 " Taylor Blau
                     ` (15 preceding siblings ...)
  2024-07-17 21:12   ` [PATCH v2 16/19] midx: implement verification support for incremental MIDXs Taylor Blau
@ 2024-07-17 21:12   ` Taylor Blau
  2024-08-01 10:46     ` Jeff King
  2024-07-17 21:12   ` [PATCH v2 18/19] t/t5313-pack-bounds-checks.sh: prepare for sub-directories Taylor Blau
                     ` (2 subsequent siblings)
  19 siblings, 1 reply; 102+ messages in thread
From: Taylor Blau @ 2024-07-17 21:12 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Two years ago, commit ff1e653c8e2 (midx: respect
'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP', 2021-08-31) introduced a new
environment variable which caused the test suite to write MIDX bitmaps
after any 'git repack' invocation.

At the time, this was done to help flush out any bugs with MIDX bitmaps
that weren't explicitly covered in the t5326-multi-pack-bitmap.sh
script.

Two years later, that flag has served us well and is no longer providing
meaningful coverage, as the script in t5326 has matured substantially
and covers many more interesting cases than it did back when ff1e653c8e2
was originally written.

Remove the 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' environment variable
as it is no longer serving a useful purpose. More importantly, removing
this variable clears the way for us to introduce a new one to help
similarly flush out bugs related to incremental MIDX chains.

Because these incremental MIDX chains are (for now) incompatible with
MIDX bitmaps, we cannot have both.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/repack.c                  | 12 ++----------
 ci/run-build-and-tests.sh         |  1 -
 midx.h                            |  2 --
 t/README                          |  4 ----
 t/t0410-partial-clone.sh          |  2 --
 t/t5310-pack-bitmaps.sh           |  4 ----
 t/t5319-multi-pack-index.sh       |  3 +--
 t/t5326-multi-pack-bitmaps.sh     |  3 +--
 t/t5327-multi-pack-bitmaps-rev.sh |  5 ++---
 t/t7700-repack.sh                 | 21 +++++++--------------
 10 files changed, 13 insertions(+), 44 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index f0317fa94a..8499bf0e12 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -1217,10 +1217,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		if (!write_midx &&
 		    (!(pack_everything & ALL_INTO_ONE) || !is_bare_repository()))
 			write_bitmaps = 0;
-	} else if (write_bitmaps &&
-		   git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0) &&
-		   git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP, 0)) {
-		write_bitmaps = 0;
 	}
 	if (pack_kept_objects < 0)
 		pack_kept_objects = write_bitmaps > 0 && !write_midx;
@@ -1518,12 +1514,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	if (run_update_server_info)
 		update_server_info(0);
 
-	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0)) {
-		unsigned flags = 0;
-		if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP, 0))
-			flags |= MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX;
-		write_midx_file(get_object_directory(), NULL, NULL, flags);
-	}
+	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
+		write_midx_file(get_object_directory(), NULL, NULL, 0);
 
 cleanup:
 	string_list_clear(&names, 1);
diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index 98dda42045..e6fd68630c 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -25,7 +25,6 @@ linux-TEST-vars)
 	export GIT_TEST_COMMIT_GRAPH=1
 	export GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=1
 	export GIT_TEST_MULTI_PACK_INDEX=1
-	export GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=1
 	export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master
 	export GIT_TEST_NO_WRITE_REV_INDEX=1
 	export GIT_TEST_CHECKOUT_WORKERS=2
diff --git a/midx.h b/midx.h
index 9d30935589..3714cad2cc 100644
--- a/midx.h
+++ b/midx.h
@@ -29,8 +29,6 @@ struct bitmapped_pack;
 #define MIDX_LARGE_OFFSET_NEEDED 0x80000000
 
 #define GIT_TEST_MULTI_PACK_INDEX "GIT_TEST_MULTI_PACK_INDEX"
-#define GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP \
-	"GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP"
 
 struct multi_pack_index {
 	struct multi_pack_index *next;
diff --git a/t/README b/t/README
index d9e0e07506..e8a11926e4 100644
--- a/t/README
+++ b/t/README
@@ -469,10 +469,6 @@ GIT_TEST_MULTI_PACK_INDEX=<boolean>, when true, forces the multi-pack-
 index to be written after every 'git repack' command, and overrides the
 'core.multiPackIndex' setting to true.
 
-GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=<boolean>, when true, sets the
-'--bitmap' option on all invocations of 'git multi-pack-index write',
-and ignores pack-objects' '--write-bitmap-index'.
-
 GIT_TEST_SIDEBAND_ALL=<boolean>, when true, overrides the
 'uploadpack.allowSidebandAll' setting to true, and when false, forces
 fetch-pack to not request sideband-all (even if the server advertises
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 2c30c86e7b..34bdb3ab1f 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -5,8 +5,6 @@ test_description='partial clone'
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-terminal.sh
 
-# missing promisor objects cause repacks which write bitmaps to fail
-GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
 # When enabled, some commands will write commit-graphs. This causes fsck
 # to fail when delete_object() is called because fsck will attempt to
 # verify the out-of-sync commit graph.
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index d7fd71360e..a6de7c5764 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -5,10 +5,6 @@ test_description='exercise basic bitmap functionality'
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-bitmap.sh
 
-# t5310 deals only with single-pack bitmaps, so don't write MIDX bitmaps in
-# their place.
-GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
-
 # Likewise, allow individual tests to control whether or not they use
 # the boundary-based traversal.
 sane_unset GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 10d2a6bf92..6e9ee23398 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -600,8 +600,7 @@ test_expect_success 'repack preserves multi-pack-index when creating packs' '
 compare_results_with_midx "after repack"
 
 test_expect_success 'multi-pack-index and pack-bitmap' '
-	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
-		git -c repack.writeBitmaps=true repack -ad &&
+	git -c repack.writeBitmaps=true repack -ad &&
 	git multi-pack-index write &&
 	git rev-list --test-bitmap HEAD
 '
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 916da389b6..1cb3e3ff08 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -4,10 +4,9 @@ test_description='exercise basic multi-pack bitmap functionality'
 . ./test-lib.sh
 . "${TEST_DIRECTORY}/lib-bitmap.sh"
 
-# We'll be writing our own midx and bitmaps, so avoid getting confused by the
+# We'll be writing our own MIDX, so avoid getting confused by the
 # automatic ones.
 GIT_TEST_MULTI_PACK_INDEX=0
-GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
 
 # This test exercise multi-pack bitmap functionality where the object order is
 # stored and read from a special chunk within the MIDX, so use the default
diff --git a/t/t5327-multi-pack-bitmaps-rev.sh b/t/t5327-multi-pack-bitmaps-rev.sh
index e65e311cd7..23db949c20 100755
--- a/t/t5327-multi-pack-bitmaps-rev.sh
+++ b/t/t5327-multi-pack-bitmaps-rev.sh
@@ -5,10 +5,9 @@ test_description='exercise basic multi-pack bitmap functionality (.rev files)'
 . ./test-lib.sh
 . "${TEST_DIRECTORY}/lib-bitmap.sh"
 
-# We'll be writing our own midx and bitmaps, so avoid getting confused by the
-# automatic ones.
+# We'll be writing our own MIDX, so avoid getting confused by the automatic
+# ones.
 GIT_TEST_MULTI_PACK_INDEX=0
-GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
 
 # Unlike t5326, this test exercise multi-pack bitmap functionality where the
 # object order is stored in a separate .rev file.
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 127efe99f8..8f34f05087 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -70,14 +70,13 @@ test_expect_success 'objects in packs marked .keep are not repacked' '
 
 test_expect_success 'writing bitmaps via command-line can duplicate .keep objects' '
 	# build on $oid, $packid, and .keep state from previous
-	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 git repack -Adbl &&
+	git repack -Adbl &&
 	test_has_duplicate_object true
 '
 
 test_expect_success 'writing bitmaps via config can duplicate .keep objects' '
 	# build on $oid, $packid, and .keep state from previous
-	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
-		git -c repack.writebitmaps=true repack -Adl &&
+	git -c repack.writebitmaps=true repack -Adl &&
 	test_has_duplicate_object true
 '
 
@@ -284,8 +283,7 @@ test_expect_success 'repacking fails when missing .pack actually means missing o
 test_expect_success 'bitmaps are created by default in bare repos' '
 	git clone --bare .git bare.git &&
 	rm -f bare.git/objects/pack/*.bitmap &&
-	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
-		git -C bare.git repack -ad &&
+	git -C bare.git repack -ad &&
 	bitmap=$(ls bare.git/objects/pack/*.bitmap) &&
 	test_path_is_file "$bitmap"
 '
@@ -296,8 +294,7 @@ test_expect_success 'incremental repack does not complain' '
 '
 
 test_expect_success 'bitmaps can be disabled on bare repos' '
-	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
-		git -c repack.writeBitmaps=false -C bare.git repack -ad &&
+	git -c repack.writeBitmaps=false -C bare.git repack -ad &&
 	bitmap=$(ls bare.git/objects/pack/*.bitmap || :) &&
 	test -z "$bitmap"
 '
@@ -308,8 +305,7 @@ test_expect_success 'no bitmaps created if .keep files present' '
 	keep=${pack%.pack}.keep &&
 	test_when_finished "rm -f \"\$keep\"" &&
 	>"$keep" &&
-	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
-		git -C bare.git repack -ad 2>stderr &&
+	git -C bare.git repack -ad 2>stderr &&
 	test_must_be_empty stderr &&
 	find bare.git/objects/pack/ -type f -name "*.bitmap" >actual &&
 	test_must_be_empty actual
@@ -320,8 +316,7 @@ test_expect_success 'auto-bitmaps do not complain if unavailable' '
 	blob=$(test-tool genrandom big $((1024*1024)) |
 	       git -C bare.git hash-object -w --stdin) &&
 	git -C bare.git update-ref refs/tags/big $blob &&
-	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
-		git -C bare.git repack -ad 2>stderr &&
+	git -C bare.git repack -ad 2>stderr &&
 	test_must_be_empty stderr &&
 	find bare.git/objects/pack -type f -name "*.bitmap" >actual &&
 	test_must_be_empty actual
@@ -342,9 +337,7 @@ test_expect_success 'repacking with a filter works' '
 '
 
 test_expect_success '--filter fails with --write-bitmap-index' '
-	test_must_fail \
-		env GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
-		git -C bare.git repack -a -d --write-bitmap-index --filter=blob:none
+	test_must_fail git -C bare.git repack -a -d --write-bitmap-index --filter=blob:none
 '
 
 test_expect_success 'repacking with two filters works' '
-- 
2.46.0.rc0.94.g9b2aff57b3


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v2 18/19] t/t5313-pack-bounds-checks.sh: prepare for sub-directories
  2024-07-17 21:11 ` [PATCH v2 " Taylor Blau
                     ` (16 preceding siblings ...)
  2024-07-17 21:12   ` [PATCH v2 17/19] t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
@ 2024-07-17 21:12   ` Taylor Blau
  2024-07-17 21:12   ` [PATCH v2 19/19] midx: implement support for writing incremental MIDX chains Taylor Blau
  2024-08-01 11:14   ` [PATCH v2 00/19] midx: incremental multi-pack indexes, part one Jeff King
  19 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-07-17 21:12 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Prepare for sub-directories to appear in $GIT_DIR/objects/pack by
adjusting the copy, remove, and chmod invocations to perform their
behavior recursively.

This prepares us for the new $GIT_DIR/objects/pack/multi-pack-index.d
directory which will be added in a following commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5313-pack-bounds-checks.sh | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/t/t5313-pack-bounds-checks.sh b/t/t5313-pack-bounds-checks.sh
index ceaa6700a2..86fc73f9fb 100755
--- a/t/t5313-pack-bounds-checks.sh
+++ b/t/t5313-pack-bounds-checks.sh
@@ -7,11 +7,11 @@ TEST_PASSES_SANITIZE_LEAK=true
 
 clear_base () {
 	test_when_finished 'restore_base' &&
-	rm -f $base
+	rm -r -f $base
 }
 
 restore_base () {
-	cp base-backup/* .git/objects/pack/
+	cp -r base-backup/* .git/objects/pack/
 }
 
 do_pack () {
@@ -64,9 +64,9 @@ test_expect_success 'set up base packfile and variables' '
 	git commit -m base &&
 	git repack -ad &&
 	base=$(echo .git/objects/pack/*) &&
-	chmod +w $base &&
+	chmod -R +w $base &&
 	mkdir base-backup &&
-	cp $base base-backup/ &&
+	cp -r $base base-backup/ &&
 	object=$(git rev-parse HEAD:file)
 '
 
-- 
2.46.0.rc0.94.g9b2aff57b3


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v2 19/19] midx: implement support for writing incremental MIDX chains
  2024-07-17 21:11 ` [PATCH v2 " Taylor Blau
                     ` (17 preceding siblings ...)
  2024-07-17 21:12   ` [PATCH v2 18/19] t/t5313-pack-bounds-checks.sh: prepare for sub-directories Taylor Blau
@ 2024-07-17 21:12   ` Taylor Blau
  2024-08-01 11:07     ` Jeff King
  2024-08-01 11:14   ` [PATCH v2 00/19] midx: incremental multi-pack indexes, part one Jeff King
  19 siblings, 1 reply; 102+ messages in thread
From: Taylor Blau @ 2024-07-17 21:12 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Now that the rest of the MIDX subsystem and relevant callers have been
updated to learn about how to read and process incremental MIDX chains,
let's finally update the implementation in `write_midx_internal()` to be
able to write incremental MIDX chains.

This new feature is available behind the `--incremental` option for the
`multi-pack-index` builtin, like so:

    $ git multi-pack-index write --incremental

The implementation for doing so is relatively straightforward, and boils
down to a handful of different kinds of changes implemented in this
patch:

  - The `compute_sorted_entries()` function is taught to reject objects
    which appear in any existing MIDX layer.

  - Functions like `write_midx_revindex()` are adjusted to write
    pack_order values which are offset by the number of objects in the
    base MIDX layer.

  - The end of `write_midx_internal()` is adjusted to move
    non-incremental MIDX files when necessary (i.e. when creating an
    incremental chain with an existing non-incremental MIDX in the
    repository).

There are a handful of other changes that are introduced, like new
functions to clear incremental MIDX files that are unrelated to the
current chain (using the same "keep_hash" mechanism as in the
non-incremental case).

The tests explicitly exercising the new incremental MIDX feature are
relatively limited for two reasons:

  1. Most of the "interesting" behavior is already thoroughly covered in
     t5319-multi-pack-index.sh, which handles the core logic of reading
     objects through a MIDX.

     The new tests in t5334-incremental-multi-pack-index.sh are mostly
     focused on creating and destroying incremental MIDXs, as well as
     stitching their results together across layers.

  2. A new GIT_TEST environment variable is added called
     "GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL", which modifies the
     entire test suite to write incremental MIDXs after repacking when
     combined with the "GIT_TEST_MULTI_PACK_INDEX" variable.

     This exercises the long tail of other interesting behavior that is
     defined implicitly throughout the rest of the CI suite. It is
     likewise added to the linux-TEST-vars job.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.txt  |  11 +-
 builtin/multi-pack-index.c              |   2 +
 builtin/repack.c                        |   8 +-
 ci/run-build-and-tests.sh               |   1 +
 midx-write.c                            | 314 ++++++++++++++++++++----
 midx.c                                  |  62 ++++-
 midx.h                                  |   4 +
 packfile.c                              |  16 +-
 packfile.h                              |   4 +
 t/README                                |   4 +
 t/lib-bitmap.sh                         |   6 +-
 t/lib-midx.sh                           |  28 +++
 t/t5319-multi-pack-index.sh             |  27 +-
 t/t5326-multi-pack-bitmaps.sh           |   1 +
 t/t5327-multi-pack-bitmaps-rev.sh       |   1 +
 t/t5332-multi-pack-reuse.sh             |   2 +
 t/t5334-incremental-multi-pack-index.sh |  46 ++++
 t/t7700-repack.sh                       |  27 +-
 18 files changed, 460 insertions(+), 104 deletions(-)
 create mode 100755 t/t5334-incremental-multi-pack-index.sh

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index 3696506eb3..631d5c7d15 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -64,6 +64,12 @@ The file given at `<path>` is expected to be readable, and can contain
 duplicates. (If a given OID is given more than once, it is marked as
 preferred if at least one instance of it begins with the special `+`
 marker).
+
+	--incremental::
+		Write an incremental MIDX file containing only objects
+		and packs not present in an existing MIDX layer.
+		Migrates non-incremental MIDXs to incremental ones when
+		necessary. Incompatible with `--bitmap`.
 --
 
 verify::
@@ -74,6 +80,8 @@ expire::
 	have no objects referenced by the MIDX (with the exception of
 	`.keep` packs and cruft packs). Rewrite the MIDX file afterward
 	to remove all references to these pack-files.
++
+NOTE: this mode is incompatible with incremental MIDX files.
 
 repack::
 	Create a new pack-file containing objects in small pack-files
@@ -95,7 +103,8 @@ repack::
 +
 If `repack.packKeptObjects` is `false`, then any pack-files with an
 associated `.keep` file will not be selected for the batch to repack.
-
++
+NOTE: this mode is incompatible with incremental MIDX files.
 
 EXAMPLES
 --------
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 9cf1a32d65..8805cbbeb3 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -129,6 +129,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
 			MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
 		OPT_BIT(0, "progress", &opts.flags,
 			N_("force progress reporting"), MIDX_PROGRESS),
+		OPT_BIT(0, "incremental", &opts.flags,
+			N_("write a new incremental MIDX"), MIDX_WRITE_INCREMENTAL),
 		OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
 			 N_("write multi-pack index containing only given indexes")),
 		OPT_FILENAME(0, "refs-snapshot", &opts.refs_snapshot,
diff --git a/builtin/repack.c b/builtin/repack.c
index 8499bf0e12..7608430a37 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -1514,8 +1514,12 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	if (run_update_server_info)
 		update_server_info(0);
 
-	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
-		write_midx_file(get_object_directory(), NULL, NULL, 0);
+	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0)) {
+		unsigned flags = 0;
+		if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL, 0))
+			flags |= MIDX_WRITE_INCREMENTAL;
+		write_midx_file(get_object_directory(), NULL, NULL, flags);
+	}
 
 cleanup:
 	string_list_clear(&names, 1);
diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index e6fd68630c..2e28d02b20 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -25,6 +25,7 @@ linux-TEST-vars)
 	export GIT_TEST_COMMIT_GRAPH=1
 	export GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=1
 	export GIT_TEST_MULTI_PACK_INDEX=1
+	export GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=1
 	export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master
 	export GIT_TEST_NO_WRITE_REV_INDEX=1
 	export GIT_TEST_CHECKOUT_WORKERS=2
diff --git a/midx-write.c b/midx-write.c
index d5275d719b..a94cb28bfd 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -17,6 +17,8 @@
 #include "refs.h"
 #include "revision.h"
 #include "list-objects.h"
+#include "path.h"
+#include "pack-revindex.h"
 
 #define PACK_EXPIRED UINT_MAX
 #define BITMAP_POS_UNKNOWN (~((uint32_t)0))
@@ -25,7 +27,11 @@
 
 extern int midx_checksum_valid(struct multi_pack_index *m);
 extern void clear_midx_files_ext(const char *object_dir, const char *ext,
-				 unsigned char *keep_hash);
+				 const char *keep_hash);
+extern void clear_incremental_midx_files_ext(const char *object_dir,
+					     const char *ext,
+					     const char **keep_hashes,
+					     uint32_t hashes_nr);
 extern int cmp_idx_or_pack_name(const char *idx_or_pack_name,
 				const char *idx_name);
 
@@ -86,6 +92,7 @@ struct write_midx_context {
 	size_t nr;
 	size_t alloc;
 	struct multi_pack_index *m;
+	struct multi_pack_index *base_midx;
 	struct progress *progress;
 	unsigned pack_paths_checked;
 
@@ -99,6 +106,9 @@ struct write_midx_context {
 
 	int preferred_pack_idx;
 
+	int incremental;
+	uint32_t num_multi_pack_indexes_before;
+
 	struct string_list *to_include;
 };
 
@@ -122,6 +132,9 @@ static int should_include_pack(const struct write_midx_context *ctx,
 	 */
 	if (ctx->m && midx_contains_pack(ctx->m, file_name))
 		return 0;
+	else if (ctx->base_midx && midx_contains_pack(ctx->base_midx,
+						      file_name))
+		return 0;
 	else if (ctx->to_include &&
 		 !string_list_has_string(ctx->to_include, file_name))
 		return 0;
@@ -338,7 +351,7 @@ static void compute_sorted_entries(struct write_midx_context *ctx,
 	for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
 		fanout.nr = 0;
 
-		if (ctx->m)
+		if (ctx->m && !ctx->incremental)
 			midx_fanout_add_midx_fanout(&fanout, ctx->m, cur_fanout,
 						    ctx->preferred_pack_idx);
 
@@ -364,6 +377,10 @@ static void compute_sorted_entries(struct write_midx_context *ctx,
 			if (cur_object && oideq(&fanout.entries[cur_object - 1].oid,
 						&fanout.entries[cur_object].oid))
 				continue;
+			if (ctx->incremental && ctx->base_midx &&
+			    midx_has_oid(ctx->base_midx,
+					 &fanout.entries[cur_object].oid))
+				continue;
 
 			ALLOC_GROW(ctx->entries, st_add(ctx->entries_nr, 1),
 				   alloc_objects);
@@ -547,10 +564,16 @@ static int write_midx_revindex(struct hashfile *f,
 			       void *data)
 {
 	struct write_midx_context *ctx = data;
-	uint32_t i;
+	uint32_t i, nr_base;
+
+	if (ctx->incremental && ctx->base_midx)
+		nr_base = ctx->base_midx->num_objects +
+			ctx->base_midx->num_objects_in_base;
+	else
+		nr_base = 0;
 
 	for (i = 0; i < ctx->entries_nr; i++)
-		hashwrite_be32(f, ctx->pack_order[i]);
+		hashwrite_be32(f, ctx->pack_order[i] + nr_base);
 
 	return 0;
 }
@@ -579,12 +602,18 @@ static int midx_pack_order_cmp(const void *va, const void *vb)
 static uint32_t *midx_pack_order(struct write_midx_context *ctx)
 {
 	struct midx_pack_order_data *data;
-	uint32_t *pack_order;
+	uint32_t *pack_order, base_objects = 0;
 	uint32_t i;
 
 	trace2_region_enter("midx", "midx_pack_order", the_repository);
 
+	if (ctx->incremental && ctx->base_midx)
+		base_objects = ctx->base_midx->num_objects +
+			ctx->base_midx->num_objects_in_base;
+
+	ALLOC_ARRAY(pack_order, ctx->entries_nr);
 	ALLOC_ARRAY(data, ctx->entries_nr);
+
 	for (i = 0; i < ctx->entries_nr; i++) {
 		struct pack_midx_entry *e = &ctx->entries[i];
 		data[i].nr = i;
@@ -596,12 +625,11 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx)
 
 	QSORT(data, ctx->entries_nr, midx_pack_order_cmp);
 
-	ALLOC_ARRAY(pack_order, ctx->entries_nr);
 	for (i = 0; i < ctx->entries_nr; i++) {
 		struct pack_midx_entry *e = &ctx->entries[data[i].nr];
 		struct pack_info *pack = &ctx->info[ctx->pack_perm[e->pack_int_id]];
 		if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
-			pack->bitmap_pos = i;
+			pack->bitmap_pos = i + base_objects;
 		pack->bitmap_nr++;
 		pack_order[i] = data[i].nr;
 	}
@@ -649,7 +677,8 @@ static void prepare_midx_packing_data(struct packing_data *pdata,
 	prepare_packing_data(the_repository, pdata);
 
 	for (i = 0; i < ctx->entries_nr; i++) {
-		struct pack_midx_entry *from = &ctx->entries[ctx->pack_order[i]];
+		uint32_t pos = ctx->pack_order[i];
+		struct pack_midx_entry *from = &ctx->entries[pos];
 		struct object_entry *to = packlist_alloc(pdata, &from->oid);
 
 		oe_set_in_pack(pdata, to,
@@ -897,37 +926,130 @@ static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
 static int fill_packs_from_midx(struct write_midx_context *ctx,
 				const char *preferred_pack_name, uint32_t flags)
 {
-	uint32_t i;
+	struct multi_pack_index *m;
 
-	for (i = 0; i < ctx->m->num_packs; i++) {
-		ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
+	for (m = ctx->m; m; m = m->base_midx) {
+		uint32_t i;
+
+		for (i = 0; i < m->num_packs; i++) {
+			ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
 
-		if (flags & MIDX_WRITE_REV_INDEX || preferred_pack_name) {
 			/*
 			 * If generating a reverse index, need to have
 			 * packed_git's loaded to compare their
 			 * mtimes and object count.
 			 *
-			 *
 			 * If a preferred pack is specified, need to
 			 * have packed_git's loaded to ensure the chosen
 			 * preferred pack has a non-zero object count.
 			 */
-			if (prepare_midx_pack(the_repository, ctx->m, i))
-				return error(_("could not load pack"));
+			if (flags & MIDX_WRITE_REV_INDEX ||
+			    preferred_pack_name) {
+				if (prepare_midx_pack(the_repository, m,
+						      m->num_packs_in_base + i)) {
+					error(_("could not load pack"));
+					return 1;
+				}
 
-			if (open_pack_index(ctx->m->packs[i]))
-				die(_("could not open index for %s"),
-				    ctx->m->packs[i]->pack_name);
+				if (open_pack_index(m->packs[i]))
+					die(_("could not open index for %s"),
+					    m->packs[i]->pack_name);
+			}
+
+			fill_pack_info(&ctx->info[ctx->nr++], m->packs[i],
+				       m->pack_names[i],
+				       m->num_packs_in_base + i);
 		}
-
-		fill_pack_info(&ctx->info[ctx->nr++], ctx->m->packs[i],
-			       ctx->m->pack_names[i], i);
 	}
-
 	return 0;
 }
 
+static struct {
+	const char *non_split;
+	const char *split;
+} midx_exts[] = {
+	{NULL, MIDX_EXT_MIDX},
+	{MIDX_EXT_BITMAP, MIDX_EXT_BITMAP},
+	{MIDX_EXT_REV, MIDX_EXT_REV},
+};
+
+static int link_midx_to_chain(struct multi_pack_index *m)
+{
+	struct strbuf from = STRBUF_INIT;
+	struct strbuf to = STRBUF_INIT;
+	int ret = 0;
+	size_t i;
+
+	if (!m || m->has_chain) {
+		/*
+		 * Either no MIDX previously existed, or it was already
+		 * part of a MIDX chain. In both cases, we have nothing
+		 * to link, so return early.
+		 */
+		goto done;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(midx_exts); i++) {
+		const unsigned char *hash = get_midx_checksum(m);
+
+		get_midx_filename_ext(&from, m->object_dir, hash,
+				      midx_exts[i].non_split);
+		get_split_midx_filename_ext(&to, m->object_dir, hash,
+					    midx_exts[i].split);
+
+		if (link(from.buf, to.buf) < 0 && errno != ENOENT) {
+			ret = error_errno(_("unable to link '%s' to '%s'"),
+					  from.buf, to.buf);
+			goto done;
+		}
+
+		strbuf_reset(&from);
+		strbuf_reset(&to);
+	}
+
+done:
+	strbuf_release(&from);
+	strbuf_release(&to);
+	return ret;
+}
+
+static void clear_midx_files(const char *object_dir,
+			     const char **hashes,
+			     uint32_t hashes_nr,
+			     unsigned incremental)
+{
+	/*
+	 * if incremental:
+	 *   - remove all non-incremental MIDX files
+	 *   - remove any incremental MIDX files not in the current one
+	 *
+	 * if non-incremental:
+	 *   - remove all incremental MIDX files
+	 *   - remove any non-incremental MIDX files not matching the current
+	 *     hash
+	 */
+	struct strbuf buf = STRBUF_INIT;
+	const char *exts[] = { MIDX_EXT_BITMAP, MIDX_EXT_REV, MIDX_EXT_MIDX };
+	uint32_t i, j;
+
+	for (i = 0; i < ARRAY_SIZE(exts); i++) {
+		clear_incremental_midx_files_ext(object_dir, exts[i],
+						 hashes, hashes_nr);
+		for (j = 0; j < hashes_nr; j++)
+			clear_midx_files_ext(object_dir, exts[i], hashes[j]);
+	}
+
+	if (incremental)
+		get_midx_filename(&buf, object_dir);
+	else
+		get_midx_chain_filename(&buf, object_dir);
+
+	if (unlink(buf.buf) && errno != ENOENT)
+		die_errno(_("failed to clear multi-pack-index at %s"), buf.buf);
+
+	strbuf_release(&buf);
+}
+
 static int write_midx_internal(const char *object_dir,
 			       struct string_list *packs_to_include,
 			       struct string_list *packs_to_drop,
@@ -940,42 +1062,66 @@ static int write_midx_internal(const char *object_dir,
 	uint32_t i, start_pack;
 	struct hashfile *f = NULL;
 	struct lock_file lk;
+	struct tempfile *incr;
 	struct write_midx_context ctx = { 0 };
 	int bitmapped_packs_concat_len = 0;
 	int pack_name_concat_len = 0;
 	int dropped_packs = 0;
 	int result = 0;
+	const char **keep_hashes = NULL;
 	struct chunkfile *cf;
 
 	trace2_region_enter("midx", "write_midx_internal", the_repository);
 
-	get_midx_filename(&midx_name, object_dir);
+	ctx.incremental = !!(flags & MIDX_WRITE_INCREMENTAL);
+	if (ctx.incremental && (flags & MIDX_WRITE_BITMAP))
+		die(_("cannot write incremental MIDX with bitmap"));
+
+	if (ctx.incremental)
+		strbuf_addf(&midx_name,
+			    "%s/pack/multi-pack-index.d/tmp_midx_XXXXXX",
+			    object_dir);
+	else
+		get_midx_filename(&midx_name, object_dir);
 	if (safe_create_leading_directories(midx_name.buf))
 		die_errno(_("unable to create leading directories of %s"),
 			  midx_name.buf);
 
-	if (!packs_to_include) {
-		/*
-		 * Only reference an existing MIDX when not filtering which
-		 * packs to include, since all packs and objects are copied
-		 * blindly from an existing MIDX if one is present.
-		 */
-		ctx.m = lookup_multi_pack_index(the_repository, object_dir);
-	}
+	if (!packs_to_include || ctx.incremental) {
+		struct multi_pack_index *m = lookup_multi_pack_index(the_repository,
+								     object_dir);
+		if (m && !midx_checksum_valid(m)) {
+			warning(_("ignoring existing multi-pack-index; checksum mismatch"));
+			m = NULL;
+		}
 
-	if (ctx.m && !midx_checksum_valid(ctx.m)) {
-		warning(_("ignoring existing multi-pack-index; checksum mismatch"));
-		ctx.m = NULL;
+		if (m) {
+			/*
+			 * Only reference an existing MIDX when not filtering
+			 * which packs to include, since all packs and objects
+			 * are copied blindly from an existing MIDX if one is
+			 * present.
+			 */
+			if (ctx.incremental)
+				ctx.base_midx = m;
+			else if (!packs_to_include)
+				ctx.m = m;
+		}
 	}
 
 	ctx.nr = 0;
-	ctx.alloc = ctx.m ? ctx.m->num_packs : 16;
+	ctx.alloc = ctx.m ? ctx.m->num_packs + ctx.m->num_packs_in_base : 16;
 	ctx.info = NULL;
 	ALLOC_ARRAY(ctx.info, ctx.alloc);
 
-	if (ctx.m && fill_packs_from_midx(&ctx, preferred_pack_name,
-					  flags) < 0) {
-		result = 1;
+	if (ctx.incremental) {
+		struct multi_pack_index *m = ctx.base_midx;
+		while (m) {
+			ctx.num_multi_pack_indexes_before++;
+			m = m->base_midx;
+		}
+	} else if (ctx.m && fill_packs_from_midx(&ctx, preferred_pack_name,
+						 flags) < 0) {
 		goto cleanup;
 	}
 
@@ -992,7 +1138,8 @@ static int write_midx_internal(const char *object_dir,
 	for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
 	stop_progress(&ctx.progress);
 
-	if ((ctx.m && ctx.nr == ctx.m->num_packs) &&
+	if ((ctx.m && ctx.nr == ctx.m->num_packs + ctx.m->num_packs_in_base) &&
+	    !ctx.incremental &&
 	    !(packs_to_include || packs_to_drop)) {
 		struct bitmap_index *bitmap_git;
 		int bitmap_exists;
@@ -1008,12 +1155,14 @@ static int write_midx_internal(const char *object_dir,
 			 * corresponding bitmap (or one wasn't requested).
 			 */
 			if (!want_bitmap)
-				clear_midx_files_ext(object_dir, ".bitmap",
-						     NULL);
+				clear_midx_files_ext(object_dir, "bitmap", NULL);
 			goto cleanup;
 		}
 	}
 
+	if (ctx.incremental && !ctx.nr)
+		goto cleanup; /* nothing to do */
+
 	if (preferred_pack_name) {
 		ctx.preferred_pack_idx = -1;
 
@@ -1159,8 +1308,30 @@ static int write_midx_internal(const char *object_dir,
 		pack_name_concat_len += MIDX_CHUNK_ALIGNMENT -
 					(pack_name_concat_len % MIDX_CHUNK_ALIGNMENT);
 
-	hold_lock_file_for_update(&lk, midx_name.buf, LOCK_DIE_ON_ERROR);
-	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
+	if (ctx.incremental) {
+		struct strbuf lock_name = STRBUF_INIT;
+
+		get_midx_chain_filename(&lock_name, object_dir);
+		hold_lock_file_for_update(&lk, lock_name.buf, LOCK_DIE_ON_ERROR);
+		strbuf_release(&lock_name);
+
+		incr = mks_tempfile_m(midx_name.buf, 0444);
+		if (!incr) {
+			error(_("unable to create temporary MIDX layer"));
+			return -1;
+		}
+
+		if (adjust_shared_perm(get_tempfile_path(incr))) {
+			error(_("unable to adjust shared permissions for '%s'"),
+			      get_tempfile_path(incr));
+			return -1;
+		}
+
+		f = hashfd(get_tempfile_fd(incr), get_tempfile_path(incr));
+	} else {
+		hold_lock_file_for_update(&lk, midx_name.buf, LOCK_DIE_ON_ERROR);
+		f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
+	}
 
 	if (ctx.nr - dropped_packs == 0) {
 		error(_("no pack files to index."));
@@ -1253,14 +1424,55 @@ static int write_midx_internal(const char *object_dir,
 	 * have been freed in the previous if block.
 	 */
 
-	if (ctx.m)
+	CALLOC_ARRAY(keep_hashes, ctx.num_multi_pack_indexes_before + 1);
+
+	if (ctx.incremental) {
+		FILE *chainf = fdopen_lock_file(&lk, "w");
+		struct strbuf final_midx_name = STRBUF_INIT;
+		struct multi_pack_index *m = ctx.base_midx;
+
+		if (!chainf) {
+			error_errno(_("unable to open multi-pack-index chain file"));
+			return -1;
+		}
+
+		if (link_midx_to_chain(ctx.base_midx) < 0)
+			return -1;
+
+		get_split_midx_filename_ext(&final_midx_name, object_dir,
+					    midx_hash, MIDX_EXT_MIDX);
+
+		if (rename_tempfile(&incr, final_midx_name.buf) < 0) {
+			error_errno(_("unable to rename new multi-pack-index layer"));
+			return -1;
+		}
+
+		keep_hashes[ctx.num_multi_pack_indexes_before] =
+			xstrdup(hash_to_hex(midx_hash));
+
+		for (i = 0; i < ctx.num_multi_pack_indexes_before; i++) {
+			uint32_t j = ctx.num_multi_pack_indexes_before - i - 1;
+
+			keep_hashes[j] = xstrdup(hash_to_hex(get_midx_checksum(m)));
+			m = m->base_midx;
+		}
+
+		for (i = 0; i < ctx.num_multi_pack_indexes_before + 1; i++)
+			fprintf(get_lock_file_fp(&lk), "%s\n", keep_hashes[i]);
+	} else {
+		keep_hashes[ctx.num_multi_pack_indexes_before] =
+			xstrdup(hash_to_hex(midx_hash));
+	}
+
+	if (ctx.m || ctx.base_midx)
 		close_object_store(the_repository->objects);
 
 	if (commit_lock_file(&lk) < 0)
 		die_errno(_("could not write multi-pack-index"));
 
-	clear_midx_files_ext(object_dir, ".bitmap", midx_hash);
-	clear_midx_files_ext(object_dir, ".rev", midx_hash);
+	clear_midx_files(object_dir, keep_hashes,
+			 ctx.num_multi_pack_indexes_before + 1,
+			 ctx.incremental);
 
 cleanup:
 	for (i = 0; i < ctx.nr; i++) {
@@ -1275,6 +1487,11 @@ static int write_midx_internal(const char *object_dir,
 	free(ctx.entries);
 	free(ctx.pack_perm);
 	free(ctx.pack_order);
+	if (keep_hashes) {
+		for (i = 0; i < ctx.num_multi_pack_indexes_before + 1; i++)
+			free((char *)keep_hashes[i]);
+		free(keep_hashes);
+	}
 	strbuf_release(&midx_name);
 
 	trace2_region_leave("midx", "write_midx_internal", the_repository);
@@ -1311,6 +1528,9 @@ int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla
 	if (!m)
 		return 0;
 
+	if (m->base_midx)
+		die(_("cannot expire packs from an incremental multi-pack-index"));
+
 	CALLOC_ARRAY(count, m->num_packs);
 
 	if (flags & MIDX_PROGRESS)
@@ -1485,6 +1705,8 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 
 	if (!m)
 		return 0;
+	if (m->base_midx)
+		die(_("cannot repack an incremental multi-pack-index"));
 
 	CALLOC_ARRAY(include_pack, m->num_packs);
 
diff --git a/midx.c b/midx.c
index 21a9dbe23a..6372138ba8 100644
--- a/midx.c
+++ b/midx.c
@@ -16,7 +16,10 @@
 
 int midx_checksum_valid(struct multi_pack_index *m);
 void clear_midx_files_ext(const char *object_dir, const char *ext,
-			  unsigned char *keep_hash);
+			  const char *keep_hash);
+void clear_incremental_midx_files_ext(const char *object_dir, const char *ext,
+				      char **keep_hashes,
+				      uint32_t hashes_nr);
 int cmp_idx_or_pack_name(const char *idx_or_pack_name,
 			 const char *idx_name);
 
@@ -520,6 +523,11 @@ int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m,
 	return 0;
 }
 
+int midx_has_oid(struct multi_pack_index *m, const struct object_id *oid)
+{
+	return bsearch_midx(oid, m, NULL);
+}
+
 struct object_id *nth_midxed_object_oid(struct object_id *oid,
 					struct multi_pack_index *m,
 					uint32_t n)
@@ -722,7 +730,8 @@ int midx_checksum_valid(struct multi_pack_index *m)
 }
 
 struct clear_midx_data {
-	char *keep;
+	char **keep;
+	uint32_t keep_nr;
 	const char *ext;
 };
 
@@ -730,32 +739,63 @@ static void clear_midx_file_ext(const char *full_path, size_t full_path_len UNUS
 				const char *file_name, void *_data)
 {
 	struct clear_midx_data *data = _data;
+	uint32_t i;
 
 	if (!(starts_with(file_name, "multi-pack-index-") &&
 	      ends_with(file_name, data->ext)))
 		return;
-	if (data->keep && !strcmp(data->keep, file_name))
-		return;
-
+	for (i = 0; i < data->keep_nr; i++) {
+		if (!strcmp(data->keep[i], file_name))
+			return;
+	}
 	if (unlink(full_path))
 		die_errno(_("failed to remove %s"), full_path);
 }
 
 void clear_midx_files_ext(const char *object_dir, const char *ext,
-			  unsigned char *keep_hash)
+			  const char *keep_hash)
 {
 	struct clear_midx_data data;
 	memset(&data, 0, sizeof(struct clear_midx_data));
 
-	if (keep_hash)
-		data.keep = xstrfmt("multi-pack-index-%s%s",
-				    hash_to_hex(keep_hash), ext);
+	if (keep_hash) {
+		ALLOC_ARRAY(data.keep, 1);
+
+		data.keep[0] = xstrfmt("multi-pack-index-%s.%s", keep_hash, ext);
+		data.keep_nr = 1;
+	}
 	data.ext = ext;
 
 	for_each_file_in_pack_dir(object_dir,
 				  clear_midx_file_ext,
 				  &data);
 
+	if (keep_hash)
+		free(data.keep[0]);
+	free(data.keep);
+}
+
+void clear_incremental_midx_files_ext(const char *object_dir, const char *ext,
+				      char **keep_hashes,
+				      uint32_t hashes_nr)
+{
+	struct clear_midx_data data;
+	uint32_t i;
+
+	memset(&data, 0, sizeof(struct clear_midx_data));
+
+	ALLOC_ARRAY(data.keep, hashes_nr);
+	for (i = 0; i < hashes_nr; i++)
+		data.keep[i] = xstrfmt("multi-pack-index-%s.%s", keep_hashes[i],
+				       ext);
+	data.keep_nr = hashes_nr;
+	data.ext = ext;
+
+	for_each_file_in_pack_subdir(object_dir, "multi-pack-index.d",
+				     clear_midx_file_ext, &data);
+
+	for (i = 0; i < hashes_nr; i++)
+		free(data.keep[i]);
 	free(data.keep);
 }
 
@@ -773,8 +813,8 @@ void clear_midx_file(struct repository *r)
 	if (remove_path(midx.buf))
 		die(_("failed to clear multi-pack-index at %s"), midx.buf);
 
-	clear_midx_files_ext(r->objects->odb->path, ".bitmap", NULL);
-	clear_midx_files_ext(r->objects->odb->path, ".rev", NULL);
+	clear_midx_files_ext(r->objects->odb->path, MIDX_EXT_BITMAP, NULL);
+	clear_midx_files_ext(r->objects->odb->path, MIDX_EXT_REV, NULL);
 
 	strbuf_release(&midx);
 }
diff --git a/midx.h b/midx.h
index 3714cad2cc..42d4f8d149 100644
--- a/midx.h
+++ b/midx.h
@@ -29,6 +29,8 @@ struct bitmapped_pack;
 #define MIDX_LARGE_OFFSET_NEEDED 0x80000000
 
 #define GIT_TEST_MULTI_PACK_INDEX "GIT_TEST_MULTI_PACK_INDEX"
+#define GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL \
+	"GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL"
 
 struct multi_pack_index {
 	struct multi_pack_index *next;
@@ -77,6 +79,7 @@ struct multi_pack_index {
 #define MIDX_WRITE_BITMAP (1 << 2)
 #define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3)
 #define MIDX_WRITE_BITMAP_LOOKUP_TABLE (1 << 4)
+#define MIDX_WRITE_INCREMENTAL (1 << 5)
 
 #define MIDX_EXT_REV "rev"
 #define MIDX_EXT_BITMAP "bitmap"
@@ -101,6 +104,7 @@ int bsearch_one_midx(const struct object_id *oid, struct multi_pack_index *m,
 		     uint32_t *result);
 int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m,
 		 uint32_t *result);
+int midx_has_oid(struct multi_pack_index *m, const struct object_id *oid);
 off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos);
 uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos);
 struct object_id *nth_midxed_object_oid(struct object_id *oid,
diff --git a/packfile.c b/packfile.c
index 1eb18e3041..cf12a539ea 100644
--- a/packfile.c
+++ b/packfile.c
@@ -815,9 +815,10 @@ static void report_pack_garbage(struct string_list *list)
 	report_helper(list, seen_bits, first, list->nr);
 }
 
-void for_each_file_in_pack_dir(const char *objdir,
-			       each_file_in_pack_dir_fn fn,
-			       void *data)
+void for_each_file_in_pack_subdir(const char *objdir,
+				  const char *subdir,
+				  each_file_in_pack_dir_fn fn,
+				  void *data)
 {
 	struct strbuf path = STRBUF_INIT;
 	size_t dirnamelen;
@@ -826,6 +827,8 @@ void for_each_file_in_pack_dir(const char *objdir,
 
 	strbuf_addstr(&path, objdir);
 	strbuf_addstr(&path, "/pack");
+	if (subdir)
+		strbuf_addf(&path, "/%s", subdir);
 	dir = opendir(path.buf);
 	if (!dir) {
 		if (errno != ENOENT)
@@ -847,6 +850,13 @@ void for_each_file_in_pack_dir(const char *objdir,
 	strbuf_release(&path);
 }
 
+void for_each_file_in_pack_dir(const char *objdir,
+			       each_file_in_pack_dir_fn fn,
+			       void *data)
+{
+	for_each_file_in_pack_subdir(objdir, NULL, fn, data);
+}
+
 struct prepare_pack_data {
 	struct repository *r;
 	struct string_list *garbage;
diff --git a/packfile.h b/packfile.h
index eb18ec15db..0f78658229 100644
--- a/packfile.h
+++ b/packfile.h
@@ -55,6 +55,10 @@ struct packed_git *parse_pack_index(unsigned char *sha1, const char *idx_path);
 
 typedef void each_file_in_pack_dir_fn(const char *full_path, size_t full_path_len,
 				      const char *file_name, void *data);
+void for_each_file_in_pack_subdir(const char *objdir,
+				  const char *subdir,
+				  each_file_in_pack_dir_fn fn,
+				  void *data);
 void for_each_file_in_pack_dir(const char *objdir,
 			       each_file_in_pack_dir_fn fn,
 			       void *data);
diff --git a/t/README b/t/README
index e8a11926e4..e93a29de1b 100644
--- a/t/README
+++ b/t/README
@@ -469,6 +469,10 @@ GIT_TEST_MULTI_PACK_INDEX=<boolean>, when true, forces the multi-pack-
 index to be written after every 'git repack' command, and overrides the
 'core.multiPackIndex' setting to true.
 
+GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=<boolean>, when true, sets
+the '--incremental' option on all invocations of 'git multi-pack-index
+write'.
+
 GIT_TEST_SIDEBAND_ALL=<boolean>, when true, overrides the
 'uploadpack.allowSidebandAll' setting to true, and when false, forces
 fetch-pack to not request sideband-all (even if the server advertises
diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
index f595937094..62aa6744a6 100644
--- a/t/lib-bitmap.sh
+++ b/t/lib-bitmap.sh
@@ -1,6 +1,8 @@
 # Helpers for scripts testing bitmap functionality; see t5310 for
 # example usage.
 
+. "$TEST_DIRECTORY"/lib-midx.sh
+
 objdir=.git/objects
 midx=$objdir/pack/multi-pack-index
 
@@ -264,10 +266,6 @@ have_delta () {
 	test_cmp expect actual
 }
 
-midx_checksum () {
-	test-tool read-midx --checksum "$1"
-}
-
 # midx_pack_source <obj>
 midx_pack_source () {
 	test-tool read-midx --show-objects .git/objects | grep "^$1 " | cut -f2
diff --git a/t/lib-midx.sh b/t/lib-midx.sh
index 1261994744..e38c609604 100644
--- a/t/lib-midx.sh
+++ b/t/lib-midx.sh
@@ -6,3 +6,31 @@ test_midx_consistent () {
 	test_cmp expect actual &&
 	git multi-pack-index --object-dir=$1 verify
 }
+
+midx_checksum () {
+	test-tool read-midx --checksum "$1"
+}
+
+midx_git_two_modes () {
+	git -c core.multiPackIndex=false $1 >expect &&
+	git -c core.multiPackIndex=true $1 >actual &&
+	if [ "$2" = "sorted" ]
+	then
+		sort <expect >expect.sorted &&
+		mv expect.sorted expect &&
+		sort <actual >actual.sorted &&
+		mv actual.sorted actual
+	fi &&
+	test_cmp expect actual
+}
+
+compare_results_with_midx () {
+	MSG=$1
+	test_expect_success "check normal git operations: $MSG" '
+		midx_git_two_modes "rev-list --objects --all" &&
+		midx_git_two_modes "log --raw" &&
+		midx_git_two_modes "count-objects --verbose" &&
+		midx_git_two_modes "cat-file --batch-all-objects --batch-check" &&
+		midx_git_two_modes "cat-file --batch-all-objects --batch-check --unordered" sorted
+	'
+}
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 6e9ee23398..4b0b5a5c9f 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -3,8 +3,11 @@
 test_description='multi-pack-indexes'
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-chunk.sh
+. "$TEST_DIRECTORY"/lib-midx.sh
 
 GIT_TEST_MULTI_PACK_INDEX=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=0
 objdir=.git/objects
 
 HASH_LEN=$(test_oid rawsz)
@@ -107,30 +110,6 @@ test_expect_success 'write midx with one v1 pack' '
 	midx_read_expect 1 18 4 $objdir
 '
 
-midx_git_two_modes () {
-	git -c core.multiPackIndex=false $1 >expect &&
-	git -c core.multiPackIndex=true $1 >actual &&
-	if [ "$2" = "sorted" ]
-	then
-		sort <expect >expect.sorted &&
-		mv expect.sorted expect &&
-		sort <actual >actual.sorted &&
-		mv actual.sorted actual
-	fi &&
-	test_cmp expect actual
-}
-
-compare_results_with_midx () {
-	MSG=$1
-	test_expect_success "check normal git operations: $MSG" '
-		midx_git_two_modes "rev-list --objects --all" &&
-		midx_git_two_modes "log --raw" &&
-		midx_git_two_modes "count-objects --verbose" &&
-		midx_git_two_modes "cat-file --batch-all-objects --batch-check" &&
-		midx_git_two_modes "cat-file --batch-all-objects --batch-check --unordered" sorted
-	'
-}
-
 test_expect_success 'write midx with one v2 pack' '
 	git pack-objects --index-version=2,0x40 $objdir/pack/test <obj-list &&
 	git multi-pack-index --object-dir=$objdir write &&
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 1cb3e3ff08..832b92619c 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -7,6 +7,7 @@ test_description='exercise basic multi-pack bitmap functionality'
 # We'll be writing our own MIDX, so avoid getting confused by the
 # automatic ones.
 GIT_TEST_MULTI_PACK_INDEX=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=0
 
 # This test exercise multi-pack bitmap functionality where the object order is
 # stored and read from a special chunk within the MIDX, so use the default
diff --git a/t/t5327-multi-pack-bitmaps-rev.sh b/t/t5327-multi-pack-bitmaps-rev.sh
index 23db949c20..9cac03a94b 100755
--- a/t/t5327-multi-pack-bitmaps-rev.sh
+++ b/t/t5327-multi-pack-bitmaps-rev.sh
@@ -8,6 +8,7 @@ test_description='exercise basic multi-pack bitmap functionality (.rev files)'
 # We'll be writing our own MIDX, so avoid getting confused by the automatic
 # ones.
 GIT_TEST_MULTI_PACK_INDEX=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=0
 
 # Unlike t5326, this test exercise multi-pack bitmap functionality where the
 # object order is stored in a separate .rev file.
diff --git a/t/t5332-multi-pack-reuse.sh b/t/t5332-multi-pack-reuse.sh
index ed823f37bc..941e73d354 100755
--- a/t/t5332-multi-pack-reuse.sh
+++ b/t/t5332-multi-pack-reuse.sh
@@ -6,6 +6,8 @@ TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-bitmap.sh
 
+GIT_TEST_MULTI_PACK_INDEX=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=0
 objdir=.git/objects
 packdir=$objdir/pack
 
diff --git a/t/t5334-incremental-multi-pack-index.sh b/t/t5334-incremental-multi-pack-index.sh
new file mode 100755
index 0000000000..c3b08acc73
--- /dev/null
+++ b/t/t5334-incremental-multi-pack-index.sh
@@ -0,0 +1,46 @@
+#!/bin/sh
+
+test_description='incremental multi-pack-index'
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-midx.sh
+
+GIT_TEST_MULTI_PACK_INDEX=0
+export GIT_TEST_MULTI_PACK_INDEX
+
+objdir=.git/objects
+packdir=$objdir/pack
+midxdir=$packdir/multi-pack-index.d
+midx_chain=$midxdir/multi-pack-index-chain
+
+test_expect_success 'convert non-incremental MIDX to incremental' '
+	test_commit base &&
+	git repack -ad &&
+	git multi-pack-index write &&
+
+	test_path_is_file $packdir/multi-pack-index &&
+	old_hash="$(midx_checksum $objdir)" &&
+
+	test_commit other &&
+	git repack -d &&
+	git multi-pack-index write --incremental &&
+
+	test_path_is_missing $packdir/multi-pack-index &&
+	test_path_is_file $midx_chain &&
+	test_line_count = 2 $midx_chain &&
+	grep $old_hash $midx_chain
+'
+
+compare_results_with_midx 'incremental MIDX'
+
+test_expect_success 'convert incremental to non-incremental' '
+	test_commit squash &&
+	git repack -d &&
+	git multi-pack-index write &&
+
+	test_path_is_file $packdir/multi-pack-index &&
+	test_dir_is_empty $midxdir
+'
+
+compare_results_with_midx 'non-incremental MIDX conversion'
+
+test_done
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 8f34f05087..be1188e736 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -7,6 +7,9 @@ test_description='git repack works correctly'
 . "${TEST_DIRECTORY}/lib-midx.sh"
 . "${TEST_DIRECTORY}/lib-terminal.sh"
 
+GIT_TEST_MULTI_PACK_INDEX=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=0
+
 commit_and_pack () {
 	test_commit "$@" 1>&2 &&
 	incrpackid=$(git pack-objects --all --unpacked --incremental .git/objects/pack/pack </dev/null) &&
@@ -117,7 +120,7 @@ test_expect_success '--local disables writing bitmaps when connected to alternat
 	(
 		cd member &&
 		test_commit "object" &&
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adl --write-bitmap-index 2>err &&
+		git repack -Adl --write-bitmap-index 2>err &&
 		cat >expect <<-EOF &&
 		warning: disabling bitmap writing, as some objects are not being packed
 		EOF
@@ -533,11 +536,11 @@ test_expect_success 'setup for --write-midx tests' '
 test_expect_success '--write-midx unchanged' '
 	(
 		cd midx &&
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack &&
+		git repack &&
 		test_path_is_missing $midx &&
 		test_path_is_missing $midx-*.bitmap &&
 
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack --write-midx &&
+		git repack --write-midx &&
 
 		test_path_is_file $midx &&
 		test_path_is_missing $midx-*.bitmap &&
@@ -550,7 +553,7 @@ test_expect_success '--write-midx with a new pack' '
 		cd midx &&
 		test_commit loose &&
 
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack --write-midx &&
+		git repack --write-midx &&
 
 		test_path_is_file $midx &&
 		test_path_is_missing $midx-*.bitmap &&
@@ -561,7 +564,7 @@ test_expect_success '--write-midx with a new pack' '
 test_expect_success '--write-midx with -b' '
 	(
 		cd midx &&
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack -mb &&
+		git repack -mb &&
 
 		test_path_is_file $midx &&
 		test_path_is_file $midx-*.bitmap &&
@@ -574,7 +577,7 @@ test_expect_success '--write-midx with -d' '
 		cd midx &&
 		test_commit repack &&
 
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack -Ad --write-midx &&
+		git repack -Ad --write-midx &&
 
 		test_path_is_file $midx &&
 		test_path_is_missing $midx-*.bitmap &&
@@ -587,21 +590,21 @@ test_expect_success 'cleans up MIDX when appropriate' '
 		cd midx &&
 
 		test_commit repack-2 &&
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb --write-midx &&
+		git repack -Adb --write-midx &&
 
 		checksum=$(midx_checksum $objdir) &&
 		test_path_is_file $midx &&
 		test_path_is_file $midx-$checksum.bitmap &&
 
 		test_commit repack-3 &&
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb --write-midx &&
+		git repack -Adb --write-midx &&
 
 		test_path_is_file $midx &&
 		test_path_is_missing $midx-$checksum.bitmap &&
 		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
 
 		test_commit repack-4 &&
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb &&
+		git repack -Adb &&
 
 		find $objdir/pack -type f -name "multi-pack-index*" >files &&
 		test_must_be_empty files
@@ -622,7 +625,6 @@ test_expect_success '--write-midx with preferred bitmap tips' '
 		git log --format="create refs/tags/%s/%s %H" HEAD >refs &&
 		git update-ref --stdin <refs &&
 
-		GIT_TEST_MULTI_PACK_INDEX=0 \
 		git repack --write-midx --write-bitmap-index &&
 		test_path_is_file $midx &&
 		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
@@ -714,13 +716,13 @@ test_expect_success '--write-midx removes stale pack-based bitmaps' '
 	(
 		cd repo &&
 		test_commit base &&
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack -Ab &&
+		git repack -Ab &&
 
 		pack_bitmap=$(ls $objdir/pack/pack-*.bitmap) &&
 		test_path_is_file "$pack_bitmap" &&
 
 		test_commit tip &&
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack -bm &&
+		git repack -bm &&
 
 		test_path_is_file $midx &&
 		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
@@ -743,7 +745,6 @@ test_expect_success '--write-midx with --pack-kept-objects' '
 		keep="$objdir/pack/pack-$one.keep" &&
 		touch "$keep" &&
 
-		GIT_TEST_MULTI_PACK_INDEX=0 \
 		git repack --write-midx --write-bitmap-index --geometric=2 -d \
 			--pack-kept-objects &&
 
-- 
2.46.0.rc0.94.g9b2aff57b3

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 01/19] Documentation: describe incremental MIDX format
  2024-07-17 21:11   ` [PATCH v2 01/19] Documentation: describe incremental MIDX format Taylor Blau
@ 2024-08-01  9:19     ` Jeff King
  2024-08-01 18:52       ` Taylor Blau
  0 siblings, 1 reply; 102+ messages in thread
From: Jeff King @ 2024-08-01  9:19 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Wed, Jul 17, 2024 at 05:11:58PM -0400, Taylor Blau wrote:

> +Each individual component of the chain need only contain a small number
> +of packfiles. Appending to the chain does not invalidate earlier parts
> +of the chain, so repositories can control how much time is spent
> +updating the MIDX chain by determining the number of packs in each layer
> +of the MIDX chain.
> +
> +=== Design state
> +
> +At present, the incremental multi-pack indexes feature is missing two
> +important components:
> +
> +  - The ability to rewrite earlier portions of the MIDX chain (i.e., to
> +    "compact" some collection of adjacent MIDX layers into a single
> +    MIDX). At present the only supported way of shrinking a MIDX chain
> +    is to rewrite the entire chain from scratch without the `--split`
> +    flag.
> ++
> +There are no fundamental limitations that stand in the way of being able
> +to implement this feature. It is omitted from the initial implementation
> +in order to reduce the complexity, but will be added later.
> +
> +  - Support for reachability bitmaps. The classic single MIDX
> +    implementation does support reachability bitmaps (see the section
> +    titled "multi-pack-index reverse indexes" in
> +    linkgit:gitformat-pack[5] for more details).
> ++
> +As above, there are no fundamental limitations that stand in the way of
> +extending the incremental MIDX format to support reachability bitmaps.
> +The design below specifically takes this into account, and support for
> +reachability bitmaps will be added in a future patch series. It is
> +omitted from this series for the same reason as above.

It is nice that you added a bit of a roadmap here about what is
implemented and what is not, and that the design takes into account
future directions (especially incremental bitmap generation).

It does feel a little funny to say "this series" in text that will go
into the repository (i.e., somebody reading the checked out file will
say "huh? which series?"). I'm not sure how to word it better, except to
maybe just say "in the future" and "it is omitted for now" (and
obviously it's a pretty minor point).

> +In brief, to support reachability bitmaps with the incremental MIDX
> +feature, the concept of the pseudo-pack order is extended across each
> +layer of the incremental MIDX chain to form a concatenated pseudo-pack
> +order. This concatenation takes place in the same order as the chain
> +itself (in other words, the concatenated pseudo-pack order for a chain
> +`{$H1, $H2, $H3}` would be the pseudo-pack order for `$H1`, followed by
> +the pseudo-pack order for `$H2`, followed by the pseudo-pack order for
> +`$H3`).

OK, that makes sense. It's how I'd have intuitively assumed it to be,
and most importantly, it should allow appending to the chain without
regenerating (or even translating) earlier bitmaps.

> +=== File layout
> +
> +Instead of storing a single `multi-pack-index` file (with an optional
> +`.rev` and `.bitmap` extension) in `$GIT_DIR/objects/pack`, incremental
> +MIDXs are stored in the following layout:
> +
> +----
> +$GIT_DIR/objects/pack/multi-pack-index.d/
> +$GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-chain
> +$GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-$H1.midx
> +$GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-$H2.midx
> +$GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-$H3.midx
> +----
> +
> +The `multi-pack-index-chain` file contains a list of the incremental
> +MIDX files in the chain, in order. The above example shows a chain whose
> +`multi-pack-index-chain` file would contain the following lines:
> +
> +----
> +$H1
> +$H2
> +$H3
> +----
> +
> +The `multi-pack-index-$H1.midx` file contains the first layer of the
> +multi-pack-index chain. The `multi-pack-index-$H2.midx` file contains
> +the second layer of the chain, and so on.

Makes sense. How does the chained multi-pack-index.d interact with a
singular multi-pack-index? Generally we should not have both at the same
time, but I'd imagine they both exist for a brief period when moving
from one to another.

I assume the rules are the same as for commit-graphs, which use the same
on-disk structure. I can't think of a reason to prefer one over the
other but this might be a good place to document what does/should
happen.

> +=== Object positions for incremental MIDXs
> +
> +In the original multi-pack-index design, we refer to objects via their
> +lexicographic position (by object IDs) within the repository's singular
> +multi-pack-index. In the incremental multi-pack-index design, we refer
> +to objects via their index into a concatenated lexicographic ordering
> +among each component in the MIDX chain.

How do duplicate objects work here? I guess there aren't any duplicates
in the midx itself, only in the constituent packfiles. So from the
perspective of this section, I guess it doesn't matter? And from the
perspective of bitmaps (where the duplicate issue came up before), it is
business as usual: the midx revindex gives the bit order, and we'd
presumably concatenate the individual revindexes in chain order.

(Mostly just thinking out loud; I'm not sure there's much for you to
answer there).


Looking good so far...

-Peff

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 02/19] midx: add new fields for incremental MIDX chains
  2024-07-17 21:12   ` [PATCH v2 02/19] midx: add new fields for incremental MIDX chains Taylor Blau
@ 2024-08-01  9:21     ` Jeff King
  2024-08-01 18:54       ` Taylor Blau
  0 siblings, 1 reply; 102+ messages in thread
From: Jeff King @ 2024-08-01  9:21 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Wed, Jul 17, 2024 at 05:12:01PM -0400, Taylor Blau wrote:

> The incremental MIDX chain feature is designed around the idea of
> indexing into a concatenated lexicographic ordering of object IDs
> present in the MIDX.
> 
> When given an object position, the MIDX machinery needs to be able to
> locate both (a) which MIDX layer contains the given object, and (b) at
> what position *within that MIDX layer* that object appears.
> 
> To do this, three new fields are added to the `struct multi_pack_index`:
> 
>   - struct multi_pack_index *base_midx;
>   - uint32_t num_objects_in_base;
>   - uint32_t num_packs_in_base;
> 
> These three fields store the pieces of information suggested by their
> respective field names. In turn, the `num_objects_in_base` and
> `num_packs_in_base` fields are used to crawl backwards along the
> `base_midx` pointer to locate the appropriate position for a given
> object within the MIDX that contains it.

OK, so base_midx is a back-pointer. I think in theory you could compute
num_objects_in_base on the fly by doing that crawl yourself, but we'd
want to be able to do it in constant time, rather than O(# of midx)?

Makes sense.

-Peff

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 03/19] midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs
  2024-07-17 21:12   ` [PATCH v2 03/19] midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs Taylor Blau
@ 2024-08-01  9:30     ` Jeff King
  2024-08-01 18:57       ` Taylor Blau
  0 siblings, 1 reply; 102+ messages in thread
From: Jeff King @ 2024-08-01  9:30 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Wed, Jul 17, 2024 at 05:12:04PM -0400, Taylor Blau wrote:

> [^1]: As a reminder, this means that the object is identified among the
>   objects contained in all layers of the incremental MIDX chain, not any
>   particular layer. For example, consider MIDX chain with two individual
>   MIDXs, one with 4 objects and another with 3 objects. If the MIDX with
>   4 objects appears earlier in the chain, then asking for pack "6" would
>   return the second object in the MIDX with 3 objects.

I think this is "object 6" in the final sentence?

Otherwise, the explanation lays things out pretty well. Let's look at
the code.

> +static uint32_t midx_for_object(struct multi_pack_index **_m, uint32_t pos)
> +{
> +	struct multi_pack_index *m = *_m;
> +	while (m && pos < m->num_objects_in_base)
> +		m = m->base_midx;

OK, so given a global position, we walk backwards until we find the
correct midx...

> +	if (!m)
> +		BUG("NULL multi-pack-index for object position: %"PRIu32, pos);
> +
> +	if (pos >= m->num_objects + m->num_objects_in_base)
> +		die(_("invalid MIDX object position, MIDX is likely corrupt"));

...and we double check that the given base claims to have that position.
Seems obvious.

> +	*_m = m;
> +
> +	return pos - m->num_objects_in_base;

And then we adjust it into a per-midx position.

> @@ -334,8 +351,10 @@ off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
>  
>  uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos)
>  {
> -	return get_be32(m->chunk_object_offsets +
> -			(off_t)pos * MIDX_CHUNK_OFFSET_WIDTH);
> +	pos = midx_for_object(&m, pos);
> +
> +	return m->num_packs_in_base + get_be32(m->chunk_object_offsets +
> +					       (off_t)pos * MIDX_CHUNK_OFFSET_WIDTH);
>  }

OK, so now this function translates a global position into a local one,
and then we get the pack id for the local midx/pos, and then turn it
back into a global pack id.

That all makes sense, but you definitely have to read carefully to make
sure which positions/ids are global within the chain and which are local
to a midx.

I wonder if the type system can help us annotate them, but I suspect it
becomes awkward. Just typedef-ing them to uint32_t means the compiler
won't warn us when we use one in the wrong spot. Sticking them in
structs would solve that, but then using them is painful. Let's keep
reading and see if it's even an issue in practice.

-Peff

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 04/19] midx: teach `prepare_midx_pack()` about incremental MIDXs
  2024-07-17 21:12   ` [PATCH v2 04/19] midx: teach `prepare_midx_pack()` " Taylor Blau
@ 2024-08-01  9:35     ` Jeff King
  2024-08-01 19:00       ` Taylor Blau
  0 siblings, 1 reply; 102+ messages in thread
From: Jeff King @ 2024-08-01  9:35 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Wed, Jul 17, 2024 at 05:12:07PM -0400, Taylor Blau wrote:

> The function `prepare_midx_pack()` is part of the midx.h API and
> loads the pack identified by the MIDX-local 'pack_int_id'. This patch
> prepares that function to be aware of an incremental MIDX world.
> 
> To do this, introduce the second of the two general purpose helpers
> mentioned in the previous commit. This commit introduces
> `midx_for_pack()`, which is the pack-specific analog of
> `midx_for_object()`, and works in the same fashion.
> 
> Like `midx_for_object()`, this function chases down the '->base_midx'
> field until it finds the MIDX layer within the chain that contains the
> given pack.
> 
> Use this function within `prepare_midx_pack()` so that the `pack_int_id`
> it expects is now relative to the entire MIDX chain, and that it
> prepares the given pack in the appropriate MIDX.

OK, I'm adequately prepared for more global/local confusion. :)

> -int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id)
> +static uint32_t midx_for_pack(struct multi_pack_index **_m,
> +			      uint32_t pack_int_id)
>  {
> -	struct strbuf pack_name = STRBUF_INIT;
> -	struct packed_git *p;
> +	struct multi_pack_index *m = *_m;
> +	while (m && pack_int_id < m->num_packs_in_base)
> +		m = m->base_midx;

OK, so we chase down the pack id as before...

> +	if (!m)
> +		BUG("NULL multi-pack-index for pack ID: %"PRIu32, pack_int_id);
> +
> +	if (pack_int_id >= m->num_packs + m->num_packs_in_base)
>  		die(_("bad pack-int-id: %u (%u total packs)"),
> -		    pack_int_id, m->num_packs);
> +		    pack_int_id, m->num_packs + m->num_packs_in_base);

...with the same sanity checks...

> +	*_m = m;
> +
> +	return pack_int_id - m->num_packs_in_base;

...and the same global to local offset conversion. Looks good so far.

> +int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
> +		      uint32_t pack_int_id)
> +{
> +	struct strbuf pack_name = STRBUF_INIT;
> +	struct packed_git *p;
> +	uint32_t local_pack_int_id = midx_for_pack(&m, pack_int_id);

This one uses a separate variable with the word "local" in it. Helpful. :)

> +	if (m->packs[local_pack_int_id])
>  		return 0;
>  
>  	strbuf_addf(&pack_name, "%s/pack/%s", m->object_dir,
> -		    m->pack_names[pack_int_id]);
> +		    m->pack_names[local_pack_int_id]);

OK, and then this is just existing lazy-load of the pack struct. Good.

I guess if you just reused pack_int_id for the local id, the diff would
be much smaller (this part would remain exactly the same). I dunno which
is better, but it was a little curious that the two patches differed in
approach. Probably not worth caring too much about, though.

-Peff

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 05/19] midx: teach `nth_midxed_object_oid()` about incremental MIDXs
  2024-07-17 21:12   ` [PATCH v2 05/19] midx: teach `nth_midxed_object_oid()` " Taylor Blau
@ 2024-08-01  9:38     ` Jeff King
  2024-08-01 19:03       ` Taylor Blau
  0 siblings, 1 reply; 102+ messages in thread
From: Jeff King @ 2024-08-01  9:38 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Wed, Jul 17, 2024 at 05:12:10PM -0400, Taylor Blau wrote:

> The function `nth_midxed_object_oid()` returns the object ID for a given
> object position in the MIDX lexicographic order.
> 
> Teach this function to instead operate over the concatenated
> lexicographic order defined in an earlier step so that it is able to be
> used with incremental MIDXs.
> 
> To do this, we need to both (a) adjust the bounds check for the given
> 'n', as well as record the MIDX-local position after chasing the
> `->base_midx` pointer to find the MIDX which contains that object.

Yep, this makes sense. The hard thing about reviewing this, I think, is
that each individual step like this is going to make sense, but I'll
have very little clue what spots (if any) were missed.

To some degree I think the proof will be in the pudding. If you missed
any helpers, then the end result is going to crash and burn quite badly
when used with a chained midx, and we'd see it in the test suite. And
the nice thing is that most of this is abstracted inside these helpers,
so we know the set of tricky places is generally limited to the helpers,
and not arbitrary bits of midx code.

-Peff

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 06/19] midx: teach `nth_bitmapped_pack()` about incremental MIDXs
  2024-07-17 21:12   ` [PATCH v2 06/19] midx: teach `nth_bitmapped_pack()` " Taylor Blau
@ 2024-08-01  9:39     ` Jeff King
  2024-08-01 19:07       ` Taylor Blau
  0 siblings, 1 reply; 102+ messages in thread
From: Jeff King @ 2024-08-01  9:39 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Wed, Jul 17, 2024 at 05:12:13PM -0400, Taylor Blau wrote:

> In a similar fashion as in previous commits, teach the function
> `nth_bitmapped_pack()` about incremental MIDXs by translating the given
> `pack_int_id` from the concatenated lexical order to a MIDX-local
> lexical position.
> 
> When accessing the containing MIDX's array of packs, use the local pack
> ID. Likewise, when reading the 'BTMP' chunk, use the MIDX-local offset
> when accessing the data within that chunk.

OK, makes sense.

>  int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
>  		       struct bitmapped_pack *bp, uint32_t pack_int_id)
>  {
> +	uint32_t local_pack_int_id = midx_for_pack(&m, pack_int_id);
> +

Heh, after the last one reused the "n" variable, now we are back to a
separate local variable. Not wrong, but curious to go back and forth.

-Peff

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 07/19] midx: introduce `bsearch_one_midx()`
  2024-07-17 21:12   ` [PATCH v2 07/19] midx: introduce `bsearch_one_midx()` Taylor Blau
@ 2024-08-01 10:06     ` Jeff King
  2024-08-01 19:54       ` Taylor Blau
  0 siblings, 1 reply; 102+ messages in thread
From: Jeff King @ 2024-08-01 10:06 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Wed, Jul 17, 2024 at 05:12:16PM -0400, Taylor Blau wrote:

> The `bsearch_midx()` function will be extended in a following commit to
> search for the location of a given object ID across all MIDXs in a chain
> (or the single non-chain MIDX if no chain is available).
> 
> While most callers will naturally want to use the updated
> `bsearch_midx()` function, there are a handful of special cases that
> will want finer control and will only want to search through a single
> MIDX.
> 
> For instance, the object abbreviation code, which cares about object IDs
> near to where we'd expect to find a match in a MIDX. In that case, we
> want to look at the nearby matches in each layer of the MIDX chain, not
> just a single one).

Hmm. That seems like a weird thing for the object abbreviation code to
want, just because the layers of the midx are essentially random with
respect to object names. So you have to search each layer individually
when looking for a contiguous segment of hash names. But maybe that's
why you want the one_midx() variant. Let's see...

>  static void unique_in_midx(struct multi_pack_index *m,
>  			   struct disambiguate_state *ds)
>  {
> -	uint32_t num, i, first = 0;
> -	const struct object_id *current = NULL;
> -	int len = ds->len > ds->repo->hash_algo->hexsz ?
> -		ds->repo->hash_algo->hexsz : ds->len;
> -	num = m->num_objects;
> +	for (; m; m = m->base_midx) {
> +		uint32_t num, i, first = 0;
> +		const struct object_id *current = NULL;
> +		int len = ds->len > ds->repo->hash_algo->hexsz ?
> +			ds->repo->hash_algo->hexsz : ds->len;
>  
> -	if (!num)
> -		return;
> +		num = m->num_objects + m->num_objects_in_base;
>  
> -	bsearch_midx(&ds->bin_pfx, m, &first);
> +		if (!num)
> +			continue;
>  
> -	/*
> -	 * At this point, "first" is the location of the lowest object
> -	 * with an object name that could match "bin_pfx".  See if we have
> -	 * 0, 1 or more objects that actually match(es).
> -	 */
> -	for (i = first; i < num && !ds->ambiguous; i++) {
> -		struct object_id oid;
> -		current = nth_midxed_object_oid(&oid, m, i);
> -		if (!match_hash(len, ds->bin_pfx.hash, current->hash))
> -			break;
> -		update_candidates(ds, current);
> +		bsearch_one_midx(&ds->bin_pfx, m, &first);
> +
> +		/*
> +		 * At this point, "first" is the location of the lowest
> +		 * object with an object name that could match
> +		 * "bin_pfx".  See if we have 0, 1 or more objects that
> +		 * actually match(es).
> +		 */
> +		for (i = first; i < num && !ds->ambiguous; i++) {
> +			struct object_id oid;
> +			current = nth_midxed_object_oid(&oid, m, i);
> +			if (!match_hash(len, ds->bin_pfx.hash, current->hash))
> +				break;
> +			update_candidates(ds, current);
> +		}
>  	}
>  }

This is much easier to read with "-w", of course. So yeah, the gist of
it is that we're going to loop over items in the chain via the base_midx
pointer, and then search each individually. So that makes sense.

One thing that confused me, though, is setting "num". From the "-w"
diff:

  -       num = m->num_objects;
  +
  +               num = m->num_objects + m->num_objects_in_base;
  
                  if (!num)
  -               return;
  +                       continue;

Before we only had one midx, so that was our limit. But now we are
looking at "num" as a limit in the global size of the chained midx.
Which feels weird, since we're just considering a single layer here. We
seem to use "num" in two ways:

  - we return if it's 0 (or now continue to the next layer). But
    wouldn't we want to do that per-layer? I don't think it will produce
    wrong answers, but we're less likely to kick in this early return
    (though it's not clear to me when it would ever kick in really; a
    zero-length midx?).

  - later we loop "i" from "first", using "num" as a boundary. But this
    "i" is a global position, since that's what bsearch_one_midx()
    returns, and what nth_midxed_object_oid() expects.

    So I think it's correct, though it feels like bsearch_one_midx()
    should still return the position within that midx (and then
    bsearch_midx() could add back m->num_objects_in_base to get a global
    position). And then I guess likewise there would need to be a
    midx-local version of nth_midxed_object_oid().

    I'm not sure if that would make things simpler, or just add to the
    confusion, though. It's easy to get the global/local functions mixed
    up, since of course the global ones also have to take a "struct
    multi_pack_index". We could make them take a
    "multi_pack_index_chain" instead, but now all of the other code
    which wants to treat chains and single midx files the same has to
    care about the distinction (probably by wrapping the single midx in
    a one-entry chain struct).


> @@ -708,37 +712,40 @@ static int repo_extend_abbrev_len(struct repository *r UNUSED,
>  static void find_abbrev_len_for_midx(struct multi_pack_index *m,
>  				     struct min_abbrev_data *mad)

And likewise here (again, much more readable with "-w"). Interestingly,
in this one...

>  {
> -	int match = 0;
> -	uint32_t num, first = 0;
> -	struct object_id oid;
> -	const struct object_id *mad_oid;
> +	for (; m; m = m->base_midx) {
> +		int match = 0;
> +		uint32_t num, first = 0;
> +		struct object_id oid;
> +		const struct object_id *mad_oid;
>  
> -	if (!m->num_objects)
> -		return;
> +		if (!m->num_objects)
> +			continue;

We do the early return/continue directly on the layer's m->num_objects,
which makes sense.

> -	num = m->num_objects;
> -	mad_oid = mad->oid;
> -	match = bsearch_midx(mad_oid, m, &first);
> +		num = m->num_objects + m->num_objects_in_base;
> +		mad_oid = mad->oid;
> +		match = bsearch_one_midx(mad_oid, m, &first);

But then of course we go back to the same global "num", as we must.

So I think it's all correct, minus the early continue on "!num" in the
first function.

-Peff

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 08/19] midx: teach `bsearch_midx()` about incremental MIDXs
  2024-07-17 21:12   ` [PATCH v2 08/19] midx: teach `bsearch_midx()` about incremental MIDXs Taylor Blau
@ 2024-08-01 10:07     ` Jeff King
  0 siblings, 0 replies; 102+ messages in thread
From: Jeff King @ 2024-08-01 10:07 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Wed, Jul 17, 2024 at 05:12:19PM -0400, Taylor Blau wrote:

> Now that the special cases callers of `bsearch_midx()` have been dealt
> with, teach `bsearch_midx()` to handle incremental MIDX chains.
> 
> The incremental MIDX-aware version of `bsearch_midx()` works by
> repeatedly searching for a given OID in each layer along the
> `->base_midx` pointer, stopping either when an exact match is found, or
> the end of the chain is reached.

OK. I think this could have just happened in the last patch, but no big
deal either way.

-Peff

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 09/19] midx: teach `nth_midxed_offset()` about incremental MIDXs
  2024-07-17 21:12   ` [PATCH v2 09/19] midx: teach `nth_midxed_offset()` " Taylor Blau
@ 2024-08-01 10:08     ` Jeff King
  0 siblings, 0 replies; 102+ messages in thread
From: Jeff King @ 2024-08-01 10:08 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Wed, Jul 17, 2024 at 05:12:22PM -0400, Taylor Blau wrote:

> In a similar fashion as in previous commits, teach the function
> `nth_midxed_offset()` about incremental MIDXs.
> 
> The given object `pos` is used to find the containing MIDX, and
> translated back into a MIDX-local position by assigning the return value
> of `midx_for_object()` to it.

Makes sense.

> --- a/midx.c
> +++ b/midx.c
> @@ -368,6 +368,8 @@ off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
>  	const unsigned char *offset_data;
>  	uint32_t offset32;
>  
> +	pos = midx_for_object(&m, pos);
> +

"pos" reused again! :)

It certainly makes the diffs nicer to read.

-Peff

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 10/19] midx: teach `fill_midx_entry()` about incremental MIDXs
  2024-07-17 21:12   ` [PATCH v2 10/19] midx: teach `fill_midx_entry()` " Taylor Blau
@ 2024-08-01 10:12     ` Jeff King
  2024-08-01 20:01       ` Taylor Blau
  0 siblings, 1 reply; 102+ messages in thread
From: Jeff King @ 2024-08-01 10:12 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Wed, Jul 17, 2024 at 05:12:25PM -0400, Taylor Blau wrote:

> After that point, we only need to make two special considerations within
> this function:
> 
>   - First, the pack_int_id returned to us by `nth_midxed_pack_int_id()`
>     is a position in the concatenated lexical order of packs, so we must
>     ensure that we subtract `m->num_packs_in_base` before accessing the
>     MIDX-local `packs` array.
> 
>   - Second, we must avoid translating the `pos` back to a MIDX-local
>     index, since we use it as an argument to `nth_midxed_offset()` which
>     expects a position relative to the concatenated lexical order of
>     objects.

OK. I think this is correct, but this would be another place where we
could use an nth_midxed_offset_one() function if we had one.

My thinking was that we'd avoid walking back over the midx chain again.
But I guess we don't actually do that, because our midx_for_object()
will have overwritten our "m" variable, as well. So inside
nth_midxed_offset_one() we'll immediately realize that the global
position is inside the midx we passed in. A little extra arithmetic, but
there's no pointer chasing.

-Peff

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 11/19] midx: remove unused `midx_locate_pack()`
  2024-07-17 21:12   ` [PATCH v2 11/19] midx: remove unused `midx_locate_pack()` Taylor Blau
@ 2024-08-01 10:14     ` Jeff King
  2024-08-01 20:01       ` Taylor Blau
  0 siblings, 1 reply; 102+ messages in thread
From: Jeff King @ 2024-08-01 10:14 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Wed, Jul 17, 2024 at 05:12:29PM -0400, Taylor Blau wrote:

> Commit 307d75bbe6 (midx: implement `midx_locate_pack()`, 2023-12-14)
> introduced `midx_locate_pack()`, which was described at the time as a
> complement to the function `midx_contains_pack()` which allowed
> callers to determine where in the MIDX lexical order a pack appeared, as
> opposed to whether or not it was simply contained.
> 
> 307d75bbe6 suggests that future patches would be added which would
> introduce callers for this new function, but none ever were, meaning the
> function has gone unused since its introduction.
> 
> Clean this up by in effect reverting 307d75bbe6, which removes the
> unused functions and inlines its definition back into
> `midx_contains_pack()`.
> 
> (Looking back through the list archives when 307d75bbe6 was written,
> this was in preparation for this[1] patch from back when we had the
> concept of "disjoint" packs while developing multi-pack verbatim reuse.
> That concept was abandoned before the series was merged, but I never
> dropped what would become 307d75bbe6 from the series, leading to the
> state prior to this commit).
> 
> [1]: https://lore.kernel.org/git/3019738b52ba8cd78ea696a3b800fa91e722eb66.1701198172.git.me@ttaylorr.com/

Nice description of the history. I wish all patches which said "eh, this
is unused, let's remove it" went to the same trouble to make sure we
aren't missing something subtle.

-Peff

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 12/19] midx: teach `midx_contains_pack()` about incremental MIDXs
  2024-07-17 21:12   ` [PATCH v2 12/19] midx: teach `midx_contains_pack()` about incremental MIDXs Taylor Blau
@ 2024-08-01 10:17     ` Jeff King
  0 siblings, 0 replies; 102+ messages in thread
From: Jeff King @ 2024-08-01 10:17 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Wed, Jul 17, 2024 at 05:12:32PM -0400, Taylor Blau wrote:

> Now that the `midx_contains_pack()` versus `midx_locate_pack()` debacle
> has been cleaned up, teach the former about how to operate in an
> incremental MIDX-aware world in a similar fashion as in previous
> commits.
> 
> Instead of using either of the two `midx_for_object()` or
> `midx_for_pack()` helpers, this function is split into two: one that
> determines whether a pack is contained in a single MIDX, and another
> which calls the former in a loop over all MIDXs.
> 
> This approach does not require that we change any of the implementation
> in what is now `midx_contains_pack_1()` as it still operates over a
> single MIDX.

Makes sense. There is no ordering or relationship for which packs might
be in which midx, so we have to just walk them linearly and check each
part of the chain.

-Peff

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 13/19] midx: teach `midx_preferred_pack()` about incremental MIDXs
  2024-07-17 21:12   ` [PATCH v2 13/19] midx: teach `midx_preferred_pack()` " Taylor Blau
@ 2024-08-01 10:25     ` Jeff King
  2024-08-01 20:05       ` Taylor Blau
  0 siblings, 1 reply; 102+ messages in thread
From: Jeff King @ 2024-08-01 10:25 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Wed, Jul 17, 2024 at 05:12:35PM -0400, Taylor Blau wrote:

> The function `midx_preferred_pack()` is used to determine the identity
> of the preferred pack, which is the identity of a unique pack within
> the MIDX which is used as a tie-breaker when selecting from which pack
> to represent an object that appears in multiple packs within the MIDX.
> 
> Historically we have said that the MIDX's preferred pack has the unique
> property that all objects from that pack are represented in the MIDX.
> But that isn't quite true: a more precise statement would be that all
> objects from that pack *which appear in the MIDX* are selected from that
> pack.
> 
> This helps us extend the concept of preferred packs across a MIDX chain,
> where some object(s) in the preferred pack may appear in other packs
> in an earlier MIDX layer, in which case those object(s) will not appear
> in a subsequent MIDX layer from either the preferred pack or any other
> pack.

OK, that matches my intuition for how the preferred concept should
exist. I'm not quite clear on how that will affect the code, though.

> diff --git a/midx.c b/midx.c
> index 0fa8febb9d..d2dbea41e4 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -500,13 +500,16 @@ int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name)
>  int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
>  {
>  	if (m->preferred_pack_idx == -1) {
> +		uint32_t midx_pos;
>  		if (load_midx_revindex(m) < 0) {
>  			m->preferred_pack_idx = -2;
>  			return -1;
>  		}
>  
> -		m->preferred_pack_idx =
> -			nth_midxed_pack_int_id(m, pack_pos_to_midx(m, 0));
> +		midx_pos = pack_pos_to_midx(m, m->num_objects_in_base);
> +
> +		m->preferred_pack_idx = nth_midxed_pack_int_id(m, midx_pos);
> +

OK, so rather than looking for the pack of object 0, we're looking for
the first one in _this_ layer, since the position is global within the
midx. That makes some sense, but is pack_pos_to_midx() ready for that?
It looks like it just looks at m->revindex_data. Are we going to be
generating a revindex for the whole chain? I'd think that each layer
would have its own revindex (and any trickiness would happen at the
generation stage, making sure we don't insert objects that are already
mentioned in earlier layers).

-Peff

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 14/19] midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs
  2024-07-17 21:12   ` [PATCH v2 14/19] midx: teach `midx_fanout_add_midx_fanout()` " Taylor Blau
@ 2024-08-01 10:29     ` Jeff King
  2024-08-01 20:09       ` Taylor Blau
  0 siblings, 1 reply; 102+ messages in thread
From: Jeff King @ 2024-08-01 10:29 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Wed, Jul 17, 2024 at 05:12:38PM -0400, Taylor Blau wrote:

> The function `midx_fanout_add_midx_fanout()` is used to help construct
> the fanout table when generating a MIDX by reusing data from an existing
> MIDX.

I'm not sure I understand the original function enough to know if we're
doing the right thing. But I notice that after your series, we can only
get into midx_fanout_add_midx_fanout() if !ctx->incremental. So is this
code even used for an incremental midx?

Or is it used if we are writing a non-incremental midx, but trying to
reuse data from a chained one?

-Peff

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 15/19] midx: support reading incremental MIDX chains
  2024-07-17 21:12   ` [PATCH v2 15/19] midx: support reading incremental MIDX chains Taylor Blau
@ 2024-08-01 10:40     ` Jeff King
  2024-08-01 20:35       ` Taylor Blau
  0 siblings, 1 reply; 102+ messages in thread
From: Jeff King @ 2024-08-01 10:40 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Wed, Jul 17, 2024 at 05:12:41PM -0400, Taylor Blau wrote:

> Now that the MIDX machinery's internals have been taught to understand
> incremental MIDXs over the previous handful of commits, the MIDX
> machinery itself can begin reading incremental MIDXs.
> 
> (Note that while the on-disk format for incremental MIDXs has been
> defined, the writing end has not been implemented. This will take place
> in the commit after next.)
> 
> The core of this change involves following the order specified in the
> MIDX chain and opening up MIDXs in the chain one-by-one, adding them to
> the previous layer's `->base_midx` pointer at each step.

This makes it sound like reading a chain file of:

  multi-pack-index-$H1.midx
  multi-pack-index-$H2.midx
  multi-pack-index-$H3.midx

will have H1's base_midx pointing to H2. But the design document from
the first patch made me think it went the other way (H1 is the oldest
midx, then H2, then H3). For many things the ordering doesn't matter,
but I'd think the pseudo-pack order would go from the root of the
base_midx walk to the tip. That is, the base_midx pointers go in reverse
chronological order.

Looking at the code, I think it's doing what I expect. Not sure if I'm
mis-reading what you wrote above, or if it's wrong.

> [...]

The code itself all looked reasonable. There are a scary number of spots
where we have to do global/local position conversion. It's hard to know
if you got them all.

-Peff

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 16/19] midx: implement verification support for incremental MIDXs
  2024-07-17 21:12   ` [PATCH v2 16/19] midx: implement verification support for incremental MIDXs Taylor Blau
@ 2024-08-01 10:41     ` Jeff King
  0 siblings, 0 replies; 102+ messages in thread
From: Jeff King @ 2024-08-01 10:41 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Wed, Jul 17, 2024 at 05:12:44PM -0400, Taylor Blau wrote:

> Teach the verification implementation used by `git multi-pack-index
> verify` to perform verification for incremental MIDX chains by
> independently validating each layer within the chain.
> 
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  midx.c | 47 ++++++++++++++++++++++++++++++-----------------
>  midx.h |  2 ++
>  2 files changed, 32 insertions(+), 17 deletions(-)

Another one that benefits from "-w". Again, all looked good but I have
to just assume that you touched all of the spots that needed it.

-Peff

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 17/19] t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  2024-07-17 21:12   ` [PATCH v2 17/19] t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
@ 2024-08-01 10:46     ` Jeff King
  2024-08-01 20:36       ` Taylor Blau
  0 siblings, 1 reply; 102+ messages in thread
From: Jeff King @ 2024-08-01 10:46 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Wed, Jul 17, 2024 at 05:12:47PM -0400, Taylor Blau wrote:

> Two years ago, commit ff1e653c8e2 (midx: respect
> 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP', 2021-08-31) introduced a new
> environment variable which caused the test suite to write MIDX bitmaps
> after any 'git repack' invocation.
> 
> At the time, this was done to help flush out any bugs with MIDX bitmaps
> that weren't explicitly covered in the t5326-multi-pack-bitmap.sh
> script.
> 
> Two years later, that flag has served us well and is no longer providing
> meaningful coverage, as the script in t5326 has matured substantially
> and covers many more interesting cases than it did back when ff1e653c8e2
> was originally written.

I do think it could be providing some value still, just because other
scripts may create unusual setups that will exercise the code in
different ways. That said...

> Remove the 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' environment variable
> as it is no longer serving a useful purpose. More importantly, removing
> this variable clears the way for us to introduce a new one to help
> similarly flush out bugs related to incremental MIDX chains.
> 
> Because these incremental MIDX chains are (for now) incompatible with
> MIDX bitmaps, we cannot have both.

...if it is one or the other, I think it is better to test the new code.

And I do think that midx bitmap code is less likely to be exercised in
interesting ways by random parts of the test suite (versus something
like GIT_TEST_DEFAULT_HASH, whose effects are pervasive). So I think
this is a good tradeoff.

>  builtin/repack.c                  | 12 ++----------
>  ci/run-build-and-tests.sh         |  1 -
>  midx.h                            |  2 --
>  t/README                          |  4 ----
>  t/t0410-partial-clone.sh          |  2 --
>  t/t5310-pack-bitmaps.sh           |  4 ----
>  t/t5319-multi-pack-index.sh       |  3 +--
>  t/t5326-multi-pack-bitmaps.sh     |  3 +--
>  t/t5327-multi-pack-bitmaps-rev.sh |  5 ++---
>  t/t7700-repack.sh                 | 21 +++++++--------------
>  10 files changed, 13 insertions(+), 44 deletions(-)

Patch looks good.

-Peff

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 19/19] midx: implement support for writing incremental MIDX chains
  2024-07-17 21:12   ` [PATCH v2 19/19] midx: implement support for writing incremental MIDX chains Taylor Blau
@ 2024-08-01 11:07     ` Jeff King
  2024-08-01 20:39       ` Taylor Blau
  0 siblings, 1 reply; 102+ messages in thread
From: Jeff King @ 2024-08-01 11:07 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Wed, Jul 17, 2024 at 05:12:53PM -0400, Taylor Blau wrote:

> The implementation for doing so is relatively straightforward, and boils
> down to a handful of different kinds of changes implemented in this
> patch:
> 
>   - The `compute_sorted_entries()` function is taught to reject objects
>     which appear in any existing MIDX layer.

OK, I think this is one part I was looking for earlier but didn't see.
The implementation looks pretty easy (we can always ask about just the
earlier layers by feeding the base_midx pointer to midx_has_oid(), etc).

>   - Functions like `write_midx_revindex()` are adjusted to write
>     pack_order values which are offset by the number of objects in the
>     base MIDX layer.
> 
>   - The end of `write_midx_internal()` is adjusted to move
>     non-incremental MIDX files when necessary (i.e. when creating an
>     incremental chain with an existing non-incremental MIDX in the
>     repository).
> 
> There are a handful of other changes that are introduced, like new
> functions to clear incremental MIDX files that are unrelated to the
> current chain (using the same "keep_hash" mechanism as in the
> non-incremental case).

That all makes sense. I wondered a bit about selection of packs, size of
incremental, etc. We'd probably want a geometric-ish progression, just
like with packs, to balance cost of generation versus cost of lookups.
But I guess we get that for free to some degree with "repack
--geometric", assuming that our incremental midx is just covering the
new packfiles.

> The tests explicitly exercising the new incremental MIDX feature are
> relatively limited for two reasons:
> 
>   1. Most of the "interesting" behavior is already thoroughly covered in
>      t5319-multi-pack-index.sh, which handles the core logic of reading
>      objects through a MIDX.
> 
>      The new tests in t5334-incremental-multi-pack-index.sh are mostly
>      focused on creating and destroying incremental MIDXs, as well as
>      stitching their results together across layers.

Do you mean here that t5319 will get coverage when
GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL is set? In the long run, I
wonder if we should pull t5319's tests into lib-midx.sh and run them in
incremental and non-incremental modes.

>   2. A new GIT_TEST environment variable is added called
>      "GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL", which modifies the
>      entire test suite to write incremental MIDXs after repacking when
>      combined with the "GIT_TEST_MULTI_PACK_INDEX" variable.
> 
>      This exercises the long tail of other interesting behavior that is
>      defined implicitly throughout the rest of the CI suite. It is
>      likewise added to the linux-TEST-vars job.

Makes sense.

> diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
> index 8f34f05087..be1188e736 100755
> --- a/t/t7700-repack.sh
> +++ b/t/t7700-repack.sh
> @@ -7,6 +7,9 @@ test_description='git repack works correctly'
>  . "${TEST_DIRECTORY}/lib-midx.sh"
>  . "${TEST_DIRECTORY}/lib-terminal.sh"
>  
> +GIT_TEST_MULTI_PACK_INDEX=0
> +GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=0
> [...]
> -		GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adl --write-bitmap-index 2>err &&
> +		git repack -Adl --write-bitmap-index 2>err &&

This pulls the GIT_TEST_MULTI_PACK_INDEX=0 out of the individual tests
and sets it for the whole file. Are we losing some coverage for the
other tests? I doubt it's that big a deal either way (and I can
certainly see the argument that t7700, which is concerned with the
details of repacking, should be in control of the details of midx
generation). But I wonder if just setting WRITE_INCREMENTAL=0 would be
enough?

The rest of the changes all looked pretty reasonable, though again, it's
easy to review code you wrote and say "that looks good" but not realize
any gotchas that both of us missed.

-Peff

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 00/19] midx: incremental multi-pack indexes, part one
  2024-07-17 21:11 ` [PATCH v2 " Taylor Blau
                     ` (18 preceding siblings ...)
  2024-07-17 21:12   ` [PATCH v2 19/19] midx: implement support for writing incremental MIDX chains Taylor Blau
@ 2024-08-01 11:14   ` Jeff King
  2024-08-01 20:41     ` Taylor Blau
  19 siblings, 1 reply; 102+ messages in thread
From: Jeff King @ 2024-08-01 11:14 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Wed, Jul 17, 2024 at 05:11:54PM -0400, Taylor Blau wrote:

> This series implements incremental MIDXs, which allow for storing
> a MIDX across multiple layers, each with their own distinct set of
> packs.
> 
> This round is mostly unchanged from the previous since there has not yet
> been substantial review. But it does rebase to current 'master' (which
> is 04f5a52757 (Post 2.46-rc0 batch #2, 2024-07-16), at the time of
> writing).
> 
> Importantly, this rebase moves this topic to be based on an ancestor of
> 0c5a62f14b (midx-write.c: do not read existing MIDX with
> `packs_to_include`, 2024-06-11), which resulted in a non-trivial
> conflict prior to this rebase.
> 
> The rest of the topic is unchanged. I don't expect that we'll see much
> review here for the next couple of weeks while we are in the -rc phase,
> but I figured it would be useful to have it on the list for folks that
> are interested in taking a look.
> 
> Thanks in advance for any review! :-)

I gave it a pretty thorough look. Everything looks good for the most
part. I left a few comments, but mostly just thinking my way through
things.

The trickiest parts were:

  - the confusion between when we want local per-layer positions versus
    global positions within the whole chainfile, or whether functions
    are operating on a single layer versus the whole chain. I mused a
    bit on how we could do it differently, but ultimately I'm not sure
    there any good solutions.

  - the changes you did make look good, but it's hard to know if there's
    code lurking that still needs to be adjusted for chained midx's. For
    that I think I'd turn more towards testing than code review. I'm not
    sure how much interesting coverage we're getting from the GIT_TEST
    variable, just because the repositories made in most of the tests
    are so trivial.

    I'd love to see the results on a real workload (both a big repo, but
    also how things behave over days or weeks of repository maintenance
    done with incremental midxs). I know that can be hard to do, though.

-Peff

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 01/19] Documentation: describe incremental MIDX format
  2024-08-01  9:19     ` Jeff King
@ 2024-08-01 18:52       ` Taylor Blau
  0 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-01 18:52 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano

On Thu, Aug 01, 2024 at 05:19:52AM -0400, Jeff King wrote:
> > +As above, there are no fundamental limitations that stand in the way of
> > +extending the incremental MIDX format to support reachability bitmaps.
> > +The design below specifically takes this into account, and support for
> > +reachability bitmaps will be added in a future patch series. It is
> > +omitted from this series for the same reason as above.
>
> It is nice that you added a bit of a roadmap here about what is
> implemented and what is not, and that the design takes into account
> future directions (especially incremental bitmap generation).
>
> It does feel a little funny to say "this series" in text that will go
> into the repository (i.e., somebody reading the checked out file will
> say "huh? which series?"). I'm not sure how to word it better, except to
> maybe just say "in the future" and "it is omitted for now" (and
> obviously it's a pretty minor point).

s/this series/the current implementation/ ?

Definitely an oversight on my part, thanks for catching. I'll squash it
in.

> > +The `multi-pack-index-$H1.midx` file contains the first layer of the
> > +multi-pack-index chain. The `multi-pack-index-$H2.midx` file contains
> > +the second layer of the chain, and so on.
>
> Makes sense. How does the chained multi-pack-index.d interact with a
> singular multi-pack-index? Generally we should not have both at the same
> time, but I'd imagine they both exist for a brief period when moving
> from one to another.
>
> I assume the rules are the same as for commit-graphs, which use the same
> on-disk structure. I can't think of a reason to prefer one over the
> other but this might be a good place to document what does/should
> happen.

The commit-graph code reads the non-chained commit-graph first (see
commit-graph.c::read_commit_graph_one() for exact details, paraphrased
here):

    struct commit_graph *g = load_commit_graph_v1(...);
    if (!g)
            g = load_commit_graph_chain(...);
    return g;

and I matched the same for the MIDX code. I think there are reasonable
arguments for preferring either one over the other, so I think the
easiest thing to do is just throw our hands up and stick with the
convention ;-).

> > +=== Object positions for incremental MIDXs
> > +
> > +In the original multi-pack-index design, we refer to objects via their
> > +lexicographic position (by object IDs) within the repository's singular
> > +multi-pack-index. In the incremental multi-pack-index design, we refer
> > +to objects via their index into a concatenated lexicographic ordering
> > +among each component in the MIDX chain.
>
> How do duplicate objects work here? I guess there aren't any duplicates
> in the midx itself, only in the constituent packfiles. So from the
> perspective of this section, I guess it doesn't matter? And from the
> perspective of bitmaps (where the duplicate issue came up before), it is
> business as usual: the midx revindex gives the bit order, and we'd
> presumably concatenate the individual revindexes in chain order.
>
> (Mostly just thinking out loud; I'm not sure there's much for you to
> answer there).

Right. In a pre-incremental MIDX world, MIDXs contain no duplicate
object entries with respect to themselves. In this new world, the same
is true, with the additional property that MIDXs also contain no
duplicates with respect to their ancestors (when part of a MIDX chain).

> Looking good so far...
>
> -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 02/19] midx: add new fields for incremental MIDX chains
  2024-08-01  9:21     ` Jeff King
@ 2024-08-01 18:54       ` Taylor Blau
  0 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-01 18:54 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano

On Thu, Aug 01, 2024 at 05:21:42AM -0400, Jeff King wrote:
> On Wed, Jul 17, 2024 at 05:12:01PM -0400, Taylor Blau wrote:
>
> > The incremental MIDX chain feature is designed around the idea of
> > indexing into a concatenated lexicographic ordering of object IDs
> > present in the MIDX.
> >
> > When given an object position, the MIDX machinery needs to be able to
> > locate both (a) which MIDX layer contains the given object, and (b) at
> > what position *within that MIDX layer* that object appears.
> >
> > To do this, three new fields are added to the `struct multi_pack_index`:
> >
> >   - struct multi_pack_index *base_midx;
> >   - uint32_t num_objects_in_base;
> >   - uint32_t num_packs_in_base;
> >
> > These three fields store the pieces of information suggested by their
> > respective field names. In turn, the `num_objects_in_base` and
> > `num_packs_in_base` fields are used to crawl backwards along the
> > `base_midx` pointer to locate the appropriate position for a given
> > object within the MIDX that contains it.
>
> OK, so base_midx is a back-pointer. I think in theory you could compute
> num_objects_in_base on the fly by doing that crawl yourself, but we'd
> want to be able to do it in constant time, rather than O(# of midx)?
>
> Makes sense.

Yep. As you have seen in the later patches in this series, we end up
needing to read num_objects_in_base and num_packs_in_base quite
frequently, so it's nice to have them precomputed.

We could compute them lazily, but they're easy to build up in
midx.c::add_midx_to_chain(), where we're already validating that, for
e.g.:

    if (unsigned_add_overflows(midx_chain->num_packs,
                               midx_chain->num_packs_in_base)) {
            /* ... */
    }
    midx->num_packs_in_base = midx_chain->num_packs +
            midx_chain->num_packs_in_base;

> -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 03/19] midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs
  2024-08-01  9:30     ` Jeff King
@ 2024-08-01 18:57       ` Taylor Blau
  0 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-01 18:57 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano

On Thu, Aug 01, 2024 at 05:30:06AM -0400, Jeff King wrote:
> On Wed, Jul 17, 2024 at 05:12:04PM -0400, Taylor Blau wrote:
>
> > [^1]: As a reminder, this means that the object is identified among the
> >   objects contained in all layers of the incremental MIDX chain, not any
> >   particular layer. For example, consider MIDX chain with two individual
> >   MIDXs, one with 4 objects and another with 3 objects. If the MIDX with
> >   4 objects appears earlier in the chain, then asking for pack "6" would
> >   return the second object in the MIDX with 3 objects.
>
> I think this is "object 6" in the final sentence?

Oops, yes. Thanks for spotting, this was as easy as s/pack/object on the
second to last line in the paragraph quoted above.

> OK, so now this function translates a global position into a local one,
> and then we get the pack id for the local midx/pos, and then turn it
> back into a global pack id.
>
> That all makes sense, but you definitely have to read carefully to make
> sure which positions/ids are global within the chain and which are local
> to a midx.

Exactly.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 04/19] midx: teach `prepare_midx_pack()` about incremental MIDXs
  2024-08-01  9:35     ` Jeff King
@ 2024-08-01 19:00       ` Taylor Blau
  0 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-01 19:00 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano

On Thu, Aug 01, 2024 at 05:35:50AM -0400, Jeff King wrote:
> OK, I'm adequately prepared for more global/local confusion. :)

Good, since there is definitely more where that came from ;-).

> I guess if you just reused pack_int_id for the local id, the diff would
> be much smaller (this part would remain exactly the same). I dunno which
> is better, but it was a little curious that the two patches differed in
> approach. Probably not worth caring too much about, though.

I meandered a lot about different approaches before I arrived at what
became midx_for_pack() and midx_for_object(). So I think declaring a new
local_pack_int_id was a relic from when perhaps the function returned
void and expected to have a uint32_t* to write the translated pack ID
to.

Later translations make use of the:

     pack_int_id = midx_for_pack(&m, pack_int_id);

pattern when they do not care about the global pack ID, and this one
should as well. I'll update the patch to do that.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 05/19] midx: teach `nth_midxed_object_oid()` about incremental MIDXs
  2024-08-01  9:38     ` Jeff King
@ 2024-08-01 19:03       ` Taylor Blau
  0 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-01 19:03 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano

On Thu, Aug 01, 2024 at 05:38:05AM -0400, Jeff King wrote:
> To some degree I think the proof will be in the pudding. If you missed
> any helpers, then the end result is going to crash and burn quite badly
> when used with a chained midx, and we'd see it in the test suite. And
> the nice thing is that most of this is abstracted inside these helpers,
> so we know the set of tricky places is generally limited to the helpers,
> and not arbitrary bits of midx code.

I think back in the old days, I might have considered experimenting with
GitHub's fork of Git to build up this feature before sharing it with the
list.

But experience has taught me that it's far better to share early and
often. Doing so gets you the benefit of having many eyes on the feature,
not just from individuals at GitHub.

Selfishly, it also reduces the pain of having to change some on-disk
format that has already been widely rolled out within GitHub's
infrastructure and similar, but the former motivation is much more
compelling.

In terms of "the pudding" here, I think that marking this feature as
experimental / incomplete is a good way for us to push this forward and
build up some real-world experience with brave users via a tagged
version of Git. Then we can refine it until we are confident it has
graduated the "experimental" phase.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 06/19] midx: teach `nth_bitmapped_pack()` about incremental MIDXs
  2024-08-01  9:39     ` Jeff King
@ 2024-08-01 19:07       ` Taylor Blau
  0 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-01 19:07 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano

On Thu, Aug 01, 2024 at 05:39:47AM -0400, Jeff King wrote:
> >  int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
> >  		       struct bitmapped_pack *bp, uint32_t pack_int_id)
> >  {
> > +	uint32_t local_pack_int_id = midx_for_pack(&m, pack_int_id);
> > +
>
> Heh, after the last one reused the "n" variable, now we are back to a
> separate local variable. Not wrong, but curious to go back and forth.

This one we care about having both, for a couple of reasons:
prepare_midx_pack() still expects us to have the global pack_int_id, and
just as well for bp->pack_int_id.

We could write this as:

    pack_int_id = midx_for_pack(&m, pack_int_id);
    if (prepare_midx_pack(r, m, pack_int_id + m->num_packs_in_base))
        return -1;

But I found it easier to have a separate local_-prefixed variable for
when referring to the MIDX-local pack identifier.

I'll add a short note in the commit message explaining why we took this
approach in this commit.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 07/19] midx: introduce `bsearch_one_midx()`
  2024-08-01 10:06     ` Jeff King
@ 2024-08-01 19:54       ` Taylor Blau
  0 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-01 19:54 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano

On Thu, Aug 01, 2024 at 06:06:35AM -0400, Jeff King wrote:
> One thing that confused me, though, is setting "num". From the "-w"
> diff:
>
>   -       num = m->num_objects;
>   +
>   +               num = m->num_objects + m->num_objects_in_base;
>
>                   if (!num)
>   -               return;
>   +                       continue;
>
> Before we only had one midx, so that was our limit. But now we are
> looking at "num" as a limit in the global size of the chained midx.
> Which feels weird, since we're just considering a single layer here. We
> seem to use "num" in two ways:
>
>   - we return if it's 0 (or now continue to the next layer). But
>     wouldn't we want to do that per-layer? I don't think it will produce
>     wrong answers, but we're less likely to kick in this early return
>     (though it's not clear to me when it would ever kick in really; a
>     zero-length midx?).

This is definitely a bug. We should certainly do something like:

    for (; m; m = m->base_midx) {
            uint32_t num;
            if (!m->num_objects)
                    continue;

            num = m->num_objects + m->num_objects_in_base;
            /* ... */
    }

I'll go ahead and fix this one up locally, which is easy enough to do.

>     So I think it's correct, though it feels like bsearch_one_midx()
>     should still return the position within that midx (and then
>     bsearch_midx() could add back m->num_objects_in_base to get a global
>     position). And then I guess likewise there would need to be a
>     midx-local version of nth_midxed_object_oid().

Like many of the other changes in this series, it's really a matter of
where you put the complexity: either it's in the callers or in the
function itself.

I think here I prefer having bsearch_one_midx() return the global
position, since it is directly usable in other top-level functions
within the MIDX API, like being able to pass it directly to
nth_midxed_object_oid() and etc. below.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 10/19] midx: teach `fill_midx_entry()` about incremental MIDXs
  2024-08-01 10:12     ` Jeff King
@ 2024-08-01 20:01       ` Taylor Blau
  0 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-01 20:01 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano

On Thu, Aug 01, 2024 at 06:12:15AM -0400, Jeff King wrote:
> On Wed, Jul 17, 2024 at 05:12:25PM -0400, Taylor Blau wrote:
>
> > After that point, we only need to make two special considerations within
> > this function:
> >
> >   - First, the pack_int_id returned to us by `nth_midxed_pack_int_id()`
> >     is a position in the concatenated lexical order of packs, so we must
> >     ensure that we subtract `m->num_packs_in_base` before accessing the
> >     MIDX-local `packs` array.
> >
> >   - Second, we must avoid translating the `pos` back to a MIDX-local
> >     index, since we use it as an argument to `nth_midxed_offset()` which
> >     expects a position relative to the concatenated lexical order of
> >     objects.
>
> OK. I think this is correct, but this would be another place where we
> could use an nth_midxed_offset_one() function if we had one.

Yeah.

> My thinking was that we'd avoid walking back over the midx chain again.
> But I guess we don't actually do that, because our midx_for_object()
> will have overwritten our "m" variable, as well. So inside
> nth_midxed_offset_one() we'll immediately realize that the global
> position is inside the midx we passed in. A little extra arithmetic, but
> there's no pointer chasing.

Yup, exactly.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 11/19] midx: remove unused `midx_locate_pack()`
  2024-08-01 10:14     ` Jeff King
@ 2024-08-01 20:01       ` Taylor Blau
  0 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-01 20:01 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano

On Thu, Aug 01, 2024 at 06:14:54AM -0400, Jeff King wrote:
> Nice description of the history. I wish all patches which said "eh, this
> is unused, let's remove it" went to the same trouble to make sure we
> aren't missing something subtle.

:-), thanks.

> -Peff
Thanks,
Taylor

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 13/19] midx: teach `midx_preferred_pack()` about incremental MIDXs
  2024-08-01 10:25     ` Jeff King
@ 2024-08-01 20:05       ` Taylor Blau
  0 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-01 20:05 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano

On Thu, Aug 01, 2024 at 06:25:17AM -0400, Jeff King wrote:
> > diff --git a/midx.c b/midx.c
> > index 0fa8febb9d..d2dbea41e4 100644
> > --- a/midx.c
> > +++ b/midx.c
> > @@ -500,13 +500,16 @@ int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name)
> >  int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
> >  {
> >  	if (m->preferred_pack_idx == -1) {
> > +		uint32_t midx_pos;
> >  		if (load_midx_revindex(m) < 0) {
> >  			m->preferred_pack_idx = -2;
> >  			return -1;
> >  		}
> >
> > -		m->preferred_pack_idx =
> > -			nth_midxed_pack_int_id(m, pack_pos_to_midx(m, 0));
> > +		midx_pos = pack_pos_to_midx(m, m->num_objects_in_base);
> > +
> > +		m->preferred_pack_idx = nth_midxed_pack_int_id(m, midx_pos);
> > +
>
> OK, so rather than looking for the pack of object 0, we're looking for
> the first one in _this_ layer, since the position is global within the
> midx. That makes some sense, but is pack_pos_to_midx() ready for that?
> It looks like it just looks at m->revindex_data. Are we going to be
> generating a revindex for the whole chain? I'd think that each layer
> would have its own revindex (and any trickiness would happen at the
> generation stage, making sure we don't insert objects that are already
> mentioned in earlier layers).

pack_pos_to_midx() is kind of ready, and kind of not.

The way that the pseudo-pack order is constructed within the
midx-write.c code, we will write reverse indexes (within each MIDX layer
itself as a separate chunk) that contain data for each object within
that layer in the expected reverse index format.

But we don't bother writing any reverse indexes for MIDXs which are
incremental at this point in the multi-series plan, since we just bail
if the BITMAP flag is set (saying that it is unsupported at this point).

Arguably we could have just left this hunk / patch out of the series as
a whole. It's this kind of stuff that's really at the boundary between
adjacent "phases" that I think is awkward no matter which way you slice
it.

> -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 14/19] midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs
  2024-08-01 10:29     ` Jeff King
@ 2024-08-01 20:09       ` Taylor Blau
  0 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-01 20:09 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano

On Thu, Aug 01, 2024 at 06:29:06AM -0400, Jeff King wrote:
> On Wed, Jul 17, 2024 at 05:12:38PM -0400, Taylor Blau wrote:
>
> > The function `midx_fanout_add_midx_fanout()` is used to help construct
> > the fanout table when generating a MIDX by reusing data from an existing
> > MIDX.
>
> I'm not sure I understand the original function enough to know if we're
> doing the right thing. But I notice that after your series, we can only
> get into midx_fanout_add_midx_fanout() if !ctx->incremental. So is this
> code even used for an incremental midx?

Originally it was, but after 0c5a62f14b (midx-write.c: do not read
existing MIDX with `packs_to_include`, 2024-06-11) we no longer use this
function in that path. But...

> Or is it used if we are writing a non-incremental midx, but trying to
> reuse data from a chained one?

...we would use it in this one, so I think the patch stands. I added a
note at the end of the commit message to make sure that we don't forget
which paths do and don't reach this function.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 15/19] midx: support reading incremental MIDX chains
  2024-08-01 10:40     ` Jeff King
@ 2024-08-01 20:35       ` Taylor Blau
  0 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-01 20:35 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano

On Thu, Aug 01, 2024 at 06:40:26AM -0400, Jeff King wrote:
> On Wed, Jul 17, 2024 at 05:12:41PM -0400, Taylor Blau wrote:
>
> > Now that the MIDX machinery's internals have been taught to understand
> > incremental MIDXs over the previous handful of commits, the MIDX
> > machinery itself can begin reading incremental MIDXs.
> >
> > (Note that while the on-disk format for incremental MIDXs has been
> > defined, the writing end has not been implemented. This will take place
> > in the commit after next.)
> >
> > The core of this change involves following the order specified in the
> > MIDX chain and opening up MIDXs in the chain one-by-one, adding them to
> > the previous layer's `->base_midx` pointer at each step.
>
> This makes it sound like reading a chain file of:
>
>   multi-pack-index-$H1.midx
>   multi-pack-index-$H2.midx
>   multi-pack-index-$H3.midx
>
> will have H1's base_midx pointing to H2. But the design document from
> the first patch made me think it went the other way (H1 is the oldest
> midx, then H2, then H3). For many things the ordering doesn't matter,
> but I'd think the pseudo-pack order would go from the root of the
> base_midx walk to the tip. That is, the base_midx pointers go in reverse
> chronological order.
>
> Looking at the code, I think it's doing what I expect. Not sure if I'm
> mis-reading what you wrote above, or if it's wrong.

The patch message is just plain wrong here. I switched the sentence
beginning with "The core of this change involves [...]" to add "in
reverse" to clarify what's going on here.

> > [...]
>
> The code itself all looked reasonable. There are a scary number of spots
> where we have to do global/local position conversion. It's hard to know
> if you got them all.

Agreed. If you have ideas to make it less scary, do let me know ;-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 17/19] t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  2024-08-01 10:46     ` Jeff King
@ 2024-08-01 20:36       ` Taylor Blau
  0 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-01 20:36 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano

On Thu, Aug 01, 2024 at 06:46:47AM -0400, Jeff King wrote:
> ...if it is one or the other, I think it is better to test the new code.
>
> And I do think that midx bitmap code is less likely to be exercised in
> interesting ways by random parts of the test suite (versus something
> like GIT_TEST_DEFAULT_HASH, whose effects are pervasive). So I think
> this is a good tradeoff.

Thanks, I appreciate your careful reasoning here.

> Patch looks good.

Thanks also for the review :-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 19/19] midx: implement support for writing incremental MIDX chains
  2024-08-01 11:07     ` Jeff King
@ 2024-08-01 20:39       ` Taylor Blau
  0 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-01 20:39 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano

On Thu, Aug 01, 2024 at 07:07:22AM -0400, Jeff King wrote:
> > The tests explicitly exercising the new incremental MIDX feature are
> > relatively limited for two reasons:
> >
> >   1. Most of the "interesting" behavior is already thoroughly covered in
> >      t5319-multi-pack-index.sh, which handles the core logic of reading
> >      objects through a MIDX.
> >
> >      The new tests in t5334-incremental-multi-pack-index.sh are mostly
> >      focused on creating and destroying incremental MIDXs, as well as
> >      stitching their results together across layers.
>
> Do you mean here that t5319 will get coverage when
> GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL is set? In the long run, I
> wonder if we should pull t5319's tests into lib-midx.sh and run them in
> incremental and non-incremental modes.

I don't know. Part of me thinks that that would be a good idea, but part
of me also thinks that it would be (a) painful (since many of those
tests assume that exactly one MIDX exists in the repository
before/during/after each test), and (b) not particularly useful (since
much of the interesting behavior occurs when multiple MIDXs contain
packs with overlapping objects, see (a)).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v2 00/19] midx: incremental multi-pack indexes, part one
  2024-08-01 11:14   ` [PATCH v2 00/19] midx: incremental multi-pack indexes, part one Jeff King
@ 2024-08-01 20:41     ` Taylor Blau
  0 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-01 20:41 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano

On Thu, Aug 01, 2024 at 07:14:10AM -0400, Jeff King wrote:
> On Wed, Jul 17, 2024 at 05:11:54PM -0400, Taylor Blau wrote:
>
> > This series implements incremental MIDXs, which allow for storing
> > a MIDX across multiple layers, each with their own distinct set of
> > packs.
> >
> > This round is mostly unchanged from the previous since there has not yet
> > been substantial review. But it does rebase to current 'master' (which
> > is 04f5a52757 (Post 2.46-rc0 batch #2, 2024-07-16), at the time of
> > writing).
> >
> > Importantly, this rebase moves this topic to be based on an ancestor of
> > 0c5a62f14b (midx-write.c: do not read existing MIDX with
> > `packs_to_include`, 2024-06-11), which resulted in a non-trivial
> > conflict prior to this rebase.
> >
> > The rest of the topic is unchanged. I don't expect that we'll see much
> > review here for the next couple of weeks while we are in the -rc phase,
> > but I figured it would be useful to have it on the list for folks that
> > are interested in taking a look.
> >
> > Thanks in advance for any review! :-)
>
> I gave it a pretty thorough look. Everything looks good for the most
> part. I left a few comments, but mostly just thinking my way through
> things.

Thanks very much.

I squashed all of the feedback that I got from your review into a local
copy, which I'll submit as "v3" (probably next week, as I am gone for a
long weekend starting ~tomorrow and would like to leave others some time
to review as well).

>   - the changes you did make look good, but it's hard to know if there's
>     code lurking that still needs to be adjusted for chained midx's. For
>     that I think I'd turn more towards testing than code review. I'm not
>     sure how much interesting coverage we're getting from the GIT_TEST
>     variable, just because the repositories made in most of the tests
>     are so trivial.
>
>     I'd love to see the results on a real workload (both a big repo, but
>     also how things behave over days or weeks of repository maintenance
>     done with incremental midxs). I know that can be hard to do, though.

Yeah, I agree that this is the biggest gap in this series and the
overall plan right now. I have some more detailed comments in [1] that I
think are useful to the overall approach.

It basically boils down to declaring the feature "experimental" and
letting users that are comfortable testing on the bleeding edge help us
iron out any bugs (along with rolling it out at GitHub once all the dust
has settled on this and subsequent parts).

Thanks again for a detailed and helpful review :-).

Thanks,
Taylor

[1]: https://lore.kernel.org/git/ZqvcAQABDIthFUPH@nand.local/

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH v3 00/19] midx: incremental multi-pack indexes, part one
  2024-06-06 23:04 [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
                   ` (22 preceding siblings ...)
  2024-07-17 21:11 ` [PATCH v2 " Taylor Blau
@ 2024-08-06 15:36 ` Taylor Blau
  2024-08-06 15:36   ` [PATCH v3 01/19] Documentation: describe incremental MIDX format Taylor Blau
                     ` (19 more replies)
  23 siblings, 20 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-06 15:36 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

This series implements incremental MIDXs, which allow for storing
a MIDX across multiple layers, each with their own distinct set of
packs.

This round is also similar to the previous one, but is rebased on
current 'master' (406f326d27 (The second batch, 2024-08-01)) and has
been updated in response to review from Peff on the previous round.

As usual, a range-diff is below, but the main changes since last time
are as follows:

  - Documentation improvements to clarify what happens when both an
    incremental- and non-incremental MIDX are both present in a
    repository.

  - Commit message typofix on 3/19 to fix an error in one of the
    technical examples.

  - Dropped a custom 'local_pack_int_id' in 4/19 to make the remaining
    diff easier to read.

  - Minor bugfix in 7/19 where we incorrectly terminated the object
    abbreviation disambiguation step for incremental MIDXs.

  - Various additional bits of information in the commit message to
    explain anything that was subtle.

Thanks in advance for any review! :-)

Taylor Blau (19):
  Documentation: describe incremental MIDX format
  midx: add new fields for incremental MIDX chains
  midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs
  midx: teach `prepare_midx_pack()` about incremental MIDXs
  midx: teach `nth_midxed_object_oid()` about incremental MIDXs
  midx: teach `nth_bitmapped_pack()` about incremental MIDXs
  midx: introduce `bsearch_one_midx()`
  midx: teach `bsearch_midx()` about incremental MIDXs
  midx: teach `nth_midxed_offset()` about incremental MIDXs
  midx: teach `fill_midx_entry()` about incremental MIDXs
  midx: remove unused `midx_locate_pack()`
  midx: teach `midx_contains_pack()` about incremental MIDXs
  midx: teach `midx_preferred_pack()` about incremental MIDXs
  midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs
  midx: support reading incremental MIDX chains
  midx: implement verification support for incremental MIDXs
  t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  t/t5313-pack-bounds-checks.sh: prepare for sub-directories
  midx: implement support for writing incremental MIDX chains

 Documentation/git-multi-pack-index.txt       |  11 +-
 Documentation/technical/multi-pack-index.txt | 103 +++++
 builtin/multi-pack-index.c                   |   2 +
 builtin/repack.c                             |   8 +-
 ci/run-build-and-tests.sh                    |   2 +-
 midx-write.c                                 | 326 ++++++++++++---
 midx.c                                       | 405 ++++++++++++++++---
 midx.h                                       |  26 +-
 object-name.c                                |  99 ++---
 packfile.c                                   |  21 +-
 packfile.h                                   |   4 +
 t/README                                     |   6 +-
 t/helper/test-read-midx.c                    |  24 +-
 t/lib-bitmap.sh                              |   6 +-
 t/lib-midx.sh                                |  28 ++
 t/t0410-partial-clone.sh                     |   2 -
 t/t5310-pack-bitmaps.sh                      |   4 -
 t/t5313-pack-bounds-checks.sh                |   8 +-
 t/t5319-multi-pack-index.sh                  |  30 +-
 t/t5326-multi-pack-bitmaps.sh                |   4 +-
 t/t5327-multi-pack-bitmaps-rev.sh            |   6 +-
 t/t5332-multi-pack-reuse.sh                  |   2 +
 t/t5334-incremental-multi-pack-index.sh      |  46 +++
 t/t7700-repack.sh                            |  48 +--
 24 files changed, 960 insertions(+), 261 deletions(-)
 create mode 100755 t/t5334-incremental-multi-pack-index.sh

Range-diff against v2:
 1:  014588b3ec !  1:  90b21b11ed Documentation: describe incremental MIDX format
    @@ Documentation/technical/multi-pack-index.txt: Design Details
     +extending the incremental MIDX format to support reachability bitmaps.
     +The design below specifically takes this into account, and support for
     +reachability bitmaps will be added in a future patch series. It is
    -+omitted from this series for the same reason as above.
    ++omitted from the current implementation for the same reason as above.
     ++
     +In brief, to support reachability bitmaps with the incremental MIDX
     +feature, the concept of the pseudo-pack order is extended across each
    @@ Documentation/technical/multi-pack-index.txt: Design Details
     +multi-pack-index chain. The `multi-pack-index-$H2.midx` file contains
     +the second layer of the chain, and so on.
     +
    ++When both an incremental- and non-incremental MIDX are present, the
    ++non-incremental MIDX is always read first.
    ++
     +=== Object positions for incremental MIDXs
     +
     +In the original multi-pack-index design, we refer to objects via their
 2:  337ebc6de7 =  2:  0d3b19c59f midx: add new fields for incremental MIDX chains
 3:  f449a72877 !  3:  5cd742b677 midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs
    @@ Commit message
           objects contained in all layers of the incremental MIDX chain, not any
           particular layer. For example, consider MIDX chain with two individual
           MIDXs, one with 4 objects and another with 3 objects. If the MIDX with
    -      4 objects appears earlier in the chain, then asking for pack "6" would
    +      4 objects appears earlier in the chain, then asking for object 6 would
           return the second object in the MIDX with 3 objects.
     
         [^2]: Building on the previous example, asking for object 6 in a MIDX
 4:  f88569c819 !  4:  372104c73d midx: teach `prepare_midx_pack()` about incremental MIDXs
    @@ midx.c: static uint32_t midx_for_object(struct multi_pack_index **_m, uint32_t p
      		die(_("bad pack-int-id: %u (%u total packs)"),
     -		    pack_int_id, m->num_packs);
     +		    pack_int_id, m->num_packs + m->num_packs_in_base);
    - 
    --	if (m->packs[pack_int_id])
    ++
     +	*_m = m;
     +
     +	return pack_int_id - m->num_packs_in_base;
    @@ midx.c: static uint32_t midx_for_object(struct multi_pack_index **_m, uint32_t p
     +{
     +	struct strbuf pack_name = STRBUF_INIT;
     +	struct packed_git *p;
    -+	uint32_t local_pack_int_id = midx_for_pack(&m, pack_int_id);
     +
    -+	if (m->packs[local_pack_int_id])
    ++	pack_int_id = midx_for_pack(&m, pack_int_id);
    + 
    + 	if (m->packs[pack_int_id])
      		return 0;
    - 
    - 	strbuf_addf(&pack_name, "%s/pack/%s", m->object_dir,
    --		    m->pack_names[pack_int_id]);
    -+		    m->pack_names[local_pack_int_id]);
    - 
    - 	p = add_packed_git(pack_name.buf, pack_name.len, m->local);
    - 	strbuf_release(&pack_name);
    -@@ midx.c: int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t
    - 		return 1;
    - 
    - 	p->multi_pack_index = 1;
    --	m->packs[pack_int_id] = p;
    -+	m->packs[local_pack_int_id] = p;
    - 	install_packed_git(r, p);
    - 	list_add_tail(&p->mru, &r->objects->packed_git_mru);
    - 
 5:  ec57ff4349 =  5:  e68a3ceff9 midx: teach `nth_midxed_object_oid()` about incremental MIDXs
 6:  650b8c8c21 !  6:  ff2d7bc5ca midx: teach `nth_bitmapped_pack()` about incremental MIDXs
    @@ Commit message
         ID. Likewise, when reading the 'BTMP' chunk, use the MIDX-local offset
         when accessing the data within that chunk.
     
    +    (Note that the both the call to prepare_midx_pack() and the assignment
    +    of bp->pack_int_id both care about the global pack_int_id, so avoid
    +    shadowing the given 'pack_int_id' parameter).
    +
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## midx.c ##
 7:  bfd1dadbf1 !  7:  32c3fceada midx: introduce `bsearch_one_midx()`
    @@ object-name.c: static int match_hash(unsigned len, const unsigned char *a, const
      
     -	if (!num)
     -		return;
    -+		num = m->num_objects + m->num_objects_in_base;
    ++		if (!m->num_objects)
    ++			continue;
      
     -	bsearch_midx(&ds->bin_pfx, m, &first);
    -+		if (!num)
    -+			continue;
    ++		num = m->num_objects + m->num_objects_in_base;
      
     -	/*
     -	 * At this point, "first" is the location of the lowest object
 8:  38bd45bd24 =  8:  16db6c98ce midx: teach `bsearch_midx()` about incremental MIDXs
 9:  342ed56033 =  9:  761c7c59ba midx: teach `nth_midxed_offset()` about incremental MIDXs
10:  2b335c45ae = 10:  8366456d29 midx: teach `fill_midx_entry()` about incremental MIDXs
11:  22de5898f3 = 11:  909d927c47 midx: remove unused `midx_locate_pack()`
12:  fb60f2b022 = 12:  71127601b5 midx: teach `midx_contains_pack()` about incremental MIDXs
13:  38b642d404 = 13:  2f98ebb141 midx: teach `midx_preferred_pack()` about incremental MIDXs
14:  594386da10 ! 14:  550ae2dc93 midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs
    @@ Commit message
             MIDX layers when dealing with an incremental MIDX chain by calling
             itself when given a MIDX with a non-NULL `base_midx`.
     
    +    Note that after 0c5a62f14b (midx-write.c: do not read existing MIDX with
    +    `packs_to_include`, 2024-06-11), we do not use this function with an
    +    existing MIDX (incremental or not) when generating a MIDX with
    +    --stdin-packs, and likewise for incremental MIDXs.
    +
    +    But it is still used when adding the fanout table from an incremental
    +    MIDX when generating a non-incremental MIDX (without --stdin-packs, of
    +    course).
    +
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## midx-write.c ##
15:  dad130799c ! 15:  9ae1bc415e midx: support reading incremental MIDX chains
    @@ Commit message
         in the commit after next.)
     
         The core of this change involves following the order specified in the
    -    MIDX chain and opening up MIDXs in the chain one-by-one, adding them to
    -    the previous layer's `->base_midx` pointer at each step.
    +    MIDX chain in reverse and opening up MIDXs in the chain one-by-one,
    +    adding them to the previous layer's `->base_midx` pointer at each step.
     
         In order to implement this, the `load_multi_pack_index()` function is
         taught to call a new `load_multi_pack_index_chain()` function if loading
16:  ad976ef413 = 16:  3d4181df51 midx: implement verification support for incremental MIDXs
17:  23912425bf = 17:  3b268f91bf t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
18:  814da1916d = 18:  09d74f8942 t/t5313-pack-bounds-checks.sh: prepare for sub-directories
19:  e2b5961b45 = 19:  5d467d38a8 midx: implement support for writing incremental MIDX chains

base-commit: 406f326d271e0bacecdb00425422c5fa3f314930
-- 
2.46.0.46.g406f326d27.dirty

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH v3 01/19] Documentation: describe incremental MIDX format
  2024-08-06 15:36 ` [PATCH v3 " Taylor Blau
@ 2024-08-06 15:36   ` Taylor Blau
  2024-08-06 15:36   ` [PATCH v3 02/19] midx: add new fields for incremental MIDX chains Taylor Blau
                     ` (18 subsequent siblings)
  19 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-06 15:36 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Prepare to implement incremental multi-pack indexes (MIDXs) over the
next several commits by first describing the relevant prerequisites
(like a new chunk in the MIDX format, the directory structure for
incremental MIDXs, etc.)

The format is described in detail in the patch contents below, but the
high-level description is as follows.

Incremental MIDXs live in $GIT_DIR/objects/pack/multi-pack-index.d, and
each `*.midx` within that directory has a single "parent" MIDX, which is
the MIDX layer immediately before it in the MIDX chain. The chain order
resides in a file 'multi-pack-index-chain' in the same directory.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/technical/multi-pack-index.txt | 103 +++++++++++++++++++
 1 file changed, 103 insertions(+)

diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
index f2221d2b44..cc063b30be 100644
--- a/Documentation/technical/multi-pack-index.txt
+++ b/Documentation/technical/multi-pack-index.txt
@@ -61,6 +61,109 @@ Design Details
 - The MIDX file format uses a chunk-based approach (similar to the
   commit-graph file) that allows optional data to be added.
 
+Incremental multi-pack indexes
+------------------------------
+
+As repositories grow in size, it becomes more expensive to write a
+multi-pack index (MIDX) that includes all packfiles. To accommodate
+this, the "incremental multi-pack indexes" feature allows for combining
+a "chain" of multi-pack indexes.
+
+Each individual component of the chain need only contain a small number
+of packfiles. Appending to the chain does not invalidate earlier parts
+of the chain, so repositories can control how much time is spent
+updating the MIDX chain by determining the number of packs in each layer
+of the MIDX chain.
+
+=== Design state
+
+At present, the incremental multi-pack indexes feature is missing two
+important components:
+
+  - The ability to rewrite earlier portions of the MIDX chain (i.e., to
+    "compact" some collection of adjacent MIDX layers into a single
+    MIDX). At present the only supported way of shrinking a MIDX chain
+    is to rewrite the entire chain from scratch without the `--split`
+    flag.
++
+There are no fundamental limitations that stand in the way of being able
+to implement this feature. It is omitted from the initial implementation
+in order to reduce the complexity, but will be added later.
+
+  - Support for reachability bitmaps. The classic single MIDX
+    implementation does support reachability bitmaps (see the section
+    titled "multi-pack-index reverse indexes" in
+    linkgit:gitformat-pack[5] for more details).
++
+As above, there are no fundamental limitations that stand in the way of
+extending the incremental MIDX format to support reachability bitmaps.
+The design below specifically takes this into account, and support for
+reachability bitmaps will be added in a future patch series. It is
+omitted from the current implementation for the same reason as above.
++
+In brief, to support reachability bitmaps with the incremental MIDX
+feature, the concept of the pseudo-pack order is extended across each
+layer of the incremental MIDX chain to form a concatenated pseudo-pack
+order. This concatenation takes place in the same order as the chain
+itself (in other words, the concatenated pseudo-pack order for a chain
+`{$H1, $H2, $H3}` would be the pseudo-pack order for `$H1`, followed by
+the pseudo-pack order for `$H2`, followed by the pseudo-pack order for
+`$H3`).
++
+The layout will then be extended so that each layer of the incremental
+MIDX chain can write a `*.bitmap`. The objects in each layer's bitmap
+are offset by the number of objects in the previous layers of the chain.
+
+=== File layout
+
+Instead of storing a single `multi-pack-index` file (with an optional
+`.rev` and `.bitmap` extension) in `$GIT_DIR/objects/pack`, incremental
+MIDXs are stored in the following layout:
+
+----
+$GIT_DIR/objects/pack/multi-pack-index.d/
+$GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-chain
+$GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-$H1.midx
+$GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-$H2.midx
+$GIT_DIR/objects/pack/multi-pack-index.d/multi-pack-index-$H3.midx
+----
+
+The `multi-pack-index-chain` file contains a list of the incremental
+MIDX files in the chain, in order. The above example shows a chain whose
+`multi-pack-index-chain` file would contain the following lines:
+
+----
+$H1
+$H2
+$H3
+----
+
+The `multi-pack-index-$H1.midx` file contains the first layer of the
+multi-pack-index chain. The `multi-pack-index-$H2.midx` file contains
+the second layer of the chain, and so on.
+
+When both an incremental- and non-incremental MIDX are present, the
+non-incremental MIDX is always read first.
+
+=== Object positions for incremental MIDXs
+
+In the original multi-pack-index design, we refer to objects via their
+lexicographic position (by object IDs) within the repository's singular
+multi-pack-index. In the incremental multi-pack-index design, we refer
+to objects via their index into a concatenated lexicographic ordering
+among each component in the MIDX chain.
+
+If `objects_nr()` is a function that returns the number of objects in a
+given MIDX layer, then the index of an object at lexicographic position
+`i` within, say, $H3 is defined as:
+
+----
+objects_nr($H2) + objects_nr($H1) + i
+----
+
+(in the C implementation, this is often computed as `i +
+m->num_objects_in_base`).
+
 Future Work
 -----------
 
-- 
2.46.0.46.g406f326d27.dirty


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 02/19] midx: add new fields for incremental MIDX chains
  2024-08-06 15:36 ` [PATCH v3 " Taylor Blau
  2024-08-06 15:36   ` [PATCH v3 01/19] Documentation: describe incremental MIDX format Taylor Blau
@ 2024-08-06 15:36   ` Taylor Blau
  2024-08-06 15:37   ` [PATCH v3 03/19] midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs Taylor Blau
                     ` (17 subsequent siblings)
  19 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-06 15:36 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The incremental MIDX chain feature is designed around the idea of
indexing into a concatenated lexicographic ordering of object IDs
present in the MIDX.

When given an object position, the MIDX machinery needs to be able to
locate both (a) which MIDX layer contains the given object, and (b) at
what position *within that MIDX layer* that object appears.

To do this, three new fields are added to the `struct multi_pack_index`:

  - struct multi_pack_index *base_midx;
  - uint32_t num_objects_in_base;
  - uint32_t num_packs_in_base;

These three fields store the pieces of information suggested by their
respective field names. In turn, the `num_objects_in_base` and
`num_packs_in_base` fields are used to crawl backwards along the
`base_midx` pointer to locate the appropriate position for a given
object within the MIDX that contains it.

The following commits will update various parts of the MIDX machinery
(as well as their callers from outside of midx.c and midx-write.c) to be
aware and make use of these fields when performing object lookups.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/midx.h b/midx.h
index 8554f2d616..020e49f77c 100644
--- a/midx.h
+++ b/midx.h
@@ -63,6 +63,10 @@ struct multi_pack_index {
 	const unsigned char *chunk_revindex;
 	size_t chunk_revindex_len;
 
+	struct multi_pack_index *base_midx;
+	uint32_t num_objects_in_base;
+	uint32_t num_packs_in_base;
+
 	const char **pack_names;
 	struct packed_git **packs;
 	char object_dir[FLEX_ARRAY];
-- 
2.46.0.46.g406f326d27.dirty


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 03/19] midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs
  2024-08-06 15:36 ` [PATCH v3 " Taylor Blau
  2024-08-06 15:36   ` [PATCH v3 01/19] Documentation: describe incremental MIDX format Taylor Blau
  2024-08-06 15:36   ` [PATCH v3 02/19] midx: add new fields for incremental MIDX chains Taylor Blau
@ 2024-08-06 15:37   ` Taylor Blau
  2024-08-06 15:37   ` [PATCH v3 04/19] midx: teach `prepare_midx_pack()` " Taylor Blau
                     ` (16 subsequent siblings)
  19 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-06 15:37 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The function `nth_midxed_pack_int_id()` takes in a object position in
MIDX lexicographic order and returns an identifier of the pack from
which that object was selected in the MIDX.

Currently, the given object position is an index into the lexicographic
order of objects in a single MIDX. Change this position to instead refer
into the concatenated lexicographic order of all MIDXs in a MIDX chain.

This has two visible effects within the implementation of
`prepare_midx_pack()`:

  - First, the given position is now an index into the concatenated
    lexicographic order of all MIDXs in the order in which they appear
    in the MIDX chain.

  - Second the pack ID returned from this function is now also in the
    concatenated order of packs among all layers of the MIDX chain in
    the same order that they appear in the MIDX chain.

To do this, introduce the first of two general purpose helpers, this one
being `midx_for_object()`. `midx_for_object()` takes a double pointer to
a `struct multi_pack_index` as well as an object `pos` in terms of the
entire MIDX chain[^1].

The function chases down the '->base_midx' field until it finds the MIDX
layer within the chain that contains the given object. It then:

  - modifies the double pointer to point to the containing MIDX, instead
    of the tip of the chain, and

  - returns the MIDX-local position[^2] at which the given object can be
    found.

Use this function within `nth_midxed_pack_int_id()` so that the `pos` it
expects is now relative to the entire MIDX chain, and that it returns
the appropriate pack position for that object.

[^1]: As a reminder, this means that the object is identified among the
  objects contained in all layers of the incremental MIDX chain, not any
  particular layer. For example, consider MIDX chain with two individual
  MIDXs, one with 4 objects and another with 3 objects. If the MIDX with
  4 objects appears earlier in the chain, then asking for object 6 would
  return the second object in the MIDX with 3 objects.

[^2]: Building on the previous example, asking for object 6 in a MIDX
  chain with (4, 3) objects, respectively, this would set the double
  pointer to point at the MIDX containing three objects, and would
  return an index to the second object within that MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index 3992b05465..39d358da20 100644
--- a/midx.c
+++ b/midx.c
@@ -242,6 +242,23 @@ void close_midx(struct multi_pack_index *m)
 	free(m);
 }
 
+static uint32_t midx_for_object(struct multi_pack_index **_m, uint32_t pos)
+{
+	struct multi_pack_index *m = *_m;
+	while (m && pos < m->num_objects_in_base)
+		m = m->base_midx;
+
+	if (!m)
+		BUG("NULL multi-pack-index for object position: %"PRIu32, pos);
+
+	if (pos >= m->num_objects + m->num_objects_in_base)
+		die(_("invalid MIDX object position, MIDX is likely corrupt"));
+
+	*_m = m;
+
+	return pos - m->num_objects_in_base;
+}
+
 int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id)
 {
 	struct strbuf pack_name = STRBUF_INIT;
@@ -334,8 +351,10 @@ off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
 
 uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos)
 {
-	return get_be32(m->chunk_object_offsets +
-			(off_t)pos * MIDX_CHUNK_OFFSET_WIDTH);
+	pos = midx_for_object(&m, pos);
+
+	return m->num_packs_in_base + get_be32(m->chunk_object_offsets +
+					       (off_t)pos * MIDX_CHUNK_OFFSET_WIDTH);
 }
 
 int fill_midx_entry(struct repository *r,
-- 
2.46.0.46.g406f326d27.dirty


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 04/19] midx: teach `prepare_midx_pack()` about incremental MIDXs
  2024-08-06 15:36 ` [PATCH v3 " Taylor Blau
                     ` (2 preceding siblings ...)
  2024-08-06 15:37   ` [PATCH v3 03/19] midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs Taylor Blau
@ 2024-08-06 15:37   ` Taylor Blau
  2024-08-06 15:37   ` [PATCH v3 05/19] midx: teach `nth_midxed_object_oid()` " Taylor Blau
                     ` (15 subsequent siblings)
  19 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-06 15:37 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The function `prepare_midx_pack()` is part of the midx.h API and
loads the pack identified by the MIDX-local 'pack_int_id'. This patch
prepares that function to be aware of an incremental MIDX world.

To do this, introduce the second of the two general purpose helpers
mentioned in the previous commit. This commit introduces
`midx_for_pack()`, which is the pack-specific analog of
`midx_for_object()`, and works in the same fashion.

Like `midx_for_object()`, this function chases down the '->base_midx'
field until it finds the MIDX layer within the chain that contains the
given pack.

Use this function within `prepare_midx_pack()` so that the `pack_int_id`
it expects is now relative to the entire MIDX chain, and that it
prepares the given pack in the appropriate MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/midx.c b/midx.c
index 39d358da20..07b3981a7a 100644
--- a/midx.c
+++ b/midx.c
@@ -259,14 +259,32 @@ static uint32_t midx_for_object(struct multi_pack_index **_m, uint32_t pos)
 	return pos - m->num_objects_in_base;
 }
 
-int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id)
+static uint32_t midx_for_pack(struct multi_pack_index **_m,
+			      uint32_t pack_int_id)
 {
-	struct strbuf pack_name = STRBUF_INIT;
-	struct packed_git *p;
+	struct multi_pack_index *m = *_m;
+	while (m && pack_int_id < m->num_packs_in_base)
+		m = m->base_midx;
 
-	if (pack_int_id >= m->num_packs)
+	if (!m)
+		BUG("NULL multi-pack-index for pack ID: %"PRIu32, pack_int_id);
+
+	if (pack_int_id >= m->num_packs + m->num_packs_in_base)
 		die(_("bad pack-int-id: %u (%u total packs)"),
-		    pack_int_id, m->num_packs);
+		    pack_int_id, m->num_packs + m->num_packs_in_base);
+
+	*_m = m;
+
+	return pack_int_id - m->num_packs_in_base;
+}
+
+int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
+		      uint32_t pack_int_id)
+{
+	struct strbuf pack_name = STRBUF_INIT;
+	struct packed_git *p;
+
+	pack_int_id = midx_for_pack(&m, pack_int_id);
 
 	if (m->packs[pack_int_id])
 		return 0;
-- 
2.46.0.46.g406f326d27.dirty


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 05/19] midx: teach `nth_midxed_object_oid()` about incremental MIDXs
  2024-08-06 15:36 ` [PATCH v3 " Taylor Blau
                     ` (3 preceding siblings ...)
  2024-08-06 15:37   ` [PATCH v3 04/19] midx: teach `prepare_midx_pack()` " Taylor Blau
@ 2024-08-06 15:37   ` Taylor Blau
  2024-08-06 15:37   ` [PATCH v3 06/19] midx: teach `nth_bitmapped_pack()` " Taylor Blau
                     ` (14 subsequent siblings)
  19 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-06 15:37 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The function `nth_midxed_object_oid()` returns the object ID for a given
object position in the MIDX lexicographic order.

Teach this function to instead operate over the concatenated
lexicographic order defined in an earlier step so that it is able to be
used with incremental MIDXs.

To do this, we need to both (a) adjust the bounds check for the given
'n', as well as record the MIDX-local position after chasing the
`->base_midx` pointer to find the MIDX which contains that object.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 07b3981a7a..64a051cca1 100644
--- a/midx.c
+++ b/midx.c
@@ -338,9 +338,11 @@ struct object_id *nth_midxed_object_oid(struct object_id *oid,
 					struct multi_pack_index *m,
 					uint32_t n)
 {
-	if (n >= m->num_objects)
+	if (n >= m->num_objects + m->num_objects_in_base)
 		return NULL;
 
+	n = midx_for_object(&m, n);
+
 	oidread(oid, m->chunk_oid_lookup + st_mult(m->hash_len, n),
 		the_repository->hash_algo);
 	return oid;
-- 
2.46.0.46.g406f326d27.dirty


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 06/19] midx: teach `nth_bitmapped_pack()` about incremental MIDXs
  2024-08-06 15:36 ` [PATCH v3 " Taylor Blau
                     ` (4 preceding siblings ...)
  2024-08-06 15:37   ` [PATCH v3 05/19] midx: teach `nth_midxed_object_oid()` " Taylor Blau
@ 2024-08-06 15:37   ` Taylor Blau
  2024-08-06 15:37   ` [PATCH v3 07/19] midx: introduce `bsearch_one_midx()` Taylor Blau
                     ` (13 subsequent siblings)
  19 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-06 15:37 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

In a similar fashion as in previous commits, teach the function
`nth_bitmapped_pack()` about incremental MIDXs by translating the given
`pack_int_id` from the concatenated lexical order to a MIDX-local
lexical position.

When accessing the containing MIDX's array of packs, use the local pack
ID. Likewise, when reading the 'BTMP' chunk, use the MIDX-local offset
when accessing the data within that chunk.

(Note that the both the call to prepare_midx_pack() and the assignment
of bp->pack_int_id both care about the global pack_int_id, so avoid
shadowing the given 'pack_int_id' parameter).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/midx.c b/midx.c
index 64a051cca1..25350152f1 100644
--- a/midx.c
+++ b/midx.c
@@ -311,17 +311,19 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
 int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
 		       struct bitmapped_pack *bp, uint32_t pack_int_id)
 {
+	uint32_t local_pack_int_id = midx_for_pack(&m, pack_int_id);
+
 	if (!m->chunk_bitmapped_packs)
 		return error(_("MIDX does not contain the BTMP chunk"));
 
 	if (prepare_midx_pack(r, m, pack_int_id))
 		return error(_("could not load bitmapped pack %"PRIu32), pack_int_id);
 
-	bp->p = m->packs[pack_int_id];
+	bp->p = m->packs[local_pack_int_id];
 	bp->bitmap_pos = get_be32((char *)m->chunk_bitmapped_packs +
-				  MIDX_CHUNK_BITMAPPED_PACKS_WIDTH * pack_int_id);
+				  MIDX_CHUNK_BITMAPPED_PACKS_WIDTH * local_pack_int_id);
 	bp->bitmap_nr = get_be32((char *)m->chunk_bitmapped_packs +
-				 MIDX_CHUNK_BITMAPPED_PACKS_WIDTH * pack_int_id +
+				 MIDX_CHUNK_BITMAPPED_PACKS_WIDTH * local_pack_int_id +
 				 sizeof(uint32_t));
 	bp->pack_int_id = pack_int_id;
 
-- 
2.46.0.46.g406f326d27.dirty


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 07/19] midx: introduce `bsearch_one_midx()`
  2024-08-06 15:36 ` [PATCH v3 " Taylor Blau
                     ` (5 preceding siblings ...)
  2024-08-06 15:37   ` [PATCH v3 06/19] midx: teach `nth_bitmapped_pack()` " Taylor Blau
@ 2024-08-06 15:37   ` Taylor Blau
  2024-08-06 15:37   ` [PATCH v3 08/19] midx: teach `bsearch_midx()` about incremental MIDXs Taylor Blau
                     ` (12 subsequent siblings)
  19 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-06 15:37 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The `bsearch_midx()` function will be extended in a following commit to
search for the location of a given object ID across all MIDXs in a chain
(or the single non-chain MIDX if no chain is available).

While most callers will naturally want to use the updated
`bsearch_midx()` function, there are a handful of special cases that
will want finer control and will only want to search through a single
MIDX.

For instance, the object abbreviation code, which cares about object IDs
near to where we'd expect to find a match in a MIDX. In that case, we
want to look at the nearby matches in each layer of the MIDX chain, not
just a single one).

Split the more fine-grained control out into a separate function called
`bsearch_one_midx()` which searches only a single MIDX.

At present both `bsearch_midx()` and `bsearch_one_midx()` have identical
behavior, but the following commit will rewrite the former to be aware
of incremental MIDXs for the remaining non-special case callers.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c        | 17 +++++++--
 midx.h        |  5 ++-
 object-name.c | 99 +++++++++++++++++++++++++++------------------------
 3 files changed, 71 insertions(+), 50 deletions(-)

diff --git a/midx.c b/midx.c
index 25350152f1..bd6e3f26c9 100644
--- a/midx.c
+++ b/midx.c
@@ -330,10 +330,21 @@ int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
 	return 0;
 }
 
-int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result)
+int bsearch_one_midx(const struct object_id *oid, struct multi_pack_index *m,
+		     uint32_t *result)
 {
-	return bsearch_hash(oid->hash, m->chunk_oid_fanout, m->chunk_oid_lookup,
-			    the_hash_algo->rawsz, result);
+	int ret = bsearch_hash(oid->hash, m->chunk_oid_fanout,
+			       m->chunk_oid_lookup, the_hash_algo->rawsz,
+			       result);
+	if (result)
+		*result += m->num_objects_in_base;
+	return ret;
+}
+
+int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m,
+		 uint32_t *result)
+{
+		return bsearch_one_midx(oid, m, result);
 }
 
 struct object_id *nth_midxed_object_oid(struct object_id *oid,
diff --git a/midx.h b/midx.h
index 020e49f77c..46c53d69ff 100644
--- a/midx.h
+++ b/midx.h
@@ -90,7 +90,10 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
 int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
 		       struct bitmapped_pack *bp, uint32_t pack_int_id);
-int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result);
+int bsearch_one_midx(const struct object_id *oid, struct multi_pack_index *m,
+		     uint32_t *result);
+int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m,
+		 uint32_t *result);
 off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos);
 uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos);
 struct object_id *nth_midxed_object_oid(struct object_id *oid,
diff --git a/object-name.c b/object-name.c
index 527b853ac4..739d46f9cf 100644
--- a/object-name.c
+++ b/object-name.c
@@ -134,28 +134,32 @@ static int match_hash(unsigned len, const unsigned char *a, const unsigned char
 static void unique_in_midx(struct multi_pack_index *m,
 			   struct disambiguate_state *ds)
 {
-	uint32_t num, i, first = 0;
-	const struct object_id *current = NULL;
-	int len = ds->len > ds->repo->hash_algo->hexsz ?
-		ds->repo->hash_algo->hexsz : ds->len;
-	num = m->num_objects;
+	for (; m; m = m->base_midx) {
+		uint32_t num, i, first = 0;
+		const struct object_id *current = NULL;
+		int len = ds->len > ds->repo->hash_algo->hexsz ?
+			ds->repo->hash_algo->hexsz : ds->len;
 
-	if (!num)
-		return;
+		if (!m->num_objects)
+			continue;
 
-	bsearch_midx(&ds->bin_pfx, m, &first);
+		num = m->num_objects + m->num_objects_in_base;
 
-	/*
-	 * At this point, "first" is the location of the lowest object
-	 * with an object name that could match "bin_pfx".  See if we have
-	 * 0, 1 or more objects that actually match(es).
-	 */
-	for (i = first; i < num && !ds->ambiguous; i++) {
-		struct object_id oid;
-		current = nth_midxed_object_oid(&oid, m, i);
-		if (!match_hash(len, ds->bin_pfx.hash, current->hash))
-			break;
-		update_candidates(ds, current);
+		bsearch_one_midx(&ds->bin_pfx, m, &first);
+
+		/*
+		 * At this point, "first" is the location of the lowest
+		 * object with an object name that could match
+		 * "bin_pfx".  See if we have 0, 1 or more objects that
+		 * actually match(es).
+		 */
+		for (i = first; i < num && !ds->ambiguous; i++) {
+			struct object_id oid;
+			current = nth_midxed_object_oid(&oid, m, i);
+			if (!match_hash(len, ds->bin_pfx.hash, current->hash))
+				break;
+			update_candidates(ds, current);
+		}
 	}
 }
 
@@ -708,37 +712,40 @@ static int repo_extend_abbrev_len(struct repository *r UNUSED,
 static void find_abbrev_len_for_midx(struct multi_pack_index *m,
 				     struct min_abbrev_data *mad)
 {
-	int match = 0;
-	uint32_t num, first = 0;
-	struct object_id oid;
-	const struct object_id *mad_oid;
+	for (; m; m = m->base_midx) {
+		int match = 0;
+		uint32_t num, first = 0;
+		struct object_id oid;
+		const struct object_id *mad_oid;
 
-	if (!m->num_objects)
-		return;
+		if (!m->num_objects)
+			continue;
 
-	num = m->num_objects;
-	mad_oid = mad->oid;
-	match = bsearch_midx(mad_oid, m, &first);
+		num = m->num_objects + m->num_objects_in_base;
+		mad_oid = mad->oid;
+		match = bsearch_one_midx(mad_oid, m, &first);
 
-	/*
-	 * first is now the position in the packfile where we would insert
-	 * mad->hash if it does not exist (or the position of mad->hash if
-	 * it does exist). Hence, we consider a maximum of two objects
-	 * nearby for the abbreviation length.
-	 */
-	mad->init_len = 0;
-	if (!match) {
-		if (nth_midxed_object_oid(&oid, m, first))
-			extend_abbrev_len(&oid, mad);
-	} else if (first < num - 1) {
-		if (nth_midxed_object_oid(&oid, m, first + 1))
-			extend_abbrev_len(&oid, mad);
+		/*
+		 * first is now the position in the packfile where we
+		 * would insert mad->hash if it does not exist (or the
+		 * position of mad->hash if it does exist). Hence, we
+		 * consider a maximum of two objects nearby for the
+		 * abbreviation length.
+		 */
+		mad->init_len = 0;
+		if (!match) {
+			if (nth_midxed_object_oid(&oid, m, first))
+				extend_abbrev_len(&oid, mad);
+		} else if (first < num - 1) {
+			if (nth_midxed_object_oid(&oid, m, first + 1))
+				extend_abbrev_len(&oid, mad);
+		}
+		if (first > 0) {
+			if (nth_midxed_object_oid(&oid, m, first - 1))
+				extend_abbrev_len(&oid, mad);
+		}
+		mad->init_len = mad->cur_len;
 	}
-	if (first > 0) {
-		if (nth_midxed_object_oid(&oid, m, first - 1))
-			extend_abbrev_len(&oid, mad);
-	}
-	mad->init_len = mad->cur_len;
 }
 
 static void find_abbrev_len_for_pack(struct packed_git *p,
-- 
2.46.0.46.g406f326d27.dirty


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 08/19] midx: teach `bsearch_midx()` about incremental MIDXs
  2024-08-06 15:36 ` [PATCH v3 " Taylor Blau
                     ` (6 preceding siblings ...)
  2024-08-06 15:37   ` [PATCH v3 07/19] midx: introduce `bsearch_one_midx()` Taylor Blau
@ 2024-08-06 15:37   ` Taylor Blau
  2024-08-06 15:37   ` [PATCH v3 09/19] midx: teach `nth_midxed_offset()` " Taylor Blau
                     ` (11 subsequent siblings)
  19 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-06 15:37 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Now that the special cases callers of `bsearch_midx()` have been dealt
with, teach `bsearch_midx()` to handle incremental MIDX chains.

The incremental MIDX-aware version of `bsearch_midx()` works by
repeatedly searching for a given OID in each layer along the
`->base_midx` pointer, stopping either when an exact match is found, or
the end of the chain is reached.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index bd6e3f26c9..83857cbd1e 100644
--- a/midx.c
+++ b/midx.c
@@ -344,7 +344,10 @@ int bsearch_one_midx(const struct object_id *oid, struct multi_pack_index *m,
 int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m,
 		 uint32_t *result)
 {
-		return bsearch_one_midx(oid, m, result);
+	for (; m; m = m->base_midx)
+		if (bsearch_one_midx(oid, m, result))
+			return 1;
+	return 0;
 }
 
 struct object_id *nth_midxed_object_oid(struct object_id *oid,
-- 
2.46.0.46.g406f326d27.dirty


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 09/19] midx: teach `nth_midxed_offset()` about incremental MIDXs
  2024-08-06 15:36 ` [PATCH v3 " Taylor Blau
                     ` (7 preceding siblings ...)
  2024-08-06 15:37   ` [PATCH v3 08/19] midx: teach `bsearch_midx()` about incremental MIDXs Taylor Blau
@ 2024-08-06 15:37   ` Taylor Blau
  2024-08-06 15:37   ` [PATCH v3 10/19] midx: teach `fill_midx_entry()` " Taylor Blau
                     ` (10 subsequent siblings)
  19 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-06 15:37 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

In a similar fashion as in previous commits, teach the function
`nth_midxed_offset()` about incremental MIDXs.

The given object `pos` is used to find the containing MIDX, and
translated back into a MIDX-local position by assigning the return value
of `midx_for_object()` to it.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/midx.c b/midx.c
index 83857cbd1e..346e58dec7 100644
--- a/midx.c
+++ b/midx.c
@@ -369,6 +369,8 @@ off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
 	const unsigned char *offset_data;
 	uint32_t offset32;
 
+	pos = midx_for_object(&m, pos);
+
 	offset_data = m->chunk_object_offsets + (off_t)pos * MIDX_CHUNK_OFFSET_WIDTH;
 	offset32 = get_be32(offset_data + sizeof(uint32_t));
 
-- 
2.46.0.46.g406f326d27.dirty


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 10/19] midx: teach `fill_midx_entry()` about incremental MIDXs
  2024-08-06 15:36 ` [PATCH v3 " Taylor Blau
                     ` (8 preceding siblings ...)
  2024-08-06 15:37   ` [PATCH v3 09/19] midx: teach `nth_midxed_offset()` " Taylor Blau
@ 2024-08-06 15:37   ` Taylor Blau
  2024-08-06 15:37   ` [PATCH v3 11/19] midx: remove unused `midx_locate_pack()` Taylor Blau
                     ` (9 subsequent siblings)
  19 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-06 15:37 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

In a similar fashion as previous commits, teach the `fill_midx_entry()`
function to work in a incremental MIDX-aware fashion.

This function, unlike others which accept an index into either the
lexical order of objects or packs, takes in an object_id, and attempts
to fill a caller-provided 'struct pack_entry' with the remaining pieces
of information about that object from the MIDX.

The function uses `bsearch_midx()` which fills out the frame-local 'pos'
variable, recording the given object_id's lexical position within the
MIDX chain, if found (if no matching object ID was found, we'll return
immediately without filling out the `pack_entry` structure).

Once given that position, we jump back through the `->base_midx` pointer
to ensure that our `m` points at the MIDX layer which contains the given
object_id (and not an ancestor or descendant of it in the chain). Note
that we can drop the bounds check "if (pos >= m->num_objects)" because
`midx_for_object()` performs this check for us.

After that point, we only need to make two special considerations within
this function:

  - First, the pack_int_id returned to us by `nth_midxed_pack_int_id()`
    is a position in the concatenated lexical order of packs, so we must
    ensure that we subtract `m->num_packs_in_base` before accessing the
    MIDX-local `packs` array.

  - Second, we must avoid translating the `pos` back to a MIDX-local
    index, since we use it as an argument to `nth_midxed_offset()` which
    expects a position relative to the concatenated lexical order of
    objects.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/midx.c b/midx.c
index 346e58dec7..5e4e6f9b65 100644
--- a/midx.c
+++ b/midx.c
@@ -407,14 +407,12 @@ int fill_midx_entry(struct repository *r,
 	if (!bsearch_midx(oid, m, &pos))
 		return 0;
 
-	if (pos >= m->num_objects)
-		return 0;
-
+	midx_for_object(&m, pos);
 	pack_int_id = nth_midxed_pack_int_id(m, pos);
 
 	if (prepare_midx_pack(r, m, pack_int_id))
 		return 0;
-	p = m->packs[pack_int_id];
+	p = m->packs[pack_int_id - m->num_packs_in_base];
 
 	/*
 	* We are about to tell the caller where they can locate the
-- 
2.46.0.46.g406f326d27.dirty


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 11/19] midx: remove unused `midx_locate_pack()`
  2024-08-06 15:36 ` [PATCH v3 " Taylor Blau
                     ` (9 preceding siblings ...)
  2024-08-06 15:37   ` [PATCH v3 10/19] midx: teach `fill_midx_entry()` " Taylor Blau
@ 2024-08-06 15:37   ` Taylor Blau
  2024-08-06 15:37   ` [PATCH v3 12/19] midx: teach `midx_contains_pack()` about incremental MIDXs Taylor Blau
                     ` (8 subsequent siblings)
  19 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-06 15:37 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Commit 307d75bbe6 (midx: implement `midx_locate_pack()`, 2023-12-14)
introduced `midx_locate_pack()`, which was described at the time as a
complement to the function `midx_contains_pack()` which allowed
callers to determine where in the MIDX lexical order a pack appeared, as
opposed to whether or not it was simply contained.

307d75bbe6 suggests that future patches would be added which would
introduce callers for this new function, but none ever were, meaning the
function has gone unused since its introduction.

Clean this up by in effect reverting 307d75bbe6, which removes the
unused functions and inlines its definition back into
`midx_contains_pack()`.

(Looking back through the list archives when 307d75bbe6 was written,
this was in preparation for this[1] patch from back when we had the
concept of "disjoint" packs while developing multi-pack verbatim reuse.
That concept was abandoned before the series was merged, but I never
dropped what would become 307d75bbe6 from the series, leading to the
state prior to this commit).

[1]: https://lore.kernel.org/git/3019738b52ba8cd78ea696a3b800fa91e722eb66.1701198172.git.me@ttaylorr.com/

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 13 ++-----------
 midx.h |  2 --
 2 files changed, 2 insertions(+), 13 deletions(-)

diff --git a/midx.c b/midx.c
index 5e4e6f9b65..50f131e59a 100644
--- a/midx.c
+++ b/midx.c
@@ -466,8 +466,7 @@ int cmp_idx_or_pack_name(const char *idx_or_pack_name,
 	return strcmp(idx_or_pack_name, idx_name);
 }
 
-int midx_locate_pack(struct multi_pack_index *m, const char *idx_or_pack_name,
-		     uint32_t *pos)
+int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name)
 {
 	uint32_t first = 0, last = m->num_packs;
 
@@ -478,11 +477,8 @@ int midx_locate_pack(struct multi_pack_index *m, const char *idx_or_pack_name,
 
 		current = m->pack_names[mid];
 		cmp = cmp_idx_or_pack_name(idx_or_pack_name, current);
-		if (!cmp) {
-			if (pos)
-				*pos = mid;
+		if (!cmp)
 			return 1;
-		}
 		if (cmp > 0) {
 			first = mid + 1;
 			continue;
@@ -493,11 +489,6 @@ int midx_locate_pack(struct multi_pack_index *m, const char *idx_or_pack_name,
 	return 0;
 }
 
-int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name)
-{
-	return midx_locate_pack(m, idx_or_pack_name, NULL);
-}
-
 int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
 {
 	if (m->preferred_pack_idx == -1) {
diff --git a/midx.h b/midx.h
index 46c53d69ff..86af7dfc5e 100644
--- a/midx.h
+++ b/midx.h
@@ -102,8 +102,6 @@ struct object_id *nth_midxed_object_oid(struct object_id *oid,
 int fill_midx_entry(struct repository *r, const struct object_id *oid, struct pack_entry *e, struct multi_pack_index *m);
 int midx_contains_pack(struct multi_pack_index *m,
 		       const char *idx_or_pack_name);
-int midx_locate_pack(struct multi_pack_index *m, const char *idx_or_pack_name,
-		     uint32_t *pos);
 int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id);
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local);
 
-- 
2.46.0.46.g406f326d27.dirty


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 12/19] midx: teach `midx_contains_pack()` about incremental MIDXs
  2024-08-06 15:36 ` [PATCH v3 " Taylor Blau
                     ` (10 preceding siblings ...)
  2024-08-06 15:37   ` [PATCH v3 11/19] midx: remove unused `midx_locate_pack()` Taylor Blau
@ 2024-08-06 15:37   ` Taylor Blau
  2024-08-06 15:37   ` [PATCH v3 13/19] midx: teach `midx_preferred_pack()` " Taylor Blau
                     ` (7 subsequent siblings)
  19 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-06 15:37 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Now that the `midx_contains_pack()` versus `midx_locate_pack()` debacle
has been cleaned up, teach the former about how to operate in an
incremental MIDX-aware world in a similar fashion as in previous
commits.

Instead of using either of the two `midx_for_object()` or
`midx_for_pack()` helpers, this function is split into two: one that
determines whether a pack is contained in a single MIDX, and another
which calls the former in a loop over all MIDXs.

This approach does not require that we change any of the implementation
in what is now `midx_contains_pack_1()` as it still operates over a
single MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 50f131e59a..454c27b673 100644
--- a/midx.c
+++ b/midx.c
@@ -466,7 +466,8 @@ int cmp_idx_or_pack_name(const char *idx_or_pack_name,
 	return strcmp(idx_or_pack_name, idx_name);
 }
 
-int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name)
+static int midx_contains_pack_1(struct multi_pack_index *m,
+				const char *idx_or_pack_name)
 {
 	uint32_t first = 0, last = m->num_packs;
 
@@ -489,6 +490,14 @@ int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name)
 	return 0;
 }
 
+int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name)
+{
+	for (; m; m = m->base_midx)
+		if (midx_contains_pack_1(m, idx_or_pack_name))
+			return 1;
+	return 0;
+}
+
 int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
 {
 	if (m->preferred_pack_idx == -1) {
-- 
2.46.0.46.g406f326d27.dirty


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 13/19] midx: teach `midx_preferred_pack()` about incremental MIDXs
  2024-08-06 15:36 ` [PATCH v3 " Taylor Blau
                     ` (11 preceding siblings ...)
  2024-08-06 15:37   ` [PATCH v3 12/19] midx: teach `midx_contains_pack()` about incremental MIDXs Taylor Blau
@ 2024-08-06 15:37   ` Taylor Blau
  2024-08-06 15:37   ` [PATCH v3 14/19] midx: teach `midx_fanout_add_midx_fanout()` " Taylor Blau
                     ` (6 subsequent siblings)
  19 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-06 15:37 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The function `midx_preferred_pack()` is used to determine the identity
of the preferred pack, which is the identity of a unique pack within
the MIDX which is used as a tie-breaker when selecting from which pack
to represent an object that appears in multiple packs within the MIDX.

Historically we have said that the MIDX's preferred pack has the unique
property that all objects from that pack are represented in the MIDX.
But that isn't quite true: a more precise statement would be that all
objects from that pack *which appear in the MIDX* are selected from that
pack.

This helps us extend the concept of preferred packs across a MIDX chain,
where some object(s) in the preferred pack may appear in other packs
in an earlier MIDX layer, in which case those object(s) will not appear
in a subsequent MIDX layer from either the preferred pack or any other
pack.

Extend the concept of preferred packs by using the pack which represents
the object at the first position in MIDX pseudo-pack order belonging to
the current MIDX layer (i.e., at position 'm->num_objects_in_base').

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index 454c27b673..918349576f 100644
--- a/midx.c
+++ b/midx.c
@@ -501,13 +501,16 @@ int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name)
 int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
 {
 	if (m->preferred_pack_idx == -1) {
+		uint32_t midx_pos;
 		if (load_midx_revindex(m) < 0) {
 			m->preferred_pack_idx = -2;
 			return -1;
 		}
 
-		m->preferred_pack_idx =
-			nth_midxed_pack_int_id(m, pack_pos_to_midx(m, 0));
+		midx_pos = pack_pos_to_midx(m, m->num_objects_in_base);
+
+		m->preferred_pack_idx = nth_midxed_pack_int_id(m, midx_pos);
+
 	} else if (m->preferred_pack_idx == -2)
 		return -1; /* no revindex */
 
-- 
2.46.0.46.g406f326d27.dirty


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 14/19] midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs
  2024-08-06 15:36 ` [PATCH v3 " Taylor Blau
                     ` (12 preceding siblings ...)
  2024-08-06 15:37   ` [PATCH v3 13/19] midx: teach `midx_preferred_pack()` " Taylor Blau
@ 2024-08-06 15:37   ` Taylor Blau
  2024-08-06 15:37   ` [PATCH v3 15/19] midx: support reading incremental MIDX chains Taylor Blau
                     ` (5 subsequent siblings)
  19 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-06 15:37 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The function `midx_fanout_add_midx_fanout()` is used to help construct
the fanout table when generating a MIDX by reusing data from an existing
MIDX.

Prepare this function to work with incremental MIDXs by making a few
changes:

  - The bounds checks need to be adjusted to start object lookups taking
    into account the number of objects in the previous MIDX layer (i.e.,
    by starting the lookups at position `m->num_objects_in_base` instead
    of position 0).

  - Likewise, the bounds checks need to end at `m->num_objects_in_base`
    objects after `m->num_objects`.

  - Finally, `midx_fanout_add_midx_fanout()` needs to recur on earlier
    MIDX layers when dealing with an incremental MIDX chain by calling
    itself when given a MIDX with a non-NULL `base_midx`.

Note that after 0c5a62f14b (midx-write.c: do not read existing MIDX with
`packs_to_include`, 2024-06-11), we do not use this function with an
existing MIDX (incremental or not) when generating a MIDX with
--stdin-packs, and likewise for incremental MIDXs.

But it is still used when adding the fanout table from an incremental
MIDX when generating a non-incremental MIDX (without --stdin-packs, of
course).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index a77ee73c68..0accbdbb04 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -196,7 +196,7 @@ static int nth_midxed_pack_midx_entry(struct multi_pack_index *m,
 				      struct pack_midx_entry *e,
 				      uint32_t pos)
 {
-	if (pos >= m->num_objects)
+	if (pos >= m->num_objects + m->num_objects_in_base)
 		return 1;
 
 	nth_midxed_object_oid(&e->oid, m, pos);
@@ -247,12 +247,16 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
 					uint32_t cur_fanout,
 					int preferred_pack)
 {
-	uint32_t start = 0, end;
+	uint32_t start = m->num_objects_in_base, end;
 	uint32_t cur_object;
 
+	if (m->base_midx)
+		midx_fanout_add_midx_fanout(fanout, m->base_midx, cur_fanout,
+					    preferred_pack);
+
 	if (cur_fanout)
-		start = ntohl(m->chunk_oid_fanout[cur_fanout - 1]);
-	end = ntohl(m->chunk_oid_fanout[cur_fanout]);
+		start += ntohl(m->chunk_oid_fanout[cur_fanout - 1]);
+	end = m->num_objects_in_base + ntohl(m->chunk_oid_fanout[cur_fanout]);
 
 	for (cur_object = start; cur_object < end; cur_object++) {
 		if ((preferred_pack > -1) &&
-- 
2.46.0.46.g406f326d27.dirty


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 15/19] midx: support reading incremental MIDX chains
  2024-08-06 15:36 ` [PATCH v3 " Taylor Blau
                     ` (13 preceding siblings ...)
  2024-08-06 15:37   ` [PATCH v3 14/19] midx: teach `midx_fanout_add_midx_fanout()` " Taylor Blau
@ 2024-08-06 15:37   ` Taylor Blau
  2024-08-06 15:37   ` [PATCH v3 16/19] midx: implement verification support for incremental MIDXs Taylor Blau
                     ` (4 subsequent siblings)
  19 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-06 15:37 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Now that the MIDX machinery's internals have been taught to understand
incremental MIDXs over the previous handful of commits, the MIDX
machinery itself can begin reading incremental MIDXs.

(Note that while the on-disk format for incremental MIDXs has been
defined, the writing end has not been implemented. This will take place
in the commit after next.)

The core of this change involves following the order specified in the
MIDX chain in reverse and opening up MIDXs in the chain one-by-one,
adding them to the previous layer's `->base_midx` pointer at each step.

In order to implement this, the `load_multi_pack_index()` function is
taught to call a new `load_multi_pack_index_chain()` function if loading
a non-incremental MIDX failed via `load_multi_pack_index_one()`.

When loading a MIDX chain, `load_midx_chain_fd_st()` reads each line in
the file one-by-one and dispatches calls to
`load_multi_pack_index_one()` to read each layer of the MIDX chain. When
a layer was successfully read, it is added to the MIDX chain by calling
`add_midx_to_chain()` which validates the contents of the `BASE` chunk,
performs some bounds checks on the number of combined packs and objects,
and attaches the new MIDX by assigning its `base_midx` pointer to the
existing part of the chain.

As a supplement to this, introduce a new mode in the test-read-midx
test-tool which allows us to read the information for a specific MIDX in
the chain by specifying its trailing checksum via the command-line
arguments like so:

    $ test-tool read-midx .git/objects [checksum]

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c                    | 184 +++++++++++++++++++++++++++++++++++---
 midx.h                    |   7 ++
 packfile.c                |   5 +-
 t/helper/test-read-midx.c |  24 +++--
 4 files changed, 201 insertions(+), 19 deletions(-)

diff --git a/midx.c b/midx.c
index 918349576f..54c06cbb86 100644
--- a/midx.c
+++ b/midx.c
@@ -91,7 +91,9 @@ static int midx_read_object_offsets(const unsigned char *chunk_start,
 
 #define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + the_hash_algo->rawsz)
 
-struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local)
+static struct multi_pack_index *load_multi_pack_index_one(const char *object_dir,
+							  const char *midx_name,
+							  int local)
 {
 	struct multi_pack_index *m = NULL;
 	int fd;
@@ -99,31 +101,26 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 	size_t midx_size;
 	void *midx_map = NULL;
 	uint32_t hash_version;
-	struct strbuf midx_name = STRBUF_INIT;
 	uint32_t i;
 	const char *cur_pack_name;
 	struct chunkfile *cf = NULL;
 
-	get_midx_filename(&midx_name, object_dir);
-
-	fd = git_open(midx_name.buf);
+	fd = git_open(midx_name);
 
 	if (fd < 0)
 		goto cleanup_fail;
 	if (fstat(fd, &st)) {
-		error_errno(_("failed to read %s"), midx_name.buf);
+		error_errno(_("failed to read %s"), midx_name);
 		goto cleanup_fail;
 	}
 
 	midx_size = xsize_t(st.st_size);
 
 	if (midx_size < MIDX_MIN_SIZE) {
-		error(_("multi-pack-index file %s is too small"), midx_name.buf);
+		error(_("multi-pack-index file %s is too small"), midx_name);
 		goto cleanup_fail;
 	}
 
-	strbuf_release(&midx_name);
-
 	midx_map = xmmap(NULL, midx_size, PROT_READ, MAP_PRIVATE, fd, 0);
 	close(fd);
 
@@ -213,7 +210,6 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 
 cleanup_fail:
 	free(m);
-	strbuf_release(&midx_name);
 	free_chunkfile(cf);
 	if (midx_map)
 		munmap(midx_map, midx_size);
@@ -222,6 +218,173 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 	return NULL;
 }
 
+void get_midx_chain_dirname(struct strbuf *buf, const char *object_dir)
+{
+	strbuf_addf(buf, "%s/pack/multi-pack-index.d", object_dir);
+}
+
+void get_midx_chain_filename(struct strbuf *buf, const char *object_dir)
+{
+	get_midx_chain_dirname(buf, object_dir);
+	strbuf_addstr(buf, "/multi-pack-index-chain");
+}
+
+void get_split_midx_filename_ext(struct strbuf *buf, const char *object_dir,
+				 const unsigned char *hash, const char *ext)
+{
+	get_midx_chain_dirname(buf, object_dir);
+	strbuf_addf(buf, "/multi-pack-index-%s.%s", hash_to_hex(hash), ext);
+}
+
+static int open_multi_pack_index_chain(const char *chain_file,
+				       int *fd, struct stat *st)
+{
+	*fd = git_open(chain_file);
+	if (*fd < 0)
+		return 0;
+	if (fstat(*fd, st)) {
+		close(*fd);
+		return 0;
+	}
+	if (st->st_size < the_hash_algo->hexsz) {
+		close(*fd);
+		if (!st->st_size) {
+			/* treat empty files the same as missing */
+			errno = ENOENT;
+		} else {
+			warning(_("multi-pack-index chain file too small"));
+			errno = EINVAL;
+		}
+		return 0;
+	}
+	return 1;
+}
+
+static int add_midx_to_chain(struct multi_pack_index *midx,
+			     struct multi_pack_index *midx_chain,
+			     struct object_id *oids,
+			     int n)
+{
+	if (midx_chain) {
+		if (unsigned_add_overflows(midx_chain->num_packs,
+					   midx_chain->num_packs_in_base)) {
+			warning(_("pack count in base MIDX too high: %"PRIuMAX),
+				(uintmax_t)midx_chain->num_packs_in_base);
+			return 0;
+		}
+		if (unsigned_add_overflows(midx_chain->num_objects,
+					   midx_chain->num_objects_in_base)) {
+			warning(_("object count in base MIDX too high: %"PRIuMAX),
+				(uintmax_t)midx_chain->num_objects_in_base);
+			return 0;
+		}
+		midx->num_packs_in_base = midx_chain->num_packs +
+			midx_chain->num_packs_in_base;
+		midx->num_objects_in_base = midx_chain->num_objects +
+			midx_chain->num_objects_in_base;
+	}
+
+	midx->base_midx = midx_chain;
+	midx->has_chain = 1;
+
+	return 1;
+}
+
+static struct multi_pack_index *load_midx_chain_fd_st(const char *object_dir,
+						      int local,
+						      int fd, struct stat *st,
+						      int *incomplete_chain)
+{
+	struct multi_pack_index *midx_chain = NULL;
+	struct strbuf buf = STRBUF_INIT;
+	struct object_id *layers = NULL;
+	int valid = 1;
+	uint32_t i, count;
+	FILE *fp = xfdopen(fd, "r");
+
+	count = st->st_size / (the_hash_algo->hexsz + 1);
+	CALLOC_ARRAY(layers, count);
+
+	for (i = 0; i < count; i++) {
+		struct multi_pack_index *m;
+
+		if (strbuf_getline_lf(&buf, fp) == EOF)
+			break;
+
+		if (get_oid_hex(buf.buf, &layers[i])) {
+			warning(_("invalid multi-pack-index chain: line '%s' "
+				  "not a hash"),
+				buf.buf);
+			valid = 0;
+			break;
+		}
+
+		valid = 0;
+
+		strbuf_reset(&buf);
+		get_split_midx_filename_ext(&buf, object_dir, layers[i].hash,
+					    MIDX_EXT_MIDX);
+		m = load_multi_pack_index_one(object_dir, buf.buf, local);
+
+		if (m) {
+			if (add_midx_to_chain(m, midx_chain, layers, i)) {
+				midx_chain = m;
+				valid = 1;
+			} else {
+				close_midx(m);
+			}
+		}
+		if (!valid) {
+			warning(_("unable to find all multi-pack index files"));
+			break;
+		}
+	}
+
+	free(layers);
+	fclose(fp);
+	strbuf_release(&buf);
+
+	*incomplete_chain = !valid;
+	return midx_chain;
+}
+
+static struct multi_pack_index *load_multi_pack_index_chain(const char *object_dir,
+							    int local)
+{
+	struct strbuf chain_file = STRBUF_INIT;
+	struct stat st;
+	int fd;
+	struct multi_pack_index *m = NULL;
+
+	get_midx_chain_filename(&chain_file, object_dir);
+	if (open_multi_pack_index_chain(chain_file.buf, &fd, &st)) {
+		int incomplete;
+		/* ownership of fd is taken over by load function */
+		m = load_midx_chain_fd_st(object_dir, local, fd, &st,
+					  &incomplete);
+	}
+
+	strbuf_release(&chain_file);
+	return m;
+}
+
+struct multi_pack_index *load_multi_pack_index(const char *object_dir,
+					       int local)
+{
+	struct strbuf midx_name = STRBUF_INIT;
+	struct multi_pack_index *m;
+
+	get_midx_filename(&midx_name, object_dir);
+
+	m = load_multi_pack_index_one(object_dir, midx_name.buf, local);
+	if (!m)
+		m = load_multi_pack_index_chain(object_dir, local);
+
+	strbuf_release(&midx_name);
+
+	return m;
+}
+
 void close_midx(struct multi_pack_index *m)
 {
 	uint32_t i;
@@ -230,6 +393,7 @@ void close_midx(struct multi_pack_index *m)
 		return;
 
 	close_midx(m->next);
+	close_midx(m->base_midx);
 
 	munmap((unsigned char *)m->data, m->data_len);
 
diff --git a/midx.h b/midx.h
index 86af7dfc5e..94de16a8c4 100644
--- a/midx.h
+++ b/midx.h
@@ -24,6 +24,7 @@ struct bitmapped_pack;
 #define MIDX_CHUNKID_OBJECTOFFSETS 0x4f4f4646 /* "OOFF" */
 #define MIDX_CHUNKID_LARGEOFFSETS 0x4c4f4646 /* "LOFF" */
 #define MIDX_CHUNKID_REVINDEX 0x52494458 /* "RIDX" */
+#define MIDX_CHUNKID_BASE 0x42415345 /* "BASE" */
 #define MIDX_CHUNK_OFFSET_WIDTH (2 * sizeof(uint32_t))
 #define MIDX_LARGE_OFFSET_NEEDED 0x80000000
 
@@ -50,6 +51,7 @@ struct multi_pack_index {
 	int preferred_pack_idx;
 
 	int local;
+	int has_chain;
 
 	const unsigned char *chunk_pack_names;
 	size_t chunk_pack_names_len;
@@ -80,11 +82,16 @@ struct multi_pack_index {
 
 #define MIDX_EXT_REV "rev"
 #define MIDX_EXT_BITMAP "bitmap"
+#define MIDX_EXT_MIDX "midx"
 
 const unsigned char *get_midx_checksum(struct multi_pack_index *m);
 void get_midx_filename(struct strbuf *out, const char *object_dir);
 void get_midx_filename_ext(struct strbuf *out, const char *object_dir,
 			   const unsigned char *hash, const char *ext);
+void get_midx_chain_dirname(struct strbuf *buf, const char *object_dir);
+void get_midx_chain_filename(struct strbuf *buf, const char *object_dir);
+void get_split_midx_filename_ext(struct strbuf *buf, const char *object_dir,
+				 const unsigned char *hash, const char *ext);
 
 struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
 int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
diff --git a/packfile.c b/packfile.c
index 813584646f..1eb18e3041 100644
--- a/packfile.c
+++ b/packfile.c
@@ -880,7 +880,8 @@ static void prepare_pack(const char *full_name, size_t full_name_len,
 	if (!report_garbage)
 		return;
 
-	if (!strcmp(file_name, "multi-pack-index"))
+	if (!strcmp(file_name, "multi-pack-index") ||
+	    !strcmp(file_name, "multi-pack-index.d"))
 		return;
 	if (starts_with(file_name, "multi-pack-index") &&
 	    (ends_with(file_name, ".bitmap") || ends_with(file_name, ".rev")))
@@ -1064,7 +1065,7 @@ struct packed_git *get_all_packs(struct repository *r)
 	prepare_packed_git(r);
 	for (m = r->objects->multi_pack_index; m; m = m->next) {
 		uint32_t i;
-		for (i = 0; i < m->num_packs; i++)
+		for (i = 0; i < m->num_packs + m->num_packs_in_base; i++)
 			prepare_midx_pack(r, m, i);
 	}
 
diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index 83effc2b5f..69757e94fc 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -9,8 +9,10 @@
 #include "packfile.h"
 #include "setup.h"
 #include "gettext.h"
+#include "pack-revindex.h"
 
-static int read_midx_file(const char *object_dir, int show_objects)
+static int read_midx_file(const char *object_dir, const char *checksum,
+			  int show_objects)
 {
 	uint32_t i;
 	struct multi_pack_index *m;
@@ -21,6 +23,13 @@ static int read_midx_file(const char *object_dir, int show_objects)
 	if (!m)
 		return 1;
 
+	if (checksum) {
+		while (m && strcmp(hash_to_hex(get_midx_checksum(m)), checksum))
+			m = m->base_midx;
+		if (!m)
+			return 1;
+	}
+
 	printf("header: %08x %d %d %d %d\n",
 	       m->signature,
 	       m->version,
@@ -54,7 +63,8 @@ static int read_midx_file(const char *object_dir, int show_objects)
 		struct pack_entry e;
 
 		for (i = 0; i < m->num_objects; i++) {
-			nth_midxed_object_oid(&oid, m, i);
+			nth_midxed_object_oid(&oid, m,
+					      i + m->num_objects_in_base);
 			fill_midx_entry(the_repository, &oid, &e, m);
 
 			printf("%s %"PRIu64"\t%s\n",
@@ -111,7 +121,7 @@ static int read_midx_bitmapped_packs(const char *object_dir)
 	if (!midx)
 		return 1;
 
-	for (i = 0; i < midx->num_packs; i++) {
+	for (i = 0; i < midx->num_packs + midx->num_packs_in_base; i++) {
 		if (nth_bitmapped_pack(the_repository, midx, &pack, i) < 0)
 			return 1;
 
@@ -127,16 +137,16 @@ static int read_midx_bitmapped_packs(const char *object_dir)
 
 int cmd__read_midx(int argc, const char **argv)
 {
-	if (!(argc == 2 || argc == 3))
-		usage("read-midx [--show-objects|--checksum|--preferred-pack|--bitmap] <object-dir>");
+	if (!(argc == 2 || argc == 3 || argc == 4))
+		usage("read-midx [--show-objects|--checksum|--preferred-pack|--bitmap] <object-dir> <checksum>");
 
 	if (!strcmp(argv[1], "--show-objects"))
-		return read_midx_file(argv[2], 1);
+		return read_midx_file(argv[2], argv[3], 1);
 	else if (!strcmp(argv[1], "--checksum"))
 		return read_midx_checksum(argv[2]);
 	else if (!strcmp(argv[1], "--preferred-pack"))
 		return read_midx_preferred_pack(argv[2]);
 	else if (!strcmp(argv[1], "--bitmap"))
 		return read_midx_bitmapped_packs(argv[2]);
-	return read_midx_file(argv[1], 0);
+	return read_midx_file(argv[1], argv[2], 0);
 }
-- 
2.46.0.46.g406f326d27.dirty


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 16/19] midx: implement verification support for incremental MIDXs
  2024-08-06 15:36 ` [PATCH v3 " Taylor Blau
                     ` (14 preceding siblings ...)
  2024-08-06 15:37   ` [PATCH v3 15/19] midx: support reading incremental MIDX chains Taylor Blau
@ 2024-08-06 15:37   ` Taylor Blau
  2024-08-06 15:38   ` [PATCH v3 17/19] t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
                     ` (3 subsequent siblings)
  19 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-06 15:37 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Teach the verification implementation used by `git multi-pack-index
verify` to perform verification for incremental MIDX chains by
independently validating each layer within the chain.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 47 ++++++++++++++++++++++++++++++-----------------
 midx.h |  2 ++
 2 files changed, 32 insertions(+), 17 deletions(-)

diff --git a/midx.c b/midx.c
index 54c06cbb86..a53d65702d 100644
--- a/midx.c
+++ b/midx.c
@@ -470,6 +470,13 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
 	return 0;
 }
 
+struct packed_git *nth_midxed_pack(struct multi_pack_index *m,
+				   uint32_t pack_int_id)
+{
+	uint32_t local_pack_int_id = midx_for_pack(&m, pack_int_id);
+	return m->packs[local_pack_int_id];
+}
+
 #define MIDX_CHUNK_BITMAPPED_PACKS_WIDTH (2 * sizeof(uint32_t))
 
 int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
@@ -818,6 +825,7 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
 	uint32_t i;
 	struct progress *progress = NULL;
 	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
+	struct multi_pack_index *curr;
 	verify_midx_error = 0;
 
 	if (!m) {
@@ -840,8 +848,8 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
 
 	if (flags & MIDX_PROGRESS)
 		progress = start_delayed_progress(_("Looking for referenced packfiles"),
-					  m->num_packs);
-	for (i = 0; i < m->num_packs; i++) {
+						  m->num_packs + m->num_packs_in_base);
+	for (i = 0; i < m->num_packs + m->num_packs_in_base; i++) {
 		if (prepare_midx_pack(r, m, i))
 			midx_report("failed to load pack in position %d", i);
 
@@ -861,17 +869,20 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
 	if (flags & MIDX_PROGRESS)
 		progress = start_sparse_progress(_("Verifying OID order in multi-pack-index"),
 						 m->num_objects - 1);
-	for (i = 0; i < m->num_objects - 1; i++) {
-		struct object_id oid1, oid2;
 
-		nth_midxed_object_oid(&oid1, m, i);
-		nth_midxed_object_oid(&oid2, m, i + 1);
+	for (curr = m; curr; curr = curr->base_midx) {
+		for (i = 0; i < m->num_objects - 1; i++) {
+			struct object_id oid1, oid2;
 
-		if (oidcmp(&oid1, &oid2) >= 0)
-			midx_report(_("oid lookup out of order: oid[%d] = %s >= %s = oid[%d]"),
-				    i, oid_to_hex(&oid1), oid_to_hex(&oid2), i + 1);
+			nth_midxed_object_oid(&oid1, m, m->num_objects_in_base + i);
+			nth_midxed_object_oid(&oid2, m, m->num_objects_in_base + i + 1);
 
-		midx_display_sparse_progress(progress, i + 1);
+			if (oidcmp(&oid1, &oid2) >= 0)
+				midx_report(_("oid lookup out of order: oid[%d] = %s >= %s = oid[%d]"),
+					    i, oid_to_hex(&oid1), oid_to_hex(&oid2), i + 1);
+
+			midx_display_sparse_progress(progress, i + 1);
+		}
 	}
 	stop_progress(&progress);
 
@@ -881,8 +892,8 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
 	 * each of the objects and only require 1 packfile to be open at a
 	 * time.
 	 */
-	ALLOC_ARRAY(pairs, m->num_objects);
-	for (i = 0; i < m->num_objects; i++) {
+	ALLOC_ARRAY(pairs, m->num_objects + m->num_objects_in_base);
+	for (i = 0; i < m->num_objects + m->num_objects_in_base; i++) {
 		pairs[i].pos = i;
 		pairs[i].pack_int_id = nth_midxed_pack_int_id(m, i);
 	}
@@ -896,16 +907,18 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
 
 	if (flags & MIDX_PROGRESS)
 		progress = start_sparse_progress(_("Verifying object offsets"), m->num_objects);
-	for (i = 0; i < m->num_objects; i++) {
+	for (i = 0; i < m->num_objects + m->num_objects_in_base; i++) {
 		struct object_id oid;
 		struct pack_entry e;
 		off_t m_offset, p_offset;
 
 		if (i > 0 && pairs[i-1].pack_int_id != pairs[i].pack_int_id &&
-		    m->packs[pairs[i-1].pack_int_id])
-		{
-			close_pack_fd(m->packs[pairs[i-1].pack_int_id]);
-			close_pack_index(m->packs[pairs[i-1].pack_int_id]);
+		    nth_midxed_pack(m, pairs[i-1].pack_int_id)) {
+			uint32_t pack_int_id = pairs[i-1].pack_int_id;
+			struct packed_git *p = nth_midxed_pack(m, pack_int_id);
+
+			close_pack_fd(p);
+			close_pack_index(p);
 		}
 
 		nth_midxed_object_oid(&oid, m, pairs[i].pos);
diff --git a/midx.h b/midx.h
index 94de16a8c4..9d30935589 100644
--- a/midx.h
+++ b/midx.h
@@ -95,6 +95,8 @@ void get_split_midx_filename_ext(struct strbuf *buf, const char *object_dir,
 
 struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
 int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
+struct packed_git *nth_midxed_pack(struct multi_pack_index *m,
+				   uint32_t pack_int_id);
 int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
 		       struct bitmapped_pack *bp, uint32_t pack_int_id);
 int bsearch_one_midx(const struct object_id *oid, struct multi_pack_index *m,
-- 
2.46.0.46.g406f326d27.dirty


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 17/19] t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  2024-08-06 15:36 ` [PATCH v3 " Taylor Blau
                     ` (15 preceding siblings ...)
  2024-08-06 15:37   ` [PATCH v3 16/19] midx: implement verification support for incremental MIDXs Taylor Blau
@ 2024-08-06 15:38   ` Taylor Blau
  2024-08-06 15:38   ` [PATCH v3 18/19] t/t5313-pack-bounds-checks.sh: prepare for sub-directories Taylor Blau
                     ` (2 subsequent siblings)
  19 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-06 15:38 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Two years ago, commit ff1e653c8e2 (midx: respect
'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP', 2021-08-31) introduced a new
environment variable which caused the test suite to write MIDX bitmaps
after any 'git repack' invocation.

At the time, this was done to help flush out any bugs with MIDX bitmaps
that weren't explicitly covered in the t5326-multi-pack-bitmap.sh
script.

Two years later, that flag has served us well and is no longer providing
meaningful coverage, as the script in t5326 has matured substantially
and covers many more interesting cases than it did back when ff1e653c8e2
was originally written.

Remove the 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' environment variable
as it is no longer serving a useful purpose. More importantly, removing
this variable clears the way for us to introduce a new one to help
similarly flush out bugs related to incremental MIDX chains.

Because these incremental MIDX chains are (for now) incompatible with
MIDX bitmaps, we cannot have both.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/repack.c                  | 12 ++----------
 ci/run-build-and-tests.sh         |  1 -
 midx.h                            |  2 --
 t/README                          |  4 ----
 t/t0410-partial-clone.sh          |  2 --
 t/t5310-pack-bitmaps.sh           |  4 ----
 t/t5319-multi-pack-index.sh       |  3 +--
 t/t5326-multi-pack-bitmaps.sh     |  3 +--
 t/t5327-multi-pack-bitmaps-rev.sh |  5 ++---
 t/t7700-repack.sh                 | 21 +++++++--------------
 10 files changed, 13 insertions(+), 44 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index f0317fa94a..8499bf0e12 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -1217,10 +1217,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		if (!write_midx &&
 		    (!(pack_everything & ALL_INTO_ONE) || !is_bare_repository()))
 			write_bitmaps = 0;
-	} else if (write_bitmaps &&
-		   git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0) &&
-		   git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP, 0)) {
-		write_bitmaps = 0;
 	}
 	if (pack_kept_objects < 0)
 		pack_kept_objects = write_bitmaps > 0 && !write_midx;
@@ -1518,12 +1514,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	if (run_update_server_info)
 		update_server_info(0);
 
-	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0)) {
-		unsigned flags = 0;
-		if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP, 0))
-			flags |= MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX;
-		write_midx_file(get_object_directory(), NULL, NULL, flags);
-	}
+	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
+		write_midx_file(get_object_directory(), NULL, NULL, 0);
 
 cleanup:
 	string_list_clear(&names, 1);
diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index 98dda42045..e6fd68630c 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -25,7 +25,6 @@ linux-TEST-vars)
 	export GIT_TEST_COMMIT_GRAPH=1
 	export GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=1
 	export GIT_TEST_MULTI_PACK_INDEX=1
-	export GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=1
 	export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master
 	export GIT_TEST_NO_WRITE_REV_INDEX=1
 	export GIT_TEST_CHECKOUT_WORKERS=2
diff --git a/midx.h b/midx.h
index 9d30935589..3714cad2cc 100644
--- a/midx.h
+++ b/midx.h
@@ -29,8 +29,6 @@ struct bitmapped_pack;
 #define MIDX_LARGE_OFFSET_NEEDED 0x80000000
 
 #define GIT_TEST_MULTI_PACK_INDEX "GIT_TEST_MULTI_PACK_INDEX"
-#define GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP \
-	"GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP"
 
 struct multi_pack_index {
 	struct multi_pack_index *next;
diff --git a/t/README b/t/README
index 724ee58195..3cee0c05e2 100644
--- a/t/README
+++ b/t/README
@@ -445,10 +445,6 @@ GIT_TEST_MULTI_PACK_INDEX=<boolean>, when true, forces the multi-pack-
 index to be written after every 'git repack' command, and overrides the
 'core.multiPackIndex' setting to true.
 
-GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=<boolean>, when true, sets the
-'--bitmap' option on all invocations of 'git multi-pack-index write',
-and ignores pack-objects' '--write-bitmap-index'.
-
 GIT_TEST_SIDEBAND_ALL=<boolean>, when true, overrides the
 'uploadpack.allowSidebandAll' setting to true, and when false, forces
 fetch-pack to not request sideband-all (even if the server advertises
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 2c30c86e7b..34bdb3ab1f 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -5,8 +5,6 @@ test_description='partial clone'
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-terminal.sh
 
-# missing promisor objects cause repacks which write bitmaps to fail
-GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
 # When enabled, some commands will write commit-graphs. This causes fsck
 # to fail when delete_object() is called because fsck will attempt to
 # verify the out-of-sync commit graph.
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index d7fd71360e..a6de7c5764 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -5,10 +5,6 @@ test_description='exercise basic bitmap functionality'
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-bitmap.sh
 
-# t5310 deals only with single-pack bitmaps, so don't write MIDX bitmaps in
-# their place.
-GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
-
 # Likewise, allow individual tests to control whether or not they use
 # the boundary-based traversal.
 sane_unset GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index ace5ac3b61..8c54fc0655 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -600,8 +600,7 @@ test_expect_success 'repack preserves multi-pack-index when creating packs' '
 compare_results_with_midx "after repack"
 
 test_expect_success 'multi-pack-index and pack-bitmap' '
-	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
-		git -c repack.writeBitmaps=true repack -ad &&
+	git -c repack.writeBitmaps=true repack -ad &&
 	git multi-pack-index write &&
 	git rev-list --test-bitmap HEAD
 '
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 916da389b6..1cb3e3ff08 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -4,10 +4,9 @@ test_description='exercise basic multi-pack bitmap functionality'
 . ./test-lib.sh
 . "${TEST_DIRECTORY}/lib-bitmap.sh"
 
-# We'll be writing our own midx and bitmaps, so avoid getting confused by the
+# We'll be writing our own MIDX, so avoid getting confused by the
 # automatic ones.
 GIT_TEST_MULTI_PACK_INDEX=0
-GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
 
 # This test exercise multi-pack bitmap functionality where the object order is
 # stored and read from a special chunk within the MIDX, so use the default
diff --git a/t/t5327-multi-pack-bitmaps-rev.sh b/t/t5327-multi-pack-bitmaps-rev.sh
index e65e311cd7..23db949c20 100755
--- a/t/t5327-multi-pack-bitmaps-rev.sh
+++ b/t/t5327-multi-pack-bitmaps-rev.sh
@@ -5,10 +5,9 @@ test_description='exercise basic multi-pack bitmap functionality (.rev files)'
 . ./test-lib.sh
 . "${TEST_DIRECTORY}/lib-bitmap.sh"
 
-# We'll be writing our own midx and bitmaps, so avoid getting confused by the
-# automatic ones.
+# We'll be writing our own MIDX, so avoid getting confused by the automatic
+# ones.
 GIT_TEST_MULTI_PACK_INDEX=0
-GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
 
 # Unlike t5326, this test exercise multi-pack bitmap functionality where the
 # object order is stored in a separate .rev file.
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 127efe99f8..8f34f05087 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -70,14 +70,13 @@ test_expect_success 'objects in packs marked .keep are not repacked' '
 
 test_expect_success 'writing bitmaps via command-line can duplicate .keep objects' '
 	# build on $oid, $packid, and .keep state from previous
-	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 git repack -Adbl &&
+	git repack -Adbl &&
 	test_has_duplicate_object true
 '
 
 test_expect_success 'writing bitmaps via config can duplicate .keep objects' '
 	# build on $oid, $packid, and .keep state from previous
-	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
-		git -c repack.writebitmaps=true repack -Adl &&
+	git -c repack.writebitmaps=true repack -Adl &&
 	test_has_duplicate_object true
 '
 
@@ -284,8 +283,7 @@ test_expect_success 'repacking fails when missing .pack actually means missing o
 test_expect_success 'bitmaps are created by default in bare repos' '
 	git clone --bare .git bare.git &&
 	rm -f bare.git/objects/pack/*.bitmap &&
-	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
-		git -C bare.git repack -ad &&
+	git -C bare.git repack -ad &&
 	bitmap=$(ls bare.git/objects/pack/*.bitmap) &&
 	test_path_is_file "$bitmap"
 '
@@ -296,8 +294,7 @@ test_expect_success 'incremental repack does not complain' '
 '
 
 test_expect_success 'bitmaps can be disabled on bare repos' '
-	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
-		git -c repack.writeBitmaps=false -C bare.git repack -ad &&
+	git -c repack.writeBitmaps=false -C bare.git repack -ad &&
 	bitmap=$(ls bare.git/objects/pack/*.bitmap || :) &&
 	test -z "$bitmap"
 '
@@ -308,8 +305,7 @@ test_expect_success 'no bitmaps created if .keep files present' '
 	keep=${pack%.pack}.keep &&
 	test_when_finished "rm -f \"\$keep\"" &&
 	>"$keep" &&
-	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
-		git -C bare.git repack -ad 2>stderr &&
+	git -C bare.git repack -ad 2>stderr &&
 	test_must_be_empty stderr &&
 	find bare.git/objects/pack/ -type f -name "*.bitmap" >actual &&
 	test_must_be_empty actual
@@ -320,8 +316,7 @@ test_expect_success 'auto-bitmaps do not complain if unavailable' '
 	blob=$(test-tool genrandom big $((1024*1024)) |
 	       git -C bare.git hash-object -w --stdin) &&
 	git -C bare.git update-ref refs/tags/big $blob &&
-	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
-		git -C bare.git repack -ad 2>stderr &&
+	git -C bare.git repack -ad 2>stderr &&
 	test_must_be_empty stderr &&
 	find bare.git/objects/pack -type f -name "*.bitmap" >actual &&
 	test_must_be_empty actual
@@ -342,9 +337,7 @@ test_expect_success 'repacking with a filter works' '
 '
 
 test_expect_success '--filter fails with --write-bitmap-index' '
-	test_must_fail \
-		env GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
-		git -C bare.git repack -a -d --write-bitmap-index --filter=blob:none
+	test_must_fail git -C bare.git repack -a -d --write-bitmap-index --filter=blob:none
 '
 
 test_expect_success 'repacking with two filters works' '
-- 
2.46.0.46.g406f326d27.dirty


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 18/19] t/t5313-pack-bounds-checks.sh: prepare for sub-directories
  2024-08-06 15:36 ` [PATCH v3 " Taylor Blau
                     ` (16 preceding siblings ...)
  2024-08-06 15:38   ` [PATCH v3 17/19] t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
@ 2024-08-06 15:38   ` Taylor Blau
  2024-08-06 15:38   ` [PATCH v3 19/19] midx: implement support for writing incremental MIDX chains Taylor Blau
  2024-08-12 14:27   ` [PATCH v3 00/19] midx: incremental multi-pack indexes, part one Jeff King
  19 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-06 15:38 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Prepare for sub-directories to appear in $GIT_DIR/objects/pack by
adjusting the copy, remove, and chmod invocations to perform their
behavior recursively.

This prepares us for the new $GIT_DIR/objects/pack/multi-pack-index.d
directory which will be added in a following commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5313-pack-bounds-checks.sh | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/t/t5313-pack-bounds-checks.sh b/t/t5313-pack-bounds-checks.sh
index ceaa6700a2..86fc73f9fb 100755
--- a/t/t5313-pack-bounds-checks.sh
+++ b/t/t5313-pack-bounds-checks.sh
@@ -7,11 +7,11 @@ TEST_PASSES_SANITIZE_LEAK=true
 
 clear_base () {
 	test_when_finished 'restore_base' &&
-	rm -f $base
+	rm -r -f $base
 }
 
 restore_base () {
-	cp base-backup/* .git/objects/pack/
+	cp -r base-backup/* .git/objects/pack/
 }
 
 do_pack () {
@@ -64,9 +64,9 @@ test_expect_success 'set up base packfile and variables' '
 	git commit -m base &&
 	git repack -ad &&
 	base=$(echo .git/objects/pack/*) &&
-	chmod +w $base &&
+	chmod -R +w $base &&
 	mkdir base-backup &&
-	cp $base base-backup/ &&
+	cp -r $base base-backup/ &&
 	object=$(git rev-parse HEAD:file)
 '
 
-- 
2.46.0.46.g406f326d27.dirty


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v3 19/19] midx: implement support for writing incremental MIDX chains
  2024-08-06 15:36 ` [PATCH v3 " Taylor Blau
                     ` (17 preceding siblings ...)
  2024-08-06 15:38   ` [PATCH v3 18/19] t/t5313-pack-bounds-checks.sh: prepare for sub-directories Taylor Blau
@ 2024-08-06 15:38   ` Taylor Blau
  2024-08-12 14:27   ` [PATCH v3 00/19] midx: incremental multi-pack indexes, part one Jeff King
  19 siblings, 0 replies; 102+ messages in thread
From: Taylor Blau @ 2024-08-06 15:38 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Now that the rest of the MIDX subsystem and relevant callers have been
updated to learn about how to read and process incremental MIDX chains,
let's finally update the implementation in `write_midx_internal()` to be
able to write incremental MIDX chains.

This new feature is available behind the `--incremental` option for the
`multi-pack-index` builtin, like so:

    $ git multi-pack-index write --incremental

The implementation for doing so is relatively straightforward, and boils
down to a handful of different kinds of changes implemented in this
patch:

  - The `compute_sorted_entries()` function is taught to reject objects
    which appear in any existing MIDX layer.

  - Functions like `write_midx_revindex()` are adjusted to write
    pack_order values which are offset by the number of objects in the
    base MIDX layer.

  - The end of `write_midx_internal()` is adjusted to move
    non-incremental MIDX files when necessary (i.e. when creating an
    incremental chain with an existing non-incremental MIDX in the
    repository).

There are a handful of other changes that are introduced, like new
functions to clear incremental MIDX files that are unrelated to the
current chain (using the same "keep_hash" mechanism as in the
non-incremental case).

The tests explicitly exercising the new incremental MIDX feature are
relatively limited for two reasons:

  1. Most of the "interesting" behavior is already thoroughly covered in
     t5319-multi-pack-index.sh, which handles the core logic of reading
     objects through a MIDX.

     The new tests in t5334-incremental-multi-pack-index.sh are mostly
     focused on creating and destroying incremental MIDXs, as well as
     stitching their results together across layers.

  2. A new GIT_TEST environment variable is added called
     "GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL", which modifies the
     entire test suite to write incremental MIDXs after repacking when
     combined with the "GIT_TEST_MULTI_PACK_INDEX" variable.

     This exercises the long tail of other interesting behavior that is
     defined implicitly throughout the rest of the CI suite. It is
     likewise added to the linux-TEST-vars job.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.txt  |  11 +-
 builtin/multi-pack-index.c              |   2 +
 builtin/repack.c                        |   8 +-
 ci/run-build-and-tests.sh               |   1 +
 midx-write.c                            | 314 ++++++++++++++++++++----
 midx.c                                  |  62 ++++-
 midx.h                                  |   4 +
 packfile.c                              |  16 +-
 packfile.h                              |   4 +
 t/README                                |   4 +
 t/lib-bitmap.sh                         |   6 +-
 t/lib-midx.sh                           |  28 +++
 t/t5319-multi-pack-index.sh             |  27 +-
 t/t5326-multi-pack-bitmaps.sh           |   1 +
 t/t5327-multi-pack-bitmaps-rev.sh       |   1 +
 t/t5332-multi-pack-reuse.sh             |   2 +
 t/t5334-incremental-multi-pack-index.sh |  46 ++++
 t/t7700-repack.sh                       |  27 +-
 18 files changed, 460 insertions(+), 104 deletions(-)
 create mode 100755 t/t5334-incremental-multi-pack-index.sh

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index 3696506eb3..631d5c7d15 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -64,6 +64,12 @@ The file given at `<path>` is expected to be readable, and can contain
 duplicates. (If a given OID is given more than once, it is marked as
 preferred if at least one instance of it begins with the special `+`
 marker).
+
+	--incremental::
+		Write an incremental MIDX file containing only objects
+		and packs not present in an existing MIDX layer.
+		Migrates non-incremental MIDXs to incremental ones when
+		necessary. Incompatible with `--bitmap`.
 --
 
 verify::
@@ -74,6 +80,8 @@ expire::
 	have no objects referenced by the MIDX (with the exception of
 	`.keep` packs and cruft packs). Rewrite the MIDX file afterward
 	to remove all references to these pack-files.
++
+NOTE: this mode is incompatible with incremental MIDX files.
 
 repack::
 	Create a new pack-file containing objects in small pack-files
@@ -95,7 +103,8 @@ repack::
 +
 If `repack.packKeptObjects` is `false`, then any pack-files with an
 associated `.keep` file will not be selected for the batch to repack.
-
++
+NOTE: this mode is incompatible with incremental MIDX files.
 
 EXAMPLES
 --------
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 9cf1a32d65..8805cbbeb3 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -129,6 +129,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
 			MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
 		OPT_BIT(0, "progress", &opts.flags,
 			N_("force progress reporting"), MIDX_PROGRESS),
+		OPT_BIT(0, "incremental", &opts.flags,
+			N_("write a new incremental MIDX"), MIDX_WRITE_INCREMENTAL),
 		OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
 			 N_("write multi-pack index containing only given indexes")),
 		OPT_FILENAME(0, "refs-snapshot", &opts.refs_snapshot,
diff --git a/builtin/repack.c b/builtin/repack.c
index 8499bf0e12..7608430a37 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -1514,8 +1514,12 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	if (run_update_server_info)
 		update_server_info(0);
 
-	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
-		write_midx_file(get_object_directory(), NULL, NULL, 0);
+	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0)) {
+		unsigned flags = 0;
+		if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL, 0))
+			flags |= MIDX_WRITE_INCREMENTAL;
+		write_midx_file(get_object_directory(), NULL, NULL, flags);
+	}
 
 cleanup:
 	string_list_clear(&names, 1);
diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index e6fd68630c..2e28d02b20 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -25,6 +25,7 @@ linux-TEST-vars)
 	export GIT_TEST_COMMIT_GRAPH=1
 	export GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=1
 	export GIT_TEST_MULTI_PACK_INDEX=1
+	export GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=1
 	export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master
 	export GIT_TEST_NO_WRITE_REV_INDEX=1
 	export GIT_TEST_CHECKOUT_WORKERS=2
diff --git a/midx-write.c b/midx-write.c
index 0accbdbb04..e74b3d82fa 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -17,6 +17,8 @@
 #include "refs.h"
 #include "revision.h"
 #include "list-objects.h"
+#include "path.h"
+#include "pack-revindex.h"
 
 #define PACK_EXPIRED UINT_MAX
 #define BITMAP_POS_UNKNOWN (~((uint32_t)0))
@@ -25,7 +27,11 @@
 
 extern int midx_checksum_valid(struct multi_pack_index *m);
 extern void clear_midx_files_ext(const char *object_dir, const char *ext,
-				 unsigned char *keep_hash);
+				 const char *keep_hash);
+extern void clear_incremental_midx_files_ext(const char *object_dir,
+					     const char *ext,
+					     const char **keep_hashes,
+					     uint32_t hashes_nr);
 extern int cmp_idx_or_pack_name(const char *idx_or_pack_name,
 				const char *idx_name);
 
@@ -86,6 +92,7 @@ struct write_midx_context {
 	size_t nr;
 	size_t alloc;
 	struct multi_pack_index *m;
+	struct multi_pack_index *base_midx;
 	struct progress *progress;
 	unsigned pack_paths_checked;
 
@@ -99,6 +106,9 @@ struct write_midx_context {
 
 	int preferred_pack_idx;
 
+	int incremental;
+	uint32_t num_multi_pack_indexes_before;
+
 	struct string_list *to_include;
 };
 
@@ -122,6 +132,9 @@ static int should_include_pack(const struct write_midx_context *ctx,
 	 */
 	if (ctx->m && midx_contains_pack(ctx->m, file_name))
 		return 0;
+	else if (ctx->base_midx && midx_contains_pack(ctx->base_midx,
+						      file_name))
+		return 0;
 	else if (ctx->to_include &&
 		 !string_list_has_string(ctx->to_include, file_name))
 		return 0;
@@ -338,7 +351,7 @@ static void compute_sorted_entries(struct write_midx_context *ctx,
 	for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
 		fanout.nr = 0;
 
-		if (ctx->m)
+		if (ctx->m && !ctx->incremental)
 			midx_fanout_add_midx_fanout(&fanout, ctx->m, cur_fanout,
 						    ctx->preferred_pack_idx);
 
@@ -364,6 +377,10 @@ static void compute_sorted_entries(struct write_midx_context *ctx,
 			if (cur_object && oideq(&fanout.entries[cur_object - 1].oid,
 						&fanout.entries[cur_object].oid))
 				continue;
+			if (ctx->incremental && ctx->base_midx &&
+			    midx_has_oid(ctx->base_midx,
+					 &fanout.entries[cur_object].oid))
+				continue;
 
 			ALLOC_GROW(ctx->entries, st_add(ctx->entries_nr, 1),
 				   alloc_objects);
@@ -547,10 +564,16 @@ static int write_midx_revindex(struct hashfile *f,
 			       void *data)
 {
 	struct write_midx_context *ctx = data;
-	uint32_t i;
+	uint32_t i, nr_base;
+
+	if (ctx->incremental && ctx->base_midx)
+		nr_base = ctx->base_midx->num_objects +
+			ctx->base_midx->num_objects_in_base;
+	else
+		nr_base = 0;
 
 	for (i = 0; i < ctx->entries_nr; i++)
-		hashwrite_be32(f, ctx->pack_order[i]);
+		hashwrite_be32(f, ctx->pack_order[i] + nr_base);
 
 	return 0;
 }
@@ -579,12 +602,18 @@ static int midx_pack_order_cmp(const void *va, const void *vb)
 static uint32_t *midx_pack_order(struct write_midx_context *ctx)
 {
 	struct midx_pack_order_data *data;
-	uint32_t *pack_order;
+	uint32_t *pack_order, base_objects = 0;
 	uint32_t i;
 
 	trace2_region_enter("midx", "midx_pack_order", the_repository);
 
+	if (ctx->incremental && ctx->base_midx)
+		base_objects = ctx->base_midx->num_objects +
+			ctx->base_midx->num_objects_in_base;
+
+	ALLOC_ARRAY(pack_order, ctx->entries_nr);
 	ALLOC_ARRAY(data, ctx->entries_nr);
+
 	for (i = 0; i < ctx->entries_nr; i++) {
 		struct pack_midx_entry *e = &ctx->entries[i];
 		data[i].nr = i;
@@ -596,12 +625,11 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx)
 
 	QSORT(data, ctx->entries_nr, midx_pack_order_cmp);
 
-	ALLOC_ARRAY(pack_order, ctx->entries_nr);
 	for (i = 0; i < ctx->entries_nr; i++) {
 		struct pack_midx_entry *e = &ctx->entries[data[i].nr];
 		struct pack_info *pack = &ctx->info[ctx->pack_perm[e->pack_int_id]];
 		if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
-			pack->bitmap_pos = i;
+			pack->bitmap_pos = i + base_objects;
 		pack->bitmap_nr++;
 		pack_order[i] = data[i].nr;
 	}
@@ -649,7 +677,8 @@ static void prepare_midx_packing_data(struct packing_data *pdata,
 	prepare_packing_data(the_repository, pdata);
 
 	for (i = 0; i < ctx->entries_nr; i++) {
-		struct pack_midx_entry *from = &ctx->entries[ctx->pack_order[i]];
+		uint32_t pos = ctx->pack_order[i];
+		struct pack_midx_entry *from = &ctx->entries[pos];
 		struct object_entry *to = packlist_alloc(pdata, &from->oid);
 
 		oe_set_in_pack(pdata, to,
@@ -897,37 +926,130 @@ static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
 static int fill_packs_from_midx(struct write_midx_context *ctx,
 				const char *preferred_pack_name, uint32_t flags)
 {
-	uint32_t i;
+	struct multi_pack_index *m;
 
-	for (i = 0; i < ctx->m->num_packs; i++) {
-		ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
+	for (m = ctx->m; m; m = m->base_midx) {
+		uint32_t i;
+
+		for (i = 0; i < m->num_packs; i++) {
+			ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
 
-		if (flags & MIDX_WRITE_REV_INDEX || preferred_pack_name) {
 			/*
 			 * If generating a reverse index, need to have
 			 * packed_git's loaded to compare their
 			 * mtimes and object count.
 			 *
-			 *
 			 * If a preferred pack is specified, need to
 			 * have packed_git's loaded to ensure the chosen
 			 * preferred pack has a non-zero object count.
 			 */
-			if (prepare_midx_pack(the_repository, ctx->m, i))
-				return error(_("could not load pack"));
+			if (flags & MIDX_WRITE_REV_INDEX ||
+			    preferred_pack_name) {
+				if (prepare_midx_pack(the_repository, m,
+						      m->num_packs_in_base + i)) {
+					error(_("could not load pack"));
+					return 1;
+				}
 
-			if (open_pack_index(ctx->m->packs[i]))
-				die(_("could not open index for %s"),
-				    ctx->m->packs[i]->pack_name);
+				if (open_pack_index(m->packs[i]))
+					die(_("could not open index for %s"),
+					    m->packs[i]->pack_name);
+			}
+
+			fill_pack_info(&ctx->info[ctx->nr++], m->packs[i],
+				       m->pack_names[i],
+				       m->num_packs_in_base + i);
 		}
-
-		fill_pack_info(&ctx->info[ctx->nr++], ctx->m->packs[i],
-			       ctx->m->pack_names[i], i);
 	}
-
 	return 0;
 }
 
+static struct {
+	const char *non_split;
+	const char *split;
+} midx_exts[] = {
+	{NULL, MIDX_EXT_MIDX},
+	{MIDX_EXT_BITMAP, MIDX_EXT_BITMAP},
+	{MIDX_EXT_REV, MIDX_EXT_REV},
+};
+
+static int link_midx_to_chain(struct multi_pack_index *m)
+{
+	struct strbuf from = STRBUF_INIT;
+	struct strbuf to = STRBUF_INIT;
+	int ret = 0;
+	size_t i;
+
+	if (!m || m->has_chain) {
+		/*
+		 * Either no MIDX previously existed, or it was already
+		 * part of a MIDX chain. In both cases, we have nothing
+		 * to link, so return early.
+		 */
+		goto done;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(midx_exts); i++) {
+		const unsigned char *hash = get_midx_checksum(m);
+
+		get_midx_filename_ext(&from, m->object_dir, hash,
+				      midx_exts[i].non_split);
+		get_split_midx_filename_ext(&to, m->object_dir, hash,
+					    midx_exts[i].split);
+
+		if (link(from.buf, to.buf) < 0 && errno != ENOENT) {
+			ret = error_errno(_("unable to link '%s' to '%s'"),
+					  from.buf, to.buf);
+			goto done;
+		}
+
+		strbuf_reset(&from);
+		strbuf_reset(&to);
+	}
+
+done:
+	strbuf_release(&from);
+	strbuf_release(&to);
+	return ret;
+}
+
+static void clear_midx_files(const char *object_dir,
+			     const char **hashes,
+			     uint32_t hashes_nr,
+			     unsigned incremental)
+{
+	/*
+	 * if incremental:
+	 *   - remove all non-incremental MIDX files
+	 *   - remove any incremental MIDX files not in the current one
+	 *
+	 * if non-incremental:
+	 *   - remove all incremental MIDX files
+	 *   - remove any non-incremental MIDX files not matching the current
+	 *     hash
+	 */
+	struct strbuf buf = STRBUF_INIT;
+	const char *exts[] = { MIDX_EXT_BITMAP, MIDX_EXT_REV, MIDX_EXT_MIDX };
+	uint32_t i, j;
+
+	for (i = 0; i < ARRAY_SIZE(exts); i++) {
+		clear_incremental_midx_files_ext(object_dir, exts[i],
+						 hashes, hashes_nr);
+		for (j = 0; j < hashes_nr; j++)
+			clear_midx_files_ext(object_dir, exts[i], hashes[j]);
+	}
+
+	if (incremental)
+		get_midx_filename(&buf, object_dir);
+	else
+		get_midx_chain_filename(&buf, object_dir);
+
+	if (unlink(buf.buf) && errno != ENOENT)
+		die_errno(_("failed to clear multi-pack-index at %s"), buf.buf);
+
+	strbuf_release(&buf);
+}
+
 static int write_midx_internal(const char *object_dir,
 			       struct string_list *packs_to_include,
 			       struct string_list *packs_to_drop,
@@ -940,42 +1062,66 @@ static int write_midx_internal(const char *object_dir,
 	uint32_t i, start_pack;
 	struct hashfile *f = NULL;
 	struct lock_file lk;
+	struct tempfile *incr;
 	struct write_midx_context ctx = { 0 };
 	int bitmapped_packs_concat_len = 0;
 	int pack_name_concat_len = 0;
 	int dropped_packs = 0;
 	int result = 0;
+	const char **keep_hashes = NULL;
 	struct chunkfile *cf;
 
 	trace2_region_enter("midx", "write_midx_internal", the_repository);
 
-	get_midx_filename(&midx_name, object_dir);
+	ctx.incremental = !!(flags & MIDX_WRITE_INCREMENTAL);
+	if (ctx.incremental && (flags & MIDX_WRITE_BITMAP))
+		die(_("cannot write incremental MIDX with bitmap"));
+
+	if (ctx.incremental)
+		strbuf_addf(&midx_name,
+			    "%s/pack/multi-pack-index.d/tmp_midx_XXXXXX",
+			    object_dir);
+	else
+		get_midx_filename(&midx_name, object_dir);
 	if (safe_create_leading_directories(midx_name.buf))
 		die_errno(_("unable to create leading directories of %s"),
 			  midx_name.buf);
 
-	if (!packs_to_include) {
-		/*
-		 * Only reference an existing MIDX when not filtering which
-		 * packs to include, since all packs and objects are copied
-		 * blindly from an existing MIDX if one is present.
-		 */
-		ctx.m = lookup_multi_pack_index(the_repository, object_dir);
-	}
+	if (!packs_to_include || ctx.incremental) {
+		struct multi_pack_index *m = lookup_multi_pack_index(the_repository,
+								     object_dir);
+		if (m && !midx_checksum_valid(m)) {
+			warning(_("ignoring existing multi-pack-index; checksum mismatch"));
+			m = NULL;
+		}
 
-	if (ctx.m && !midx_checksum_valid(ctx.m)) {
-		warning(_("ignoring existing multi-pack-index; checksum mismatch"));
-		ctx.m = NULL;
+		if (m) {
+			/*
+			 * Only reference an existing MIDX when not filtering
+			 * which packs to include, since all packs and objects
+			 * are copied blindly from an existing MIDX if one is
+			 * present.
+			 */
+			if (ctx.incremental)
+				ctx.base_midx = m;
+			else if (!packs_to_include)
+				ctx.m = m;
+		}
 	}
 
 	ctx.nr = 0;
-	ctx.alloc = ctx.m ? ctx.m->num_packs : 16;
+	ctx.alloc = ctx.m ? ctx.m->num_packs + ctx.m->num_packs_in_base : 16;
 	ctx.info = NULL;
 	ALLOC_ARRAY(ctx.info, ctx.alloc);
 
-	if (ctx.m && fill_packs_from_midx(&ctx, preferred_pack_name,
-					  flags) < 0) {
-		result = 1;
+	if (ctx.incremental) {
+		struct multi_pack_index *m = ctx.base_midx;
+		while (m) {
+			ctx.num_multi_pack_indexes_before++;
+			m = m->base_midx;
+		}
+	} else if (ctx.m && fill_packs_from_midx(&ctx, preferred_pack_name,
+						 flags) < 0) {
 		goto cleanup;
 	}
 
@@ -992,7 +1138,8 @@ static int write_midx_internal(const char *object_dir,
 	for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
 	stop_progress(&ctx.progress);
 
-	if ((ctx.m && ctx.nr == ctx.m->num_packs) &&
+	if ((ctx.m && ctx.nr == ctx.m->num_packs + ctx.m->num_packs_in_base) &&
+	    !ctx.incremental &&
 	    !(packs_to_include || packs_to_drop)) {
 		struct bitmap_index *bitmap_git;
 		int bitmap_exists;
@@ -1008,12 +1155,14 @@ static int write_midx_internal(const char *object_dir,
 			 * corresponding bitmap (or one wasn't requested).
 			 */
 			if (!want_bitmap)
-				clear_midx_files_ext(object_dir, ".bitmap",
-						     NULL);
+				clear_midx_files_ext(object_dir, "bitmap", NULL);
 			goto cleanup;
 		}
 	}
 
+	if (ctx.incremental && !ctx.nr)
+		goto cleanup; /* nothing to do */
+
 	if (preferred_pack_name) {
 		ctx.preferred_pack_idx = -1;
 
@@ -1159,8 +1308,30 @@ static int write_midx_internal(const char *object_dir,
 		pack_name_concat_len += MIDX_CHUNK_ALIGNMENT -
 					(pack_name_concat_len % MIDX_CHUNK_ALIGNMENT);
 
-	hold_lock_file_for_update(&lk, midx_name.buf, LOCK_DIE_ON_ERROR);
-	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
+	if (ctx.incremental) {
+		struct strbuf lock_name = STRBUF_INIT;
+
+		get_midx_chain_filename(&lock_name, object_dir);
+		hold_lock_file_for_update(&lk, lock_name.buf, LOCK_DIE_ON_ERROR);
+		strbuf_release(&lock_name);
+
+		incr = mks_tempfile_m(midx_name.buf, 0444);
+		if (!incr) {
+			error(_("unable to create temporary MIDX layer"));
+			return -1;
+		}
+
+		if (adjust_shared_perm(get_tempfile_path(incr))) {
+			error(_("unable to adjust shared permissions for '%s'"),
+			      get_tempfile_path(incr));
+			return -1;
+		}
+
+		f = hashfd(get_tempfile_fd(incr), get_tempfile_path(incr));
+	} else {
+		hold_lock_file_for_update(&lk, midx_name.buf, LOCK_DIE_ON_ERROR);
+		f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
+	}
 
 	if (ctx.nr - dropped_packs == 0) {
 		error(_("no pack files to index."));
@@ -1253,14 +1424,55 @@ static int write_midx_internal(const char *object_dir,
 	 * have been freed in the previous if block.
 	 */
 
-	if (ctx.m)
+	CALLOC_ARRAY(keep_hashes, ctx.num_multi_pack_indexes_before + 1);
+
+	if (ctx.incremental) {
+		FILE *chainf = fdopen_lock_file(&lk, "w");
+		struct strbuf final_midx_name = STRBUF_INIT;
+		struct multi_pack_index *m = ctx.base_midx;
+
+		if (!chainf) {
+			error_errno(_("unable to open multi-pack-index chain file"));
+			return -1;
+		}
+
+		if (link_midx_to_chain(ctx.base_midx) < 0)
+			return -1;
+
+		get_split_midx_filename_ext(&final_midx_name, object_dir,
+					    midx_hash, MIDX_EXT_MIDX);
+
+		if (rename_tempfile(&incr, final_midx_name.buf) < 0) {
+			error_errno(_("unable to rename new multi-pack-index layer"));
+			return -1;
+		}
+
+		keep_hashes[ctx.num_multi_pack_indexes_before] =
+			xstrdup(hash_to_hex(midx_hash));
+
+		for (i = 0; i < ctx.num_multi_pack_indexes_before; i++) {
+			uint32_t j = ctx.num_multi_pack_indexes_before - i - 1;
+
+			keep_hashes[j] = xstrdup(hash_to_hex(get_midx_checksum(m)));
+			m = m->base_midx;
+		}
+
+		for (i = 0; i < ctx.num_multi_pack_indexes_before + 1; i++)
+			fprintf(get_lock_file_fp(&lk), "%s\n", keep_hashes[i]);
+	} else {
+		keep_hashes[ctx.num_multi_pack_indexes_before] =
+			xstrdup(hash_to_hex(midx_hash));
+	}
+
+	if (ctx.m || ctx.base_midx)
 		close_object_store(the_repository->objects);
 
 	if (commit_lock_file(&lk) < 0)
 		die_errno(_("could not write multi-pack-index"));
 
-	clear_midx_files_ext(object_dir, ".bitmap", midx_hash);
-	clear_midx_files_ext(object_dir, ".rev", midx_hash);
+	clear_midx_files(object_dir, keep_hashes,
+			 ctx.num_multi_pack_indexes_before + 1,
+			 ctx.incremental);
 
 cleanup:
 	for (i = 0; i < ctx.nr; i++) {
@@ -1275,6 +1487,11 @@ static int write_midx_internal(const char *object_dir,
 	free(ctx.entries);
 	free(ctx.pack_perm);
 	free(ctx.pack_order);
+	if (keep_hashes) {
+		for (i = 0; i < ctx.num_multi_pack_indexes_before + 1; i++)
+			free((char *)keep_hashes[i]);
+		free(keep_hashes);
+	}
 	strbuf_release(&midx_name);
 
 	trace2_region_leave("midx", "write_midx_internal", the_repository);
@@ -1311,6 +1528,9 @@ int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla
 	if (!m)
 		return 0;
 
+	if (m->base_midx)
+		die(_("cannot expire packs from an incremental multi-pack-index"));
+
 	CALLOC_ARRAY(count, m->num_packs);
 
 	if (flags & MIDX_PROGRESS)
@@ -1485,6 +1705,8 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 
 	if (!m)
 		return 0;
+	if (m->base_midx)
+		die(_("cannot repack an incremental multi-pack-index"));
 
 	CALLOC_ARRAY(include_pack, m->num_packs);
 
diff --git a/midx.c b/midx.c
index a53d65702d..c867b2b6c2 100644
--- a/midx.c
+++ b/midx.c
@@ -16,7 +16,10 @@
 
 int midx_checksum_valid(struct multi_pack_index *m);
 void clear_midx_files_ext(const char *object_dir, const char *ext,
-			  unsigned char *keep_hash);
+			  const char *keep_hash);
+void clear_incremental_midx_files_ext(const char *object_dir, const char *ext,
+				      char **keep_hashes,
+				      uint32_t hashes_nr);
 int cmp_idx_or_pack_name(const char *idx_or_pack_name,
 			 const char *idx_name);
 
@@ -521,6 +524,11 @@ int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m,
 	return 0;
 }
 
+int midx_has_oid(struct multi_pack_index *m, const struct object_id *oid)
+{
+	return bsearch_midx(oid, m, NULL);
+}
+
 struct object_id *nth_midxed_object_oid(struct object_id *oid,
 					struct multi_pack_index *m,
 					uint32_t n)
@@ -723,7 +731,8 @@ int midx_checksum_valid(struct multi_pack_index *m)
 }
 
 struct clear_midx_data {
-	char *keep;
+	char **keep;
+	uint32_t keep_nr;
 	const char *ext;
 };
 
@@ -731,32 +740,63 @@ static void clear_midx_file_ext(const char *full_path, size_t full_path_len UNUS
 				const char *file_name, void *_data)
 {
 	struct clear_midx_data *data = _data;
+	uint32_t i;
 
 	if (!(starts_with(file_name, "multi-pack-index-") &&
 	      ends_with(file_name, data->ext)))
 		return;
-	if (data->keep && !strcmp(data->keep, file_name))
-		return;
-
+	for (i = 0; i < data->keep_nr; i++) {
+		if (!strcmp(data->keep[i], file_name))
+			return;
+	}
 	if (unlink(full_path))
 		die_errno(_("failed to remove %s"), full_path);
 }
 
 void clear_midx_files_ext(const char *object_dir, const char *ext,
-			  unsigned char *keep_hash)
+			  const char *keep_hash)
 {
 	struct clear_midx_data data;
 	memset(&data, 0, sizeof(struct clear_midx_data));
 
-	if (keep_hash)
-		data.keep = xstrfmt("multi-pack-index-%s%s",
-				    hash_to_hex(keep_hash), ext);
+	if (keep_hash) {
+		ALLOC_ARRAY(data.keep, 1);
+
+		data.keep[0] = xstrfmt("multi-pack-index-%s.%s", keep_hash, ext);
+		data.keep_nr = 1;
+	}
 	data.ext = ext;
 
 	for_each_file_in_pack_dir(object_dir,
 				  clear_midx_file_ext,
 				  &data);
 
+	if (keep_hash)
+		free(data.keep[0]);
+	free(data.keep);
+}
+
+void clear_incremental_midx_files_ext(const char *object_dir, const char *ext,
+				      char **keep_hashes,
+				      uint32_t hashes_nr)
+{
+	struct clear_midx_data data;
+	uint32_t i;
+
+	memset(&data, 0, sizeof(struct clear_midx_data));
+
+	ALLOC_ARRAY(data.keep, hashes_nr);
+	for (i = 0; i < hashes_nr; i++)
+		data.keep[i] = xstrfmt("multi-pack-index-%s.%s", keep_hashes[i],
+				       ext);
+	data.keep_nr = hashes_nr;
+	data.ext = ext;
+
+	for_each_file_in_pack_subdir(object_dir, "multi-pack-index.d",
+				     clear_midx_file_ext, &data);
+
+	for (i = 0; i < hashes_nr; i++)
+		free(data.keep[i]);
 	free(data.keep);
 }
 
@@ -774,8 +814,8 @@ void clear_midx_file(struct repository *r)
 	if (remove_path(midx.buf))
 		die(_("failed to clear multi-pack-index at %s"), midx.buf);
 
-	clear_midx_files_ext(r->objects->odb->path, ".bitmap", NULL);
-	clear_midx_files_ext(r->objects->odb->path, ".rev", NULL);
+	clear_midx_files_ext(r->objects->odb->path, MIDX_EXT_BITMAP, NULL);
+	clear_midx_files_ext(r->objects->odb->path, MIDX_EXT_REV, NULL);
 
 	strbuf_release(&midx);
 }
diff --git a/midx.h b/midx.h
index 3714cad2cc..42d4f8d149 100644
--- a/midx.h
+++ b/midx.h
@@ -29,6 +29,8 @@ struct bitmapped_pack;
 #define MIDX_LARGE_OFFSET_NEEDED 0x80000000
 
 #define GIT_TEST_MULTI_PACK_INDEX "GIT_TEST_MULTI_PACK_INDEX"
+#define GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL \
+	"GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL"
 
 struct multi_pack_index {
 	struct multi_pack_index *next;
@@ -77,6 +79,7 @@ struct multi_pack_index {
 #define MIDX_WRITE_BITMAP (1 << 2)
 #define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3)
 #define MIDX_WRITE_BITMAP_LOOKUP_TABLE (1 << 4)
+#define MIDX_WRITE_INCREMENTAL (1 << 5)
 
 #define MIDX_EXT_REV "rev"
 #define MIDX_EXT_BITMAP "bitmap"
@@ -101,6 +104,7 @@ int bsearch_one_midx(const struct object_id *oid, struct multi_pack_index *m,
 		     uint32_t *result);
 int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m,
 		 uint32_t *result);
+int midx_has_oid(struct multi_pack_index *m, const struct object_id *oid);
 off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos);
 uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos);
 struct object_id *nth_midxed_object_oid(struct object_id *oid,
diff --git a/packfile.c b/packfile.c
index 1eb18e3041..cf12a539ea 100644
--- a/packfile.c
+++ b/packfile.c
@@ -815,9 +815,10 @@ static void report_pack_garbage(struct string_list *list)
 	report_helper(list, seen_bits, first, list->nr);
 }
 
-void for_each_file_in_pack_dir(const char *objdir,
-			       each_file_in_pack_dir_fn fn,
-			       void *data)
+void for_each_file_in_pack_subdir(const char *objdir,
+				  const char *subdir,
+				  each_file_in_pack_dir_fn fn,
+				  void *data)
 {
 	struct strbuf path = STRBUF_INIT;
 	size_t dirnamelen;
@@ -826,6 +827,8 @@ void for_each_file_in_pack_dir(const char *objdir,
 
 	strbuf_addstr(&path, objdir);
 	strbuf_addstr(&path, "/pack");
+	if (subdir)
+		strbuf_addf(&path, "/%s", subdir);
 	dir = opendir(path.buf);
 	if (!dir) {
 		if (errno != ENOENT)
@@ -847,6 +850,13 @@ void for_each_file_in_pack_dir(const char *objdir,
 	strbuf_release(&path);
 }
 
+void for_each_file_in_pack_dir(const char *objdir,
+			       each_file_in_pack_dir_fn fn,
+			       void *data)
+{
+	for_each_file_in_pack_subdir(objdir, NULL, fn, data);
+}
+
 struct prepare_pack_data {
 	struct repository *r;
 	struct string_list *garbage;
diff --git a/packfile.h b/packfile.h
index eb18ec15db..0f78658229 100644
--- a/packfile.h
+++ b/packfile.h
@@ -55,6 +55,10 @@ struct packed_git *parse_pack_index(unsigned char *sha1, const char *idx_path);
 
 typedef void each_file_in_pack_dir_fn(const char *full_path, size_t full_path_len,
 				      const char *file_name, void *data);
+void for_each_file_in_pack_subdir(const char *objdir,
+				  const char *subdir,
+				  each_file_in_pack_dir_fn fn,
+				  void *data);
 void for_each_file_in_pack_dir(const char *objdir,
 			       each_file_in_pack_dir_fn fn,
 			       void *data);
diff --git a/t/README b/t/README
index 3cee0c05e2..44c02d8129 100644
--- a/t/README
+++ b/t/README
@@ -445,6 +445,10 @@ GIT_TEST_MULTI_PACK_INDEX=<boolean>, when true, forces the multi-pack-
 index to be written after every 'git repack' command, and overrides the
 'core.multiPackIndex' setting to true.
 
+GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=<boolean>, when true, sets
+the '--incremental' option on all invocations of 'git multi-pack-index
+write'.
+
 GIT_TEST_SIDEBAND_ALL=<boolean>, when true, overrides the
 'uploadpack.allowSidebandAll' setting to true, and when false, forces
 fetch-pack to not request sideband-all (even if the server advertises
diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
index f595937094..62aa6744a6 100644
--- a/t/lib-bitmap.sh
+++ b/t/lib-bitmap.sh
@@ -1,6 +1,8 @@
 # Helpers for scripts testing bitmap functionality; see t5310 for
 # example usage.
 
+. "$TEST_DIRECTORY"/lib-midx.sh
+
 objdir=.git/objects
 midx=$objdir/pack/multi-pack-index
 
@@ -264,10 +266,6 @@ have_delta () {
 	test_cmp expect actual
 }
 
-midx_checksum () {
-	test-tool read-midx --checksum "$1"
-}
-
 # midx_pack_source <obj>
 midx_pack_source () {
 	test-tool read-midx --show-objects .git/objects | grep "^$1 " | cut -f2
diff --git a/t/lib-midx.sh b/t/lib-midx.sh
index 1261994744..e38c609604 100644
--- a/t/lib-midx.sh
+++ b/t/lib-midx.sh
@@ -6,3 +6,31 @@ test_midx_consistent () {
 	test_cmp expect actual &&
 	git multi-pack-index --object-dir=$1 verify
 }
+
+midx_checksum () {
+	test-tool read-midx --checksum "$1"
+}
+
+midx_git_two_modes () {
+	git -c core.multiPackIndex=false $1 >expect &&
+	git -c core.multiPackIndex=true $1 >actual &&
+	if [ "$2" = "sorted" ]
+	then
+		sort <expect >expect.sorted &&
+		mv expect.sorted expect &&
+		sort <actual >actual.sorted &&
+		mv actual.sorted actual
+	fi &&
+	test_cmp expect actual
+}
+
+compare_results_with_midx () {
+	MSG=$1
+	test_expect_success "check normal git operations: $MSG" '
+		midx_git_two_modes "rev-list --objects --all" &&
+		midx_git_two_modes "log --raw" &&
+		midx_git_two_modes "count-objects --verbose" &&
+		midx_git_two_modes "cat-file --batch-all-objects --batch-check" &&
+		midx_git_two_modes "cat-file --batch-all-objects --batch-check --unordered" sorted
+	'
+}
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 8c54fc0655..ce1b58c732 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -3,8 +3,11 @@
 test_description='multi-pack-indexes'
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-chunk.sh
+. "$TEST_DIRECTORY"/lib-midx.sh
 
 GIT_TEST_MULTI_PACK_INDEX=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=0
 objdir=.git/objects
 
 HASH_LEN=$(test_oid rawsz)
@@ -107,30 +110,6 @@ test_expect_success 'write midx with one v1 pack' '
 	midx_read_expect 1 18 4 $objdir
 '
 
-midx_git_two_modes () {
-	git -c core.multiPackIndex=false $1 >expect &&
-	git -c core.multiPackIndex=true $1 >actual &&
-	if [ "$2" = "sorted" ]
-	then
-		sort <expect >expect.sorted &&
-		mv expect.sorted expect &&
-		sort <actual >actual.sorted &&
-		mv actual.sorted actual
-	fi &&
-	test_cmp expect actual
-}
-
-compare_results_with_midx () {
-	MSG=$1
-	test_expect_success "check normal git operations: $MSG" '
-		midx_git_two_modes "rev-list --objects --all" &&
-		midx_git_two_modes "log --raw" &&
-		midx_git_two_modes "count-objects --verbose" &&
-		midx_git_two_modes "cat-file --batch-all-objects --batch-check" &&
-		midx_git_two_modes "cat-file --batch-all-objects --batch-check --unordered" sorted
-	'
-}
-
 test_expect_success 'write midx with one v2 pack' '
 	git pack-objects --index-version=2,0x40 $objdir/pack/test <obj-list &&
 	git multi-pack-index --object-dir=$objdir write &&
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 1cb3e3ff08..832b92619c 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -7,6 +7,7 @@ test_description='exercise basic multi-pack bitmap functionality'
 # We'll be writing our own MIDX, so avoid getting confused by the
 # automatic ones.
 GIT_TEST_MULTI_PACK_INDEX=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=0
 
 # This test exercise multi-pack bitmap functionality where the object order is
 # stored and read from a special chunk within the MIDX, so use the default
diff --git a/t/t5327-multi-pack-bitmaps-rev.sh b/t/t5327-multi-pack-bitmaps-rev.sh
index 23db949c20..9cac03a94b 100755
--- a/t/t5327-multi-pack-bitmaps-rev.sh
+++ b/t/t5327-multi-pack-bitmaps-rev.sh
@@ -8,6 +8,7 @@ test_description='exercise basic multi-pack bitmap functionality (.rev files)'
 # We'll be writing our own MIDX, so avoid getting confused by the automatic
 # ones.
 GIT_TEST_MULTI_PACK_INDEX=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=0
 
 # Unlike t5326, this test exercise multi-pack bitmap functionality where the
 # object order is stored in a separate .rev file.
diff --git a/t/t5332-multi-pack-reuse.sh b/t/t5332-multi-pack-reuse.sh
index ed823f37bc..941e73d354 100755
--- a/t/t5332-multi-pack-reuse.sh
+++ b/t/t5332-multi-pack-reuse.sh
@@ -6,6 +6,8 @@ TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-bitmap.sh
 
+GIT_TEST_MULTI_PACK_INDEX=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=0
 objdir=.git/objects
 packdir=$objdir/pack
 
diff --git a/t/t5334-incremental-multi-pack-index.sh b/t/t5334-incremental-multi-pack-index.sh
new file mode 100755
index 0000000000..c3b08acc73
--- /dev/null
+++ b/t/t5334-incremental-multi-pack-index.sh
@@ -0,0 +1,46 @@
+#!/bin/sh
+
+test_description='incremental multi-pack-index'
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-midx.sh
+
+GIT_TEST_MULTI_PACK_INDEX=0
+export GIT_TEST_MULTI_PACK_INDEX
+
+objdir=.git/objects
+packdir=$objdir/pack
+midxdir=$packdir/multi-pack-index.d
+midx_chain=$midxdir/multi-pack-index-chain
+
+test_expect_success 'convert non-incremental MIDX to incremental' '
+	test_commit base &&
+	git repack -ad &&
+	git multi-pack-index write &&
+
+	test_path_is_file $packdir/multi-pack-index &&
+	old_hash="$(midx_checksum $objdir)" &&
+
+	test_commit other &&
+	git repack -d &&
+	git multi-pack-index write --incremental &&
+
+	test_path_is_missing $packdir/multi-pack-index &&
+	test_path_is_file $midx_chain &&
+	test_line_count = 2 $midx_chain &&
+	grep $old_hash $midx_chain
+'
+
+compare_results_with_midx 'incremental MIDX'
+
+test_expect_success 'convert incremental to non-incremental' '
+	test_commit squash &&
+	git repack -d &&
+	git multi-pack-index write &&
+
+	test_path_is_file $packdir/multi-pack-index &&
+	test_dir_is_empty $midxdir
+'
+
+compare_results_with_midx 'non-incremental MIDX conversion'
+
+test_done
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 8f34f05087..be1188e736 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -7,6 +7,9 @@ test_description='git repack works correctly'
 . "${TEST_DIRECTORY}/lib-midx.sh"
 . "${TEST_DIRECTORY}/lib-terminal.sh"
 
+GIT_TEST_MULTI_PACK_INDEX=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=0
+
 commit_and_pack () {
 	test_commit "$@" 1>&2 &&
 	incrpackid=$(git pack-objects --all --unpacked --incremental .git/objects/pack/pack </dev/null) &&
@@ -117,7 +120,7 @@ test_expect_success '--local disables writing bitmaps when connected to alternat
 	(
 		cd member &&
 		test_commit "object" &&
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adl --write-bitmap-index 2>err &&
+		git repack -Adl --write-bitmap-index 2>err &&
 		cat >expect <<-EOF &&
 		warning: disabling bitmap writing, as some objects are not being packed
 		EOF
@@ -533,11 +536,11 @@ test_expect_success 'setup for --write-midx tests' '
 test_expect_success '--write-midx unchanged' '
 	(
 		cd midx &&
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack &&
+		git repack &&
 		test_path_is_missing $midx &&
 		test_path_is_missing $midx-*.bitmap &&
 
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack --write-midx &&
+		git repack --write-midx &&
 
 		test_path_is_file $midx &&
 		test_path_is_missing $midx-*.bitmap &&
@@ -550,7 +553,7 @@ test_expect_success '--write-midx with a new pack' '
 		cd midx &&
 		test_commit loose &&
 
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack --write-midx &&
+		git repack --write-midx &&
 
 		test_path_is_file $midx &&
 		test_path_is_missing $midx-*.bitmap &&
@@ -561,7 +564,7 @@ test_expect_success '--write-midx with a new pack' '
 test_expect_success '--write-midx with -b' '
 	(
 		cd midx &&
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack -mb &&
+		git repack -mb &&
 
 		test_path_is_file $midx &&
 		test_path_is_file $midx-*.bitmap &&
@@ -574,7 +577,7 @@ test_expect_success '--write-midx with -d' '
 		cd midx &&
 		test_commit repack &&
 
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack -Ad --write-midx &&
+		git repack -Ad --write-midx &&
 
 		test_path_is_file $midx &&
 		test_path_is_missing $midx-*.bitmap &&
@@ -587,21 +590,21 @@ test_expect_success 'cleans up MIDX when appropriate' '
 		cd midx &&
 
 		test_commit repack-2 &&
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb --write-midx &&
+		git repack -Adb --write-midx &&
 
 		checksum=$(midx_checksum $objdir) &&
 		test_path_is_file $midx &&
 		test_path_is_file $midx-$checksum.bitmap &&
 
 		test_commit repack-3 &&
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb --write-midx &&
+		git repack -Adb --write-midx &&
 
 		test_path_is_file $midx &&
 		test_path_is_missing $midx-$checksum.bitmap &&
 		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
 
 		test_commit repack-4 &&
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb &&
+		git repack -Adb &&
 
 		find $objdir/pack -type f -name "multi-pack-index*" >files &&
 		test_must_be_empty files
@@ -622,7 +625,6 @@ test_expect_success '--write-midx with preferred bitmap tips' '
 		git log --format="create refs/tags/%s/%s %H" HEAD >refs &&
 		git update-ref --stdin <refs &&
 
-		GIT_TEST_MULTI_PACK_INDEX=0 \
 		git repack --write-midx --write-bitmap-index &&
 		test_path_is_file $midx &&
 		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
@@ -714,13 +716,13 @@ test_expect_success '--write-midx removes stale pack-based bitmaps' '
 	(
 		cd repo &&
 		test_commit base &&
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack -Ab &&
+		git repack -Ab &&
 
 		pack_bitmap=$(ls $objdir/pack/pack-*.bitmap) &&
 		test_path_is_file "$pack_bitmap" &&
 
 		test_commit tip &&
-		GIT_TEST_MULTI_PACK_INDEX=0 git repack -bm &&
+		git repack -bm &&
 
 		test_path_is_file $midx &&
 		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
@@ -743,7 +745,6 @@ test_expect_success '--write-midx with --pack-kept-objects' '
 		keep="$objdir/pack/pack-$one.keep" &&
 		touch "$keep" &&
 
-		GIT_TEST_MULTI_PACK_INDEX=0 \
 		git repack --write-midx --write-bitmap-index --geometric=2 -d \
 			--pack-kept-objects &&
 
-- 
2.46.0.46.g406f326d27.dirty

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v3 00/19] midx: incremental multi-pack indexes, part one
  2024-08-06 15:36 ` [PATCH v3 " Taylor Blau
                     ` (18 preceding siblings ...)
  2024-08-06 15:38   ` [PATCH v3 19/19] midx: implement support for writing incremental MIDX chains Taylor Blau
@ 2024-08-12 14:27   ` Jeff King
  19 siblings, 0 replies; 102+ messages in thread
From: Jeff King @ 2024-08-12 14:27 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Tue, Aug 06, 2024 at 11:36:36AM -0400, Taylor Blau wrote:

> As usual, a range-diff is below, but the main changes since last time
> are as follows:
> 
>   - Documentation improvements to clarify what happens when both an
>     incremental- and non-incremental MIDX are both present in a
>     repository.
> 
>   - Commit message typofix on 3/19 to fix an error in one of the
>     technical examples.
> 
>   - Dropped a custom 'local_pack_int_id' in 4/19 to make the remaining
>     diff easier to read.
> 
>   - Minor bugfix in 7/19 where we incorrectly terminated the object
>     abbreviation disambiguation step for incremental MIDXs.
> 
>   - Various additional bits of information in the commit message to
>     explain anything that was subtle.

This all looks good to me.

I read over your responses to my previous review, but as you answered
all of my questions I didn't respond to each individually. :)

I looked over the changes in this iteration and they address all the
small points I brought up. In the bigger picture, I do think there are
probably still issues lurking around the global/local pack and objection
position ids. But where you have the series now seems like a good
cut-off point:

  - I suspect pack_pos_to_midx() might need to be adjusted. But it's not
    an issue that can be triggered until until we support incremental
    midx bitmaps. And that should definitely go in its own series (and
    preparing is kind of pointless because we don't know what the
    correct interface will be until mid-way through that topic).

  - I won't be surprised if there is some global/local bug that shakes
    out in the long run. But I don't have a clever way of preventing it
    or avoiding the need to deal with the distinction. So I think the
    best path is forward, and to let the shaking commence.

    The most important thing to that the new on-disk files are sane,
    since those are hard to walk back later. A simple bug in the code
    can always be fixed. Likewise, the changes here are unlikely to
    create a bug for anybody using a single unchained midx. So even if
    there is a bug, it will only be triggerable for people using the
    experimental mode.

-Peff

^ permalink raw reply	[flat|nested] 102+ messages in thread

end of thread, other threads:[~2024-08-12 14:27 UTC | newest]

Thread overview: 102+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-06 23:04 [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
2024-06-06 23:04 ` [PATCH 01/19] Documentation: describe incremental MIDX format Taylor Blau
2024-06-06 23:04 ` [PATCH 02/19] midx: add new fields for incremental MIDX chains Taylor Blau
2024-06-06 23:04 ` [PATCH 03/19] midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs Taylor Blau
2024-06-06 23:04 ` [PATCH 04/19] midx: teach `prepare_midx_pack()` " Taylor Blau
2024-06-06 23:04 ` [PATCH 05/19] midx: teach `nth_midxed_object_oid()` " Taylor Blau
2024-06-06 23:04 ` [PATCH 06/19] midx: teach `nth_bitmapped_pack()` " Taylor Blau
2024-06-06 23:04 ` [PATCH 07/19] midx: introduce `bsearch_one_midx()` Taylor Blau
2024-06-06 23:04 ` [PATCH 08/19] midx: teach `bsearch_midx()` about incremental MIDXs Taylor Blau
2024-06-06 23:04 ` [PATCH 09/19] midx: teach `nth_midxed_offset()` " Taylor Blau
2024-06-06 23:04 ` [PATCH 10/19] midx: teach `fill_midx_entry()` " Taylor Blau
2024-06-06 23:04 ` [PATCH 11/19] midx: remove unused `midx_locate_pack()` Taylor Blau
2024-06-06 23:05 ` [PATCH 12/19] midx: teach `midx_contains_pack()` about incremental MIDXs Taylor Blau
2024-06-06 23:05 ` [PATCH 13/19] midx: teach `midx_preferred_pack()` " Taylor Blau
2024-06-06 23:05 ` [PATCH 14/19] midx: teach `midx_fanout_add_midx_fanout()` " Taylor Blau
2024-06-06 23:05 ` [PATCH 15/19] midx: support reading incremental MIDX chains Taylor Blau
2024-06-06 23:05 ` [PATCH 16/19] midx: implement verification support for incremental MIDXs Taylor Blau
2024-06-06 23:05 ` [PATCH 17/19] t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
2024-06-06 23:05 ` [PATCH 18/19] t/t5313-pack-bounds-checks.sh: prepare for sub-directories Taylor Blau
2024-06-06 23:05 ` [PATCH 19/19] midx: implement support for writing incremental MIDX chains Taylor Blau
2024-06-06 23:06 ` [PATCH 00/19] midx: incremental multi-pack indexes, part one Taylor Blau
2024-06-07 18:33   ` Junio C Hamano
2024-06-07 20:29     ` Taylor Blau
2024-06-07 17:55 ` Junio C Hamano
2024-06-07 20:31   ` Taylor Blau
2024-06-25 23:21 ` Junio C Hamano
2024-06-26  0:44   ` Elijah Newren
2024-07-17 21:11 ` [PATCH v2 " Taylor Blau
2024-07-17 21:11   ` [PATCH v2 01/19] Documentation: describe incremental MIDX format Taylor Blau
2024-08-01  9:19     ` Jeff King
2024-08-01 18:52       ` Taylor Blau
2024-07-17 21:12   ` [PATCH v2 02/19] midx: add new fields for incremental MIDX chains Taylor Blau
2024-08-01  9:21     ` Jeff King
2024-08-01 18:54       ` Taylor Blau
2024-07-17 21:12   ` [PATCH v2 03/19] midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs Taylor Blau
2024-08-01  9:30     ` Jeff King
2024-08-01 18:57       ` Taylor Blau
2024-07-17 21:12   ` [PATCH v2 04/19] midx: teach `prepare_midx_pack()` " Taylor Blau
2024-08-01  9:35     ` Jeff King
2024-08-01 19:00       ` Taylor Blau
2024-07-17 21:12   ` [PATCH v2 05/19] midx: teach `nth_midxed_object_oid()` " Taylor Blau
2024-08-01  9:38     ` Jeff King
2024-08-01 19:03       ` Taylor Blau
2024-07-17 21:12   ` [PATCH v2 06/19] midx: teach `nth_bitmapped_pack()` " Taylor Blau
2024-08-01  9:39     ` Jeff King
2024-08-01 19:07       ` Taylor Blau
2024-07-17 21:12   ` [PATCH v2 07/19] midx: introduce `bsearch_one_midx()` Taylor Blau
2024-08-01 10:06     ` Jeff King
2024-08-01 19:54       ` Taylor Blau
2024-07-17 21:12   ` [PATCH v2 08/19] midx: teach `bsearch_midx()` about incremental MIDXs Taylor Blau
2024-08-01 10:07     ` Jeff King
2024-07-17 21:12   ` [PATCH v2 09/19] midx: teach `nth_midxed_offset()` " Taylor Blau
2024-08-01 10:08     ` Jeff King
2024-07-17 21:12   ` [PATCH v2 10/19] midx: teach `fill_midx_entry()` " Taylor Blau
2024-08-01 10:12     ` Jeff King
2024-08-01 20:01       ` Taylor Blau
2024-07-17 21:12   ` [PATCH v2 11/19] midx: remove unused `midx_locate_pack()` Taylor Blau
2024-08-01 10:14     ` Jeff King
2024-08-01 20:01       ` Taylor Blau
2024-07-17 21:12   ` [PATCH v2 12/19] midx: teach `midx_contains_pack()` about incremental MIDXs Taylor Blau
2024-08-01 10:17     ` Jeff King
2024-07-17 21:12   ` [PATCH v2 13/19] midx: teach `midx_preferred_pack()` " Taylor Blau
2024-08-01 10:25     ` Jeff King
2024-08-01 20:05       ` Taylor Blau
2024-07-17 21:12   ` [PATCH v2 14/19] midx: teach `midx_fanout_add_midx_fanout()` " Taylor Blau
2024-08-01 10:29     ` Jeff King
2024-08-01 20:09       ` Taylor Blau
2024-07-17 21:12   ` [PATCH v2 15/19] midx: support reading incremental MIDX chains Taylor Blau
2024-08-01 10:40     ` Jeff King
2024-08-01 20:35       ` Taylor Blau
2024-07-17 21:12   ` [PATCH v2 16/19] midx: implement verification support for incremental MIDXs Taylor Blau
2024-08-01 10:41     ` Jeff King
2024-07-17 21:12   ` [PATCH v2 17/19] t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
2024-08-01 10:46     ` Jeff King
2024-08-01 20:36       ` Taylor Blau
2024-07-17 21:12   ` [PATCH v2 18/19] t/t5313-pack-bounds-checks.sh: prepare for sub-directories Taylor Blau
2024-07-17 21:12   ` [PATCH v2 19/19] midx: implement support for writing incremental MIDX chains Taylor Blau
2024-08-01 11:07     ` Jeff King
2024-08-01 20:39       ` Taylor Blau
2024-08-01 11:14   ` [PATCH v2 00/19] midx: incremental multi-pack indexes, part one Jeff King
2024-08-01 20:41     ` Taylor Blau
2024-08-06 15:36 ` [PATCH v3 " Taylor Blau
2024-08-06 15:36   ` [PATCH v3 01/19] Documentation: describe incremental MIDX format Taylor Blau
2024-08-06 15:36   ` [PATCH v3 02/19] midx: add new fields for incremental MIDX chains Taylor Blau
2024-08-06 15:37   ` [PATCH v3 03/19] midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs Taylor Blau
2024-08-06 15:37   ` [PATCH v3 04/19] midx: teach `prepare_midx_pack()` " Taylor Blau
2024-08-06 15:37   ` [PATCH v3 05/19] midx: teach `nth_midxed_object_oid()` " Taylor Blau
2024-08-06 15:37   ` [PATCH v3 06/19] midx: teach `nth_bitmapped_pack()` " Taylor Blau
2024-08-06 15:37   ` [PATCH v3 07/19] midx: introduce `bsearch_one_midx()` Taylor Blau
2024-08-06 15:37   ` [PATCH v3 08/19] midx: teach `bsearch_midx()` about incremental MIDXs Taylor Blau
2024-08-06 15:37   ` [PATCH v3 09/19] midx: teach `nth_midxed_offset()` " Taylor Blau
2024-08-06 15:37   ` [PATCH v3 10/19] midx: teach `fill_midx_entry()` " Taylor Blau
2024-08-06 15:37   ` [PATCH v3 11/19] midx: remove unused `midx_locate_pack()` Taylor Blau
2024-08-06 15:37   ` [PATCH v3 12/19] midx: teach `midx_contains_pack()` about incremental MIDXs Taylor Blau
2024-08-06 15:37   ` [PATCH v3 13/19] midx: teach `midx_preferred_pack()` " Taylor Blau
2024-08-06 15:37   ` [PATCH v3 14/19] midx: teach `midx_fanout_add_midx_fanout()` " Taylor Blau
2024-08-06 15:37   ` [PATCH v3 15/19] midx: support reading incremental MIDX chains Taylor Blau
2024-08-06 15:37   ` [PATCH v3 16/19] midx: implement verification support for incremental MIDXs Taylor Blau
2024-08-06 15:38   ` [PATCH v3 17/19] t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
2024-08-06 15:38   ` [PATCH v3 18/19] t/t5313-pack-bounds-checks.sh: prepare for sub-directories Taylor Blau
2024-08-06 15:38   ` [PATCH v3 19/19] midx: implement support for writing incremental MIDX chains Taylor Blau
2024-08-12 14:27   ` [PATCH v3 00/19] midx: incremental multi-pack indexes, part one Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).