* [PATCH 01/17] midx: mark `get_midx_checksum()` arguments as const
2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
2025-12-08 18:26 ` Patrick Steinhardt
2025-12-06 20:31 ` [PATCH 02/17] midx: split `get_midx_checksum()` by adding `get_midx_hash()` Taylor Blau
` (15 subsequent siblings)
16 siblings, 1 reply; 39+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
To make clear that the fucntion `get_midx_checksum()` does not do
anything to modify its argument, mark the MIDX pointer as const.
The following commit will rename this function altogether to make clear
that it returns the raw bytes of the checksum, not a hex-encoded copy of
it.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx.c | 2 +-
midx.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/midx.c b/midx.c
index 24e1e721754..6c01f0fa522 100644
--- a/midx.c
+++ b/midx.c
@@ -24,7 +24,7 @@ void clear_incremental_midx_files_ext(struct odb_source *source, const char *ext
int cmp_idx_or_pack_name(const char *idx_or_pack_name,
const char *idx_name);
-const unsigned char *get_midx_checksum(struct multi_pack_index *m)
+const unsigned char *get_midx_checksum(const struct multi_pack_index *m)
{
return m->data + m->data_len - m->source->odb->repo->hash_algo->rawsz;
}
diff --git a/midx.h b/midx.h
index 6e54d73503d..7c7e0b59121 100644
--- a/midx.h
+++ b/midx.h
@@ -85,7 +85,7 @@ struct multi_pack_index {
#define MIDX_EXT_BITMAP "bitmap"
#define MIDX_EXT_MIDX "midx"
-const unsigned char *get_midx_checksum(struct multi_pack_index *m);
+const unsigned char *get_midx_checksum(const struct multi_pack_index *m);
void get_midx_filename(struct odb_source *source, struct strbuf *out);
void get_midx_filename_ext(struct odb_source *source, struct strbuf *out,
const unsigned char *hash, const char *ext);
--
2.52.0.171.gd6a4e6b6955
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH 01/17] midx: mark `get_midx_checksum()` arguments as const
2025-12-06 20:31 ` [PATCH 01/17] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
@ 2025-12-08 18:26 ` Patrick Steinhardt
2025-12-09 1:41 ` Taylor Blau
0 siblings, 1 reply; 39+ messages in thread
From: Patrick Steinhardt @ 2025-12-08 18:26 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Sat, Dec 06, 2025 at 03:31:01PM -0500, Taylor Blau wrote:
> To make clear that the fucntion `get_midx_checksum()` does not do
Nit: s/fucntion/functino
> anything to modify its argument, mark the MIDX pointer as const.
>
> The following commit will rename this function altogether to make clear
> that it returns the raw bytes of the checksum, not a hex-encoded copy of
> it.
Patrick
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 01/17] midx: mark `get_midx_checksum()` arguments as const
2025-12-08 18:26 ` Patrick Steinhardt
@ 2025-12-09 1:41 ` Taylor Blau
0 siblings, 0 replies; 39+ messages in thread
From: Taylor Blau @ 2025-12-09 1:41 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Mon, Dec 08, 2025 at 07:26:45PM +0100, Patrick Steinhardt wrote:
> On Sat, Dec 06, 2025 at 03:31:01PM -0500, Taylor Blau wrote:
> > To make clear that the fucntion `get_midx_checksum()` does not do
>
> Nit: s/fucntion/functino
s/functino/function, but otherwise ;-).
Thanks,
Taylor
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 02/17] midx: split `get_midx_checksum()` by adding `get_midx_hash()`
2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
2025-12-06 20:31 ` [PATCH 01/17] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
2025-12-08 18:25 ` Patrick Steinhardt
2025-12-06 20:31 ` [PATCH 03/17] builtin/multi-pack-index.c: make '--progress' a common option Taylor Blau
` (14 subsequent siblings)
16 siblings, 1 reply; 39+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
When trying to print out, say, the hexadecimal representation of a
MIDX's hash, our code will do something like:
hash_to_hex_algop(get_midx_checksum(m),
m->source->odb->repo->hash_algo);
, which is both cumbersome and repetitive. In fact, all but a handful of
callers to `get_midx_checksum()` do exactly the above. Reduce the
repetitive nature of calling `get_midx_checksum()` by having it return a
pointer into a static buffer containing the above result.
For the handful of callers that do need to compare the raw bytes and
don't want to deal with an encoded copy (e.g., because they are passing
it to hasheq() or similar), introduce `get_midx_hash()` which returns
the raw bytes.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx-write.c | 8 +++-----
midx.c | 8 +++++++-
midx.h | 3 ++-
pack-bitmap.c | 9 ++++-----
pack-revindex.c | 4 ++--
t/helper/test-read-midx.c | 4 ++--
6 files changed, 20 insertions(+), 16 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index 23e61cb0001..73d24fabbc6 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -955,7 +955,7 @@ static int link_midx_to_chain(struct multi_pack_index *m)
}
for (i = 0; i < ARRAY_SIZE(midx_exts); i++) {
- const unsigned char *hash = get_midx_checksum(m);
+ const unsigned char *hash = get_midx_hash(m);
get_midx_filename_ext(m->source, &from,
hash, midx_exts[i].non_split);
@@ -1086,8 +1086,7 @@ static int write_midx_internal(struct odb_source *source,
while (m) {
if (flags & MIDX_WRITE_BITMAP && load_midx_revindex(m)) {
error(_("could not load reverse index for MIDX %s"),
- hash_to_hex_algop(get_midx_checksum(m),
- m->source->odb->repo->hash_algo));
+ get_midx_checksum(m));
goto cleanup;
}
ctx.num_multi_pack_indexes_before++;
@@ -1445,8 +1444,7 @@ static int write_midx_internal(struct odb_source *source,
for (uint32_t i = 0; i < ctx.num_multi_pack_indexes_before; i++) {
uint32_t j = ctx.num_multi_pack_indexes_before - i - 1;
- keep_hashes[j] = xstrdup(hash_to_hex_algop(get_midx_checksum(m),
- r->hash_algo));
+ keep_hashes[j] = xstrdup(get_midx_checksum(m));
m = m->base_midx;
}
diff --git a/midx.c b/midx.c
index 6c01f0fa522..f9b11de9ca9 100644
--- a/midx.c
+++ b/midx.c
@@ -24,7 +24,13 @@ void clear_incremental_midx_files_ext(struct odb_source *source, const char *ext
int cmp_idx_or_pack_name(const char *idx_or_pack_name,
const char *idx_name);
-const unsigned char *get_midx_checksum(const struct multi_pack_index *m)
+const char *get_midx_checksum(const struct multi_pack_index *m)
+{
+ return hash_to_hex_algop(get_midx_hash(m),
+ m->source->odb->repo->hash_algo);
+}
+
+const unsigned char *get_midx_hash(const struct multi_pack_index *m)
{
return m->data + m->data_len - m->source->odb->repo->hash_algo->rawsz;
}
diff --git a/midx.h b/midx.h
index 7c7e0b59121..e188ffeb578 100644
--- a/midx.h
+++ b/midx.h
@@ -85,7 +85,8 @@ struct multi_pack_index {
#define MIDX_EXT_BITMAP "bitmap"
#define MIDX_EXT_MIDX "midx"
-const unsigned char *get_midx_checksum(const struct multi_pack_index *m);
+const char *get_midx_checksum(const struct multi_pack_index *m) /* static buffer */;
+const unsigned char *get_midx_hash(const struct multi_pack_index *m);
void get_midx_filename(struct odb_source *source, struct strbuf *out);
void get_midx_filename_ext(struct odb_source *source, struct strbuf *out,
const unsigned char *hash, const char *ext);
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 8ca79725b1d..f466ed2ddcb 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -441,11 +441,11 @@ char *midx_bitmap_filename(struct multi_pack_index *midx)
struct strbuf buf = STRBUF_INIT;
if (midx->has_chain)
get_split_midx_filename_ext(midx->source, &buf,
- get_midx_checksum(midx),
+ get_midx_hash(midx),
MIDX_EXT_BITMAP);
else
get_midx_filename_ext(midx->source, &buf,
- get_midx_checksum(midx),
+ get_midx_hash(midx),
MIDX_EXT_BITMAP);
return strbuf_detach(&buf, NULL);
@@ -502,7 +502,7 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
if (load_bitmap_header(bitmap_git) < 0)
goto cleanup;
- if (!hasheq(get_midx_checksum(bitmap_git->midx), bitmap_git->checksum,
+ if (!hasheq(get_midx_hash(bitmap_git->midx), bitmap_git->checksum,
bitmap_repo(bitmap_git)->hash_algo)) {
error(_("checksum doesn't match in MIDX and bitmap"));
goto cleanup;
@@ -2820,8 +2820,7 @@ void test_bitmap_walk(struct rev_info *revs)
if (bitmap_is_midx(found))
fprintf_ln(stderr, "Located via MIDX '%s'.",
- hash_to_hex_algop(get_midx_checksum(found->midx),
- revs->repo->hash_algo));
+ get_midx_checksum(found->midx));
else
fprintf_ln(stderr, "Located via pack '%s'.",
hash_to_hex_algop(found->pack->hash,
diff --git a/pack-revindex.c b/pack-revindex.c
index d0791cc4938..016195ceb93 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -390,11 +390,11 @@ int load_midx_revindex(struct multi_pack_index *m)
if (m->has_chain)
get_split_midx_filename_ext(m->source, &revindex_name,
- get_midx_checksum(m),
+ get_midx_hash(m),
MIDX_EXT_REV);
else
get_midx_filename_ext(m->source, &revindex_name,
- get_midx_checksum(m),
+ get_midx_hash(m),
MIDX_EXT_REV);
ret = load_revindex_from_disk(m->source->odb->repo->hash_algo,
diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index 6de5d1665af..dee603b3cd0 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -34,7 +34,7 @@ static int read_midx_file(const char *object_dir, const char *checksum,
return 1;
if (checksum) {
- while (m && strcmp(hash_to_hex(get_midx_checksum(m)), checksum))
+ while (m && strcmp(get_midx_checksum(m), checksum))
m = m->base_midx;
if (!m)
return 1;
@@ -94,7 +94,7 @@ static int read_midx_checksum(const char *object_dir)
m = setup_midx(object_dir);
if (!m)
return 1;
- printf("%s\n", hash_to_hex(get_midx_checksum(m)));
+ printf("%s\n", get_midx_checksum(m));
close_midx(m);
return 0;
--
2.52.0.171.gd6a4e6b6955
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH 02/17] midx: split `get_midx_checksum()` by adding `get_midx_hash()`
2025-12-06 20:31 ` [PATCH 02/17] midx: split `get_midx_checksum()` by adding `get_midx_hash()` Taylor Blau
@ 2025-12-08 18:25 ` Patrick Steinhardt
2025-12-09 1:42 ` Taylor Blau
0 siblings, 1 reply; 39+ messages in thread
From: Patrick Steinhardt @ 2025-12-08 18:25 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Sat, Dec 06, 2025 at 03:31:04PM -0500, Taylor Blau wrote:
> When trying to print out, say, the hexadecimal representation of a
> MIDX's hash, our code will do something like:
>
> hash_to_hex_algop(get_midx_checksum(m),
> m->source->odb->repo->hash_algo);
>
> , which is both cumbersome and repetitive. In fact, all but a handful of
> callers to `get_midx_checksum()` do exactly the above. Reduce the
> repetitive nature of calling `get_midx_checksum()` by having it return a
> pointer into a static buffer containing the above result.
>
> For the handful of callers that do need to compare the raw bytes and
> don't want to deal with an encoded copy (e.g., because they are passing
> it to hasheq() or similar), introduce `get_midx_hash()` which returns
> the raw bytes.
This is a welcome change indeed.
> diff --git a/midx.h b/midx.h
> index 7c7e0b59121..e188ffeb578 100644
> --- a/midx.h
> +++ b/midx.h
> @@ -85,7 +85,8 @@ struct multi_pack_index {
> #define MIDX_EXT_BITMAP "bitmap"
> #define MIDX_EXT_MIDX "midx"
>
> -const unsigned char *get_midx_checksum(const struct multi_pack_index *m);
> +const char *get_midx_checksum(const struct multi_pack_index *m) /* static buffer */;
> +const unsigned char *get_midx_hash(const struct multi_pack_index *m);
> void get_midx_filename(struct odb_source *source, struct strbuf *out);
> void get_midx_filename_ext(struct odb_source *source, struct strbuf *out,
> const unsigned char *hash, const char *ext);
If I didn't have the context of this patch series I would be wondering
what the actual difference between `get_midx_checksum()` and
`get_midx_hash()` is. The way the functions are named seems to rather
indicate that we talk about two different kinds of hashes, rather than
two different ways to encode them.
Would it maybe be preferable to call them `get_midx_checksum()` and
`get_midx_checksum_hex()`? While at it, we could go even further and
rename them to `midx_get_checksum()` and `midx_get_checksum_hex()` to
conform to our modern best practices.
Patrick
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH 02/17] midx: split `get_midx_checksum()` by adding `get_midx_hash()`
2025-12-08 18:25 ` Patrick Steinhardt
@ 2025-12-09 1:42 ` Taylor Blau
2025-12-09 1:50 ` Taylor Blau
0 siblings, 1 reply; 39+ messages in thread
From: Taylor Blau @ 2025-12-09 1:42 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Mon, Dec 08, 2025 at 07:25:05PM +0100, Patrick Steinhardt wrote:
> If I didn't have the context of this patch series I would be wondering
> what the actual difference between `get_midx_checksum()` and
> `get_midx_hash()` is. The way the functions are named seems to rather
> indicate that we talk about two different kinds of hashes, rather than
> two different ways to encode them.
>
> Would it maybe be preferable to call them `get_midx_checksum()` and
> `get_midx_checksum_hex()`? While at it, we could go even further and
> rename them to `midx_get_checksum()` and `midx_get_checksum_hex()` to
> conform to our modern best practices.
Yeah, I think those are both reasonable suggestions; I'll apply those
locally, thanks!
> Patrick
Thanks,
Taylor
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 02/17] midx: split `get_midx_checksum()` by adding `get_midx_hash()`
2025-12-09 1:42 ` Taylor Blau
@ 2025-12-09 1:50 ` Taylor Blau
2025-12-09 6:27 ` Patrick Steinhardt
0 siblings, 1 reply; 39+ messages in thread
From: Taylor Blau @ 2025-12-09 1:50 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Mon, Dec 08, 2025 at 08:42:23PM -0500, Taylor Blau wrote:
> On Mon, Dec 08, 2025 at 07:25:05PM +0100, Patrick Steinhardt wrote:
> > If I didn't have the context of this patch series I would be wondering
> > what the actual difference between `get_midx_checksum()` and
> > `get_midx_hash()` is. The way the functions are named seems to rather
> > indicate that we talk about two different kinds of hashes, rather than
> > two different ways to encode them.
> >
> > Would it maybe be preferable to call them `get_midx_checksum()` and
> > `get_midx_checksum_hex()`? While at it, we could go even further and
> > rename them to `midx_get_checksum()` and `midx_get_checksum_hex()` to
> > conform to our modern best practices.
>
> Yeah, I think those are both reasonable suggestions; I'll apply those
> locally, thanks!
Hmm. Upon further thinking, I wonder which function should be named
which.
I think the _checksum() variant suggests that it returns the non-hex
encoded form, while the _hex() variant suggests the opposite.
Unfortunately, the latter is both more commonly used and more characters
to type ;-).
I wonder if there are shorter names available. Perhaps
midx_get_checksum() and midx_get_checksum_raw()?
Thanks,
Taylor
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 02/17] midx: split `get_midx_checksum()` by adding `get_midx_hash()`
2025-12-09 1:50 ` Taylor Blau
@ 2025-12-09 6:27 ` Patrick Steinhardt
0 siblings, 0 replies; 39+ messages in thread
From: Patrick Steinhardt @ 2025-12-09 6:27 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Mon, Dec 08, 2025 at 08:50:08PM -0500, Taylor Blau wrote:
> On Mon, Dec 08, 2025 at 08:42:23PM -0500, Taylor Blau wrote:
> > On Mon, Dec 08, 2025 at 07:25:05PM +0100, Patrick Steinhardt wrote:
> > > If I didn't have the context of this patch series I would be wondering
> > > what the actual difference between `get_midx_checksum()` and
> > > `get_midx_hash()` is. The way the functions are named seems to rather
> > > indicate that we talk about two different kinds of hashes, rather than
> > > two different ways to encode them.
> > >
> > > Would it maybe be preferable to call them `get_midx_checksum()` and
> > > `get_midx_checksum_hex()`? While at it, we could go even further and
> > > rename them to `midx_get_checksum()` and `midx_get_checksum_hex()` to
> > > conform to our modern best practices.
> >
> > Yeah, I think those are both reasonable suggestions; I'll apply those
> > locally, thanks!
>
> Hmm. Upon further thinking, I wonder which function should be named
> which.
>
> I think the _checksum() variant suggests that it returns the non-hex
> encoded form, while the _hex() variant suggests the opposite.
> Unfortunately, the latter is both more commonly used and more characters
> to type ;-).
>
> I wonder if there are shorter names available. Perhaps
> midx_get_checksum() and midx_get_checksum_raw()?
It's only four more characters to type the `_hex()` variant, and it is
in line with the interfaces we've got in "hex.h". So personally I'd
still prefer to go with `_hex()`.
Patrick
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 03/17] builtin/multi-pack-index.c: make '--progress' a common option
2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
2025-12-06 20:31 ` [PATCH 01/17] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
2025-12-06 20:31 ` [PATCH 02/17] midx: split `get_midx_checksum()` by adding `get_midx_hash()` Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 04/17] git-multi-pack-index(1): remove non-existent incompatibility Taylor Blau
` (13 subsequent siblings)
16 siblings, 0 replies; 39+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
All multi-pack-index sub-commands (write, verify, repack, and expire)
support a '--progress' command-line option, despite not listing it as
one of the common options in `common_opts`.
As a result each sub-command declares its own `OPT_BIT()` for a
"--progress" command-line option. Centralize this within the
`common_opts` to avoid re-declaring it in each sub-command.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
Documentation/git-multi-pack-index.adoc | 2 ++
builtin/multi-pack-index.c | 10 ++--------
2 files changed, 4 insertions(+), 8 deletions(-)
diff --git a/Documentation/git-multi-pack-index.adoc b/Documentation/git-multi-pack-index.adoc
index 2f642697e9e..a4550e28bed 100644
--- a/Documentation/git-multi-pack-index.adoc
+++ b/Documentation/git-multi-pack-index.adoc
@@ -18,6 +18,8 @@ Write or verify a multi-pack-index (MIDX) file.
OPTIONS
-------
+The following command-line options are applicable to all sub-commands:
+
--object-dir=<dir>::
Use given directory for the location of Git objects. We check
`<dir>/packs/multi-pack-index` for the current MIDX file, and
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 5f364aa816b..ca98d4c3ba3 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -84,6 +84,8 @@ static struct option common_opts[] = {
N_("directory"),
N_("object directory containing set of packfile and pack-index pairs"),
parse_object_dir),
+ OPT_BIT(0, "progress", &opts.flags, N_("force progress reporting"),
+ MIDX_PROGRESS),
OPT_END(),
};
@@ -138,8 +140,6 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
N_("pack for reuse when computing a multi-pack bitmap")),
OPT_BIT(0, "bitmap", &opts.flags, N_("write multi-pack bitmap"),
MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
- OPT_BIT(0, "progress", &opts.flags,
- N_("force progress reporting"), MIDX_PROGRESS),
OPT_BIT(0, "incremental", &opts.flags,
N_("write a new incremental MIDX"), MIDX_WRITE_INCREMENTAL),
OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
@@ -200,8 +200,6 @@ static int cmd_multi_pack_index_verify(int argc, const char **argv,
{
struct option *options;
static struct option builtin_multi_pack_index_verify_options[] = {
- OPT_BIT(0, "progress", &opts.flags,
- N_("force progress reporting"), MIDX_PROGRESS),
OPT_END(),
};
struct odb_source *source;
@@ -231,8 +229,6 @@ static int cmd_multi_pack_index_expire(int argc, const char **argv,
{
struct option *options;
static struct option builtin_multi_pack_index_expire_options[] = {
- OPT_BIT(0, "progress", &opts.flags,
- N_("force progress reporting"), MIDX_PROGRESS),
OPT_END(),
};
struct odb_source *source;
@@ -264,8 +260,6 @@ static int cmd_multi_pack_index_repack(int argc, const char **argv,
static struct option builtin_multi_pack_index_repack_options[] = {
OPT_UNSIGNED(0, "batch-size", &opts.batch_size,
N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")),
- OPT_BIT(0, "progress", &opts.flags,
- N_("force progress reporting"), MIDX_PROGRESS),
OPT_END(),
};
struct odb_source *source;
--
2.52.0.171.gd6a4e6b6955
^ permalink raw reply related [flat|nested] 39+ messages in thread* [PATCH 04/17] git-multi-pack-index(1): remove non-existent incompatibility
2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
` (2 preceding siblings ...)
2025-12-06 20:31 ` [PATCH 03/17] builtin/multi-pack-index.c: make '--progress' a common option Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 05/17] git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h' Taylor Blau
` (12 subsequent siblings)
16 siblings, 0 replies; 39+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Since fcb2205b774 (midx: implement support for writing incremental MIDX
chains, 2024-08-06), the command-line options '--incremental' and
'--bitmap' were declared to be incompatible with one another when
running 'git multi-pack-index write'.
However, since 27afc272c49 (midx: implement writing incremental MIDX
bitmaps, 2025-03-20), that incompatibility no longer exists, despite the
documentation saying so. Correct this by removing the stale reference to
their incompatibility.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
Documentation/git-multi-pack-index.adoc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/git-multi-pack-index.adoc b/Documentation/git-multi-pack-index.adoc
index a4550e28bed..a502819fc38 100644
--- a/Documentation/git-multi-pack-index.adoc
+++ b/Documentation/git-multi-pack-index.adoc
@@ -75,7 +75,7 @@ marker).
Write an incremental MIDX file containing only objects
and packs not present in an existing MIDX layer.
Migrates non-incremental MIDXs to incremental ones when
- necessary. Incompatible with `--bitmap`.
+ necessary.
--
verify::
--
2.52.0.171.gd6a4e6b6955
^ permalink raw reply related [flat|nested] 39+ messages in thread* [PATCH 05/17] git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h'
2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
` (3 preceding siblings ...)
2025-12-06 20:31 ` [PATCH 04/17] git-multi-pack-index(1): remove non-existent incompatibility Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 06/17] t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39 Taylor Blau
` (11 subsequent siblings)
16 siblings, 0 replies; 39+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Since c39fffc1c90 (tests: start asserting that *.txt SYNOPSIS matches -h
output, 2022-10-13), the manual page for 'git multi-pack-index' has a
SYNOPSIS section which differs from 'git multi-pack-index -h'.
Correct this while also documenting additional options accepted by the
'write' sub-command.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
Documentation/git-multi-pack-index.adoc | 7 ++++++-
builtin/multi-pack-index.c | 5 +++--
t/t0450/adoc-help-mismatches | 1 -
3 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/Documentation/git-multi-pack-index.adoc b/Documentation/git-multi-pack-index.adoc
index a502819fc38..164cf1f2291 100644
--- a/Documentation/git-multi-pack-index.adoc
+++ b/Documentation/git-multi-pack-index.adoc
@@ -9,7 +9,12 @@ git-multi-pack-index - Write and verify multi-pack-indexes
SYNOPSIS
--------
[verse]
-'git multi-pack-index' [--object-dir=<dir>] [--[no-]bitmap] <sub-command>
+'git multi-pack-index' [<options>] write [--preferred-pack=<pack>]
+ [--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]
+ [--refs-snapshot=<path>]
+'git multi-pack-index' [<options>] verify
+'git multi-pack-index' [<options>] expire
+'git multi-pack-index' [<options>] repack [--batch-size=<size>]
DESCRIPTION
-----------
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index ca98d4c3ba3..c0c6c1760c0 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -13,8 +13,9 @@
#include "repository.h"
#define BUILTIN_MIDX_WRITE_USAGE \
- N_("git multi-pack-index [<options>] write [--preferred-pack=<pack>]" \
- "[--refs-snapshot=<path>]")
+ N_("git multi-pack-index [<options>] write [--preferred-pack=<pack>]\n" \
+ " [--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]\n" \
+ " [--refs-snapshot=<path>]")
#define BUILTIN_MIDX_VERIFY_USAGE \
N_("git multi-pack-index [<options>] verify")
diff --git a/t/t0450/adoc-help-mismatches b/t/t0450/adoc-help-mismatches
index 8ee2d3f7c81..e8d6c13ccd0 100644
--- a/t/t0450/adoc-help-mismatches
+++ b/t/t0450/adoc-help-mismatches
@@ -33,7 +33,6 @@ merge
merge-file
merge-index
merge-one-file
-multi-pack-index
name-rev
notes
push
--
2.52.0.171.gd6a4e6b6955
^ permalink raw reply related [flat|nested] 39+ messages in thread* [PATCH 06/17] t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39
2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
` (4 preceding siblings ...)
2025-12-06 20:31 ` [PATCH 05/17] git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h' Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 07/17] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos` Taylor Blau
` (10 subsequent siblings)
16 siblings, 0 replies; 39+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Commit d4bf1d88b90 (multi-pack-index: verify missing pack, 2018-09-13)
adds a new test to the MIDX test script to test how we handle missing
packs.
While the commit itself describes the test as "verify missing pack[s]",
the test itself is actually called "verify packnames out of order",
despite that not being what it tests.
Likely this was a copy-and-paste of the test immediately above it of the
same name. Correct this by renaming the test to match the commit
message.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
t/t5319-multi-pack-index.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 93f319a4b29..ca020091dda 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -455,7 +455,7 @@ test_expect_success 'verify packnames out of order' '
"pack names out of order"
'
-test_expect_success 'verify packnames out of order' '
+test_expect_success 'verify missing pack' '
corrupt_midx_and_verify $MIDX_BYTE_PACKNAME_ORDER "a" $objdir \
"failed to load pack"
'
--
2.52.0.171.gd6a4e6b6955
^ permalink raw reply related [flat|nested] 39+ messages in thread* [PATCH 07/17] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos`
2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
` (5 preceding siblings ...)
2025-12-06 20:31 ` [PATCH 06/17] t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39 Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
2025-12-08 18:26 ` Patrick Steinhardt
2025-12-06 20:31 ` [PATCH 08/17] midx-write.c: introduce `struct write_midx_opts` Taylor Blau
` (9 subsequent siblings)
16 siblings, 1 reply; 39+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
In midx_pack_order(), we compute for each bitampped pack the first bit
to correspond to an object in that pack, along with how many bits were
assigned to object(s) in that pack.
Initially, each bitmap_nr value is set to zero, and each bitmap_pos
value is set to the sentinel BITMAP_POS_UNKNOWN. This is done to ensure
that there are no packs who have an unknown bit position but a somehow
non-zero number of objects (cf. `write_midx_bitmapped_packs()` in
midx-write.c).
Once the pack order is fully determined, midx_pack_order() sets the
bitmap_pos field for any bitmapped packs to zero if they are still
listed as BITMAP_POS_UNKNOWN.
However, we enumerate the bitmapped packs in order of `ctx->pack_perm`.
This is fine for existing cases, since the only time the
`ctx->pack_perm` array holds a value outside of the addressable range of
`ctx->info` is when there are expired packs, which only occurs via 'git
multi-pack-index expire', which does not support writing MIDX bitmaps.
As a result, the range of ctx->pack_perm covers all values in [0,
`ctx->nr`), so enumerating in this order isn't an issue.
A future change necessary for compaction will complicate this further by
introducing a wrapper around the `ctx->pack_perm` array, which turns the
given `pack_int_id` into one that is relative to the lower end of the
compaction range. As a result, indexing into `ctx->pack_perm` through
this helper, say, with "0" will produce a crash when the lower end of
the compaction range has >0 pack(s) in its base layer, since the
subtraction will wrap around the 32-bit unsigned range, resulting in an
uninitialized read.
But the process is completely unnecessary in the first place: we are
enumerating all values of `ctx->info`, and there is no reason to process
them in a different order than they appear in memory. Index `ctx->info`
directly to reflect that.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx-write.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/midx-write.c b/midx-write.c
index 73d24fabbc6..c30f6a70d37 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -637,7 +637,7 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx)
pack_order[i] = data[i].nr;
}
for (i = 0; i < ctx->nr; i++) {
- struct pack_info *pack = &ctx->info[ctx->pack_perm[i]];
+ struct pack_info *pack = &ctx->info[i];
if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
pack->bitmap_pos = 0;
}
--
2.52.0.171.gd6a4e6b6955
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH 07/17] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos`
2025-12-06 20:31 ` [PATCH 07/17] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos` Taylor Blau
@ 2025-12-08 18:26 ` Patrick Steinhardt
2025-12-09 1:59 ` Taylor Blau
0 siblings, 1 reply; 39+ messages in thread
From: Patrick Steinhardt @ 2025-12-08 18:26 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Sat, Dec 06, 2025 at 03:31:19PM -0500, Taylor Blau wrote:
> In midx_pack_order(), we compute for each bitampped pack the first bit
s/bitampped/bitmapped/
> to correspond to an object in that pack, along with how many bits were
> assigned to object(s) in that pack.
>
> Initially, each bitmap_nr value is set to zero, and each bitmap_pos
I assume `bitmap_nr` is the number of bits, whereas `bitmap_pos` is the
position of the first bit?
> value is set to the sentinel BITMAP_POS_UNKNOWN. This is done to ensure
> that there are no packs who have an unknown bit position but a somehow
> non-zero number of objects (cf. `write_midx_bitmapped_packs()` in
> midx-write.c).
>
> Once the pack order is fully determined, midx_pack_order() sets the
> bitmap_pos field for any bitmapped packs to zero if they are still
> listed as BITMAP_POS_UNKNOWN.
If so, it feels somewhat weird that we'd set the `bitmap_pos` to zero.
But I guess it doesn't matter anyway, as I assume that the `bitmap_nr`
would be zero anyway?
Anyway, reading on.
> However, we enumerate the bitmapped packs in order of `ctx->pack_perm`.
Which is the "permutation between pack-int-ids from the previous
multi-pack-index to the new one we are writing"'. So it's basically
tracking which new packs correspond to the old packs.
> This is fine for existing cases, since the only time the
> `ctx->pack_perm` array holds a value outside of the addressable range of
> `ctx->info` is when there are expired packs, which only occurs via 'git
> multi-pack-index expire', which does not support writing MIDX bitmaps.
> As a result, the range of ctx->pack_perm covers all values in [0,
> `ctx->nr`), so enumerating in this order isn't an issue.
>
> A future change necessary for compaction will complicate this further by
> introducing a wrapper around the `ctx->pack_perm` array, which turns the
> given `pack_int_id` into one that is relative to the lower end of the
> compaction range. As a result, indexing into `ctx->pack_perm` through
> this helper, say, with "0" will produce a crash when the lower end of
> the compaction range has >0 pack(s) in its base layer, since the
> subtraction will wrap around the 32-bit unsigned range, resulting in an
> uninitialized read.
>
> But the process is completely unnecessary in the first place: we are
> enumerating all values of `ctx->info`, and there is no reason to process
> them in a different order than they appear in memory. Index `ctx->info`
> directly to reflect that.
Fair. We do initialize the permutations like this:
ALLOC_ARRAY(ctx.pack_perm, ctx.nr);
for (size_t i = 0; i < ctx.nr; i++) {
if (ctx.info[i].expired) {
dropped_packs++;
ctx.pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED;
} else {
ctx.pack_perm[ctx.info[i].orig_pack_int_id] = i - dropped_packs;
}
}
So obviously, the permutation will only ever be different in case we've
got at least one dropped pack, and that only happens when we expire any
packs. So the explanation matches.
Of course it may be a bit more fragile now if we ever added a caller
of this function that _does_ expire data. But we don't have any, so that
enters the territory of overthinking things.
> diff --git a/midx-write.c b/midx-write.c
> index 73d24fabbc6..c30f6a70d37 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -637,7 +637,7 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx)
> pack_order[i] = data[i].nr;
> }
> for (i = 0; i < ctx->nr; i++) {
> - struct pack_info *pack = &ctx->info[ctx->pack_perm[i]];
> + struct pack_info *pack = &ctx->info[i];
> if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
> pack->bitmap_pos = 0;
> }
The change looks simple enough.
Patrick
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH 07/17] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos`
2025-12-08 18:26 ` Patrick Steinhardt
@ 2025-12-09 1:59 ` Taylor Blau
0 siblings, 0 replies; 39+ messages in thread
From: Taylor Blau @ 2025-12-09 1:59 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Mon, Dec 08, 2025 at 07:26:27PM +0100, Patrick Steinhardt wrote:
> On Sat, Dec 06, 2025 at 03:31:19PM -0500, Taylor Blau wrote:
> > In midx_pack_order(), we compute for each bitampped pack the first bit
>
> s/bitampped/bitmapped/
Ugh. My "bitamp" typo strikes again, thanks for spotting!
> > to correspond to an object in that pack, along with how many bits were
> > assigned to object(s) in that pack.
> >
> > Initially, each bitmap_nr value is set to zero, and each bitmap_pos
>
> I assume `bitmap_nr` is the number of bits, whereas `bitmap_pos` is the
> position of the first bit?
That's right!
> > However, we enumerate the bitmapped packs in order of `ctx->pack_perm`.
>
> Which is the "permutation between pack-int-ids from the previous
> multi-pack-index to the new one we are writing"'. So it's basically
> tracking which new packs correspond to the old packs.
Ditto.
> So obviously, the permutation will only ever be different in case we've
> got at least one dropped pack, and that only happens when we expire any
> packs. So the explanation matches.
>
> Of course it may be a bit more fragile now if we ever added a caller
> of this function that _does_ expire data. But we don't have any, so that
> enters the territory of overthinking things.
I think that with incremental MIDXs we will never have such a caller
without a mechanism to tombstone objects in existing packs, but
definitely worth calling out.
> > diff --git a/midx-write.c b/midx-write.c
> > index 73d24fabbc6..c30f6a70d37 100644
> > --- a/midx-write.c
> > +++ b/midx-write.c
> > @@ -637,7 +637,7 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx)
> > pack_order[i] = data[i].nr;
> > }
> > for (i = 0; i < ctx->nr; i++) {
> > - struct pack_info *pack = &ctx->info[ctx->pack_perm[i]];
> > + struct pack_info *pack = &ctx->info[i];
> > if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
> > pack->bitmap_pos = 0;
> > }
>
> The change looks simple enough.
Yeah, I almost wonder if the commit message was more harmful than not.
The main points that I wanted to get across were:
- Ultimately we want to enumerate a list, and there's no reason to do
that in a permuted order.
- Iterating in that permuted order is fine today because the array of
values in ctx->pack_perm are always addressable indices into
ctx->info.
- That won't be the case in the future when we are combining packs from
MIDX layers that have a non-zero m->num_packs_in_base, so adjusting
the implementation now prevents us from running into that pitfall in
such a future.
Let me know if you think that I should adjust the commit message here.
It's hard to know whether something resembling the above is better or
worse than the current version of the commit message from a reviewer's
perspective, so I'm happy to do whatever you think is cleaner ;-).
Thanks,
Taylor
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 08/17] midx-write.c: introduce `struct write_midx_opts`
2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
` (6 preceding siblings ...)
2025-12-06 20:31 ` [PATCH 07/17] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos` Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
2025-12-08 18:26 ` Patrick Steinhardt
2025-12-06 20:31 ` [PATCH 09/17] midx: do not require packs to be sorted in lexicographic order Taylor Blau
` (8 subsequent siblings)
16 siblings, 1 reply; 39+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
In the MIDX writing code, there are four functions which perform some
sort of MIDX write operation. They are:
- write_midx_file()
- write_midx_file_only()
- expire_midx_packs()
- midx_repack()
All of these functions are thin wrappers over `write_midx_internal()`,
which implements the bulk of these routines. As a result, the
`write_midx_internal()` function takes six arguments.
Future commits in this series will want to add additional arguments, and
in general this function's signature will be the union of parameters
among *all* possible ways to write a MIDX.
Instead of adding yet more arguments to this function to support MIDX
compaction, introduce a `struct write_midx_opts`, which has the same
struct members as `write_midx_internal()`'s arguments.
Adding additional fields to the `write_midx_opts` struct is preferable
to adding additional arguments to `write_midx_internal()`. This is
because the callers below all zero-initialize the struct, so each time
we add a new piece of information, we do not have to pass the zero value
for it in all other call-sites that do not care about it.
For now, no functional changes are included in this patch.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx-write.c | 129 ++++++++++++++++++++++++++++++---------------------
1 file changed, 77 insertions(+), 52 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index c30f6a70d37..b262631ae45 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1014,14 +1014,20 @@ static void clear_midx_files(struct odb_source *source,
strbuf_release(&buf);
}
-static int write_midx_internal(struct odb_source *source,
- struct string_list *packs_to_include,
- struct string_list *packs_to_drop,
- const char *preferred_pack_name,
- const char *refs_snapshot,
- unsigned flags)
+struct write_midx_opts {
+ struct odb_source *source;
+
+ struct string_list *packs_to_include;
+ struct string_list *packs_to_drop;
+
+ const char *preferred_pack_name;
+ const char *refs_snapshot;
+ unsigned flags;
+};
+
+static int write_midx_internal(struct write_midx_opts *opts)
{
- struct repository *r = source->odb->repo;
+ struct repository *r = opts->source->odb->repo;
struct strbuf midx_name = STRBUF_INIT;
unsigned char midx_hash[GIT_MAX_RAWSZ];
uint32_t start_pack;
@@ -1041,22 +1047,22 @@ static int write_midx_internal(struct odb_source *source,
trace2_region_enter("midx", "write_midx_internal", r);
ctx.repo = r;
- ctx.source = source;
+ ctx.source = opts->source;
- ctx.incremental = !!(flags & MIDX_WRITE_INCREMENTAL);
+ ctx.incremental = !!(opts->flags & MIDX_WRITE_INCREMENTAL);
if (ctx.incremental)
strbuf_addf(&midx_name,
"%s/pack/multi-pack-index.d/tmp_midx_XXXXXX",
- source->path);
+ opts->source->path);
else
- get_midx_filename(source, &midx_name);
+ get_midx_filename(opts->source, &midx_name);
if (safe_create_leading_directories(r, midx_name.buf))
die_errno(_("unable to create leading directories of %s"),
midx_name.buf);
- if (!packs_to_include || ctx.incremental) {
- struct multi_pack_index *m = get_multi_pack_index(source);
+ if (!opts->packs_to_include || ctx.incremental) {
+ struct multi_pack_index *m = get_multi_pack_index(opts->source);
if (m && !midx_checksum_valid(m)) {
warning(_("ignoring existing multi-pack-index; checksum mismatch"));
m = NULL;
@@ -1071,7 +1077,7 @@ static int write_midx_internal(struct odb_source *source,
*/
if (ctx.incremental)
ctx.base_midx = m;
- else if (!packs_to_include)
+ else if (!opts->packs_to_include)
ctx.m = m;
}
}
@@ -1084,7 +1090,7 @@ static int write_midx_internal(struct odb_source *source,
if (ctx.incremental) {
struct multi_pack_index *m = ctx.base_midx;
while (m) {
- if (flags & MIDX_WRITE_BITMAP && load_midx_revindex(m)) {
+ if (opts->flags & MIDX_WRITE_BITMAP && load_midx_revindex(m)) {
error(_("could not load reverse index for MIDX %s"),
get_midx_checksum(m));
goto cleanup;
@@ -1099,23 +1105,23 @@ static int write_midx_internal(struct odb_source *source,
start_pack = ctx.nr;
ctx.pack_paths_checked = 0;
- if (flags & MIDX_PROGRESS)
+ if (opts->flags & MIDX_PROGRESS)
ctx.progress = start_delayed_progress(r,
_("Adding packfiles to multi-pack-index"), 0);
else
ctx.progress = NULL;
- ctx.to_include = packs_to_include;
+ ctx.to_include = opts->packs_to_include;
- for_each_file_in_pack_dir(source->path, add_pack_to_midx, &ctx);
+ for_each_file_in_pack_dir(opts->source->path, add_pack_to_midx, &ctx);
stop_progress(&ctx.progress);
if ((ctx.m && ctx.nr == ctx.m->num_packs + ctx.m->num_packs_in_base) &&
!ctx.incremental &&
- !(packs_to_include || packs_to_drop)) {
+ !(opts->packs_to_include || opts->packs_to_drop)) {
struct bitmap_index *bitmap_git;
int bitmap_exists;
- int want_bitmap = flags & MIDX_WRITE_BITMAP;
+ int want_bitmap = opts->flags & MIDX_WRITE_BITMAP;
bitmap_git = prepare_midx_bitmap_git(ctx.m);
bitmap_exists = bitmap_git && bitmap_is_midx(bitmap_git);
@@ -1127,7 +1133,8 @@ static int write_midx_internal(struct odb_source *source,
* corresponding bitmap (or one wasn't requested).
*/
if (!want_bitmap)
- clear_midx_files_ext(source, "bitmap", NULL);
+ clear_midx_files_ext(opts->source, "bitmap",
+ NULL);
result = 0;
goto cleanup;
}
@@ -1138,11 +1145,11 @@ static int write_midx_internal(struct odb_source *source,
goto cleanup; /* nothing to do */
}
- if (preferred_pack_name) {
+ if (opts->preferred_pack_name) {
ctx.preferred_pack_idx = NO_PREFERRED_PACK;
for (size_t i = 0; i < ctx.nr; i++) {
- if (!cmp_idx_or_pack_name(preferred_pack_name,
+ if (!cmp_idx_or_pack_name(opts->preferred_pack_name,
ctx.info[i].pack_name)) {
ctx.preferred_pack_idx = i;
break;
@@ -1151,9 +1158,9 @@ static int write_midx_internal(struct odb_source *source,
if (ctx.preferred_pack_idx == NO_PREFERRED_PACK)
warning(_("unknown preferred pack: '%s'"),
- preferred_pack_name);
+ opts->preferred_pack_name);
} else if (ctx.nr &&
- (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))) {
+ (opts->flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))) {
struct packed_git *oldest = ctx.info[0].p;
ctx.preferred_pack_idx = 0;
@@ -1164,7 +1171,7 @@ static int write_midx_internal(struct odb_source *source,
*/
open_pack_index(oldest);
- if (packs_to_drop && packs_to_drop->nr)
+ if (opts->packs_to_drop && opts->packs_to_drop->nr)
BUG("cannot write a MIDX bitmap during expiration");
/*
@@ -1226,20 +1233,21 @@ static int write_midx_internal(struct odb_source *source,
QSORT(ctx.info, ctx.nr, pack_info_compare);
- if (packs_to_drop && packs_to_drop->nr) {
+ if (opts->packs_to_drop && opts->packs_to_drop->nr) {
size_t drop_index = 0;
int missing_drops = 0;
- for (size_t i = 0; i < ctx.nr && drop_index < packs_to_drop->nr; i++) {
+ for (size_t i = 0;
+ i < ctx.nr && drop_index < opts->packs_to_drop->nr; i++) {
int cmp = strcmp(ctx.info[i].pack_name,
- packs_to_drop->items[drop_index].string);
+ opts->packs_to_drop->items[drop_index].string);
if (!cmp) {
drop_index++;
ctx.info[i].expired = 1;
} else if (cmp > 0) {
error(_("did not see pack-file %s to drop"),
- packs_to_drop->items[drop_index].string);
+ opts->packs_to_drop->items[drop_index].string);
drop_index++;
missing_drops++;
i--;
@@ -1276,8 +1284,8 @@ static int write_midx_internal(struct odb_source *source,
}
/* Check that the preferred pack wasn't expired (if given). */
- if (preferred_pack_name) {
- struct pack_info *preferred = bsearch(preferred_pack_name,
+ if (opts->preferred_pack_name) {
+ struct pack_info *preferred = bsearch(opts->preferred_pack_name,
ctx.info, ctx.nr,
sizeof(*ctx.info),
idx_or_pack_name_cmp);
@@ -1285,7 +1293,7 @@ static int write_midx_internal(struct odb_source *source,
uint32_t perm = ctx.pack_perm[preferred->orig_pack_int_id];
if (perm == PACK_EXPIRED)
warning(_("preferred pack '%s' is expired"),
- preferred_pack_name);
+ opts->preferred_pack_name);
}
}
@@ -1299,15 +1307,15 @@ static int write_midx_internal(struct odb_source *source,
}
if (!ctx.entries_nr) {
- if (flags & MIDX_WRITE_BITMAP)
+ if (opts->flags & MIDX_WRITE_BITMAP)
warning(_("refusing to write multi-pack .bitmap without any objects"));
- flags &= ~(MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP);
+ opts->flags &= ~(MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP);
}
if (ctx.incremental) {
struct strbuf lock_name = STRBUF_INIT;
- get_midx_chain_filename(source, &lock_name);
+ get_midx_chain_filename(opts->source, &lock_name);
hold_lock_file_for_update(&lk, lock_name.buf, LOCK_DIE_ON_ERROR);
strbuf_release(&lock_name);
@@ -1350,7 +1358,7 @@ static int write_midx_internal(struct odb_source *source,
MIDX_CHUNK_LARGE_OFFSET_WIDTH),
write_midx_large_offsets);
- if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP)) {
+ if (opts->flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP)) {
ctx.pack_order = midx_pack_order(&ctx);
add_chunk(cf, MIDX_CHUNKID_REVINDEX,
st_mult(ctx.entries_nr, sizeof(uint32_t)),
@@ -1368,11 +1376,11 @@ static int write_midx_internal(struct odb_source *source,
CSUM_FSYNC | CSUM_HASH_IN_STREAM);
free_chunkfile(cf);
- if (flags & MIDX_WRITE_REV_INDEX &&
+ if (opts->flags & MIDX_WRITE_REV_INDEX &&
git_env_bool("GIT_TEST_MIDX_WRITE_REV", 0))
write_midx_reverse_index(&ctx, midx_hash);
- if (flags & MIDX_WRITE_BITMAP) {
+ if (opts->flags & MIDX_WRITE_BITMAP) {
struct packing_data pdata;
struct commit **commits;
uint32_t commits_nr;
@@ -1382,7 +1390,7 @@ static int write_midx_internal(struct odb_source *source,
prepare_midx_packing_data(&pdata, &ctx);
- commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, &ctx);
+ commits = find_commits_for_midx_bitmap(&commits_nr, opts->refs_snapshot, &ctx);
/*
* The previous steps translated the information from
@@ -1395,7 +1403,7 @@ static int write_midx_internal(struct odb_source *source,
if (write_midx_bitmap(&ctx,
midx_hash, &pdata, commits, commits_nr,
- flags) < 0) {
+ opts->flags) < 0) {
error(_("could not write multi-pack bitmap"));
clear_packing_data(&pdata);
free(commits);
@@ -1428,7 +1436,7 @@ static int write_midx_internal(struct odb_source *source,
if (link_midx_to_chain(ctx.base_midx) < 0)
goto cleanup;
- get_split_midx_filename_ext(source, &final_midx_name,
+ get_split_midx_filename_ext(opts->source, &final_midx_name,
midx_hash, MIDX_EXT_MIDX);
if (rename_tempfile(&incr, final_midx_name.buf) < 0) {
@@ -1461,7 +1469,7 @@ static int write_midx_internal(struct odb_source *source,
if (commit_lock_file(&lk) < 0)
die_errno(_("could not write multi-pack-index"));
- clear_midx_files(source, keep_hashes,
+ clear_midx_files(opts->source, keep_hashes,
ctx.num_multi_pack_indexes_before + 1,
ctx.incremental);
result = 0;
@@ -1495,9 +1503,14 @@ int write_midx_file(struct odb_source *source,
const char *preferred_pack_name,
const char *refs_snapshot, unsigned flags)
{
- return write_midx_internal(source, NULL, NULL,
- preferred_pack_name, refs_snapshot,
- flags);
+ struct write_midx_opts opts = {
+ .source = source,
+ .preferred_pack_name = preferred_pack_name,
+ .refs_snapshot = refs_snapshot,
+ .flags = flags,
+ };
+
+ return write_midx_internal(&opts);
}
int write_midx_file_only(struct odb_source *source,
@@ -1505,8 +1518,15 @@ int write_midx_file_only(struct odb_source *source,
const char *preferred_pack_name,
const char *refs_snapshot, unsigned flags)
{
- return write_midx_internal(source, packs_to_include, NULL,
- preferred_pack_name, refs_snapshot, flags);
+ struct write_midx_opts opts = {
+ .source = source,
+ .packs_to_include = packs_to_include,
+ .preferred_pack_name = preferred_pack_name,
+ .refs_snapshot = refs_snapshot,
+ .flags = flags,
+ };
+
+ return write_midx_internal(&opts);
}
int expire_midx_packs(struct odb_source *source, unsigned flags)
@@ -1566,8 +1586,11 @@ int expire_midx_packs(struct odb_source *source, unsigned flags)
free(count);
if (packs_to_drop.nr)
- result = write_midx_internal(source, NULL,
- &packs_to_drop, NULL, NULL, flags);
+ result = write_midx_internal(&(struct write_midx_opts) {
+ .source = source,
+ .packs_to_drop = &packs_to_drop,
+ .flags = flags & MIDX_PROGRESS,
+ });
string_list_clear(&packs_to_drop, 0);
@@ -1774,8 +1797,10 @@ int midx_repack(struct odb_source *source, size_t batch_size, unsigned flags)
goto cleanup;
}
- result = write_midx_internal(source, NULL, NULL, NULL, NULL,
- flags);
+ result = write_midx_internal(&(struct write_midx_opts) {
+ .source = source,
+ .flags = flags,
+ });
cleanup:
free(include_pack);
--
2.52.0.171.gd6a4e6b6955
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH 08/17] midx-write.c: introduce `struct write_midx_opts`
2025-12-06 20:31 ` [PATCH 08/17] midx-write.c: introduce `struct write_midx_opts` Taylor Blau
@ 2025-12-08 18:26 ` Patrick Steinhardt
2025-12-09 2:04 ` Taylor Blau
0 siblings, 1 reply; 39+ messages in thread
From: Patrick Steinhardt @ 2025-12-08 18:26 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Sat, Dec 06, 2025 at 03:31:22PM -0500, Taylor Blau wrote:
> diff --git a/midx-write.c b/midx-write.c
> index c30f6a70d37..b262631ae45 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -1014,14 +1014,20 @@ static void clear_midx_files(struct odb_source *source,
> strbuf_release(&buf);
> }
>
> -static int write_midx_internal(struct odb_source *source,
> - struct string_list *packs_to_include,
> - struct string_list *packs_to_drop,
> - const char *preferred_pack_name,
> - const char *refs_snapshot,
> - unsigned flags)
> +struct write_midx_opts {
> + struct odb_source *source;
> +
> + struct string_list *packs_to_include;
> + struct string_list *packs_to_drop;
> +
> + const char *preferred_pack_name;
> + const char *refs_snapshot;
> + unsigned flags;
> +};
> +
> +static int write_midx_internal(struct write_midx_opts *opts)
> {
> - struct repository *r = source->odb->repo;
> + struct repository *r = opts->source->odb->repo;
> struct strbuf midx_name = STRBUF_INIT;
> unsigned char midx_hash[GIT_MAX_RAWSZ];
> uint32_t start_pack;
One might argue that parameters which _must_ be passed could be moved
out of the structure and into the function signature, and as far as I
understand, that would only be the `struct odb_source`. After all, we
are talking about options, and a mandatory field is not really an option
in my book. It also makes the interface at least a tiny bit more self
documenting.
Other than that this patch looks like a nice improvement to me.
> @@ -1566,8 +1586,11 @@ int expire_midx_packs(struct odb_source *source, unsigned flags)
> free(count);
>
> if (packs_to_drop.nr)
> - result = write_midx_internal(source, NULL,
> - &packs_to_drop, NULL, NULL, flags);
> + result = write_midx_internal(&(struct write_midx_opts) {
> + .source = source,
> + .packs_to_drop = &packs_to_drop,
> + .flags = flags & MIDX_PROGRESS,
> + });
>
> string_list_clear(&packs_to_drop, 0);
>
I think this syntax is not allowed in our codebase except for a test
balloon just yet. See aso 9b2527caa4 (CodingGuidelines: document test
balloons in flight, 2025-07-23):
since late 2024 with v2.48.0-rc0~20, we have test balloons for
compound literal syntax, e.g., (struct foo){ .member = value };
our hope is that no platforms we care about have trouble using
them, and officially adopt its wider use in mid 2026. Do not add
more use of the syntax until that happens.
> @@ -1774,8 +1797,10 @@ int midx_repack(struct odb_source *source, size_t batch_size, unsigned flags)
> goto cleanup;
> }
>
> - result = write_midx_internal(source, NULL, NULL, NULL, NULL,
> - flags);
> + result = write_midx_internal(&(struct write_midx_opts) {
> + .source = source,
> + .flags = flags,
> + });
Same here.
Patrick
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH 08/17] midx-write.c: introduce `struct write_midx_opts`
2025-12-08 18:26 ` Patrick Steinhardt
@ 2025-12-09 2:04 ` Taylor Blau
0 siblings, 0 replies; 39+ messages in thread
From: Taylor Blau @ 2025-12-09 2:04 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Mon, Dec 08, 2025 at 07:26:36PM +0100, Patrick Steinhardt wrote:
> One might argue that parameters which _must_ be passed could be moved
> out of the structure and into the function signature, and as far as I
> understand, that would only be the `struct odb_source`. After all, we
> are talking about options, and a mandatory field is not really an option
> in my book. It also makes the interface at least a tiny bit more self
> documenting.
>
> Other than that this patch looks like a nice improvement to me.
I think that's a reasonable consideration. My preference would be to
keep everything contained within the 'struct write_midx_opts' since it
makes it really easy to pass everything you might need around to other
sub-routines by just passing a single pointer.
So I'm inclined to keep the new API as-is presented here, but I'm happy
to discuss changing it around if you feel strongly about it.
As a reasonable middle-ground, I added a "/* non-optional */" next to
the "source" member within the new structure.
> > @@ -1566,8 +1586,11 @@ int expire_midx_packs(struct odb_source *source, unsigned flags)
> > free(count);
> >
> > if (packs_to_drop.nr)
> > - result = write_midx_internal(source, NULL,
> > - &packs_to_drop, NULL, NULL, flags);
> > + result = write_midx_internal(&(struct write_midx_opts) {
> > + .source = source,
> > + .packs_to_drop = &packs_to_drop,
> > + .flags = flags & MIDX_PROGRESS,
> > + });
> >
> > string_list_clear(&packs_to_drop, 0);
> >
>
> I think this syntax is not allowed in our codebase except for a test
> balloon just yet. See aso 9b2527caa4 (CodingGuidelines: document test
> balloons in flight, 2025-07-23):
>
> since late 2024 with v2.48.0-rc0~20, we have test balloons for
> compound literal syntax, e.g., (struct foo){ .member = value };
> our hope is that no platforms we care about have trouble using
> them, and officially adopt its wider use in mid 2026. Do not add
> more use of the syntax until that happens.
>
> > @@ -1774,8 +1797,10 @@ int midx_repack(struct odb_source *source, size_t batch_size, unsigned flags)
> > goto cleanup;
> > }
> >
> > - result = write_midx_internal(source, NULL, NULL, NULL, NULL,
> > - flags);
> > + result = write_midx_internal(&(struct write_midx_opts) {
> > + .source = source,
> > + .flags = flags,
> > + });
>
> Same here.
Hah, I even remember checking to make sure there wasn't such a test
balloon and then thinking that I'd need to adjust before sending. That
must have been just before I got up from my desk, and I must have
forgotten about it until now. Fixed up locally, thanks for spotting!
Thanks,
Taylor
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 09/17] midx: do not require packs to be sorted in lexicographic order
2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
` (7 preceding siblings ...)
2025-12-06 20:31 ` [PATCH 08/17] midx-write.c: introduce `struct write_midx_opts` Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
2025-12-08 18:26 ` Patrick Steinhardt
2025-12-06 20:31 ` [PATCH 10/17] git-compat-util.h: introduce `u32_add()` Taylor Blau
` (7 subsequent siblings)
16 siblings, 1 reply; 39+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
The MIDX file format currently requires that pack files be identified by
the lexicographic ordering of their names (that is, a pack having a
checksum beginning with "abc" would have a numeric pack_int_id which is
smaller than the same value for a pack beginning with "bcd").
As a result, it is impossible to combine adjacent MIDX layers together
without permuting bits from bitmaps that are in more recent layer(s).
To see why, consider the following example:
| packs | preferred pack
--------+-------------+---------------
MIDX #0 | { X, Y, Z } | Y
MIDX #1 | { A, B, C } | B
MIDX #2 | { D, E, F } | D
, where MIDX #2's base MIDX is MIDX #1, and so on. Suppose that we want
to combine MIDX layers #0 and #1, to create a new layer #0' containing
the packs from both layers. With the original three MIDX layers, objects
are laid out in the bitmap in the order they appear in their source
pack, and the packs themselves are arranged according to the pseudo-pack
order. In this case, that ordering is Y, X, Z, B, A, C.
But recall that the pseudo-pack ordering is defined by the order that
packs appear in the MIDX, with the exception of the preferred pack,
which sorts ahead of all other packs regardless of its position within
the MIDX. In the above example, that means that pack 'Y' could be placed
anywhere (so long as it is designated as preferred), however, all other
packs must be placed in the location listed above.
Because that ordering isn't sorted lexicographically, it is impossible
to compact MIDX layers in the above configuration without permuting the
object-to-bit-position mapping. Changing this mapping would affect all
bitmaps belonging to newer layers, rendering the bitmaps associated with
MIDX #2 unreadable.
One of the goals of MIDX compaction is that we are able to shrink the
length of the MIDX chain *without* invalidating bitmaps that belong to
newer layers, and the lexicographic ordering constraint is at odds with
this goal.
However, packs do not *need* to be lexicographically ordered within the
MIDX. As far as I can gather, the only reason they are sorted lexically
is to make it possible to perform a binary search over the pack names in
a MIDX, necessary to make `midx_contains_pack()`'s performance
logarithmic in the number of packs rather than linear.
Relax this constraint by allowing MIDX writes to proceed with packs that
are not arranged in lexicographic order. `midx_contains_pack()` will
lazily instantiate a `pack_names_sorted` array on the MIDX, which will
be used to implement the binary search over pack names.
Note that this produces MIDXs which may be incompatible with earlier
versions of Git that have stricter requirements on the layout of packs
within a MIDX. This patch does *not* modify the version number of the
MIDX format, since existing versions of Git already know to gracefully
ignore a MIDX with packs that appear out-of-order.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx-write.c | 5 -----
midx.c | 28 ++++++++++++++++++++++------
midx.h | 1 +
t/t5319-multi-pack-index.sh | 5 -----
4 files changed, 23 insertions(+), 16 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index b262631ae45..55342fcb6dd 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -410,11 +410,6 @@ static int write_midx_pack_names(struct hashfile *f, void *data)
if (ctx->info[i].expired)
continue;
- if (i && strcmp(ctx->info[i].pack_name, ctx->info[i - 1].pack_name) <= 0)
- BUG("incorrect pack-file order: %s before %s",
- ctx->info[i - 1].pack_name,
- ctx->info[i].pack_name);
-
writelen = strlen(ctx->info[i].pack_name) + 1;
hashwrite(f, ctx->info[i].pack_name, writelen);
written += writelen;
diff --git a/midx.c b/midx.c
index f9b11de9ca9..4d5fe880649 100644
--- a/midx.c
+++ b/midx.c
@@ -209,11 +209,6 @@ static struct multi_pack_index *load_multi_pack_index_one(struct odb_source *sou
if (!end)
die(_("multi-pack-index pack-name chunk is too short"));
cur_pack_name = end + 1;
-
- if (i && strcmp(m->pack_names[i], m->pack_names[i - 1]) <= 0)
- die(_("multi-pack-index pack names out of order: '%s' before '%s'"),
- m->pack_names[i - 1],
- m->pack_names[i]);
}
trace2_data_intmax("midx", r, "load/num_packs", m->num_packs);
@@ -411,6 +406,7 @@ void close_midx(struct multi_pack_index *m)
}
FREE_AND_NULL(m->packs);
FREE_AND_NULL(m->pack_names);
+ FREE_AND_NULL(m->pack_names_sorted);
free(m);
}
@@ -656,17 +652,37 @@ int cmp_idx_or_pack_name(const char *idx_or_pack_name,
return strcmp(idx_or_pack_name, idx_name);
}
+
+static int midx_pack_names_cmp(const void *a, const void *b, void *m_)
+{
+ struct multi_pack_index *m = m_;
+ return strcmp(m->pack_names[*(const size_t *)a],
+ m->pack_names[*(const size_t *)b]);
+}
+
static int midx_contains_pack_1(struct multi_pack_index *m,
const char *idx_or_pack_name)
{
uint32_t first = 0, last = m->num_packs;
+ if (!m->pack_names_sorted) {
+ uint32_t i;
+
+ ALLOC_ARRAY(m->pack_names_sorted, m->num_packs);
+
+ for (i = 0; i < m->num_packs; i++)
+ m->pack_names_sorted[i] = i;
+
+ QSORT_S(m->pack_names_sorted, m->num_packs, midx_pack_names_cmp,
+ m);
+ }
+
while (first < last) {
uint32_t mid = first + (last - first) / 2;
const char *current;
int cmp;
- current = m->pack_names[mid];
+ current = m->pack_names[m->pack_names_sorted[mid]];
cmp = cmp_idx_or_pack_name(idx_or_pack_name, current);
if (!cmp)
return 1;
diff --git a/midx.h b/midx.h
index e188ffeb578..39bf04b18e5 100644
--- a/midx.h
+++ b/midx.h
@@ -71,6 +71,7 @@ struct multi_pack_index {
uint32_t num_packs_in_base;
const char **pack_names;
+ size_t *pack_names_sorted;
struct packed_git **packs;
};
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index ca020091dda..03676d37b98 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -450,11 +450,6 @@ test_expect_success 'verify invalid chunk offset' '
"improper chunk offset(s)"
'
-test_expect_success 'verify packnames out of order' '
- corrupt_midx_and_verify $MIDX_BYTE_PACKNAME_ORDER "z" $objdir \
- "pack names out of order"
-'
-
test_expect_success 'verify missing pack' '
corrupt_midx_and_verify $MIDX_BYTE_PACKNAME_ORDER "a" $objdir \
"failed to load pack"
--
2.52.0.171.gd6a4e6b6955
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH 09/17] midx: do not require packs to be sorted in lexicographic order
2025-12-06 20:31 ` [PATCH 09/17] midx: do not require packs to be sorted in lexicographic order Taylor Blau
@ 2025-12-08 18:26 ` Patrick Steinhardt
2025-12-09 2:07 ` Taylor Blau
0 siblings, 1 reply; 39+ messages in thread
From: Patrick Steinhardt @ 2025-12-08 18:26 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Sat, Dec 06, 2025 at 03:31:25PM -0500, Taylor Blau wrote:
> Note that this produces MIDXs which may be incompatible with earlier
> versions of Git that have stricter requirements on the layout of packs
> within a MIDX. This patch does *not* modify the version number of the
> MIDX format, since existing versions of Git already know to gracefully
> ignore a MIDX with packs that appear out-of-order.
Interesting. Did you verify how other implementations of Git behave if
we start to relax this requirement? It seems like a somewhat dangerous
assumption to me that this will just continue to work.
Also, is there a reason why you prefer this over bumping the version
number?
> @@ -656,17 +652,37 @@ int cmp_idx_or_pack_name(const char *idx_or_pack_name,
> return strcmp(idx_or_pack_name, idx_name);
> }
>
> +
> +static int midx_pack_names_cmp(const void *a, const void *b, void *m_)
> +{
> + struct multi_pack_index *m = m_;
> + return strcmp(m->pack_names[*(const size_t *)a],
> + m->pack_names[*(const size_t *)b]);
> +}
> +
> static int midx_contains_pack_1(struct multi_pack_index *m,
> const char *idx_or_pack_name)
> {
> uint32_t first = 0, last = m->num_packs;
>
> + if (!m->pack_names_sorted) {
> + uint32_t i;
> +
> + ALLOC_ARRAY(m->pack_names_sorted, m->num_packs);
> +
> + for (i = 0; i < m->num_packs; i++)
> + m->pack_names_sorted[i] = i;
> +
> + QSORT_S(m->pack_names_sorted, m->num_packs, midx_pack_names_cmp,
> + m);
> + }
> +
> while (first < last) {
> uint32_t mid = first + (last - first) / 2;
> const char *current;
> int cmp;
>
> - current = m->pack_names[mid];
> + current = m->pack_names[m->pack_names_sorted[mid]];
> cmp = cmp_idx_or_pack_name(idx_or_pack_name, current);
> if (!cmp)
> return 1;
I assume that it cannot happen that we append to the array of MIDX'd
packs after we have sorted. It would mean that the MIDX somehow changed
its representation or was amended to, which isn't possible.
Patrick
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH 09/17] midx: do not require packs to be sorted in lexicographic order
2025-12-08 18:26 ` Patrick Steinhardt
@ 2025-12-09 2:07 ` Taylor Blau
2025-12-09 2:11 ` Taylor Blau
0 siblings, 1 reply; 39+ messages in thread
From: Taylor Blau @ 2025-12-09 2:07 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Mon, Dec 08, 2025 at 07:26:53PM +0100, Patrick Steinhardt wrote:
> On Sat, Dec 06, 2025 at 03:31:25PM -0500, Taylor Blau wrote:
> > Note that this produces MIDXs which may be incompatible with earlier
> > versions of Git that have stricter requirements on the layout of packs
> > within a MIDX. This patch does *not* modify the version number of the
> > MIDX format, since existing versions of Git already know to gracefully
> > ignore a MIDX with packs that appear out-of-order.
>
> Interesting. Did you verify how other implementations of Git behave if
> we start to relax this requirement? It seems like a somewhat dangerous
> assumption to me that this will just continue to work.
That's a great point. It looks like current libgit2 assumes[1] that the
list is sorted and complains loudly if it is not. Presumably other
implementations behave similarly.
I think that is a compelling enough argument to swing us towards
bumping the version number to avoid compatibility issues.
> Also, is there a reason why you prefer this over bumping the version
> number?
I was trying to avoid having all existing Git clients be unable to read
v2 MIDXs for such a seemingly minor change, but I think the above
compels us to.
> I assume that it cannot happen that we append to the array of MIDX'd
> packs after we have sorted. It would mean that the MIDX somehow changed
> its representation or was amended to, which isn't possible.
That's right.
Thanks,
Taylor
[1]: https://github.com/libgit2/libgit2/blob/v1.9.2/src/libgit2/midx.c#L75-L76
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 09/17] midx: do not require packs to be sorted in lexicographic order
2025-12-09 2:07 ` Taylor Blau
@ 2025-12-09 2:11 ` Taylor Blau
0 siblings, 0 replies; 39+ messages in thread
From: Taylor Blau @ 2025-12-09 2:11 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Mon, Dec 08, 2025 at 09:07:33PM -0500, Taylor Blau wrote:
> On Mon, Dec 08, 2025 at 07:26:53PM +0100, Patrick Steinhardt wrote:
> > On Sat, Dec 06, 2025 at 03:31:25PM -0500, Taylor Blau wrote:
> > > Note that this produces MIDXs which may be incompatible with earlier
> > > versions of Git that have stricter requirements on the layout of packs
> > > within a MIDX. This patch does *not* modify the version number of the
> > > MIDX format, since existing versions of Git already know to gracefully
> > > ignore a MIDX with packs that appear out-of-order.
> >
> > Interesting. Did you verify how other implementations of Git behave if
> > we start to relax this requirement? It seems like a somewhat dangerous
> > assumption to me that this will just continue to work.
>
> That's a great point. It looks like current libgit2 assumes[1] that the
> list is sorted and complains loudly if it is not. Presumably other
> implementations behave similarly.
>
> I think that is a compelling enough argument to swing us towards
> bumping the version number to avoid compatibility issues.
I had another thought about how we might work around this without
forcing a compatibility issue, but it's a non-starter. I wanted to share
it on the list for posterity regardless.
I was going to add that we could instead consider adding a new chunk to
the MIDX format that lists the pack names in the order that they should
appear in the pseudo-pack order. Absent of that chunk, the pseudo-pack
order would be defined by the lexicographic order of pack names. If the
chunk exists, it would supersede that ordering.
But that just kicks the can down the road, since implementations like
libgit2 would think that they could read a *.midx file, but then they'd
produce all sorts of errors when trying to read its corresponding
*.bitmap file by permuting its bits out-of-order.
(I'm not sure off-hand whether or not libgit2 supports reading MIDX
bitmaps to begin with. Regardless, we should not introduce the
possibility for such a breakage in clients that *do* support reading
MIDX bitmaps, whether or not libgit2 is such a client.)
Thanks,
Taylor
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 10/17] git-compat-util.h: introduce `u32_add()`
2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
` (8 preceding siblings ...)
2025-12-06 20:31 ` [PATCH 09/17] midx: do not require packs to be sorted in lexicographic order Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
2025-12-08 18:27 ` Patrick Steinhardt
2025-12-06 20:31 ` [PATCH 11/17] midx-write.c: introduce `midx_pack_perm()` helper Taylor Blau
` (6 subsequent siblings)
16 siblings, 1 reply; 39+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
A future commit will want to add two 32-bit unsigned values together
while checking for overflow. Introduce a variant of the u64_add()
function for operating on 32-bit inputs.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
git-compat-util.h | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/git-compat-util.h b/git-compat-util.h
index 398e0fac4fa..a7aa5f05fc9 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -670,6 +670,14 @@ static inline int cast_size_t_to_int(size_t a)
return (int)a;
}
+static inline uint32_t u32_add(uint32_t a, uint32_t b)
+{
+ if (unsigned_add_overflows(a, b))
+ die("uint32_t overflow: %"PRIuMAX" + %"PRIuMAX,
+ (uintmax_t)a, (uintmax_t)b);
+ return a + b;
+}
+
static inline uint64_t u64_mult(uint64_t a, uint64_t b)
{
if (unsigned_mult_overflows(a, b))
--
2.52.0.171.gd6a4e6b6955
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH 10/17] git-compat-util.h: introduce `u32_add()`
2025-12-06 20:31 ` [PATCH 10/17] git-compat-util.h: introduce `u32_add()` Taylor Blau
@ 2025-12-08 18:27 ` Patrick Steinhardt
2025-12-09 2:13 ` Taylor Blau
0 siblings, 1 reply; 39+ messages in thread
From: Patrick Steinhardt @ 2025-12-08 18:27 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Sat, Dec 06, 2025 at 03:31:28PM -0500, Taylor Blau wrote:
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 398e0fac4fa..a7aa5f05fc9 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -670,6 +670,14 @@ static inline int cast_size_t_to_int(size_t a)
> return (int)a;
> }
>
> +static inline uint32_t u32_add(uint32_t a, uint32_t b)
> +{
> + if (unsigned_add_overflows(a, b))
> + die("uint32_t overflow: %"PRIuMAX" + %"PRIuMAX,
> + (uintmax_t)a, (uintmax_t)b);
> + return a + b;
> +}
We already use PRIu32 in our codebase, so why is the cast necessary?
Patrick
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH 10/17] git-compat-util.h: introduce `u32_add()`
2025-12-08 18:27 ` Patrick Steinhardt
@ 2025-12-09 2:13 ` Taylor Blau
0 siblings, 0 replies; 39+ messages in thread
From: Taylor Blau @ 2025-12-09 2:13 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Mon, Dec 08, 2025 at 07:27:01PM +0100, Patrick Steinhardt wrote:
> On Sat, Dec 06, 2025 at 03:31:28PM -0500, Taylor Blau wrote:
> > diff --git a/git-compat-util.h b/git-compat-util.h
> > index 398e0fac4fa..a7aa5f05fc9 100644
> > --- a/git-compat-util.h
> > +++ b/git-compat-util.h
> > @@ -670,6 +670,14 @@ static inline int cast_size_t_to_int(size_t a)
> > return (int)a;
> > }
> >
> > +static inline uint32_t u32_add(uint32_t a, uint32_t b)
> > +{
> > + if (unsigned_add_overflows(a, b))
> > + die("uint32_t overflow: %"PRIuMAX" + %"PRIuMAX,
> > + (uintmax_t)a, (uintmax_t)b);
> > + return a + b;
> > +}
>
> We already use PRIu32 in our codebase, so why is the cast necessary?
I don't think it is; we could easily write this as:
die("uint32_t overflow: %"PRIu32" + %"PRIu32, a, b);
instead, but this matches the convention of other similar functions in
the compat-util header.
(It's possible that there is some reasoning here that using PRIuMAX
really *is* necessary, but it isn't clear to me that's the case.)
Thanks,
Taylor
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 11/17] midx-write.c: introduce `midx_pack_perm()` helper
2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
` (9 preceding siblings ...)
2025-12-06 20:31 ` [PATCH 10/17] git-compat-util.h: introduce `u32_add()` Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 12/17] midx-write.c: extract `fill_pack_from_midx()` Taylor Blau
` (5 subsequent siblings)
16 siblings, 0 replies; 39+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
The `ctx->pack_perm` array can be considered as a permutation between
the original `pack_int_id` of some given pack to its position in the
`ctx->info` array containing all packs.
Today we can always index into this array with any known `pack_int_id`,
since there is never a `pack_int_id` which is greater than or equal to
the value `ctx->nr`.
That is not necessarily the case with MIDX compaction. For example,
suppose we have a MIDX chain with three layers, each containing three
packs. The base of the MIDX chain will have packs with IDs 0, 1, and 2,
the next layer 3, 4, and 5, and so on. If we are compacting the topmost
two layers, we'll have input `pack_int_id` values between [3, 8], but
`ctx->nr` will only be 6.
In that example, if we want to know where the pack whose original
`pack_int_id` value was, say, 7, we would compute `ctx->pack_perm[7]`,
leading to an uninitialized read, since there are only 6 entries
allocated in that array.
To address this, there are a couple of options:
- We could allocate enough entries in `ctx->pack_perm` to accommodate
the largest `orig_pack_int_id` value.
- Or, we could internally shift the input values by the number of packs
in the base layer of the lower end of the MIDX compaction range.
This patch prepare us to take the latter approach, since it does not
allocate more memory than strictly necessary. (In our above example, the
base of the lower end of the compaction range is the first MIDX layer
(having three packs), so we would end up indexing `ctx->pack_perm[7-3]`,
which is a valid read.)
Note that this patch does not actually implement that approach yet, but
merely performs a behavior-preserving refactoring which will make the
change easier to carry out in the future.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx-write.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index 55342fcb6dd..4a1a16431a6 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -114,6 +114,12 @@ struct write_midx_context {
struct odb_source *source;
};
+static uint32_t midx_pack_perm(struct write_midx_context *ctx,
+ uint32_t orig_pack_int_id)
+{
+ return ctx->pack_perm[orig_pack_int_id];
+}
+
static int should_include_pack(const struct write_midx_context *ctx,
const char *file_name)
{
@@ -509,12 +515,12 @@ static int write_midx_object_offsets(struct hashfile *f,
for (i = 0; i < ctx->entries_nr; i++) {
struct pack_midx_entry *obj = list++;
- if (ctx->pack_perm[obj->pack_int_id] == PACK_EXPIRED)
+ if (midx_pack_perm(ctx, obj->pack_int_id) == PACK_EXPIRED)
BUG("object %s is in an expired pack with int-id %d",
oid_to_hex(&obj->oid),
obj->pack_int_id);
- hashwrite_be32(f, ctx->pack_perm[obj->pack_int_id]);
+ hashwrite_be32(f, midx_pack_perm(ctx, obj->pack_int_id));
if (ctx->large_offsets_needed && obj->offset >> 31)
hashwrite_be32(f, MIDX_LARGE_OFFSET_NEEDED | nr_large_offset++);
@@ -615,7 +621,7 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx)
for (i = 0; i < ctx->entries_nr; i++) {
struct pack_midx_entry *e = &ctx->entries[i];
data[i].nr = i;
- data[i].pack = ctx->pack_perm[e->pack_int_id];
+ data[i].pack = midx_pack_perm(ctx, e->pack_int_id);
if (!e->preferred)
data[i].pack |= (1U << 31);
data[i].offset = e->offset;
@@ -625,7 +631,7 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx)
for (i = 0; i < ctx->entries_nr; i++) {
struct pack_midx_entry *e = &ctx->entries[data[i].nr];
- struct pack_info *pack = &ctx->info[ctx->pack_perm[e->pack_int_id]];
+ struct pack_info *pack = &ctx->info[midx_pack_perm(ctx, e->pack_int_id)];
if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
pack->bitmap_pos = i + base_objects;
pack->bitmap_nr++;
@@ -686,7 +692,7 @@ static void prepare_midx_packing_data(struct packing_data *pdata,
struct object_entry *to = packlist_alloc(pdata, &from->oid);
oe_set_in_pack(pdata, to,
- ctx->info[ctx->pack_perm[from->pack_int_id]].p);
+ ctx->info[midx_pack_perm(ctx, from->pack_int_id)].p);
}
trace2_region_leave("midx", "prepare_midx_packing_data", ctx->repo);
@@ -1285,7 +1291,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
sizeof(*ctx.info),
idx_or_pack_name_cmp);
if (preferred) {
- uint32_t perm = ctx.pack_perm[preferred->orig_pack_int_id];
+ uint32_t perm = midx_pack_perm(&ctx, preferred->orig_pack_int_id);
if (perm == PACK_EXPIRED)
warning(_("preferred pack '%s' is expired"),
opts->preferred_pack_name);
--
2.52.0.171.gd6a4e6b6955
^ permalink raw reply related [flat|nested] 39+ messages in thread* [PATCH 12/17] midx-write.c: extract `fill_pack_from_midx()`
2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
` (10 preceding siblings ...)
2025-12-06 20:31 ` [PATCH 11/17] midx-write.c: introduce `midx_pack_perm()` helper Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 13/17] midx-write.c: enumerate `pack_int_id` values directly Taylor Blau
` (4 subsequent siblings)
16 siblings, 0 replies; 39+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
When filling packs from an existing MIDX, `fill_packs_from_midx()`
handles preparing a MIDX'd pack, and reading out its pack name from the
existing MIDX.
MIDX compaction will want to perform an identical operation, though the
caller will look quite different than `fill_packs_from_midx()`. To
reduce any future code duplication, extract `fill_pack_from_midx()`
from `fill_packs_from_midx()` to prepare to call our new helper function
in a future change.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx-write.c | 27 +++++++++++++++++++++------
1 file changed, 21 insertions(+), 6 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index 4a1a16431a6..5927691f6a0 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -910,6 +910,21 @@ static int write_midx_bitmap(struct write_midx_context *ctx,
return ret;
}
+static int fill_pack_from_midx(struct pack_info *info,
+ struct multi_pack_index *m,
+ uint32_t pack_int_id)
+{
+ if (prepare_midx_pack(m, pack_int_id))
+ return error(_("could not load pack %d"), pack_int_id);
+
+ fill_pack_info(info,
+ m->packs[pack_int_id - m->num_packs_in_base],
+ m->pack_names[pack_int_id - m->num_packs_in_base],
+ pack_int_id);
+
+ return 0;
+}
+
static int fill_packs_from_midx(struct write_midx_context *ctx)
{
struct multi_pack_index *m;
@@ -918,13 +933,13 @@ static int fill_packs_from_midx(struct write_midx_context *ctx)
uint32_t i;
for (i = 0; i < m->num_packs; i++) {
- if (prepare_midx_pack(m, m->num_packs_in_base + i))
- return error(_("could not load pack"));
-
ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
- fill_pack_info(&ctx->info[ctx->nr++], m->packs[i],
- m->pack_names[i],
- m->num_packs_in_base + i);
+
+ if (fill_pack_from_midx(&ctx->info[ctx->nr], m,
+ m->num_packs_in_base + i) < 0)
+ return -1;
+
+ ctx->nr++;
}
}
return 0;
--
2.52.0.171.gd6a4e6b6955
^ permalink raw reply related [flat|nested] 39+ messages in thread* [PATCH 13/17] midx-write.c: enumerate `pack_int_id` values directly
2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
` (11 preceding siblings ...)
2025-12-06 20:31 ` [PATCH 12/17] midx-write.c: extract `fill_pack_from_midx()` Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
2025-12-08 18:27 ` Patrick Steinhardt
2025-12-06 20:31 ` [PATCH 14/17] midx-write.c: factor fanout layering from `compute_sorted_entries()` Taylor Blau
` (3 subsequent siblings)
16 siblings, 1 reply; 39+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Our `midx-write.c::fill_packs_from_midx()` function currently enumerates
the range [0, m->num_packs), and then shifts its index variable up by
`m->num_packs_in_base` to produce a valid `pack_int_id`.
Instead, directly enumerate the range:
[m->num_packs_in_base, m->num_packs_in_base + m->num_packs)
, which are the original pack_int_ids themselves as opposed to the
indexes of those packs relative to the MIDX layer they are contained
within.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx-write.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index 5927691f6a0..d3644276aad 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -932,11 +932,11 @@ static int fill_packs_from_midx(struct write_midx_context *ctx)
for (m = ctx->m; m; m = m->base_midx) {
uint32_t i;
- for (i = 0; i < m->num_packs; i++) {
+ for (i = m->num_packs_in_base;
+ i < m->num_packs_in_base + m->num_packs; i++) {
ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
- if (fill_pack_from_midx(&ctx->info[ctx->nr], m,
- m->num_packs_in_base + i) < 0)
+ if (fill_pack_from_midx(&ctx->info[ctx->nr], m, i) < 0)
return -1;
ctx->nr++;
--
2.52.0.171.gd6a4e6b6955
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH 13/17] midx-write.c: enumerate `pack_int_id` values directly
2025-12-06 20:31 ` [PATCH 13/17] midx-write.c: enumerate `pack_int_id` values directly Taylor Blau
@ 2025-12-08 18:27 ` Patrick Steinhardt
2025-12-09 2:14 ` Taylor Blau
0 siblings, 1 reply; 39+ messages in thread
From: Patrick Steinhardt @ 2025-12-08 18:27 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Sat, Dec 06, 2025 at 03:31:37PM -0500, Taylor Blau wrote:
> Our `midx-write.c::fill_packs_from_midx()` function currently enumerates
> the range [0, m->num_packs), and then shifts its index variable up by
> `m->num_packs_in_base` to produce a valid `pack_int_id`.
>
> Instead, directly enumerate the range:
>
> [m->num_packs_in_base, m->num_packs_in_base + m->num_packs)
>
> , which are the original pack_int_ids themselves as opposed to the
> indexes of those packs relative to the MIDX layer they are contained
> within.
Sensible. I was confused a bit by the previous change because I couldn't
quite spot the shift happening. I think this makes things a bit easier
to read.
Patrick
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 13/17] midx-write.c: enumerate `pack_int_id` values directly
2025-12-08 18:27 ` Patrick Steinhardt
@ 2025-12-09 2:14 ` Taylor Blau
0 siblings, 0 replies; 39+ messages in thread
From: Taylor Blau @ 2025-12-09 2:14 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Mon, Dec 08, 2025 at 07:27:08PM +0100, Patrick Steinhardt wrote:
> On Sat, Dec 06, 2025 at 03:31:37PM -0500, Taylor Blau wrote:
> > Our `midx-write.c::fill_packs_from_midx()` function currently enumerates
> > the range [0, m->num_packs), and then shifts its index variable up by
> > `m->num_packs_in_base` to produce a valid `pack_int_id`.
> >
> > Instead, directly enumerate the range:
> >
> > [m->num_packs_in_base, m->num_packs_in_base + m->num_packs)
> >
> > , which are the original pack_int_ids themselves as opposed to the
> > indexes of those packs relative to the MIDX layer they are contained
> > within.
>
> Sensible. I was confused a bit by the previous change because I couldn't
> quite spot the shift happening. I think this makes things a bit easier
> to read.
I'm glad that the end result was more pleasing. I have gone back and
forth whether to enumerate [0, m->num_packs) and shift, or to enumerate
the pack_int_ids directly, so it's helpful to know what style others
prefer.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 14/17] midx-write.c: factor fanout layering from `compute_sorted_entries()`
2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
` (12 preceding siblings ...)
2025-12-06 20:31 ` [PATCH 13/17] midx-write.c: enumerate `pack_int_id` values directly Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 15/17] t/helper/test-read-midx.c: plug memory leak when selecting layer Taylor Blau
` (2 subsequent siblings)
16 siblings, 0 replies; 39+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
When computing the set of objects to appear in a MIDX, we use
compute_sorted_entries(), which handles objects from various existing
sources one fanout layer at a time.
The process for computing this set is slightly different during MIDX
compaction, so factor out the existing functionality into its own
routine to prevent `compute_sorted_entries()` from becoming too
difficult to read.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx-write.c | 42 +++++++++++++++++++++++++-----------------
1 file changed, 25 insertions(+), 17 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index d3644276aad..7854561359d 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -323,6 +323,30 @@ static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout,
}
}
+static void midx_fanout_add(struct midx_fanout *fanout,
+ struct write_midx_context *ctx,
+ uint32_t start_pack,
+ uint32_t cur_fanout)
+{
+ uint32_t cur_pack;
+
+ if (ctx->m && !ctx->incremental)
+ midx_fanout_add_midx_fanout(fanout, ctx->m, cur_fanout,
+ ctx->preferred_pack_idx);
+
+ for (cur_pack = start_pack; cur_pack < ctx->nr; cur_pack++) {
+ int preferred = cur_pack == ctx->preferred_pack_idx;
+ midx_fanout_add_pack_fanout(fanout, ctx->info, cur_pack,
+ preferred, cur_fanout);
+ }
+
+ if (ctx->preferred_pack_idx != NO_PREFERRED_PACK &&
+ ctx->preferred_pack_idx < start_pack)
+ midx_fanout_add_pack_fanout(fanout, ctx->info,
+ ctx->preferred_pack_idx, 1,
+ cur_fanout);
+}
+
/*
* It is possible to artificially get into a state where there are many
* duplicate copies of objects. That can create high memory pressure if
@@ -359,23 +383,7 @@ static void compute_sorted_entries(struct write_midx_context *ctx,
for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
fanout.nr = 0;
- if (ctx->m && !ctx->incremental)
- midx_fanout_add_midx_fanout(&fanout, ctx->m, cur_fanout,
- ctx->preferred_pack_idx);
-
- for (cur_pack = start_pack; cur_pack < ctx->nr; cur_pack++) {
- int preferred = cur_pack == ctx->preferred_pack_idx;
- midx_fanout_add_pack_fanout(&fanout,
- ctx->info, cur_pack,
- preferred, cur_fanout);
- }
-
- if (ctx->preferred_pack_idx != NO_PREFERRED_PACK &&
- ctx->preferred_pack_idx < start_pack)
- midx_fanout_add_pack_fanout(&fanout, ctx->info,
- ctx->preferred_pack_idx, 1,
- cur_fanout);
-
+ midx_fanout_add(&fanout, ctx, start_pack, cur_fanout);
midx_fanout_sort(&fanout);
/*
--
2.52.0.171.gd6a4e6b6955
^ permalink raw reply related [flat|nested] 39+ messages in thread* [PATCH 15/17] t/helper/test-read-midx.c: plug memory leak when selecting layer
2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
` (13 preceding siblings ...)
2025-12-06 20:31 ` [PATCH 14/17] midx-write.c: factor fanout layering from `compute_sorted_entries()` Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
2025-12-08 18:27 ` Patrick Steinhardt
2025-12-06 20:31 ` [PATCH 16/17] midx: implement MIDX compaction Taylor Blau
2025-12-06 20:31 ` [PATCH 17/17] midx: enable reachability bitmaps during " Taylor Blau
16 siblings, 1 reply; 39+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Though our 'read-midx' test tool is capable of printing information
about a single MIDX layer identified by its checksum, no caller in our
test suite exercises this path.
Unfortunately, there is a memory leak lurking in this (currently) unused
path that would otherwise be exposed by the following commit.
This occurs when providing a MIDX layer checksum other than the tip. As
we walk over the MIDX chain trying to find the matching layer, we drop
our reference to the top-most MIDX layer. Thus, our call to
'close_midx()' later on leaks memory between the top-most MIDX layer and
the MIDX layer immediately following the specified one.
Plug this leak by holding a reference to the tip of the MIDX chain, and
ensure that we call `close_midx()` before terminating the test tool.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
t/helper/test-read-midx.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index dee603b3cd0..6e03aabca79 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -26,9 +26,10 @@ static int read_midx_file(const char *object_dir, const char *checksum,
int show_objects)
{
uint32_t i;
- struct multi_pack_index *m;
+ struct multi_pack_index *m, *tip;
+ int ret = 0;
- m = setup_midx(object_dir);
+ m = tip = setup_midx(object_dir);
if (!m)
return 1;
@@ -36,8 +37,11 @@ static int read_midx_file(const char *object_dir, const char *checksum,
if (checksum) {
while (m && strcmp(get_midx_checksum(m), checksum))
m = m->base_midx;
- if (!m)
- return 1;
+ if (!m) {
+ ret = error(_("could not find MIDX with checksum %s"),
+ checksum);
+ goto out;
+ }
}
printf("header: %08x %d %d %d %d\n",
@@ -82,9 +86,10 @@ static int read_midx_file(const char *object_dir, const char *checksum,
}
}
- close_midx(m);
+out:
+ close_midx(tip);
- return 0;
+ return ret;
}
static int read_midx_checksum(const char *object_dir)
--
2.52.0.171.gd6a4e6b6955
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH 15/17] t/helper/test-read-midx.c: plug memory leak when selecting layer
2025-12-06 20:31 ` [PATCH 15/17] t/helper/test-read-midx.c: plug memory leak when selecting layer Taylor Blau
@ 2025-12-08 18:27 ` Patrick Steinhardt
2025-12-09 2:16 ` Taylor Blau
0 siblings, 1 reply; 39+ messages in thread
From: Patrick Steinhardt @ 2025-12-08 18:27 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Sat, Dec 06, 2025 at 03:31:43PM -0500, Taylor Blau wrote:
> diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
> index dee603b3cd0..6e03aabca79 100644
> --- a/t/helper/test-read-midx.c
> +++ b/t/helper/test-read-midx.c
> @@ -26,9 +26,10 @@ static int read_midx_file(const char *object_dir, const char *checksum,
> int show_objects)
> {
> uint32_t i;
> - struct multi_pack_index *m;
> + struct multi_pack_index *m, *tip;
> + int ret = 0;
>
> - m = setup_midx(object_dir);
> + m = tip = setup_midx(object_dir);
>
> if (!m)
> return 1;
I was briefly wondering whether we should also convert this into a `goto
out`. It's of course not required, as setting up the MIDX has just
failed. But it would simplfy things a bit as we now have a single exit
path, only.
I don't mind this too much though.
> @@ -36,8 +37,11 @@ static int read_midx_file(const char *object_dir, const char *checksum,
> if (checksum) {
> while (m && strcmp(get_midx_checksum(m), checksum))
> m = m->base_midx;
> - if (!m)
> - return 1;
> + if (!m) {
> + ret = error(_("could not find MIDX with checksum %s"),
> + checksum);
> + goto out;
> + }
> }
>
> printf("header: %08x %d %d %d %d\n",
We change the return code from 1 to -1, but that ultimately shouldn't
matter much.
I'll stop reviewing here and will have a look at the remaining two
patches with some fresh eyes. But so far this was a nice read, thanks!
Patrick
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH 15/17] t/helper/test-read-midx.c: plug memory leak when selecting layer
2025-12-08 18:27 ` Patrick Steinhardt
@ 2025-12-09 2:16 ` Taylor Blau
0 siblings, 0 replies; 39+ messages in thread
From: Taylor Blau @ 2025-12-09 2:16 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Mon, Dec 08, 2025 at 07:27:16PM +0100, Patrick Steinhardt wrote:
> > @@ -36,8 +37,11 @@ static int read_midx_file(const char *object_dir, const char *checksum,
> > if (checksum) {
> > while (m && strcmp(get_midx_checksum(m), checksum))
> > m = m->base_midx;
> > - if (!m)
> > - return 1;
> > + if (!m) {
> > + ret = error(_("could not find MIDX with checksum %s"),
> > + checksum);
> > + goto out;
> > + }
> > }
> >
> > printf("header: %08x %d %d %d %d\n",
>
> We change the return code from 1 to -1, but that ultimately shouldn't
> matter much.
Yeah; I think that returning negative values here makes more sense, and
use of error() encourages that pattern, hence the change here.
> I'll stop reviewing here and will have a look at the remaining two
> patches with some fresh eyes. But so far this was a nice read, thanks!
Thanks for the review thus far! I look forward to your thoughts on the
remainder of the series. In related news, I owe you some review on your
'pks/skip-noop-rewrite' patches, which I hope to get to tomorrow.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 16/17] midx: implement MIDX compaction
2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
` (14 preceding siblings ...)
2025-12-06 20:31 ` [PATCH 15/17] t/helper/test-read-midx.c: plug memory leak when selecting layer Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
2025-12-09 7:21 ` Patrick Steinhardt
2025-12-06 20:31 ` [PATCH 17/17] midx: enable reachability bitmaps during " Taylor Blau
16 siblings, 1 reply; 39+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
When managing a MIDX chain with many layers, it is convenient to combine
a sequence of adjacent layers into a single layer to prevent the chain
from growing too long.
While it is conceptually possible to "compact" a sequence of MIDX layers
together by running "git multi-pack-index write --stdin-packs", there
are a few drawbacks that make this less than desirable:
- Preserving the MIDX chain is impossible, since there is no way to
write a MIDX layer that contains objects or packs found in an earlier
MIDX layer already part of the chain. So callers would have to write
an entirely new (non-incremental) MIDX containing only the compacted
layers, discarding all other objects/packs from the MIDX.
- There is (currently) no way to write a MIDX layer outside of the MIDX
chain to work around the above, such that the MIDX chain could be
reassembled substituting the compacted layers with the MIDX that was
written.
- The `--stdin-packs` command-line option does not allow us to specify
the order of packs as they appear in the MIDX. Therefore, even if
there were workarounds for the previous two challenges, any bitmaps
belonging to layers which come after the compacted layer(s) would no
longer be valid.
This commit introduces a way to compact a sequence of adjacent MIDX
layers into a single layer while preserving the MIDX chain, as well as
any bitmap(s) in layers which are newer than the compacted ones.
Implementing MIDX compaction does not require a significant number of
changes to how MIDX layers are written. The main changes are as follows:
- Instead of calling `fill_packs_from_midx()`, we call a new function
`fill_packs_from_midx_range()`, which walks backwards along the
portion of the MIDX chain which we are compacting, and adds packs one
layer a time.
In order to preserve the pseudo-pack order, the concatenated pack
order is preserved, with the exception of preferred packs which are
always added first.
- After adding entries from the set of packs in the compaction range,
`compute_sorted_entries()` must adjust the `pack_int_id`'s for all
objects added in each fanout layer to match their original
`pack_int_id`'s (as opposed to the index at which each pack appears
in `ctx.info`).
- When writing out the new 'multi-pack-index-chain' file, discard any
layers in the compaction range, replacing them with the newly written
layer, instead of keeping them and placing the new layer at the end
of the chain.
This ends up being sufficient to implement MIDX compaction in such a way
that preserves bitmaps corresponding to more recent layers in the MIDX
chain.
The tests for MIDX compaction are so far fairly spartan, since the main
interesting behavior here is ensuring that the right packs/objects are
selected from each layer, and that the pack order is preserved despite
whether or not they are sorted in lexicographic order in the original
MIDX chain.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
Documentation/git-multi-pack-index.adoc | 13 ++
builtin/multi-pack-index.c | 67 +++++++
midx-write.c | 242 ++++++++++++++++++++++--
midx.h | 5 +
t/meson.build | 1 +
t/t5335-compact-multi-pack-index.sh | 102 ++++++++++
6 files changed, 411 insertions(+), 19 deletions(-)
create mode 100755 t/t5335-compact-multi-pack-index.sh
diff --git a/Documentation/git-multi-pack-index.adoc b/Documentation/git-multi-pack-index.adoc
index 164cf1f2291..a9664e77411 100644
--- a/Documentation/git-multi-pack-index.adoc
+++ b/Documentation/git-multi-pack-index.adoc
@@ -12,6 +12,8 @@ SYNOPSIS
'git multi-pack-index' [<options>] write [--preferred-pack=<pack>]
[--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]
[--refs-snapshot=<path>]
+'git multi-pack-index' [<options>] compact [--[no-]incremental]
+ <from> <to>
'git multi-pack-index' [<options>] verify
'git multi-pack-index' [<options>] expire
'git multi-pack-index' [<options>] repack [--batch-size=<size>]
@@ -83,6 +85,17 @@ marker).
necessary.
--
+compact::
+ Write a new MIDX layer containing only objects and packs present
+ in the range `<from>` to `<to>`, where both arguments are
+ checksums of existing layers in the MIDX chain.
++
+--
+ --incremental::
+ Write the result to a MIDX chain instead of writing a
+ stand-alone MIDX. Incompatible with `--bitmap`.
+--
+
verify::
Verify the contents of the MIDX file.
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index c0c6c1760c0..9b0c2082cb3 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -17,6 +17,10 @@
" [--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]\n" \
" [--refs-snapshot=<path>]")
+#define BUILTIN_MIDX_COMPACT_USAGE \
+ N_("git multi-pack-index [<options>] compact [--[no-]incremental]\n" \
+ " <from> <to>")
+
#define BUILTIN_MIDX_VERIFY_USAGE \
N_("git multi-pack-index [<options>] verify")
@@ -30,6 +34,10 @@ static char const * const builtin_multi_pack_index_write_usage[] = {
BUILTIN_MIDX_WRITE_USAGE,
NULL
};
+static char const * const builtin_multi_pack_index_compact_usage[] = {
+ BUILTIN_MIDX_COMPACT_USAGE,
+ NULL
+};
static char const * const builtin_multi_pack_index_verify_usage[] = {
BUILTIN_MIDX_VERIFY_USAGE,
NULL
@@ -44,6 +52,7 @@ static char const * const builtin_multi_pack_index_repack_usage[] = {
};
static char const * const builtin_multi_pack_index_usage[] = {
BUILTIN_MIDX_WRITE_USAGE,
+ BUILTIN_MIDX_COMPACT_USAGE,
BUILTIN_MIDX_VERIFY_USAGE,
BUILTIN_MIDX_EXPIRE_USAGE,
BUILTIN_MIDX_REPACK_USAGE,
@@ -195,6 +204,63 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
return ret;
}
+static int cmd_multi_pack_index_compact(int argc, const char **argv,
+ const char *prefix,
+ struct repository *repo)
+{
+ struct multi_pack_index *m, *cur;
+ struct multi_pack_index *from_midx = NULL;
+ struct multi_pack_index *to_midx = NULL;
+ struct odb_source *source;
+ int ret;
+
+ struct option *options;
+ static struct option builtin_multi_pack_index_compact_options[] = {
+ OPT_BIT(0, "incremental", &opts.flags,
+ N_("write a new incremental MIDX"), MIDX_WRITE_INCREMENTAL),
+ OPT_END(),
+ };
+
+ repo_config(repo, git_multi_pack_index_write_config, NULL);
+
+ options = add_common_options(builtin_multi_pack_index_compact_options);
+
+ trace2_cmd_mode(argv[0]);
+
+ if (isatty(2))
+ opts.flags |= MIDX_PROGRESS;
+ argc = parse_options(argc, argv, prefix,
+ options, builtin_multi_pack_index_compact_usage,
+ 0);
+
+ if (argc != 2)
+ usage_with_options(builtin_multi_pack_index_compact_usage,
+ options);
+ source = handle_object_dir_option(the_repository);
+
+ FREE_AND_NULL(options);
+
+ m = get_multi_pack_index(source);
+
+ for (cur = m; cur && !(from_midx && to_midx); cur = cur->base_midx) {
+ const char *midx_csum = get_midx_checksum(cur);
+
+ if (!from_midx && !strcmp(midx_csum, argv[0]))
+ from_midx = cur;
+ if (!to_midx && !strcmp(midx_csum, argv[1]))
+ to_midx = cur;
+ }
+
+ if (!from_midx)
+ die(_("could not find MIDX 'from': %s"), argv[0]);
+ if (!to_midx)
+ die(_("could not find MIDX 'to': %s"), argv[1]);
+
+ ret = write_midx_file_compact(source, from_midx, to_midx, opts.flags);
+
+ return ret;
+}
+
static int cmd_multi_pack_index_verify(int argc, const char **argv,
const char *prefix,
struct repository *repo UNUSED)
@@ -295,6 +361,7 @@ int cmd_multi_pack_index(int argc,
struct option builtin_multi_pack_index_options[] = {
OPT_SUBCOMMAND("repack", &fn, cmd_multi_pack_index_repack),
OPT_SUBCOMMAND("write", &fn, cmd_multi_pack_index_write),
+ OPT_SUBCOMMAND("compact", &fn, cmd_multi_pack_index_compact),
OPT_SUBCOMMAND("verify", &fn, cmd_multi_pack_index_verify),
OPT_SUBCOMMAND("expire", &fn, cmd_multi_pack_index_expire),
OPT_END(),
diff --git a/midx-write.c b/midx-write.c
index 7854561359d..fcbfedcd913 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -108,6 +108,10 @@ struct write_midx_context {
int incremental;
uint32_t num_multi_pack_indexes_before;
+ struct multi_pack_index *compact_from;
+ struct multi_pack_index *compact_to;
+ int compact;
+
struct string_list *to_include;
struct repository *repo;
@@ -117,6 +121,8 @@ struct write_midx_context {
static uint32_t midx_pack_perm(struct write_midx_context *ctx,
uint32_t orig_pack_int_id)
{
+ if (ctx->compact)
+ orig_pack_int_id -= ctx->compact_from->num_packs_in_base;
return ctx->pack_perm[orig_pack_int_id];
}
@@ -347,6 +353,21 @@ static void midx_fanout_add(struct midx_fanout *fanout,
cur_fanout);
}
+static void midx_fanout_add_compact(struct midx_fanout *fanout,
+ struct write_midx_context *ctx,
+ uint32_t cur_fanout)
+{
+ struct multi_pack_index *m = ctx->compact_to;
+
+ ASSERT(ctx->compact);
+
+ while (m && m != ctx->compact_from->base_midx) {
+ midx_fanout_add_midx_fanout(fanout, m, cur_fanout,
+ NO_PREFERRED_PACK);
+ m = m->base_midx;
+ }
+}
+
/*
* It is possible to artificially get into a state where there are many
* duplicate copies of objects. That can create high memory pressure if
@@ -365,6 +386,9 @@ static void compute_sorted_entries(struct write_midx_context *ctx,
size_t alloc_objects, total_objects = 0;
struct midx_fanout fanout = { 0 };
+ if (ctx->compact)
+ ASSERT(!start_pack);
+
for (cur_pack = start_pack; cur_pack < ctx->nr; cur_pack++)
total_objects = st_add(total_objects,
ctx->info[cur_pack].p->num_objects);
@@ -383,7 +407,10 @@ static void compute_sorted_entries(struct write_midx_context *ctx,
for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
fanout.nr = 0;
- midx_fanout_add(&fanout, ctx, start_pack, cur_fanout);
+ if (ctx->compact)
+ midx_fanout_add_compact(&fanout, ctx, cur_fanout);
+ else
+ midx_fanout_add(&fanout, ctx, start_pack, cur_fanout);
midx_fanout_sort(&fanout);
/*
@@ -953,6 +980,72 @@ static int fill_packs_from_midx(struct write_midx_context *ctx)
return 0;
}
+static uint32_t compactible_packs_between(const struct multi_pack_index *from,
+ const struct multi_pack_index *to)
+{
+ uint32_t nr;
+
+ ASSERT(from && to);
+
+ nr = u32_add(to->num_packs, to->num_packs_in_base);
+ if (nr < from->num_packs_in_base)
+ BUG("unexpected number of packs in base during compaction: "
+ "%"PRIu32" < %"PRIu32, nr, from->num_packs_in_base);
+
+ return nr - from->num_packs_in_base;
+}
+
+static int fill_packs_from_midx_range(struct write_midx_context *ctx,
+ int bitmap_order)
+{
+ struct multi_pack_index *m = ctx->compact_to;
+ uint32_t packs_nr;
+
+ ASSERT(ctx->compact && !ctx->nr);
+ ASSERT(ctx->compact_from);
+ ASSERT(ctx->compact_to);
+
+ packs_nr = compactible_packs_between(ctx->compact_from,
+ ctx->compact_to);
+
+ ALLOC_GROW(ctx->info, packs_nr, ctx->alloc);
+
+ while (m != ctx->compact_from->base_midx) {
+ uint32_t pack_int_id, preferred_pack_id;
+ uint32_t i;
+
+ if (bitmap_order) {
+ if (midx_preferred_pack(m, &preferred_pack_id) < 0)
+ die(_("could not determine preferred pack"));
+ } else {
+ preferred_pack_id = m->num_packs_in_base;
+ }
+
+ pack_int_id = m->num_packs_in_base - ctx->compact_from->num_packs_in_base;
+
+ if (fill_pack_from_midx(&ctx->info[pack_int_id++], m,
+ preferred_pack_id) < 0)
+ return -1;
+
+ for (i = m->num_packs_in_base;
+ i < m->num_packs_in_base + m->num_packs; i++) {
+ if (preferred_pack_id == i)
+ continue;
+
+ if (fill_pack_from_midx(&ctx->info[pack_int_id++], m,
+ i) < 0)
+ return -1;
+ }
+
+ ctx->nr += m->num_packs;
+ m = m->base_midx;
+ }
+
+ ASSERT(ctx->nr == packs_nr);
+
+ return 0;
+}
+
static struct {
const char *non_split;
const char *split;
@@ -1038,12 +1131,22 @@ static void clear_midx_files(struct odb_source *source,
strbuf_release(&buf);
}
+static int midx_hashcmp(const struct multi_pack_index *a,
+ const struct multi_pack_index *b,
+ const struct git_hash_algo *algop)
+{
+ return hashcmp(get_midx_hash(a), get_midx_hash(b), algop);
+}
+
struct write_midx_opts {
struct odb_source *source;
struct string_list *packs_to_include;
struct string_list *packs_to_drop;
+ struct multi_pack_index *compact_from;
+ struct multi_pack_index *compact_to;
+
const char *preferred_pack_name;
const char *refs_snapshot;
unsigned flags;
@@ -1066,6 +1169,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
int dropped_packs = 0;
int result = -1;
const char **keep_hashes = NULL;
+ size_t keep_hashes_nr = 0;
struct chunkfile *cf;
trace2_region_enter("midx", "write_midx_internal", r);
@@ -1074,6 +1178,17 @@ static int write_midx_internal(struct write_midx_opts *opts)
ctx.source = opts->source;
ctx.incremental = !!(opts->flags & MIDX_WRITE_INCREMENTAL);
+ ctx.compact = !!(opts->flags & MIDX_WRITE_COMPACT);
+
+ if (ctx.compact) {
+ if (!opts->compact_from)
+ BUG("expected non-NULL 'from' MIDX during compaction");
+ if (!opts->compact_to)
+ BUG("expected non-NULL 'to' MIDX during compaction");
+
+ ctx.compact_from = opts->compact_from;
+ ctx.compact_to = opts->compact_to;
+ }
if (ctx.incremental)
strbuf_addf(&midx_name,
@@ -1101,11 +1216,18 @@ static int write_midx_internal(struct write_midx_opts *opts)
*/
if (ctx.incremental)
ctx.base_midx = m;
- else if (!opts->packs_to_include)
+ if (!opts->packs_to_include)
ctx.m = m;
}
}
+ /*
+ * If compacting MIDX layer(s) in the range [from, to], then the
+ * compacted MIDX will share the same base MIDX as 'from'.
+ */
+ if (ctx.compact)
+ ctx.base_midx = ctx.compact_from->base_midx;
+
ctx.nr = 0;
ctx.alloc = ctx.m ? ctx.m->num_packs + ctx.m->num_packs_in_base : 16;
ctx.info = NULL;
@@ -1122,7 +1244,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
ctx.num_multi_pack_indexes_before++;
m = m->base_midx;
}
- } else if (ctx.m && fill_packs_from_midx(&ctx)) {
+ } else if (ctx.m && !ctx.compact && fill_packs_from_midx(&ctx)) {
goto cleanup;
}
@@ -1135,13 +1257,23 @@ static int write_midx_internal(struct write_midx_opts *opts)
else
ctx.progress = NULL;
- ctx.to_include = opts->packs_to_include;
+ if (ctx.compact) {
+ int bitmap_order = 0;
+ if (opts->preferred_pack_name)
+ bitmap_order |= 1;
+ else if (opts->flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))
+ bitmap_order |= 1;
- for_each_file_in_pack_dir(opts->source->path, add_pack_to_midx, &ctx);
+ fill_packs_from_midx_range(&ctx, bitmap_order);
+ } else {
+ ctx.to_include = opts->packs_to_include;
+ for_each_file_in_pack_dir(opts->source->path, add_pack_to_midx, &ctx);
+ }
stop_progress(&ctx.progress);
if ((ctx.m && ctx.nr == ctx.m->num_packs + ctx.m->num_packs_in_base) &&
!ctx.incremental &&
+ !ctx.compact &&
!(opts->packs_to_include || opts->packs_to_drop)) {
struct bitmap_index *bitmap_git;
int bitmap_exists;
@@ -1255,12 +1387,15 @@ static int write_midx_internal(struct write_midx_opts *opts)
ctx.large_offsets_needed = 1;
}
- QSORT(ctx.info, ctx.nr, pack_info_compare);
+ if (!ctx.compact)
+ QSORT(ctx.info, ctx.nr, pack_info_compare);
if (opts->packs_to_drop && opts->packs_to_drop->nr) {
size_t drop_index = 0;
int missing_drops = 0;
+ ASSERT(!ctx.compact);
+
for (size_t i = 0;
i < ctx.nr && drop_index < opts->packs_to_drop->nr; i++) {
int cmp = strcmp(ctx.info[i].pack_name,
@@ -1292,12 +1427,20 @@ static int write_midx_internal(struct write_midx_opts *opts)
*/
ALLOC_ARRAY(ctx.pack_perm, ctx.nr);
for (size_t i = 0; i < ctx.nr; i++) {
+ uint32_t from = ctx.info[i].orig_pack_int_id;
+ uint32_t to;
+
if (ctx.info[i].expired) {
+ to = PACK_EXPIRED;
dropped_packs++;
- ctx.pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED;
} else {
- ctx.pack_perm[ctx.info[i].orig_pack_int_id] = i - dropped_packs;
+ to = i - dropped_packs;
}
+
+ if (ctx.compact)
+ from -= ctx.compact_from->num_packs_in_base;
+
+ ctx.pack_perm[from] = to;
}
for (size_t i = 0; i < ctx.nr; i++) {
@@ -1445,7 +1588,24 @@ static int write_midx_internal(struct write_midx_opts *opts)
if (ctx.num_multi_pack_indexes_before == UINT32_MAX)
die(_("too many multi-pack-indexes"));
- CALLOC_ARRAY(keep_hashes, ctx.num_multi_pack_indexes_before + 1);
+ if (ctx.compact) {
+ struct multi_pack_index *m;
+
+ /*
+ * Keep all MIDX layers excluding those in the range [from, to].
+ */
+ for (m = ctx.base_midx; m; m = m->base_midx)
+ keep_hashes_nr++;
+ for (m = ctx.m;
+ m && midx_hashcmp(m, ctx.compact_to, r->hash_algo);
+ m = m->base_midx)
+ keep_hashes_nr++;
+
+ keep_hashes_nr++; /* include the compacted layer */
+ } else {
+ keep_hashes_nr = ctx.num_multi_pack_indexes_before + 1;
+ }
+ CALLOC_ARRAY(keep_hashes, keep_hashes_nr);
if (ctx.incremental) {
FILE *chainf = fdopen_lock_file(&lk, "w");
@@ -1470,17 +1630,47 @@ static int write_midx_internal(struct write_midx_opts *opts)
strbuf_release(&final_midx_name);
- keep_hashes[ctx.num_multi_pack_indexes_before] =
- xstrdup(hash_to_hex_algop(midx_hash, r->hash_algo));
+ if (ctx.compact) {
+ struct multi_pack_index *m;
+ uint32_t num_layers_before_from = 0;
+ uint32_t i;
- for (uint32_t i = 0; i < ctx.num_multi_pack_indexes_before; i++) {
- uint32_t j = ctx.num_multi_pack_indexes_before - i - 1;
+ for (m = ctx.base_midx; m; m = m->base_midx)
+ num_layers_before_from++;
- keep_hashes[j] = xstrdup(get_midx_checksum(m));
- m = m->base_midx;
+ m = ctx.base_midx;
+ for (i = 0; i < num_layers_before_from; i++) {
+ uint32_t j = num_layers_before_from - i - 1;
+
+ keep_hashes[j] = xstrdup(get_midx_checksum(m));
+ m = m->base_midx;
+ }
+
+ keep_hashes[i] = xstrdup(hash_to_hex_algop(midx_hash,
+ r->hash_algo));
+
+ i = 0;
+ for (m = ctx.m;
+ m && midx_hashcmp(m, ctx.compact_to, r->hash_algo);
+ m = m->base_midx) {
+ keep_hashes[keep_hashes_nr - i - 1] =
+ xstrdup(get_midx_checksum(m));
+ i++;
+ }
+ } else {
+ keep_hashes[ctx.num_multi_pack_indexes_before] =
+ xstrdup(hash_to_hex_algop(midx_hash,
+ r->hash_algo));
+
+ for (uint32_t i = 0; i < ctx.num_multi_pack_indexes_before; i++) {
+ uint32_t j = ctx.num_multi_pack_indexes_before - i - 1;
+
+ keep_hashes[j] = xstrdup(get_midx_checksum(m));
+ m = m->base_midx;
+ }
}
- for (uint32_t i = 0; i <= ctx.num_multi_pack_indexes_before; i++)
+ for (uint32_t i = 0; i < keep_hashes_nr; i++)
fprintf(get_lock_file_fp(&lk), "%s\n", keep_hashes[i]);
} else {
keep_hashes[ctx.num_multi_pack_indexes_before] =
@@ -1493,8 +1683,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
if (commit_lock_file(&lk) < 0)
die_errno(_("could not write multi-pack-index"));
- clear_midx_files(opts->source, keep_hashes,
- ctx.num_multi_pack_indexes_before + 1,
+ clear_midx_files(opts->source, keep_hashes, keep_hashes_nr,
ctx.incremental);
result = 0;
@@ -1512,7 +1701,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
free(ctx.pack_perm);
free(ctx.pack_order);
if (keep_hashes) {
- for (uint32_t i = 0; i <= ctx.num_multi_pack_indexes_before; i++)
+ for (uint32_t i = 0; i < keep_hashes_nr; i++)
free((char *)keep_hashes[i]);
free(keep_hashes);
}
@@ -1553,6 +1742,21 @@ int write_midx_file_only(struct odb_source *source,
return write_midx_internal(&opts);
}
+int write_midx_file_compact(struct odb_source *source,
+ struct multi_pack_index *from,
+ struct multi_pack_index *to,
+ unsigned flags)
+{
+ struct write_midx_opts opts = {
+ .source = source,
+ .compact_from = from,
+ .compact_to = to,
+ .flags = flags | MIDX_WRITE_COMPACT,
+ };
+
+ return write_midx_internal(&opts);
+}
+
int expire_midx_packs(struct odb_source *source, unsigned flags)
{
uint32_t i, *count, result = 0;
diff --git a/midx.h b/midx.h
index 39bf04b18e5..61f9809b8c9 100644
--- a/midx.h
+++ b/midx.h
@@ -81,6 +81,7 @@ struct multi_pack_index {
#define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3)
#define MIDX_WRITE_BITMAP_LOOKUP_TABLE (1 << 4)
#define MIDX_WRITE_INCREMENTAL (1 << 5)
+#define MIDX_WRITE_COMPACT (1 << 6)
#define MIDX_EXT_REV "rev"
#define MIDX_EXT_BITMAP "bitmap"
@@ -130,6 +131,10 @@ int write_midx_file_only(struct odb_source *source,
struct string_list *packs_to_include,
const char *preferred_pack_name,
const char *refs_snapshot, unsigned flags);
+int write_midx_file_compact(struct odb_source *source,
+ struct multi_pack_index *from,
+ struct multi_pack_index *to,
+ unsigned flags);
void clear_midx_file(struct repository *r);
int verify_midx_file(struct odb_source *source, unsigned flags);
int expire_midx_packs(struct odb_source *source, unsigned flags);
diff --git a/t/meson.build b/t/meson.build
index 7c994d4643e..2d1926faaf2 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -613,6 +613,7 @@ integration_tests = [
't5332-multi-pack-reuse.sh',
't5333-pseudo-merge-bitmaps.sh',
't5334-incremental-multi-pack-index.sh',
+ 't5335-compact-multi-pack-index.sh',
't5351-unpack-large-objects.sh',
't5400-send-pack.sh',
't5401-update-hooks.sh',
diff --git a/t/t5335-compact-multi-pack-index.sh b/t/t5335-compact-multi-pack-index.sh
new file mode 100755
index 00000000000..f889af7fb1d
--- /dev/null
+++ b/t/t5335-compact-multi-pack-index.sh
@@ -0,0 +1,102 @@
+#!/bin/sh
+
+test_description='multi-pack-index compaction'
+
+. ./test-lib.sh
+
+GIT_TEST_MULTI_PACK_INDEX=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=0
+
+objdir=.git/objects
+packdir=$objdir/pack
+midxdir=$packdir/multi-pack-index.d
+midx_chain=$midxdir/multi-pack-index-chain
+
+nth_line() {
+ local n="$1"
+ shift
+ awk "NR==$n" "$@"
+}
+
+write_packs () {
+ for c in "$@"
+ do
+ test_commit "$c" &&
+
+ git pack-objects --all --unpacked $packdir/pack-$c &&
+ git prune-packed &&
+
+ git multi-pack-index write --incremental --bitmap || return 1
+ done
+}
+
+test_midx_layer_packs () {
+ local checksum="$1" &&
+ shift &&
+
+ test-tool read-midx $objdir "$checksum" >out &&
+
+ printf "%s\n" "$@" >expect &&
+ # NOTE: do *not* pipe through sort here, we want to ensure the
+ # order of packs is preserved during compaction.
+ grep "^pack-" out | cut -d"-" -f2 >actual &&
+
+ test_cmp expect actual
+}
+
+test_midx_layer_object_uniqueness () {
+ : >objs.all
+ while read layer
+ do
+ test-tool read-midx --show-objects $objdir "$layer" >out &&
+ grep "\.pack$" out | cut -d" " -f1 | sort >objs.layer &&
+ test_stdout_line_count = 0 comm -12 objs.all objs.layer &&
+ cat objs.all objs.layer | sort >objs.tmp &&
+ mv objs.tmp objs.all || return 1
+ done <$midx_chain
+}
+
+test_expect_success 'MIDX compaction with lex-ordered pack names' '
+ git init midx-compact-lex-order &&
+ (
+ cd midx-compact-lex-order &&
+
+ write_packs A B C D E &&
+ test_line_count = 5 $midx_chain &&
+
+ git multi-pack-index compact --incremental \
+ "$(nth_line 2 "$midx_chain")" \
+ "$(nth_line 4 "$midx_chain")" &&
+ test_line_count = 3 $midx_chain &&
+
+ test_midx_layer_packs "$(nth_line 1 "$midx_chain")" A &&
+ test_midx_layer_packs "$(nth_line 2 "$midx_chain")" B C D &&
+ test_midx_layer_packs "$(nth_line 3 "$midx_chain")" E &&
+
+ test_midx_layer_object_uniqueness
+ )
+'
+
+test_expect_success 'MIDX compaction with non-lex-ordered pack names' '
+ git init midx-compact-non-lex-order &&
+ (
+ cd midx-compact-non-lex-order &&
+
+ write_packs D C A B E &&
+ test_line_count = 5 $midx_chain &&
+
+ git multi-pack-index compact --incremental \
+ "$(nth_line 2 "$midx_chain")" \
+ "$(nth_line 4 "$midx_chain")" &&
+ test_line_count = 3 $midx_chain &&
+
+ test_midx_layer_packs "$(nth_line 1 "$midx_chain")" D &&
+ test_midx_layer_packs "$(nth_line 2 "$midx_chain")" C A B &&
+ test_midx_layer_packs "$(nth_line 3 "$midx_chain")" E &&
+
+ test_midx_layer_object_uniqueness
+ )
+'
+
+test_done
--
2.52.0.171.gd6a4e6b6955
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH 16/17] midx: implement MIDX compaction
2025-12-06 20:31 ` [PATCH 16/17] midx: implement MIDX compaction Taylor Blau
@ 2025-12-09 7:21 ` Patrick Steinhardt
0 siblings, 0 replies; 39+ messages in thread
From: Patrick Steinhardt @ 2025-12-09 7:21 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Sat, Dec 06, 2025 at 03:31:47PM -0500, Taylor Blau wrote:
> diff --git a/Documentation/git-multi-pack-index.adoc b/Documentation/git-multi-pack-index.adoc
> index 164cf1f2291..a9664e77411 100644
> --- a/Documentation/git-multi-pack-index.adoc
> +++ b/Documentation/git-multi-pack-index.adoc
> @@ -12,6 +12,8 @@ SYNOPSIS
> 'git multi-pack-index' [<options>] write [--preferred-pack=<pack>]
> [--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]
> [--refs-snapshot=<path>]
> +'git multi-pack-index' [<options>] compact [--[no-]incremental]
> + <from> <to>
> 'git multi-pack-index' [<options>] verify
> 'git multi-pack-index' [<options>] expire
> 'git multi-pack-index' [<options>] repack [--batch-size=<size>]
> @@ -83,6 +85,17 @@ marker).
> necessary.
> --
>
> +compact::
> + Write a new MIDX layer containing only objects and packs present
> + in the range `<from>` to `<to>`, where both arguments are
> + checksums of existing layers in the MIDX chain.
> ++
> +--
> + --incremental::
> + Write the result to a MIDX chain instead of writing a
> + stand-alone MIDX. Incompatible with `--bitmap`.
Interesting. What would happen if you compact a subrange of the MIDX
chain without incremental? Would the MIDX be completely replaced with a
MIDX that only covers these packs?
Also, the "--bitmap" flag does not exist yet, so the second sentence
probably needs to be introduced in the next commit.
> diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
> index c0c6c1760c0..9b0c2082cb3 100644
> --- a/builtin/multi-pack-index.c
> +++ b/builtin/multi-pack-index.c
> @@ -195,6 +204,63 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
> return ret;
> }
>
> +static int cmd_multi_pack_index_compact(int argc, const char **argv,
> + const char *prefix,
> + struct repository *repo)
> +{
> + struct multi_pack_index *m, *cur;
> + struct multi_pack_index *from_midx = NULL;
> + struct multi_pack_index *to_midx = NULL;
> + struct odb_source *source;
> + int ret;
> +
> + struct option *options;
> + static struct option builtin_multi_pack_index_compact_options[] = {
> + OPT_BIT(0, "incremental", &opts.flags,
> + N_("write a new incremental MIDX"), MIDX_WRITE_INCREMENTAL),
> + OPT_END(),
> + };
> +
> + repo_config(repo, git_multi_pack_index_write_config, NULL);
> +
> + options = add_common_options(builtin_multi_pack_index_compact_options);
> +
> + trace2_cmd_mode(argv[0]);
> +
> + if (isatty(2))
> + opts.flags |= MIDX_PROGRESS;
> + argc = parse_options(argc, argv, prefix,
> + options, builtin_multi_pack_index_compact_usage,
> + 0);
> +
> + if (argc != 2)
> + usage_with_options(builtin_multi_pack_index_compact_usage,
> + options);
> + source = handle_object_dir_option(the_repository);
> +
> + FREE_AND_NULL(options);
> +
> + m = get_multi_pack_index(source);
> +
> + for (cur = m; cur && !(from_midx && to_midx); cur = cur->base_midx) {
> + const char *midx_csum = get_midx_checksum(cur);
> +
> + if (!from_midx && !strcmp(midx_csum, argv[0]))
> + from_midx = cur;
> + if (!to_midx && !strcmp(midx_csum, argv[1]))
> + to_midx = cur;
> + }
> +
> + if (!from_midx)
> + die(_("could not find MIDX 'from': %s"), argv[0]);
> + if (!to_midx)
> + die(_("could not find MIDX 'to': %s"), argv[1]);
> +
> + ret = write_midx_file_compact(source, from_midx, to_midx, opts.flags);
> +
> + return ret;
> +}
Is it valid if `from_midx == to_midx`?
> diff --git a/midx-write.c b/midx-write.c
> index 7854561359d..fcbfedcd913 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -953,6 +980,72 @@ static int fill_packs_from_midx(struct write_midx_context *ctx)
> return 0;
> }
>
> +static uint32_t compactible_packs_between(const struct multi_pack_index *from,
> + const struct multi_pack_index *to)
> +{
> + uint32_t nr;
> +
> + ASSERT(from && to);
> +
> + nr = u32_add(to->num_packs, to->num_packs_in_base);
> + if (nr < from->num_packs_in_base)
> + BUG("unexpected number of packs in base during compaction: "
> + "%"PRIu32" < %"PRIu32, nr, from->num_packs_in_base);
> +
> + return nr - from->num_packs_in_base;
> +}
> +
> +static int fill_packs_from_midx_range(struct write_midx_context *ctx,
> + int bitmap_order)
> +{
> + struct multi_pack_index *m = ctx->compact_to;
> + uint32_t packs_nr;
> +
> + ASSERT(ctx->compact && !ctx->nr);
> + ASSERT(ctx->compact_from);
> + ASSERT(ctx->compact_to);
> +
> + packs_nr = compactible_packs_between(ctx->compact_from,
> + ctx->compact_to);
> +
> + ALLOC_GROW(ctx->info, packs_nr, ctx->alloc);
> +
> + while (m != ctx->compact_from->base_midx) {
> + uint32_t pack_int_id, preferred_pack_id;
> + uint32_t i;
> +
> + if (bitmap_order) {
> + if (midx_preferred_pack(m, &preferred_pack_id) < 0)
> + die(_("could not determine preferred pack"));
`midx_preferred_pack()` only returns a valid pack ID in case we've got a
reverse index, and as far as I understand we seem to only generate those
when computing bitmaps. I assume that this means that we can only
compact MIDX layers in bitmap order if they already were in bitmap order
before?
That would at least also make sense. We of course cannot randomly change
the order in the middle of our layers, as that would break later layers
that build on top.
> + } else {
> + preferred_pack_id = m->num_packs_in_base;
> + }
> +
> + pack_int_id = m->num_packs_in_base - ctx->compact_from->num_packs_in_base;
> +
> + if (fill_pack_from_midx(&ctx->info[pack_int_id++], m,
> + preferred_pack_id) < 0)
> + return -1;
> +
> + for (i = m->num_packs_in_base;
> + i < m->num_packs_in_base + m->num_packs; i++) {
> + if (preferred_pack_id == i)
> + continue;
> +
> + if (fill_pack_from_midx(&ctx->info[pack_int_id++], m,
> + i) < 0)
> + return -1;
> + }
> +
So the condition that should hold after this loop is `pack_int_id ==
m->num_packs`. Which is somewhat obvious: we skip one pack, but that
pack is the preferred pack that we have populated first.
> @@ -1101,11 +1216,18 @@ static int write_midx_internal(struct write_midx_opts *opts)
> */
> if (ctx.incremental)
> ctx.base_midx = m;
> - else if (!opts->packs_to_include)
> + if (!opts->packs_to_include)
> ctx.m = m;
I'm a bit surprised by this change here. I would've expected that we
never pass `packs_to_include` when compacting, so why is this change
necessary?
> }
> }
>
> + /*
> + * If compacting MIDX layer(s) in the range [from, to], then the
> + * compacted MIDX will share the same base MIDX as 'from'.
> + */
> + if (ctx.compact)
> + ctx.base_midx = ctx.compact_from->base_midx;
Okay, here we overwrite `ctx.base_midx` that we might've already set in
the condition above. It's a bit confusing, doubly so because we may be
warning about the invalid MIDX and claim to ignore it, but ultimately we
don't.
> diff --git a/t/t5335-compact-multi-pack-index.sh b/t/t5335-compact-multi-pack-index.sh
> new file mode 100755
> index 00000000000..f889af7fb1d
> --- /dev/null
> +++ b/t/t5335-compact-multi-pack-index.sh
> @@ -0,0 +1,102 @@
> +#!/bin/sh
> +
> +test_description='multi-pack-index compaction'
> +
> +. ./test-lib.sh
> +
> +GIT_TEST_MULTI_PACK_INDEX=0
> +GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
> +GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=0
> +
> +objdir=.git/objects
> +packdir=$objdir/pack
> +midxdir=$packdir/multi-pack-index.d
> +midx_chain=$midxdir/multi-pack-index-chain
> +
> +nth_line() {
> + local n="$1"
> + shift
> + awk "NR==$n" "$@"
> +}
> +
> +write_packs () {
> + for c in "$@"
> + do
> + test_commit "$c" &&
Nit: it might be sensible to disable housekeeping here. You strongly
depend on the on-disk shape of the objects, so if you by chance wrote
two objects starting with "17" we'd end up repacking and racing.
I've also got an upcoming patch series in mindthat I've got cooking to
make geometric compaction the default for auto-maintenance. We've got
many test suites that implicitly rely on the current algorithm used by
git-gc(1), so I'd love to avoid adding more.
[snip]
> +test_expect_success 'MIDX compaction with lex-ordered pack names' '
> + git init midx-compact-lex-order &&
> + (
> + cd midx-compact-lex-order &&
> +
> + write_packs A B C D E &&
> + test_line_count = 5 $midx_chain &&
> +
> + git multi-pack-index compact --incremental \
> + "$(nth_line 2 "$midx_chain")" \
> + "$(nth_line 4 "$midx_chain")" &&
> + test_line_count = 3 $midx_chain &&
> +
> + test_midx_layer_packs "$(nth_line 1 "$midx_chain")" A &&
> + test_midx_layer_packs "$(nth_line 2 "$midx_chain")" B C D &&
> + test_midx_layer_packs "$(nth_line 3 "$midx_chain")" E &&
> +
> + test_midx_layer_object_uniqueness
> + )
> +'
It would be nice to also test for requests that don't make sense: "from"
larger than "to", "from == to", missing "from" or "foo" and so on.
Patrick
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 17/17] midx: enable reachability bitmaps during MIDX compaction
2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
` (15 preceding siblings ...)
2025-12-06 20:31 ` [PATCH 16/17] midx: implement MIDX compaction Taylor Blau
@ 2025-12-06 20:31 ` Taylor Blau
2025-12-09 7:21 ` Patrick Steinhardt
16 siblings, 1 reply; 39+ messages in thread
From: Taylor Blau @ 2025-12-06 20:31 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, Patrick Steinhardt
Enable callers to generate reachability bitmaps when performing MIDX
layer compaction by combining all existing bitmaps from the compacted
layers.
Note that the because of the object/pack ordering described by the
previous commit, the pseudo-pack order for the compacted MIDX is the
same as concatenating the individual pseudo-pack orderings for each
layer in the compaction range.
As a result, the only non-test or documentation change necessary is to
treat all objects as non-preferred during compaction so as not to
disturb the object ordering.
In the future, we may want to adjust which commit(s) receive
reachability bitmaps when compacting multiple .bitmap files into one, or
even generate new bitmaps (e.g., if the references have moved
significantly since the .bitmap was generated). This commit only
implements combining all existing bitmaps in range together in order to
demonstrate and lay the groundwork for more exotic strategies.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
Documentation/git-multi-pack-index.adoc | 2 +-
builtin/multi-pack-index.c | 4 +-
midx-write.c | 2 +-
t/t5335-compact-multi-pack-index.sh | 120 +++++++++++++++++++++++-
4 files changed, 123 insertions(+), 5 deletions(-)
diff --git a/Documentation/git-multi-pack-index.adoc b/Documentation/git-multi-pack-index.adoc
index a9664e77411..458bb873633 100644
--- a/Documentation/git-multi-pack-index.adoc
+++ b/Documentation/git-multi-pack-index.adoc
@@ -13,7 +13,7 @@ SYNOPSIS
[--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]
[--refs-snapshot=<path>]
'git multi-pack-index' [<options>] compact [--[no-]incremental]
- <from> <to>
+ [--[no-]bitmap] <from> <to>
'git multi-pack-index' [<options>] verify
'git multi-pack-index' [<options>] expire
'git multi-pack-index' [<options>] repack [--batch-size=<size>]
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 9b0c2082cb3..40afa8f1ed8 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -19,7 +19,7 @@
#define BUILTIN_MIDX_COMPACT_USAGE \
N_("git multi-pack-index [<options>] compact [--[no-]incremental]\n" \
- " <from> <to>")
+ " [--[no-]bitmap] <from> <to>")
#define BUILTIN_MIDX_VERIFY_USAGE \
N_("git multi-pack-index [<options>] verify")
@@ -216,6 +216,8 @@ static int cmd_multi_pack_index_compact(int argc, const char **argv,
struct option *options;
static struct option builtin_multi_pack_index_compact_options[] = {
+ OPT_BIT(0, "bitmap", &opts.flags, N_("write multi-pack bitmap"),
+ MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
OPT_BIT(0, "incremental", &opts.flags,
N_("write a new incremental MIDX"), MIDX_WRITE_INCREMENTAL),
OPT_END(),
diff --git a/midx-write.c b/midx-write.c
index fcbfedcd913..f2dbacef4cd 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -657,7 +657,7 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx)
struct pack_midx_entry *e = &ctx->entries[i];
data[i].nr = i;
data[i].pack = midx_pack_perm(ctx, e->pack_int_id);
- if (!e->preferred)
+ if (!e->preferred || ctx->compact)
data[i].pack |= (1U << 31);
data[i].offset = e->offset;
}
diff --git a/t/t5335-compact-multi-pack-index.sh b/t/t5335-compact-multi-pack-index.sh
index f889af7fb1d..a306f504305 100755
--- a/t/t5335-compact-multi-pack-index.sh
+++ b/t/t5335-compact-multi-pack-index.sh
@@ -65,7 +65,7 @@ test_expect_success 'MIDX compaction with lex-ordered pack names' '
write_packs A B C D E &&
test_line_count = 5 $midx_chain &&
- git multi-pack-index compact --incremental \
+ git multi-pack-index compact --incremental --bitmap \
"$(nth_line 2 "$midx_chain")" \
"$(nth_line 4 "$midx_chain")" &&
test_line_count = 3 $midx_chain &&
@@ -86,7 +86,7 @@ test_expect_success 'MIDX compaction with non-lex-ordered pack names' '
write_packs D C A B E &&
test_line_count = 5 $midx_chain &&
- git multi-pack-index compact --incremental \
+ git multi-pack-index compact --incremental --bitmap \
"$(nth_line 2 "$midx_chain")" \
"$(nth_line 4 "$midx_chain")" &&
test_line_count = 3 $midx_chain &&
@@ -99,4 +99,120 @@ test_expect_success 'MIDX compaction with non-lex-ordered pack names' '
)
'
+midx_objs_by_pack () {
+ awk '/\.pack$/ { split($3, a, "-"); print a[2], $1 }' | sort
+}
+
+tag_objs_from_pack () {
+ objs="$(git rev-list --objects --no-object-names "$2")" &&
+ printf "$1 %s\n" $objs | sort
+}
+
+test_expect_success 'MIDX compaction preserves pack object selection' '
+ git init midx-compact-preserve-selection &&
+ (
+ cd midx-compact-preserve-selection &&
+
+ test_commit A &&
+ test_commit B &&
+
+ # Create two packs, one containing just the objects from
+ # A, and another containing all objects from the
+ # repository.
+ p1="$(echo A | git pack-objects --revs --delta-base-offset \
+ $packdir/pack-1)" &&
+ p0="$(echo B | git pack-objects --revs --delta-base-offset \
+ $packdir/pack-0)" &&
+
+ echo "pack-1-$p1.idx" | git multi-pack-index write \
+ --incremental --bitmap --stdin-packs &&
+ echo "pack-0-$p0.idx" | git multi-pack-index write \
+ --incremental --bitmap --stdin-packs &&
+
+ write_packs C &&
+
+ git multi-pack-index compact --incremental --bitmap \
+ "$(nth_line 1 "$midx_chain")" \
+ "$(nth_line 2 "$midx_chain")" &&
+
+
+ test-tool read-midx --show-objects $objdir \
+ "$(nth_line 1 "$midx_chain")" >AB.info &&
+ test-tool read-midx --show-objects $objdir \
+ "$(nth_line 2 "$midx_chain")" >C.info &&
+
+ midx_objs_by_pack <AB.info >AB.actual &&
+ midx_objs_by_pack <C.info >C.actual &&
+
+ {
+ tag_objs_from_pack 1 A &&
+ tag_objs_from_pack 0 A..B
+ } | sort >AB.expect &&
+ tag_objs_from_pack C B..C >C.expect &&
+
+ test_cmp AB.expect AB.actual &&
+ test_cmp C.expect C.actual
+ )
+'
+
+test_expect_success 'MIDX compaction with bitmaps' '
+ git init midx-compact-with-bitmaps &&
+ (
+ cd midx-compact-with-bitmaps &&
+
+ write_packs foo bar baz quux woot &&
+
+ test-tool read-midx --bitmap $objdir >bitmap.expect &&
+ git multi-pack-index compact --incremental --bitmap \
+ "$(nth_line 2 "$midx_chain")" \
+ "$(nth_line 4 "$midx_chain")" &&
+ test-tool read-midx --bitmap $objdir >bitmap.actual &&
+
+ test_cmp bitmap.expect bitmap.actual &&
+
+ true
+ )
+'
+
+test_expect_success 'MIDX compaction with bitmaps (non-trivial)' '
+ git init midx-compact-with-bitmaps-non-trivial &&
+ (
+ cd midx-compact-with-bitmaps-non-trivial &&
+
+ git branch -m main &&
+
+ # D(4)
+ # /
+ # A(1) --- B(2) --- C(3) --- G(7)
+ # \
+ # E(5) --- F(6)
+ write_packs A B C &&
+ git checkout -b side &&
+ write_packs D &&
+ git checkout -b other B &&
+ write_packs E F &&
+ git checkout main &&
+ write_packs G &&
+
+ cat $midx_chain &&
+
+ # Compact layers 2-4, leaving us with:
+ #
+ # [A, [B, C, D], E, F, G]
+ git multi-pack-index compact --incremental --bitmap \
+ "$(nth_line 2 "$midx_chain")" \
+ "$(nth_line 4 "$midx_chain")" &&
+
+ # Then compact the top two layers, condensing the above
+ # such that the new 4th layer contains F and G.
+ #
+ # [A, [B, C, D], E, [F, G]]
+ git multi-pack-index compact --incremental --bitmap \
+ "$(nth_line 4 "$midx_chain")" \
+ "$(nth_line 5 "$midx_chain")" &&
+
+ cat $midx_chain
+ )
+'
+
test_done
--
2.52.0.171.gd6a4e6b6955
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH 17/17] midx: enable reachability bitmaps during MIDX compaction
2025-12-06 20:31 ` [PATCH 17/17] midx: enable reachability bitmaps during " Taylor Blau
@ 2025-12-09 7:21 ` Patrick Steinhardt
0 siblings, 0 replies; 39+ messages in thread
From: Patrick Steinhardt @ 2025-12-09 7:21 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, Junio C Hamano
On Sat, Dec 06, 2025 at 03:31:50PM -0500, Taylor Blau wrote:
> Enable callers to generate reachability bitmaps when performing MIDX
> layer compaction by combining all existing bitmaps from the compacted
> layers.
>
> Note that the because of the object/pack ordering described by the
s/that the because/that because/
> previous commit, the pseudo-pack order for the compacted MIDX is the
> same as concatenating the individual pseudo-pack orderings for each
> layer in the compaction range.
>
> As a result, the only non-test or documentation change necessary is to
> treat all objects as non-preferred during compaction so as not to
> disturb the object ordering.
>
> In the future, we may want to adjust which commit(s) receive
> reachability bitmaps when compacting multiple .bitmap files into one, or
> even generate new bitmaps (e.g., if the references have moved
> significantly since the .bitmap was generated). This commit only
> implements combining all existing bitmaps in range together in order to
> demonstrate and lay the groundwork for more exotic strategies.
Will there also be a follow-up patch series that introduces geometric
repacking for multi-pack indices?
> @@ -216,6 +216,8 @@ static int cmd_multi_pack_index_compact(int argc, const char **argv,
>
> struct option *options;
> static struct option builtin_multi_pack_index_compact_options[] = {
> + OPT_BIT(0, "bitmap", &opts.flags, N_("write multi-pack bitmap"),
> + MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
> OPT_BIT(0, "incremental", &opts.flags,
> N_("write a new incremental MIDX"), MIDX_WRITE_INCREMENTAL),
> OPT_END(),
Is this new flag actually incompatible with the incremental flag like
you claimed in the preceding commit? I had the impression that it should
be possible to write incremental bitmaps now.
If that's not the case, we should probably have a call to
`die_for_imcopatible_opt2()` somewhere.
Patrick
^ permalink raw reply [flat|nested] 39+ messages in thread