From: Patrick Steinhardt <ps@pks.im>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Elijah Newren <newren@gmail.com>,
Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 16/17] midx: implement MIDX compaction
Date: Tue, 9 Dec 2025 08:21:32 +0100 [thread overview]
Message-ID: <aTfN_PycU9ag8c0u@pks.im> (raw)
In-Reply-To: <c136b2e179d02321de7e7b3f1b6c748cb434d68d.1765053054.git.me@ttaylorr.com>
On Sat, Dec 06, 2025 at 03:31:47PM -0500, Taylor Blau wrote:
> diff --git a/Documentation/git-multi-pack-index.adoc b/Documentation/git-multi-pack-index.adoc
> index 164cf1f2291..a9664e77411 100644
> --- a/Documentation/git-multi-pack-index.adoc
> +++ b/Documentation/git-multi-pack-index.adoc
> @@ -12,6 +12,8 @@ SYNOPSIS
> 'git multi-pack-index' [<options>] write [--preferred-pack=<pack>]
> [--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]
> [--refs-snapshot=<path>]
> +'git multi-pack-index' [<options>] compact [--[no-]incremental]
> + <from> <to>
> 'git multi-pack-index' [<options>] verify
> 'git multi-pack-index' [<options>] expire
> 'git multi-pack-index' [<options>] repack [--batch-size=<size>]
> @@ -83,6 +85,17 @@ marker).
> necessary.
> --
>
> +compact::
> + Write a new MIDX layer containing only objects and packs present
> + in the range `<from>` to `<to>`, where both arguments are
> + checksums of existing layers in the MIDX chain.
> ++
> +--
> + --incremental::
> + Write the result to a MIDX chain instead of writing a
> + stand-alone MIDX. Incompatible with `--bitmap`.
Interesting. What would happen if you compact a subrange of the MIDX
chain without incremental? Would the MIDX be completely replaced with a
MIDX that only covers these packs?
Also, the "--bitmap" flag does not exist yet, so the second sentence
probably needs to be introduced in the next commit.
> diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
> index c0c6c1760c0..9b0c2082cb3 100644
> --- a/builtin/multi-pack-index.c
> +++ b/builtin/multi-pack-index.c
> @@ -195,6 +204,63 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
> return ret;
> }
>
> +static int cmd_multi_pack_index_compact(int argc, const char **argv,
> + const char *prefix,
> + struct repository *repo)
> +{
> + struct multi_pack_index *m, *cur;
> + struct multi_pack_index *from_midx = NULL;
> + struct multi_pack_index *to_midx = NULL;
> + struct odb_source *source;
> + int ret;
> +
> + struct option *options;
> + static struct option builtin_multi_pack_index_compact_options[] = {
> + OPT_BIT(0, "incremental", &opts.flags,
> + N_("write a new incremental MIDX"), MIDX_WRITE_INCREMENTAL),
> + OPT_END(),
> + };
> +
> + repo_config(repo, git_multi_pack_index_write_config, NULL);
> +
> + options = add_common_options(builtin_multi_pack_index_compact_options);
> +
> + trace2_cmd_mode(argv[0]);
> +
> + if (isatty(2))
> + opts.flags |= MIDX_PROGRESS;
> + argc = parse_options(argc, argv, prefix,
> + options, builtin_multi_pack_index_compact_usage,
> + 0);
> +
> + if (argc != 2)
> + usage_with_options(builtin_multi_pack_index_compact_usage,
> + options);
> + source = handle_object_dir_option(the_repository);
> +
> + FREE_AND_NULL(options);
> +
> + m = get_multi_pack_index(source);
> +
> + for (cur = m; cur && !(from_midx && to_midx); cur = cur->base_midx) {
> + const char *midx_csum = get_midx_checksum(cur);
> +
> + if (!from_midx && !strcmp(midx_csum, argv[0]))
> + from_midx = cur;
> + if (!to_midx && !strcmp(midx_csum, argv[1]))
> + to_midx = cur;
> + }
> +
> + if (!from_midx)
> + die(_("could not find MIDX 'from': %s"), argv[0]);
> + if (!to_midx)
> + die(_("could not find MIDX 'to': %s"), argv[1]);
> +
> + ret = write_midx_file_compact(source, from_midx, to_midx, opts.flags);
> +
> + return ret;
> +}
Is it valid if `from_midx == to_midx`?
> diff --git a/midx-write.c b/midx-write.c
> index 7854561359d..fcbfedcd913 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -953,6 +980,72 @@ static int fill_packs_from_midx(struct write_midx_context *ctx)
> return 0;
> }
>
> +static uint32_t compactible_packs_between(const struct multi_pack_index *from,
> + const struct multi_pack_index *to)
> +{
> + uint32_t nr;
> +
> + ASSERT(from && to);
> +
> + nr = u32_add(to->num_packs, to->num_packs_in_base);
> + if (nr < from->num_packs_in_base)
> + BUG("unexpected number of packs in base during compaction: "
> + "%"PRIu32" < %"PRIu32, nr, from->num_packs_in_base);
> +
> + return nr - from->num_packs_in_base;
> +}
> +
> +static int fill_packs_from_midx_range(struct write_midx_context *ctx,
> + int bitmap_order)
> +{
> + struct multi_pack_index *m = ctx->compact_to;
> + uint32_t packs_nr;
> +
> + ASSERT(ctx->compact && !ctx->nr);
> + ASSERT(ctx->compact_from);
> + ASSERT(ctx->compact_to);
> +
> + packs_nr = compactible_packs_between(ctx->compact_from,
> + ctx->compact_to);
> +
> + ALLOC_GROW(ctx->info, packs_nr, ctx->alloc);
> +
> + while (m != ctx->compact_from->base_midx) {
> + uint32_t pack_int_id, preferred_pack_id;
> + uint32_t i;
> +
> + if (bitmap_order) {
> + if (midx_preferred_pack(m, &preferred_pack_id) < 0)
> + die(_("could not determine preferred pack"));
`midx_preferred_pack()` only returns a valid pack ID in case we've got a
reverse index, and as far as I understand we seem to only generate those
when computing bitmaps. I assume that this means that we can only
compact MIDX layers in bitmap order if they already were in bitmap order
before?
That would at least also make sense. We of course cannot randomly change
the order in the middle of our layers, as that would break later layers
that build on top.
> + } else {
> + preferred_pack_id = m->num_packs_in_base;
> + }
> +
> + pack_int_id = m->num_packs_in_base - ctx->compact_from->num_packs_in_base;
> +
> + if (fill_pack_from_midx(&ctx->info[pack_int_id++], m,
> + preferred_pack_id) < 0)
> + return -1;
> +
> + for (i = m->num_packs_in_base;
> + i < m->num_packs_in_base + m->num_packs; i++) {
> + if (preferred_pack_id == i)
> + continue;
> +
> + if (fill_pack_from_midx(&ctx->info[pack_int_id++], m,
> + i) < 0)
> + return -1;
> + }
> +
So the condition that should hold after this loop is `pack_int_id ==
m->num_packs`. Which is somewhat obvious: we skip one pack, but that
pack is the preferred pack that we have populated first.
> @@ -1101,11 +1216,18 @@ static int write_midx_internal(struct write_midx_opts *opts)
> */
> if (ctx.incremental)
> ctx.base_midx = m;
> - else if (!opts->packs_to_include)
> + if (!opts->packs_to_include)
> ctx.m = m;
I'm a bit surprised by this change here. I would've expected that we
never pass `packs_to_include` when compacting, so why is this change
necessary?
> }
> }
>
> + /*
> + * If compacting MIDX layer(s) in the range [from, to], then the
> + * compacted MIDX will share the same base MIDX as 'from'.
> + */
> + if (ctx.compact)
> + ctx.base_midx = ctx.compact_from->base_midx;
Okay, here we overwrite `ctx.base_midx` that we might've already set in
the condition above. It's a bit confusing, doubly so because we may be
warning about the invalid MIDX and claim to ignore it, but ultimately we
don't.
> diff --git a/t/t5335-compact-multi-pack-index.sh b/t/t5335-compact-multi-pack-index.sh
> new file mode 100755
> index 00000000000..f889af7fb1d
> --- /dev/null
> +++ b/t/t5335-compact-multi-pack-index.sh
> @@ -0,0 +1,102 @@
> +#!/bin/sh
> +
> +test_description='multi-pack-index compaction'
> +
> +. ./test-lib.sh
> +
> +GIT_TEST_MULTI_PACK_INDEX=0
> +GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
> +GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=0
> +
> +objdir=.git/objects
> +packdir=$objdir/pack
> +midxdir=$packdir/multi-pack-index.d
> +midx_chain=$midxdir/multi-pack-index-chain
> +
> +nth_line() {
> + local n="$1"
> + shift
> + awk "NR==$n" "$@"
> +}
> +
> +write_packs () {
> + for c in "$@"
> + do
> + test_commit "$c" &&
Nit: it might be sensible to disable housekeeping here. You strongly
depend on the on-disk shape of the objects, so if you by chance wrote
two objects starting with "17" we'd end up repacking and racing.
I've also got an upcoming patch series in mindthat I've got cooking to
make geometric compaction the default for auto-maintenance. We've got
many test suites that implicitly rely on the current algorithm used by
git-gc(1), so I'd love to avoid adding more.
[snip]
> +test_expect_success 'MIDX compaction with lex-ordered pack names' '
> + git init midx-compact-lex-order &&
> + (
> + cd midx-compact-lex-order &&
> +
> + write_packs A B C D E &&
> + test_line_count = 5 $midx_chain &&
> +
> + git multi-pack-index compact --incremental \
> + "$(nth_line 2 "$midx_chain")" \
> + "$(nth_line 4 "$midx_chain")" &&
> + test_line_count = 3 $midx_chain &&
> +
> + test_midx_layer_packs "$(nth_line 1 "$midx_chain")" A &&
> + test_midx_layer_packs "$(nth_line 2 "$midx_chain")" B C D &&
> + test_midx_layer_packs "$(nth_line 3 "$midx_chain")" E &&
> +
> + test_midx_layer_object_uniqueness
> + )
> +'
It would be nice to also test for requests that don't make sense: "from"
larger than "to", "from == to", missing "from" or "foo" and so on.
Patrick
next prev parent reply other threads:[~2025-12-09 7:21 UTC|newest]
Thread overview: 99+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
2025-12-06 20:31 ` [PATCH 01/17] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
2025-12-08 18:26 ` Patrick Steinhardt
2025-12-09 1:41 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 02/17] midx: split `get_midx_checksum()` by adding `get_midx_hash()` Taylor Blau
2025-12-08 18:25 ` Patrick Steinhardt
2025-12-09 1:42 ` Taylor Blau
2025-12-09 1:50 ` Taylor Blau
2025-12-09 6:27 ` Patrick Steinhardt
2026-01-13 22:46 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 03/17] builtin/multi-pack-index.c: make '--progress' a common option Taylor Blau
2025-12-06 20:31 ` [PATCH 04/17] git-multi-pack-index(1): remove non-existent incompatibility Taylor Blau
2025-12-06 20:31 ` [PATCH 05/17] git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h' Taylor Blau
2025-12-06 20:31 ` [PATCH 06/17] t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39 Taylor Blau
2025-12-06 20:31 ` [PATCH 07/17] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos` Taylor Blau
2025-12-08 18:26 ` Patrick Steinhardt
2025-12-09 1:59 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 08/17] midx-write.c: introduce `struct write_midx_opts` Taylor Blau
2025-12-08 18:26 ` Patrick Steinhardt
2025-12-09 2:04 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 09/17] midx: do not require packs to be sorted in lexicographic order Taylor Blau
2025-12-08 18:26 ` Patrick Steinhardt
2025-12-09 2:07 ` Taylor Blau
2025-12-09 2:11 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 10/17] git-compat-util.h: introduce `u32_add()` Taylor Blau
2025-12-08 18:27 ` Patrick Steinhardt
2025-12-09 2:13 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 11/17] midx-write.c: introduce `midx_pack_perm()` helper Taylor Blau
2025-12-06 20:31 ` [PATCH 12/17] midx-write.c: extract `fill_pack_from_midx()` Taylor Blau
2025-12-06 20:31 ` [PATCH 13/17] midx-write.c: enumerate `pack_int_id` values directly Taylor Blau
2025-12-08 18:27 ` Patrick Steinhardt
2025-12-09 2:14 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 14/17] midx-write.c: factor fanout layering from `compute_sorted_entries()` Taylor Blau
2025-12-06 20:31 ` [PATCH 15/17] t/helper/test-read-midx.c: plug memory leak when selecting layer Taylor Blau
2025-12-08 18:27 ` Patrick Steinhardt
2025-12-09 2:16 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 16/17] midx: implement MIDX compaction Taylor Blau
2025-12-09 7:21 ` Patrick Steinhardt [this message]
2026-01-13 23:32 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 17/17] midx: enable reachability bitmaps during " Taylor Blau
2025-12-09 7:21 ` Patrick Steinhardt
2026-01-13 23:47 ` Taylor Blau
2026-01-14 19:54 ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Taylor Blau
2026-01-14 19:54 ` [PATCH v2 01/18] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
2026-01-14 19:54 ` [PATCH v2 02/18] midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()` Taylor Blau
2026-01-14 19:54 ` [PATCH v2 03/18] midx: introduce `midx_get_checksum_hex()` Taylor Blau
2026-01-14 19:54 ` [PATCH v2 04/18] builtin/multi-pack-index.c: make '--progress' a common option Taylor Blau
2026-01-14 19:54 ` [PATCH v2 05/18] git-multi-pack-index(1): remove non-existent incompatibility Taylor Blau
2026-01-14 19:54 ` [PATCH v2 06/18] git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h' Taylor Blau
2026-01-14 19:54 ` [PATCH v2 07/18] t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39 Taylor Blau
2026-01-14 19:54 ` [PATCH v2 08/18] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos` Taylor Blau
2026-01-14 21:13 ` Junio C Hamano
2026-01-14 21:40 ` Taylor Blau
2026-01-14 19:54 ` [PATCH v2 09/18] midx-write.c: introduce `struct write_midx_opts` Taylor Blau
2026-01-14 19:54 ` [PATCH v2 10/18] midx: do not require packs to be sorted in lexicographic order Taylor Blau
2026-01-14 21:28 ` Junio C Hamano
2026-01-14 21:44 ` Taylor Blau
2026-01-27 7:34 ` Patrick Steinhardt
2026-02-24 18:47 ` Taylor Blau
2026-01-14 19:54 ` [PATCH v2 11/18] git-compat-util.h: introduce `u32_add()` Taylor Blau
2026-01-14 21:49 ` Junio C Hamano
2026-01-14 22:03 ` Taylor Blau
2026-01-15 0:11 ` Taylor Blau
2026-01-21 8:51 ` Patrick Steinhardt
2026-01-21 23:55 ` Taylor Blau
2026-01-22 2:26 ` rsbecker
2026-01-22 17:07 ` Junio C Hamano
2026-02-23 13:49 ` Jeff King
2026-02-24 18:53 ` Taylor Blau
2026-01-14 19:54 ` [PATCH v2 12/18] midx-write.c: introduce `midx_pack_perm()` helper Taylor Blau
2026-01-14 19:54 ` [PATCH v2 13/18] midx-write.c: extract `fill_pack_from_midx()` Taylor Blau
2026-01-14 19:54 ` [PATCH v2 14/18] midx-write.c: enumerate `pack_int_id` values directly Taylor Blau
2026-01-14 19:55 ` [PATCH v2 15/18] midx-write.c: factor fanout layering from `compute_sorted_entries()` Taylor Blau
2026-01-14 19:55 ` [PATCH v2 16/18] t/helper/test-read-midx.c: plug memory leak when selecting layer Taylor Blau
2026-01-14 19:55 ` [PATCH v2 17/18] midx: implement MIDX compaction Taylor Blau
2026-01-27 7:35 ` Patrick Steinhardt
2026-01-27 22:13 ` Taylor Blau
2026-01-14 19:55 ` [PATCH v2 18/18] midx: enable reachability bitmaps during " Taylor Blau
2026-02-20 22:24 ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Junio C Hamano
2026-02-23 14:08 ` Jeff King
2026-02-24 5:25 ` Taylor Blau
2026-02-24 18:59 ` [PATCH v3 00/17] " Taylor Blau
2026-02-24 18:59 ` [PATCH v3 01/17] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
2026-02-24 18:59 ` [PATCH v3 02/17] midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()` Taylor Blau
2026-02-24 18:59 ` [PATCH v3 03/17] midx: introduce `midx_get_checksum_hex()` Taylor Blau
2026-02-24 18:59 ` [PATCH v3 04/17] builtin/multi-pack-index.c: make '--progress' a common option Taylor Blau
2026-02-24 18:59 ` [PATCH v3 05/17] git-multi-pack-index(1): remove non-existent incompatibility Taylor Blau
2026-02-24 18:59 ` [PATCH v3 06/17] git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h' Taylor Blau
2026-02-24 19:00 ` [PATCH v3 07/17] t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39 Taylor Blau
2026-02-24 19:00 ` [PATCH v3 08/17] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos` Taylor Blau
2026-02-24 19:00 ` [PATCH v3 09/17] midx-write.c: introduce `struct write_midx_opts` Taylor Blau
2026-02-24 19:00 ` [PATCH v3 10/17] midx: do not require packs to be sorted in lexicographic order Taylor Blau
2026-02-24 19:00 ` [PATCH v3 11/17] midx-write.c: introduce `midx_pack_perm()` helper Taylor Blau
2026-02-24 19:00 ` [PATCH v3 12/17] midx-write.c: extract `fill_pack_from_midx()` Taylor Blau
2026-02-24 19:00 ` [PATCH v3 13/17] midx-write.c: enumerate `pack_int_id` values directly Taylor Blau
2026-02-24 19:00 ` [PATCH v3 14/17] midx-write.c: factor fanout layering from `compute_sorted_entries()` Taylor Blau
2026-02-24 19:00 ` [PATCH v3 15/17] t/helper/test-read-midx.c: plug memory leak when selecting layer Taylor Blau
2026-02-24 19:00 ` [PATCH v3 16/17] midx: implement MIDX compaction Taylor Blau
2026-02-24 19:00 ` [PATCH v3 17/17] midx: enable reachability bitmaps during " Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aTfN_PycU9ag8c0u@pks.im \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.