From: Taylor Blau <me@ttaylorr.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Elijah Newren <newren@gmail.com>,
Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 16/17] midx: implement MIDX compaction
Date: Tue, 13 Jan 2026 18:32:27 -0500 [thread overview]
Message-ID: <aWbWC4nTnP52lxSg@nand.local> (raw)
In-Reply-To: <aTfN_PycU9ag8c0u@pks.im>
On Tue, Dec 09, 2025 at 08:21:32AM +0100, Patrick Steinhardt wrote:
> On Sat, Dec 06, 2025 at 03:31:47PM -0500, Taylor Blau wrote:
> > diff --git a/Documentation/git-multi-pack-index.adoc b/Documentation/git-multi-pack-index.adoc
> > index 164cf1f2291..a9664e77411 100644
> > --- a/Documentation/git-multi-pack-index.adoc
> > +++ b/Documentation/git-multi-pack-index.adoc
> > @@ -12,6 +12,8 @@ SYNOPSIS
> > 'git multi-pack-index' [<options>] write [--preferred-pack=<pack>]
> > [--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]
> > [--refs-snapshot=<path>]
> > +'git multi-pack-index' [<options>] compact [--[no-]incremental]
> > + <from> <to>
> > 'git multi-pack-index' [<options>] verify
> > 'git multi-pack-index' [<options>] expire
> > 'git multi-pack-index' [<options>] repack [--batch-size=<size>]
> > @@ -83,6 +85,17 @@ marker).
> > necessary.
> > --
> >
> > +compact::
> > + Write a new MIDX layer containing only objects and packs present
> > + in the range `<from>` to `<to>`, where both arguments are
> > + checksums of existing layers in the MIDX chain.
> > ++
> > +--
> > + --incremental::
> > + Write the result to a MIDX chain instead of writing a
> > + stand-alone MIDX. Incompatible with `--bitmap`.
>
> Interesting. What would happen if you compact a subrange of the MIDX
> chain without incremental? Would the MIDX be completely replaced with a
> MIDX that only covers these packs?
That's right.
> Also, the "--bitmap" flag does not exist yet, so the second sentence
> probably needs to be introduced in the next commit.
Ah, great catch -- I removed that line here. I don't think it needs to
be readded in the following commit, though, since that patch introduces
"--bitmap" and makes it compatible with MIDX compaction.
> > + if (!from_midx)
> > + die(_("could not find MIDX 'from': %s"), argv[0]);
> > + if (!to_midx)
> > + die(_("could not find MIDX 'to': %s"), argv[1]);
> > +
> > + ret = write_midx_file_compact(source, from_midx, to_midx, opts.flags);
> > +
> > + return ret;
> > +}
>
> Is it valid if `from_midx == to_midx`?
Yes, that would result in a noop write.
> > + while (m != ctx->compact_from->base_midx) {
> > + uint32_t pack_int_id, preferred_pack_id;
> > + uint32_t i;
> > +
> > + if (bitmap_order) {
> > + if (midx_preferred_pack(m, &preferred_pack_id) < 0)
> > + die(_("could not determine preferred pack"));
>
> `midx_preferred_pack()` only returns a valid pack ID in case we've got a
> reverse index, and as far as I understand we seem to only generate those
> when computing bitmaps. I assume that this means that we can only
> compact MIDX layers in bitmap order if they already were in bitmap order
> before?
>
> That would at least also make sense. We of course cannot randomly change
> the order in the middle of our layers, as that would break later layers
> that build on top.
Indeed, we only generate a reverse index for a MIDX if we are writing it
with bitmaps, since there is no other purpose for having a revindex
outside of reachability bitmaps.
So if we have a bitmap and are compacting, then we need to retain the
order of the packs as they appear in the pre-compaction pseudo-pack
order to avoid permuting the bits corresponding to those objects. In
other words, you're correct in saying that we cannot start writing
bitmaps during compaction if we did not have bitmaps to begin with
pre-compaction.
> > + for (i = m->num_packs_in_base;
> > + i < m->num_packs_in_base + m->num_packs; i++) {
> > + if (preferred_pack_id == i)
> > + continue;
> > +
> > + if (fill_pack_from_midx(&ctx->info[pack_int_id++], m,
> > + i) < 0)
> > + return -1;
> > + }
> > +
>
> So the condition that should hold after this loop is `pack_int_id ==
> m->num_packs`. Which is somewhat obvious: we skip one pack, but that
> pack is the preferred pack that we have populated first.
Exactly!
> > @@ -1101,11 +1216,18 @@ static int write_midx_internal(struct write_midx_opts *opts)
> > */
> > if (ctx.incremental)
> > ctx.base_midx = m;
> > - else if (!opts->packs_to_include)
> > + if (!opts->packs_to_include)
> > ctx.m = m;
>
> I'm a bit surprised by this change here. I would've expected that we
> never pass `packs_to_include` when compacting, so why is this change
> necessary?
Right, we do not pass packs_to_include here during compaction. But if we
are doing an incremental compaction, then we do want to assign ctx.m in
addition to ctx.base_midx.
> > diff --git a/t/t5335-compact-multi-pack-index.sh b/t/t5335-compact-multi-pack-index.sh
> > new file mode 100755
> > index 00000000000..f889af7fb1d
> > --- /dev/null
> > +++ b/t/t5335-compact-multi-pack-index.sh
> > @@ -0,0 +1,102 @@
> > +#!/bin/sh
> > +
> > +test_description='multi-pack-index compaction'
> > +
> > +. ./test-lib.sh
> > +
> > +GIT_TEST_MULTI_PACK_INDEX=0
> > +GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
> > +GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=0
> > +
> > +objdir=.git/objects
> > +packdir=$objdir/pack
> > +midxdir=$packdir/multi-pack-index.d
> > +midx_chain=$midxdir/multi-pack-index-chain
> > +
> > +nth_line() {
> > + local n="$1"
> > + shift
> > + awk "NR==$n" "$@"
> > +}
> > +
> > +write_packs () {
> > + for c in "$@"
> > + do
> > + test_commit "$c" &&
>
> Nit: it might be sensible to disable housekeeping here. You strongly
> depend on the on-disk shape of the objects, so if you by chance wrote
> two objects starting with "17" we'd end up repacking and racing.
>
> I've also got an upcoming patch series in mindthat I've got cooking to
> make geometric compaction the default for auto-maintenance. We've got
> many test suites that implicitly rely on the current algorithm used by
> git-gc(1), so I'd love to avoid adding more.
I'm not sure I follow what you mean by "housekeeping" here. Are you
referring to maintenance.auto? If so, we shouldn't be writing so many
packs as to trigger that during these tests, but I can disable it as a
sanity check just in case.
> [snip]
> > +test_expect_success 'MIDX compaction with lex-ordered pack names' '
> > + git init midx-compact-lex-order &&
> > + (
> > + cd midx-compact-lex-order &&
> > +
> > + write_packs A B C D E &&
> > + test_line_count = 5 $midx_chain &&
> > +
> > + git multi-pack-index compact --incremental \
> > + "$(nth_line 2 "$midx_chain")" \
> > + "$(nth_line 4 "$midx_chain")" &&
> > + test_line_count = 3 $midx_chain &&
> > +
> > + test_midx_layer_packs "$(nth_line 1 "$midx_chain")" A &&
> > + test_midx_layer_packs "$(nth_line 2 "$midx_chain")" B C D &&
> > + test_midx_layer_packs "$(nth_line 3 "$midx_chain")" E &&
> > +
> > + test_midx_layer_object_uniqueness
> > + )
> > +'
>
> It would be nice to also test for requests that don't make sense: "from"
> larger than "to", "from == to", missing "from" or "foo" and so on.
All good suggestions, thanks!
Thanks,
Taylor
next prev parent reply other threads:[~2026-01-13 23:32 UTC|newest]
Thread overview: 99+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
2025-12-06 20:31 ` [PATCH 01/17] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
2025-12-08 18:26 ` Patrick Steinhardt
2025-12-09 1:41 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 02/17] midx: split `get_midx_checksum()` by adding `get_midx_hash()` Taylor Blau
2025-12-08 18:25 ` Patrick Steinhardt
2025-12-09 1:42 ` Taylor Blau
2025-12-09 1:50 ` Taylor Blau
2025-12-09 6:27 ` Patrick Steinhardt
2026-01-13 22:46 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 03/17] builtin/multi-pack-index.c: make '--progress' a common option Taylor Blau
2025-12-06 20:31 ` [PATCH 04/17] git-multi-pack-index(1): remove non-existent incompatibility Taylor Blau
2025-12-06 20:31 ` [PATCH 05/17] git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h' Taylor Blau
2025-12-06 20:31 ` [PATCH 06/17] t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39 Taylor Blau
2025-12-06 20:31 ` [PATCH 07/17] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos` Taylor Blau
2025-12-08 18:26 ` Patrick Steinhardt
2025-12-09 1:59 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 08/17] midx-write.c: introduce `struct write_midx_opts` Taylor Blau
2025-12-08 18:26 ` Patrick Steinhardt
2025-12-09 2:04 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 09/17] midx: do not require packs to be sorted in lexicographic order Taylor Blau
2025-12-08 18:26 ` Patrick Steinhardt
2025-12-09 2:07 ` Taylor Blau
2025-12-09 2:11 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 10/17] git-compat-util.h: introduce `u32_add()` Taylor Blau
2025-12-08 18:27 ` Patrick Steinhardt
2025-12-09 2:13 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 11/17] midx-write.c: introduce `midx_pack_perm()` helper Taylor Blau
2025-12-06 20:31 ` [PATCH 12/17] midx-write.c: extract `fill_pack_from_midx()` Taylor Blau
2025-12-06 20:31 ` [PATCH 13/17] midx-write.c: enumerate `pack_int_id` values directly Taylor Blau
2025-12-08 18:27 ` Patrick Steinhardt
2025-12-09 2:14 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 14/17] midx-write.c: factor fanout layering from `compute_sorted_entries()` Taylor Blau
2025-12-06 20:31 ` [PATCH 15/17] t/helper/test-read-midx.c: plug memory leak when selecting layer Taylor Blau
2025-12-08 18:27 ` Patrick Steinhardt
2025-12-09 2:16 ` Taylor Blau
2025-12-06 20:31 ` [PATCH 16/17] midx: implement MIDX compaction Taylor Blau
2025-12-09 7:21 ` Patrick Steinhardt
2026-01-13 23:32 ` Taylor Blau [this message]
2025-12-06 20:31 ` [PATCH 17/17] midx: enable reachability bitmaps during " Taylor Blau
2025-12-09 7:21 ` Patrick Steinhardt
2026-01-13 23:47 ` Taylor Blau
2026-01-14 19:54 ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Taylor Blau
2026-01-14 19:54 ` [PATCH v2 01/18] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
2026-01-14 19:54 ` [PATCH v2 02/18] midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()` Taylor Blau
2026-01-14 19:54 ` [PATCH v2 03/18] midx: introduce `midx_get_checksum_hex()` Taylor Blau
2026-01-14 19:54 ` [PATCH v2 04/18] builtin/multi-pack-index.c: make '--progress' a common option Taylor Blau
2026-01-14 19:54 ` [PATCH v2 05/18] git-multi-pack-index(1): remove non-existent incompatibility Taylor Blau
2026-01-14 19:54 ` [PATCH v2 06/18] git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h' Taylor Blau
2026-01-14 19:54 ` [PATCH v2 07/18] t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39 Taylor Blau
2026-01-14 19:54 ` [PATCH v2 08/18] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos` Taylor Blau
2026-01-14 21:13 ` Junio C Hamano
2026-01-14 21:40 ` Taylor Blau
2026-01-14 19:54 ` [PATCH v2 09/18] midx-write.c: introduce `struct write_midx_opts` Taylor Blau
2026-01-14 19:54 ` [PATCH v2 10/18] midx: do not require packs to be sorted in lexicographic order Taylor Blau
2026-01-14 21:28 ` Junio C Hamano
2026-01-14 21:44 ` Taylor Blau
2026-01-27 7:34 ` Patrick Steinhardt
2026-02-24 18:47 ` Taylor Blau
2026-01-14 19:54 ` [PATCH v2 11/18] git-compat-util.h: introduce `u32_add()` Taylor Blau
2026-01-14 21:49 ` Junio C Hamano
2026-01-14 22:03 ` Taylor Blau
2026-01-15 0:11 ` Taylor Blau
2026-01-21 8:51 ` Patrick Steinhardt
2026-01-21 23:55 ` Taylor Blau
2026-01-22 2:26 ` rsbecker
2026-01-22 17:07 ` Junio C Hamano
2026-02-23 13:49 ` Jeff King
2026-02-24 18:53 ` Taylor Blau
2026-01-14 19:54 ` [PATCH v2 12/18] midx-write.c: introduce `midx_pack_perm()` helper Taylor Blau
2026-01-14 19:54 ` [PATCH v2 13/18] midx-write.c: extract `fill_pack_from_midx()` Taylor Blau
2026-01-14 19:54 ` [PATCH v2 14/18] midx-write.c: enumerate `pack_int_id` values directly Taylor Blau
2026-01-14 19:55 ` [PATCH v2 15/18] midx-write.c: factor fanout layering from `compute_sorted_entries()` Taylor Blau
2026-01-14 19:55 ` [PATCH v2 16/18] t/helper/test-read-midx.c: plug memory leak when selecting layer Taylor Blau
2026-01-14 19:55 ` [PATCH v2 17/18] midx: implement MIDX compaction Taylor Blau
2026-01-27 7:35 ` Patrick Steinhardt
2026-01-27 22:13 ` Taylor Blau
2026-01-14 19:55 ` [PATCH v2 18/18] midx: enable reachability bitmaps during " Taylor Blau
2026-02-20 22:24 ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Junio C Hamano
2026-02-23 14:08 ` Jeff King
2026-02-24 5:25 ` Taylor Blau
2026-02-24 18:59 ` [PATCH v3 00/17] " Taylor Blau
2026-02-24 18:59 ` [PATCH v3 01/17] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
2026-02-24 18:59 ` [PATCH v3 02/17] midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()` Taylor Blau
2026-02-24 18:59 ` [PATCH v3 03/17] midx: introduce `midx_get_checksum_hex()` Taylor Blau
2026-02-24 18:59 ` [PATCH v3 04/17] builtin/multi-pack-index.c: make '--progress' a common option Taylor Blau
2026-02-24 18:59 ` [PATCH v3 05/17] git-multi-pack-index(1): remove non-existent incompatibility Taylor Blau
2026-02-24 18:59 ` [PATCH v3 06/17] git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h' Taylor Blau
2026-02-24 19:00 ` [PATCH v3 07/17] t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39 Taylor Blau
2026-02-24 19:00 ` [PATCH v3 08/17] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos` Taylor Blau
2026-02-24 19:00 ` [PATCH v3 09/17] midx-write.c: introduce `struct write_midx_opts` Taylor Blau
2026-02-24 19:00 ` [PATCH v3 10/17] midx: do not require packs to be sorted in lexicographic order Taylor Blau
2026-02-24 19:00 ` [PATCH v3 11/17] midx-write.c: introduce `midx_pack_perm()` helper Taylor Blau
2026-02-24 19:00 ` [PATCH v3 12/17] midx-write.c: extract `fill_pack_from_midx()` Taylor Blau
2026-02-24 19:00 ` [PATCH v3 13/17] midx-write.c: enumerate `pack_int_id` values directly Taylor Blau
2026-02-24 19:00 ` [PATCH v3 14/17] midx-write.c: factor fanout layering from `compute_sorted_entries()` Taylor Blau
2026-02-24 19:00 ` [PATCH v3 15/17] t/helper/test-read-midx.c: plug memory leak when selecting layer Taylor Blau
2026-02-24 19:00 ` [PATCH v3 16/17] midx: implement MIDX compaction Taylor Blau
2026-02-24 19:00 ` [PATCH v3 17/17] midx: enable reachability bitmaps during " Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aWbWC4nTnP52lxSg@nand.local \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox