public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org, Elijah Newren <newren@gmail.com>,
	Patrick Steinhardt <ps@pks.im>
Subject: Re: [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction
Date: Tue, 24 Feb 2026 00:25:55 -0500	[thread overview]
Message-ID: <aZ02Y170p/yEjohh@nand.local> (raw)
In-Reply-To: <20260223140847.GB271392@coredump.intra.peff.net>

On Mon, Feb 23, 2026 at 09:08:47AM -0500, Jeff King wrote:
> On Fri, Feb 20, 2026 at 02:24:25PM -0800, Junio C Hamano wrote:
>
> > Taylor Blau <me@ttaylorr.com> writes:
> >
> > > [Note to the maintainer: this is based on 'master' with my
> > > 'tb/midx-write-corrupt-checksum-fix' merged in and should produce zero
> > > conflicts when applied on top.]
> > >
> > > This is a reroll of my series to implement MIDX layer compaction
> > > adjusted in response to reviewer feedback.
> >
> > Haven't seen a lot happen since the end of last month or so.
> > Anything left to do here?
>
> Sorry, I promised Taylor I would review it and I have been dragging my
> feet.
>
> I just read through the whole series, and didn't find anything that
> hadn't already been brought up. The scariest / most complicated bits
> are:
>
>   1. Dropping the sort-order requirement for midx pack lists in patch
>      10. But I think the approach with the version bump is the maximally
>      conservative one, and I feel good about that. IIRC we discussed
>      previously the possibility of using new chunks to give the
>      alternate ordering, but I think this keeps the complexity in the
>      code to a minimum.

Thanks, I think the "midx.version" approach is the one that I'm most
comfortable with too.

FWIW, I think that it's impossible to supplement the existing version
with an additional chunk to indicate the new ordering while maintaining
compatibility with out-of-tree MIDX readers. If others have thoughts on
how to do this, I'd be very curious to discuss it more, but I think what
we have in this series is likely the best we can do without breaking,
e.g., libgit2.

>   2. The actual compaction bits in the final two patches. I didn't see
>      anything wrong here, but this is exactly the kind of spot where I
>      think review fails, because you don't realize the corner case that
>      you missed (speaking from experience with midx and bitmap code).

Indeed.

>      But the nice thing here is that it should be quite unlikely to
>      cause a regression if you're not using compaction. As this is
>      mostly a building block for "part 3" that starts doing compaction
>      as part of a broader repacking strategy, I think it is OK to
>      consider it somewhat-experimental, build the next layer, and then
>      eventually let it see more exercise in a production environment.
>
>      That's how we've traditionally found those corner cases, and I
>      think trying to spend more time staring at it in review is not
>      likely to produce more insights.

As you note, compaction on its own is not all that interesting, since
users are unlikely to be managing their MIDX layers manually. Compaction
really is the final building block for the new repacking strategy, at
which point this code becomes a lot more accessible and therefore easy
to test and experiment with.

I'm going to deploy this slowly to GitHub's infrastructure in the coming
weeks[^1] and opting personal repositories into the new code to flush
out any bug(s).

As I find and fix those, I'll of course share the results here.

> There were a couple minor issues brought up in review, like out-dated
> comments and the u32_add interface. So I think we might need a v3 with a
> few touch-ups, but that's it.

Thanks. I have the following range-diff prepared locally, but after a
day of traveling I am too tired to confidently declare it free of any
typos/thinkos ;-).

--- 8< ---
 1:  2e549ea6443 =  1:  275960bf36f midx: mark `get_midx_checksum()` arguments as const
 2:  7255adafe70 =  2:  6b769517656 midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()`
 3:  25b628fda97 =  3:  64a56ee1506 midx: introduce `midx_get_checksum_hex()`
 4:  2aedd72db8c =  4:  f6d3ecbb283 builtin/multi-pack-index.c: make '--progress' a common option
 5:  a00598a36a3 =  5:  7d9f5766cda git-multi-pack-index(1): remove non-existent incompatibility
 6:  92e6d868a45 =  6:  8ba8529fe32 git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h'
 7:  ff599c11f68 =  7:  cd2b9f6f94f t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39
 8:  315a0ea2985 =  8:  983cff2f1b7 midx-write.c: don't use `pack_perm` when assigning `bitmap_pos`
 9:  af174e22e1e =  9:  5f03fda72ef midx-write.c: introduce `struct write_midx_opts`
10:  72bcd4ed6c7 ! 10:  9b02544b9ae midx: do not require packs to be sorted in lexicographic order
    @@ Documentation/gitformat-pack.adoc: HEADER:

      	1-byte version number:
     -	    Git only writes or recognizes version 1.
    -+	    Git only writes version 2, but recognizes versions 1 and 2.
    ++	    Git writes the version specified by the "midx.version"
    ++	    configuration option, which defaults to 2. It recognizes
    ++	    both versions 1 and 2.

      	1-byte Object Id Version
      	    We infer the length of object IDs (OIDs) from this value:
11:  c0c1769464b <  -:  ----------- git-compat-util.h: introduce `u32_add()`
12:  c11214a51f0 = 11:  06536b2081a midx-write.c: introduce `midx_pack_perm()` helper
13:  b9244a04297 = 12:  ee92de1795e midx-write.c: extract `fill_pack_from_midx()`
14:  c6f8d323477 = 13:  217cd5799a5 midx-write.c: enumerate `pack_int_id` values directly
15:  e71aa575463 = 14:  1193e9b3c49 midx-write.c: factor fanout layering from `compute_sorted_entries()`
16:  dbbcb494563 = 15:  73f2151609b t/helper/test-read-midx.c: plug memory leak when selecting layer
17:  13336e864f4 ! 16:  7344ce8e533 midx: implement MIDX compaction
    @@ Commit message
            `pack_int_id`'s (as opposed to the index at which each pack appears
            in `ctx.info`).

    +       Note that we cannot reuse `midx_fanout_add_midx_fanout()` directly
    +       here, as it unconditionally recurs through the `->base_midx`. Factor
    +       out a `_1()` variant that operates on a single layer, reimplement
    +       the existing function in terms of it, and use the new variant from
    +       `midx_fanout_add_compact()`.
    +
    +       Since we are sorting the list of objects ourselves, the order we add
    +       them in does not matter.
    +
          - When writing out the new 'multi-pack-index-chain' file, discard any
            layers in the compaction range, replacing them with the newly written
            layer, instead of keeping them and placing the new layer at the end
    @@ midx-write.c: struct write_midx_context {
      	return ctx->pack_perm[orig_pack_int_id];
      }

    +@@ midx-write.c: static void midx_fanout_sort(struct midx_fanout *fanout)
    + 	QSORT(fanout->entries, fanout->nr, midx_oid_compare);
    + }
    +
    +-static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
    +-					struct multi_pack_index *m,
    +-					uint32_t cur_fanout,
    +-					uint32_t preferred_pack)
    ++static void midx_fanout_add_midx_fanout_1(struct midx_fanout *fanout,
    ++					  struct multi_pack_index *m,
    ++					  uint32_t cur_fanout,
    ++					  uint32_t preferred_pack)
    + {
    + 	uint32_t start = m->num_objects_in_base, end;
    + 	uint32_t cur_object;
    +
    +-	if (m->base_midx)
    +-		midx_fanout_add_midx_fanout(fanout, m->base_midx, cur_fanout,
    +-					    preferred_pack);
    +-
    + 	if (cur_fanout)
    + 		start += ntohl(m->chunk_oid_fanout[cur_fanout - 1]);
    + 	end = m->num_objects_in_base + ntohl(m->chunk_oid_fanout[cur_fanout]);
    +@@ midx-write.c: static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
    + 	}
    + }
    +
    ++static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
    ++					struct multi_pack_index *m,
    ++					uint32_t cur_fanout,
    ++					uint32_t preferred_pack)
    ++{
    ++	if (m->base_midx)
    ++		midx_fanout_add_midx_fanout(fanout, m->base_midx, cur_fanout,
    ++					    preferred_pack);
    ++	midx_fanout_add_midx_fanout_1(fanout, m, cur_fanout, preferred_pack);
    ++}
    ++
    + static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout,
    + 					struct pack_info *info,
    + 					uint32_t cur_pack,
     @@ midx-write.c: static void midx_fanout_add(struct midx_fanout *fanout,
      					    cur_fanout);
      }
    @@ midx-write.c: static void midx_fanout_add(struct midx_fanout *fanout,
     +	ASSERT(ctx->compact);
     +
     +	while (m && m != ctx->compact_from->base_midx) {
    -+		midx_fanout_add_midx_fanout(fanout, m, cur_fanout,
    -+					    NO_PREFERRED_PACK);
    ++		midx_fanout_add_midx_fanout_1(fanout, m, cur_fanout,
    ++					      NO_PREFERRED_PACK);
     +		m = m->base_midx;
     +	}
     +}
    @@ midx-write.c: static int fill_packs_from_midx(struct write_midx_context *ctx)
     +
     +	ASSERT(from && to);
     +
    -+	nr = u32_add(to->num_packs, to->num_packs_in_base);
    ++	if (unsigned_add_overflows(to->num_packs, to->num_packs_in_base))
    ++		die(_("too many packs, unable to compact"));
    ++
    ++	nr = to->num_packs + to->num_packs_in_base;
     +	if (nr < from->num_packs_in_base)
     +		BUG("unexpected number of packs in base during compaction: "
     +		    "%"PRIu32" < %"PRIu32, nr, from->num_packs_in_base);
18:  b599f1ad4b0 = 17:  b9c1b3992c5 midx: enable reachability bitmaps during MIDX compaction
--- >8 ---

Thanks,
Taylor

[^1]: Worth noting that I am at an offsite for work this week, and then
  taking a planned vacation for the following two, so I may be a little
  less available than usual.

  reply	other threads:[~2026-02-24  5:25 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-06 20:30 [PATCH 00/17] midx: incremental MIDX/bitmap layer compaction Taylor Blau
2025-12-06 20:31 ` [PATCH 01/17] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
2025-12-08 18:26   ` Patrick Steinhardt
2025-12-09  1:41     ` Taylor Blau
2025-12-06 20:31 ` [PATCH 02/17] midx: split `get_midx_checksum()` by adding `get_midx_hash()` Taylor Blau
2025-12-08 18:25   ` Patrick Steinhardt
2025-12-09  1:42     ` Taylor Blau
2025-12-09  1:50       ` Taylor Blau
2025-12-09  6:27         ` Patrick Steinhardt
2026-01-13 22:46           ` Taylor Blau
2025-12-06 20:31 ` [PATCH 03/17] builtin/multi-pack-index.c: make '--progress' a common option Taylor Blau
2025-12-06 20:31 ` [PATCH 04/17] git-multi-pack-index(1): remove non-existent incompatibility Taylor Blau
2025-12-06 20:31 ` [PATCH 05/17] git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h' Taylor Blau
2025-12-06 20:31 ` [PATCH 06/17] t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39 Taylor Blau
2025-12-06 20:31 ` [PATCH 07/17] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos` Taylor Blau
2025-12-08 18:26   ` Patrick Steinhardt
2025-12-09  1:59     ` Taylor Blau
2025-12-06 20:31 ` [PATCH 08/17] midx-write.c: introduce `struct write_midx_opts` Taylor Blau
2025-12-08 18:26   ` Patrick Steinhardt
2025-12-09  2:04     ` Taylor Blau
2025-12-06 20:31 ` [PATCH 09/17] midx: do not require packs to be sorted in lexicographic order Taylor Blau
2025-12-08 18:26   ` Patrick Steinhardt
2025-12-09  2:07     ` Taylor Blau
2025-12-09  2:11       ` Taylor Blau
2025-12-06 20:31 ` [PATCH 10/17] git-compat-util.h: introduce `u32_add()` Taylor Blau
2025-12-08 18:27   ` Patrick Steinhardt
2025-12-09  2:13     ` Taylor Blau
2025-12-06 20:31 ` [PATCH 11/17] midx-write.c: introduce `midx_pack_perm()` helper Taylor Blau
2025-12-06 20:31 ` [PATCH 12/17] midx-write.c: extract `fill_pack_from_midx()` Taylor Blau
2025-12-06 20:31 ` [PATCH 13/17] midx-write.c: enumerate `pack_int_id` values directly Taylor Blau
2025-12-08 18:27   ` Patrick Steinhardt
2025-12-09  2:14     ` Taylor Blau
2025-12-06 20:31 ` [PATCH 14/17] midx-write.c: factor fanout layering from `compute_sorted_entries()` Taylor Blau
2025-12-06 20:31 ` [PATCH 15/17] t/helper/test-read-midx.c: plug memory leak when selecting layer Taylor Blau
2025-12-08 18:27   ` Patrick Steinhardt
2025-12-09  2:16     ` Taylor Blau
2025-12-06 20:31 ` [PATCH 16/17] midx: implement MIDX compaction Taylor Blau
2025-12-09  7:21   ` Patrick Steinhardt
2026-01-13 23:32     ` Taylor Blau
2025-12-06 20:31 ` [PATCH 17/17] midx: enable reachability bitmaps during " Taylor Blau
2025-12-09  7:21   ` Patrick Steinhardt
2026-01-13 23:47     ` Taylor Blau
2026-01-14 19:54 ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Taylor Blau
2026-01-14 19:54   ` [PATCH v2 01/18] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
2026-01-14 19:54   ` [PATCH v2 02/18] midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()` Taylor Blau
2026-01-14 19:54   ` [PATCH v2 03/18] midx: introduce `midx_get_checksum_hex()` Taylor Blau
2026-01-14 19:54   ` [PATCH v2 04/18] builtin/multi-pack-index.c: make '--progress' a common option Taylor Blau
2026-01-14 19:54   ` [PATCH v2 05/18] git-multi-pack-index(1): remove non-existent incompatibility Taylor Blau
2026-01-14 19:54   ` [PATCH v2 06/18] git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h' Taylor Blau
2026-01-14 19:54   ` [PATCH v2 07/18] t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39 Taylor Blau
2026-01-14 19:54   ` [PATCH v2 08/18] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos` Taylor Blau
2026-01-14 21:13     ` Junio C Hamano
2026-01-14 21:40       ` Taylor Blau
2026-01-14 19:54   ` [PATCH v2 09/18] midx-write.c: introduce `struct write_midx_opts` Taylor Blau
2026-01-14 19:54   ` [PATCH v2 10/18] midx: do not require packs to be sorted in lexicographic order Taylor Blau
2026-01-14 21:28     ` Junio C Hamano
2026-01-14 21:44       ` Taylor Blau
2026-01-27  7:34     ` Patrick Steinhardt
2026-02-24 18:47       ` Taylor Blau
2026-01-14 19:54   ` [PATCH v2 11/18] git-compat-util.h: introduce `u32_add()` Taylor Blau
2026-01-14 21:49     ` Junio C Hamano
2026-01-14 22:03       ` Taylor Blau
2026-01-15  0:11         ` Taylor Blau
2026-01-21  8:51           ` Patrick Steinhardt
2026-01-21 23:55             ` Taylor Blau
2026-01-22  2:26               ` rsbecker
2026-01-22 17:07                 ` Junio C Hamano
2026-02-23 13:49               ` Jeff King
2026-02-24 18:53                 ` Taylor Blau
2026-01-14 19:54   ` [PATCH v2 12/18] midx-write.c: introduce `midx_pack_perm()` helper Taylor Blau
2026-01-14 19:54   ` [PATCH v2 13/18] midx-write.c: extract `fill_pack_from_midx()` Taylor Blau
2026-01-14 19:54   ` [PATCH v2 14/18] midx-write.c: enumerate `pack_int_id` values directly Taylor Blau
2026-01-14 19:55   ` [PATCH v2 15/18] midx-write.c: factor fanout layering from `compute_sorted_entries()` Taylor Blau
2026-01-14 19:55   ` [PATCH v2 16/18] t/helper/test-read-midx.c: plug memory leak when selecting layer Taylor Blau
2026-01-14 19:55   ` [PATCH v2 17/18] midx: implement MIDX compaction Taylor Blau
2026-01-27  7:35     ` Patrick Steinhardt
2026-01-27 22:13       ` Taylor Blau
2026-01-14 19:55   ` [PATCH v2 18/18] midx: enable reachability bitmaps during " Taylor Blau
2026-02-20 22:24   ` [PATCH v2 00/18] midx: incremental MIDX/bitmap layer compaction Junio C Hamano
2026-02-23 14:08     ` Jeff King
2026-02-24  5:25       ` Taylor Blau [this message]
2026-02-24 18:59 ` [PATCH v3 00/17] " Taylor Blau
2026-02-24 18:59   ` [PATCH v3 01/17] midx: mark `get_midx_checksum()` arguments as const Taylor Blau
2026-02-24 18:59   ` [PATCH v3 02/17] midx: rename `get_midx_checksum()` to `midx_get_checksum_hash()` Taylor Blau
2026-02-24 18:59   ` [PATCH v3 03/17] midx: introduce `midx_get_checksum_hex()` Taylor Blau
2026-02-24 18:59   ` [PATCH v3 04/17] builtin/multi-pack-index.c: make '--progress' a common option Taylor Blau
2026-02-24 18:59   ` [PATCH v3 05/17] git-multi-pack-index(1): remove non-existent incompatibility Taylor Blau
2026-02-24 18:59   ` [PATCH v3 06/17] git-multi-pack-index(1): align SYNOPSIS with 'git multi-pack-index -h' Taylor Blau
2026-02-24 19:00   ` [PATCH v3 07/17] t/t5319-multi-pack-index.sh: fix copy-and-paste error in t5319.39 Taylor Blau
2026-02-24 19:00   ` [PATCH v3 08/17] midx-write.c: don't use `pack_perm` when assigning `bitmap_pos` Taylor Blau
2026-02-24 19:00   ` [PATCH v3 09/17] midx-write.c: introduce `struct write_midx_opts` Taylor Blau
2026-02-24 19:00   ` [PATCH v3 10/17] midx: do not require packs to be sorted in lexicographic order Taylor Blau
2026-02-24 19:00   ` [PATCH v3 11/17] midx-write.c: introduce `midx_pack_perm()` helper Taylor Blau
2026-02-24 19:00   ` [PATCH v3 12/17] midx-write.c: extract `fill_pack_from_midx()` Taylor Blau
2026-02-24 19:00   ` [PATCH v3 13/17] midx-write.c: enumerate `pack_int_id` values directly Taylor Blau
2026-02-24 19:00   ` [PATCH v3 14/17] midx-write.c: factor fanout layering from `compute_sorted_entries()` Taylor Blau
2026-02-24 19:00   ` [PATCH v3 15/17] t/helper/test-read-midx.c: plug memory leak when selecting layer Taylor Blau
2026-02-24 19:00   ` [PATCH v3 16/17] midx: implement MIDX compaction Taylor Blau
2026-02-24 19:00   ` [PATCH v3 17/17] midx: enable reachability bitmaps during " Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZ02Y170p/yEjohh@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox