From: Taylor Blau <me@ttaylorr.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>
Subject: Re: [PATCH 1/2] midx: fix `BUG()` when getting preferred pack without a reverse index
Date: Tue, 9 Dec 2025 19:22:11 -0500 [thread overview]
Message-ID: <aTi9M/F0sZzK/usA@nand.local> (raw)
In-Reply-To: <20251208-pks-skip-noop-rewrite-v1-1-430d52dba9f0@pks.im>
On Mon, Dec 08, 2025 at 07:27:14PM +0100, Patrick Steinhardt wrote:
> The function `midx_preferred_pack()` returns the preferred pack for a
> given multi-pack index. To compute the preferred pack we:
>
> 1. Look up the position of the first object indexed by the multi-pack
> index.
>
> 2. Convert this position from pseudo-pack order into MIDX order.
>
> 3. We then look up pack that corresponds to this MIDX index.
I think the implementation of midx_preferred_pack() works a little bit
differently than is described here. I often get confused when working in
this area juggling between the various object/pack orderings in my head.
midx_preferred_pack() cares about converting from the first position in
pseudo-pack order back into MIDX object order. To do that, we convert
the pseudo-pack position into a MIDX one, and then lookup the pack that
represents that object.
Effectively we're doing something like:
uint32_t pseudo_pack_pos = m->num_objects_in_base;
uint32_t midx_pos = pack_pos_to_midx(m, pseudo_pack_pos);
uint32_t pack_int_id = nth_midxed_pack_int_id(m, midx_pos);
> This reliably returns the preferred pack given that all of its contained
> objects will be up front in pseudo-pack order.
>
> The second step that turns the pseudo-pack order into MIDX order
> requires the reverse index though, which may not exist for example when
> the MIDX does not have a bitmap. And in that case one may easily hit a
> bug:
>
> BUG: ../pack-revindex.c:491: pack_pos_to_midx: reverse index not yet loaded
That makes sense, we can't convert a pseudo-pack position into a
MIDX-relative position without a reverse index to tell us where that bit
goes.
> In theory, `midx_preferred_pack()` already knows to handle the case
> where no reverse index exists, as it calls `load_midx_revindex()` before
> calling into `midx_preferred_pack()`. [...]
Right, it handles this case by returning -1 to indicate that it didn't
have enough information to determine the preferred pack.
> [...] But we only check for negative
> return values there, even though the function returns a positive error
> code in case the reverse index does not exist.
Ah. It looks like that was changed in 5a6072f631d (fsck: validate .rev
file header, 2023-04-17), but it looks like the caller here did not
learn about that change. It may be worth mentioning that commit in your
patch message.
While reviewing, I wanted to make sure that there weren't any other
callers of load_midx_revindex() that were also missing this check. The
return value of that function is propagated through the two expected
functions:
$ git grep -p load_revindex_from_disk
pack-revindex.c=struct revindex_header {
pack-revindex.c:static int load_revindex_from_disk(const struct git_hash_algo *algo,
pack-revindex.c=int load_pack_revindex_from_disk(struct packed_git *p)
pack-revindex.c: ret = load_revindex_from_disk(p->repo->hash_algo,
pack-revindex.c=int load_midx_revindex(struct multi_pack_index *m)
pack-revindex.c: ret = load_revindex_from_disk(m->source->odb->repo->hash_algo,
, and checking through the callers of those two functions, all are
prepared to handle a >0 return value.
> diff --git a/midx.c b/midx.c
> index 24e1e72175..b681b18fc1 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -686,7 +686,7 @@ int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
> {
> if (m->preferred_pack_idx == -1) {
> uint32_t midx_pos;
> - if (load_midx_revindex(m) < 0) {
> + if (load_midx_revindex(m)) {
> m->preferred_pack_idx = -2;
> return -1;
> }
> diff --git a/pack-revindex.h b/pack-revindex.h
> index 422c2487ae..0042892091 100644
> --- a/pack-revindex.h
> +++ b/pack-revindex.h
> @@ -72,7 +72,8 @@ int verify_pack_revindex(struct packed_git *p);
> * multi-pack index by mmap-ing it and assigning pointers in the
> * multi_pack_index to point at it.
> *
> - * A negative number is returned on error.
> + * A negative number is returned on error. A positive number is returned in
> + * case the multi-pack-index does not have a reverse index.
Makes sense.
> diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> index 93f319a4b2..9492a9737b 100755
> --- a/t/t5319-multi-pack-index.sh
> +++ b/t/t5319-multi-pack-index.sh
> @@ -350,7 +350,20 @@ test_expect_success 'preferred pack from existing MIDX without bitmaps' '
> # the new MIDX
> git multi-pack-index write --preferred-pack=pack-$pack.pack
> )
> +'
>
> +test_expect_success 'preferred pack cannot be determined without bitmap' '
> + test_when_finished "rm -fr preferred-can-be-queried" &&
> + git init preferred-can-be-queried &&
> + (
> + cd preferred-can-be-queried &&
> + test_commit initial &&
> + git repack -Adl --write-midx --no-write-bitmap-index &&
> + test_must_fail test-tool read-midx --preferred-pack .git/objects 2>err &&
> + test_grep "could not determine MIDX preferred pack" err &&
Looks good. I think that it's fine to end the test here, since we have
extensive coverage that we can determine the preferred pack when there
is a MIDX and matching bitmap, but I definitely don't feel strongly
about it.
Thanks,
Taylor
next prev parent reply other threads:[~2025-12-10 0:22 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-08 18:27 [PATCH 0/2] builtin/repack: avoid rewriting up-to-date MIDX Patrick Steinhardt
2025-12-08 18:27 ` [PATCH 1/2] midx: fix `BUG()` when getting preferred pack without a reverse index Patrick Steinhardt
2025-12-10 0:22 ` Taylor Blau [this message]
2025-12-10 9:40 ` Patrick Steinhardt
2025-12-18 21:13 ` Taylor Blau
2025-12-08 18:27 ` [PATCH 2/2] builtin/repack: don't regenerate MIDX unless needed Patrick Steinhardt
2025-12-10 2:48 ` Taylor Blau
2025-12-10 9:40 ` Patrick Steinhardt
2025-12-10 12:52 ` [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX Patrick Steinhardt
2025-12-10 12:52 ` [PATCH v2 1/3] midx: fix `BUG()` when getting preferred pack without a reverse index Patrick Steinhardt
2025-12-10 12:52 ` [PATCH v2 2/3] midx-write: extract function to test whether MIDX needs updating Patrick Steinhardt
2025-12-10 12:52 ` [PATCH v2 3/3] midx-write: skip rewriting MIDX with `--stdin-packs` unless needed Patrick Steinhardt
2025-12-11 8:46 ` [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX Junio C Hamano
2025-12-12 7:33 ` Patrick Steinhardt
2025-12-18 21:18 ` Taylor Blau
2025-12-19 6:25 ` Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aTi9M/F0sZzK/usA@nand.local \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).