git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>
Subject: Re: [PATCH 1/2] midx: fix `BUG()` when getting preferred pack without a reverse index
Date: Tue, 9 Dec 2025 19:22:11 -0500	[thread overview]
Message-ID: <aTi9M/F0sZzK/usA@nand.local> (raw)
In-Reply-To: <20251208-pks-skip-noop-rewrite-v1-1-430d52dba9f0@pks.im>

On Mon, Dec 08, 2025 at 07:27:14PM +0100, Patrick Steinhardt wrote:
> The function `midx_preferred_pack()` returns the preferred pack for a
> given multi-pack index. To compute the preferred pack we:
>
>   1. Look up the position of the first object indexed by the multi-pack
>      index.
>
>   2. Convert this position from pseudo-pack order into MIDX order.
>
>   3. We then look up pack that corresponds to this MIDX index.

I think the implementation of midx_preferred_pack() works a little bit
differently than is described here. I often get confused when working in
this area juggling between the various object/pack orderings in my head.

midx_preferred_pack() cares about converting from the first position in
pseudo-pack order back into MIDX object order. To do that, we convert
the pseudo-pack position into a MIDX one, and then lookup the pack that
represents that object.

Effectively we're doing something like:

    uint32_t pseudo_pack_pos = m->num_objects_in_base;
    uint32_t midx_pos = pack_pos_to_midx(m, pseudo_pack_pos);
    uint32_t pack_int_id = nth_midxed_pack_int_id(m, midx_pos);

> This reliably returns the preferred pack given that all of its contained
> objects will be up front in pseudo-pack order.
>
> The second step that turns the pseudo-pack order into MIDX order
> requires the reverse index though, which may not exist for example when
> the MIDX does not have a bitmap. And in that case one may easily hit a
> bug:
>
>     BUG: ../pack-revindex.c:491: pack_pos_to_midx: reverse index not yet loaded

That makes sense, we can't convert a pseudo-pack position into a
MIDX-relative position without a reverse index to tell us where that bit
goes.

> In theory, `midx_preferred_pack()` already knows to handle the case
> where no reverse index exists, as it calls `load_midx_revindex()` before
> calling into `midx_preferred_pack()`. [...]

Right, it handles this case by returning -1 to indicate that it didn't
have enough information to determine the preferred pack.

> [...] But we only check for negative
> return values there, even though the function returns a positive error
> code in case the reverse index does not exist.

Ah. It looks like that was changed in 5a6072f631d (fsck: validate .rev
file header, 2023-04-17), but it looks like the caller here did not
learn about that change. It may be worth mentioning that commit in your
patch message.

While reviewing, I wanted to make sure that there weren't any other
callers of load_midx_revindex() that were also missing this check. The
return value of that function is propagated through the two expected
functions:

    $ git grep -p load_revindex_from_disk
    pack-revindex.c=struct revindex_header {
    pack-revindex.c:static int load_revindex_from_disk(const struct git_hash_algo *algo,
    pack-revindex.c=int load_pack_revindex_from_disk(struct packed_git *p)
    pack-revindex.c:        ret = load_revindex_from_disk(p->repo->hash_algo,
    pack-revindex.c=int load_midx_revindex(struct multi_pack_index *m)
    pack-revindex.c:        ret = load_revindex_from_disk(m->source->odb->repo->hash_algo,

, and checking through the callers of those two functions, all are
prepared to handle a >0 return value.

> diff --git a/midx.c b/midx.c
> index 24e1e72175..b681b18fc1 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -686,7 +686,7 @@ int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
>  {
>  	if (m->preferred_pack_idx == -1) {
>  		uint32_t midx_pos;
> -		if (load_midx_revindex(m) < 0) {
> +		if (load_midx_revindex(m)) {
>  			m->preferred_pack_idx = -2;
>  			return -1;
>  		}
> diff --git a/pack-revindex.h b/pack-revindex.h
> index 422c2487ae..0042892091 100644
> --- a/pack-revindex.h
> +++ b/pack-revindex.h
> @@ -72,7 +72,8 @@ int verify_pack_revindex(struct packed_git *p);
>   * multi-pack index by mmap-ing it and assigning pointers in the
>   * multi_pack_index to point at it.
>   *
> - * A negative number is returned on error.
> + * A negative number is returned on error. A positive number is returned in
> + * case the multi-pack-index does not have a reverse index.

Makes sense.

> diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> index 93f319a4b2..9492a9737b 100755
> --- a/t/t5319-multi-pack-index.sh
> +++ b/t/t5319-multi-pack-index.sh
> @@ -350,7 +350,20 @@ test_expect_success 'preferred pack from existing MIDX without bitmaps' '
>  		# the new MIDX
>  		git multi-pack-index write --preferred-pack=pack-$pack.pack
>  	)
> +'
>
> +test_expect_success 'preferred pack cannot be determined without bitmap' '
> +	test_when_finished "rm -fr preferred-can-be-queried" &&
> +	git init preferred-can-be-queried &&
> +	(
> +		cd preferred-can-be-queried &&
> +		test_commit initial &&
> +		git repack -Adl --write-midx --no-write-bitmap-index &&
> +		test_must_fail test-tool read-midx --preferred-pack .git/objects 2>err &&
> +		test_grep "could not determine MIDX preferred pack" err &&

Looks good. I think that it's fine to end the test here, since we have
extensive coverage that we can determine the preferred pack when there
is a MIDX and matching bitmap, but I definitely don't feel strongly
about it.

Thanks,
Taylor

  reply	other threads:[~2025-12-10  0:22 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-08 18:27 [PATCH 0/2] builtin/repack: avoid rewriting up-to-date MIDX Patrick Steinhardt
2025-12-08 18:27 ` [PATCH 1/2] midx: fix `BUG()` when getting preferred pack without a reverse index Patrick Steinhardt
2025-12-10  0:22   ` Taylor Blau [this message]
2025-12-10  9:40     ` Patrick Steinhardt
2025-12-18 21:13       ` Taylor Blau
2025-12-08 18:27 ` [PATCH 2/2] builtin/repack: don't regenerate MIDX unless needed Patrick Steinhardt
2025-12-10  2:48   ` Taylor Blau
2025-12-10  9:40     ` Patrick Steinhardt
2025-12-10 12:52 ` [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX Patrick Steinhardt
2025-12-10 12:52   ` [PATCH v2 1/3] midx: fix `BUG()` when getting preferred pack without a reverse index Patrick Steinhardt
2025-12-10 12:52   ` [PATCH v2 2/3] midx-write: extract function to test whether MIDX needs updating Patrick Steinhardt
2025-12-10 12:52   ` [PATCH v2 3/3] midx-write: skip rewriting MIDX with `--stdin-packs` unless needed Patrick Steinhardt
2025-12-11  8:46   ` [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX Junio C Hamano
2025-12-12  7:33     ` Patrick Steinhardt
2025-12-18 21:18       ` Taylor Blau
2025-12-19  6:25         ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aTi9M/F0sZzK/usA@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).