All of lore.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>
Subject: Re: [PATCH 1/2] midx: fix `BUG()` when getting preferred pack without a reverse index
Date: Tue, 9 Dec 2025 19:22:11 -0500	[thread overview]
Message-ID: <aTi9M/F0sZzK/usA@nand.local> (raw)
In-Reply-To: <20251208-pks-skip-noop-rewrite-v1-1-430d52dba9f0@pks.im>

On Mon, Dec 08, 2025 at 07:27:14PM +0100, Patrick Steinhardt wrote:
> The function `midx_preferred_pack()` returns the preferred pack for a
> given multi-pack index. To compute the preferred pack we:
>
>   1. Look up the position of the first object indexed by the multi-pack
>      index.
>
>   2. Convert this position from pseudo-pack order into MIDX order.
>
>   3. We then look up pack that corresponds to this MIDX index.

I think the implementation of midx_preferred_pack() works a little bit
differently than is described here. I often get confused when working in
this area juggling between the various object/pack orderings in my head.

midx_preferred_pack() cares about converting from the first position in
pseudo-pack order back into MIDX object order. To do that, we convert
the pseudo-pack position into a MIDX one, and then lookup the pack that
represents that object.

Effectively we're doing something like:

    uint32_t pseudo_pack_pos = m->num_objects_in_base;
    uint32_t midx_pos = pack_pos_to_midx(m, pseudo_pack_pos);
    uint32_t pack_int_id = nth_midxed_pack_int_id(m, midx_pos);

> This reliably returns the preferred pack given that all of its contained
> objects will be up front in pseudo-pack order.
>
> The second step that turns the pseudo-pack order into MIDX order
> requires the reverse index though, which may not exist for example when
> the MIDX does not have a bitmap. And in that case one may easily hit a
> bug:
>
>     BUG: ../pack-revindex.c:491: pack_pos_to_midx: reverse index not yet loaded

That makes sense, we can't convert a pseudo-pack position into a
MIDX-relative position without a reverse index to tell us where that bit
goes.

> In theory, `midx_preferred_pack()` already knows to handle the case
> where no reverse index exists, as it calls `load_midx_revindex()` before
> calling into `midx_preferred_pack()`. [...]

Right, it handles this case by returning -1 to indicate that it didn't
have enough information to determine the preferred pack.

> [...] But we only check for negative
> return values there, even though the function returns a positive error
> code in case the reverse index does not exist.

Ah. It looks like that was changed in 5a6072f631d (fsck: validate .rev
file header, 2023-04-17), but it looks like the caller here did not
learn about that change. It may be worth mentioning that commit in your
patch message.

While reviewing, I wanted to make sure that there weren't any other
callers of load_midx_revindex() that were also missing this check. The
return value of that function is propagated through the two expected
functions:

    $ git grep -p load_revindex_from_disk
    pack-revindex.c=struct revindex_header {
    pack-revindex.c:static int load_revindex_from_disk(const struct git_hash_algo *algo,
    pack-revindex.c=int load_pack_revindex_from_disk(struct packed_git *p)
    pack-revindex.c:        ret = load_revindex_from_disk(p->repo->hash_algo,
    pack-revindex.c=int load_midx_revindex(struct multi_pack_index *m)
    pack-revindex.c:        ret = load_revindex_from_disk(m->source->odb->repo->hash_algo,

, and checking through the callers of those two functions, all are
prepared to handle a >0 return value.

> diff --git a/midx.c b/midx.c
> index 24e1e72175..b681b18fc1 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -686,7 +686,7 @@ int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
>  {
>  	if (m->preferred_pack_idx == -1) {
>  		uint32_t midx_pos;
> -		if (load_midx_revindex(m) < 0) {
> +		if (load_midx_revindex(m)) {
>  			m->preferred_pack_idx = -2;
>  			return -1;
>  		}
> diff --git a/pack-revindex.h b/pack-revindex.h
> index 422c2487ae..0042892091 100644
> --- a/pack-revindex.h
> +++ b/pack-revindex.h
> @@ -72,7 +72,8 @@ int verify_pack_revindex(struct packed_git *p);
>   * multi-pack index by mmap-ing it and assigning pointers in the
>   * multi_pack_index to point at it.
>   *
> - * A negative number is returned on error.
> + * A negative number is returned on error. A positive number is returned in
> + * case the multi-pack-index does not have a reverse index.

Makes sense.

> diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> index 93f319a4b2..9492a9737b 100755
> --- a/t/t5319-multi-pack-index.sh
> +++ b/t/t5319-multi-pack-index.sh
> @@ -350,7 +350,20 @@ test_expect_success 'preferred pack from existing MIDX without bitmaps' '
>  		# the new MIDX
>  		git multi-pack-index write --preferred-pack=pack-$pack.pack
>  	)
> +'
>
> +test_expect_success 'preferred pack cannot be determined without bitmap' '
> +	test_when_finished "rm -fr preferred-can-be-queried" &&
> +	git init preferred-can-be-queried &&
> +	(
> +		cd preferred-can-be-queried &&
> +		test_commit initial &&
> +		git repack -Adl --write-midx --no-write-bitmap-index &&
> +		test_must_fail test-tool read-midx --preferred-pack .git/objects 2>err &&
> +		test_grep "could not determine MIDX preferred pack" err &&

Looks good. I think that it's fine to end the test here, since we have
extensive coverage that we can determine the preferred pack when there
is a MIDX and matching bitmap, but I definitely don't feel strongly
about it.

Thanks,
Taylor

  reply	other threads:[~2025-12-10  0:22 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-08 18:27 [PATCH 0/2] builtin/repack: avoid rewriting up-to-date MIDX Patrick Steinhardt
2025-12-08 18:27 ` [PATCH 1/2] midx: fix `BUG()` when getting preferred pack without a reverse index Patrick Steinhardt
2025-12-10  0:22   ` Taylor Blau [this message]
2025-12-10  9:40     ` Patrick Steinhardt
2025-12-18 21:13       ` Taylor Blau
2025-12-08 18:27 ` [PATCH 2/2] builtin/repack: don't regenerate MIDX unless needed Patrick Steinhardt
2025-12-10  2:48   ` Taylor Blau
2025-12-10  9:40     ` Patrick Steinhardt
2025-12-10 12:52 ` [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX Patrick Steinhardt
2025-12-10 12:52   ` [PATCH v2 1/3] midx: fix `BUG()` when getting preferred pack without a reverse index Patrick Steinhardt
2025-12-10 12:52   ` [PATCH v2 2/3] midx-write: extract function to test whether MIDX needs updating Patrick Steinhardt
2025-12-10 12:52   ` [PATCH v2 3/3] midx-write: skip rewriting MIDX with `--stdin-packs` unless needed Patrick Steinhardt
2025-12-11  8:46   ` [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX Junio C Hamano
2025-12-12  7:33     ` Patrick Steinhardt
2025-12-18 21:18       ` Taylor Blau
2025-12-19  6:25         ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aTi9M/F0sZzK/usA@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.