git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, correctmost <cmlists@sent.com>
Subject: Re: [PATCH 2/9] pack-bitmap: handle name-hash lookups in incremental bitmaps
Date: Wed, 12 Nov 2025 21:55:03 -0500	[thread overview]
Message-ID: <aRVIh9R8Pnuk+yS0@nand.local> (raw)
In-Reply-To: <20251112080151.GB979063@coredump.intra.peff.net>

On Wed, Nov 12, 2025 at 03:01:51AM -0500, Jeff King wrote:
> As always with the midx and bitmap code, I am left unsure of which
> ordering it is correct to use (pseudo-pack order, or lexical oid order,
> or how each splits across incremental files). I _think_ this is right
> because it's matching the ordering that is already used for a single
> midx. But clearly this area is under-tested, since even when we did not
> go off the end of the array we were probably passing back junk
> name-hashes (either from the .bitmap file's trailing checksum, or
> zero-padding at the end of the mapped page).

Yeah, this is the right order. "index_pos" is a good hint that this is
in lexical order. bitmap_writer_finish() has some oid_pos() lookups that
use index directly without sorting, so bitmap_writer_finish() expects
this array in lexical order.

Commit c528e17966 (pack-bitmap: write multi-pack bitmaps, 2021-08-31)
has a comment in (what is now) midx-write.c explaining this assumption
in bitmap_writer_finish(), but it should probably be documented
explicitly in pack-bitmap.h.

> So it might be worth adding more tests here, but I know this incremental
> bitmap code is a big work in progress. So I contented myself with the
> reproduction above, and anything else can go onto the incremental todo
> pile. :)

Yeah, I agree. The only hash-cache test that I could think of is from
t5326, which tests that we can propagate existing name-hash values from
a pack bitmap in to a MIDX one. We probably need an equivalent for when
writing an incremental MIDX/bitmap too. #leftoverbits

>  pack-bitmap.c | 27 +++++++++++++++++++++++----
>  1 file changed, 23 insertions(+), 4 deletions(-)
>
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 291e1a9cf4..710b86a451 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -213,6 +213,26 @@ static uint32_t bitmap_num_objects(struct bitmap_index *index)
>  	return index->pack->num_objects;
>  }
>
> +static uint32_t bitmap_name_hash(struct bitmap_index *index, uint32_t pos)
> +{
> +	if (bitmap_is_midx(index)) {
> +		while (index && pos < index->midx->num_objects_in_base)
> +			index = index->base;

Looks good. It's too bad that we have to reimplement something very
similar to midx_for_object(), but I agree with what you wrote in the
patch message and this faithfully captures that. It might be worth doing
something like:

    while (index && pos < index->midx->num_objects_in_base) {
        ASSERT(bitmap_is_midx(index));
        index = index->base;
    }

, which should never trigger, but is a good sanity check. Definitely not
worth re-rolling IMHO.

> +
> +		if (!index)
> +			BUG("NULL base bitmap for object position: %"PRIu32, pos);
> +
> +		pos -= index->midx->num_objects_in_base;
> +		if (pos >= index->midx->num_objects)
> +			BUG("out-of-bounds midx bitmap object at %"PRIu32, pos);

midx_for_object() spells this portion slightly differently, but what you
have here is still good.

> +	}
> +
> +	if (!index->hashes)
> +		return 0;
> +
> +	return get_be32(index->hashes + pos);

We *could* double check that that offset is within bounds of
index->map_size, and I think that is ultimately worth doing at some
point. But I think that stopping where you did makes sense, since it
does the minimal thing to fix this bug.

Thanks,
Taylor

  parent reply	other threads:[~2025-11-13  2:55 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-12  7:55 [PATCH 0/9] asan bonanza Jeff King
2025-11-12  7:56 ` [PATCH 1/9] compat/mmap: mark unused argument in git_munmap() Jeff King
2025-11-12  8:01 ` [PATCH 2/9] pack-bitmap: handle name-hash lookups in incremental bitmaps Jeff King
2025-11-12 11:25   ` Patrick Steinhardt
2025-11-13  2:55   ` Taylor Blau [this message]
2025-11-18  8:59     ` Jeff King
2025-11-12  8:02 ` [PATCH 3/9] Makefile: turn on NO_MMAP when building with ASan Jeff King
2025-11-12  8:17   ` Collin Funk
2025-11-12 10:31     ` Jeff King
2025-11-12 20:06       ` Collin Funk
2025-11-12 11:26   ` Patrick Steinhardt
2025-11-13  3:12     ` Taylor Blau
2025-11-13  6:34       ` Patrick Steinhardt
2025-11-18  8:49       ` Jeff King
2025-11-13 16:30     ` Junio C Hamano
2025-11-14  7:00       ` Patrick Steinhardt
2025-11-15  2:13         ` Jeff King
2025-11-12  8:05 ` [PATCH 4/9] cache-tree: avoid strtol() on non-string buffer Jeff King
2025-11-12 11:26   ` Patrick Steinhardt
2025-11-13  3:09     ` Taylor Blau
2025-11-18  8:40       ` Jeff King
2025-11-18  8:38     ` Jeff King
2025-11-12  8:06 ` [PATCH 5/9] fsck: assert newline presence in fsck_ident() Jeff King
2025-11-12  8:06 ` [PATCH 6/9] fsck: avoid strcspn() " Jeff King
2025-11-12  8:06 ` [PATCH 7/9] fsck: remove redundant date timestamp check Jeff King
2025-11-12  8:10 ` [PATCH 8/9] fsck: avoid parse_timestamp() on buffer that isn't NUL-terminated Jeff King
2025-11-12 11:25   ` Patrick Steinhardt
2025-11-12 19:36     ` Junio C Hamano
2025-11-15  2:12     ` Jeff King
2025-11-12  8:10 ` [PATCH 9/9] t: enable ASan's strict_string_checks option Jeff King
2025-11-13  3:17 ` [PATCH 0/9] asan bonanza Taylor Blau
2025-11-18  9:11 ` [PATCH v2 " Jeff King
2025-11-18  9:11   ` [PATCH v2 1/9] compat/mmap: mark unused argument in git_munmap() Jeff King
2025-11-18  9:12   ` [PATCH v2 2/9] pack-bitmap: handle name-hash lookups in incremental bitmaps Jeff King
2025-11-18  9:12   ` [PATCH v2 3/9] Makefile: turn on NO_MMAP when building with ASan Jeff King
2025-11-18  9:12   ` [PATCH v2 4/9] cache-tree: avoid strtol() on non-string buffer Jeff King
2025-11-18 14:30     ` Phillip Wood
2025-11-23  6:19       ` Junio C Hamano
2025-11-23 15:51         ` Phillip Wood
2025-11-23 18:06           ` Junio C Hamano
2025-11-24 22:30         ` Jeff King
2025-11-24 23:09           ` Junio C Hamano
2025-11-26 15:09             ` Jeff King
2025-11-26 17:22               ` Junio C Hamano
2025-11-30 13:13                 ` [PATCH 0/4] more robust functions for parsing int from buf Jeff King
2025-11-30 13:14                   ` [PATCH 1/4] parse: prefer bool to int for boolean returns Jeff King
2025-12-04 11:23                     ` Patrick Steinhardt
2025-11-30 13:15                   ` [PATCH 2/4] parse: add functions for parsing from non-string buffers Jeff King
2025-11-30 13:46                     ` my complaints with clar Jeff King
2025-12-01 14:16                       ` Phillip Wood
2025-12-04 11:09                         ` Patrick Steinhardt
2025-12-05 18:30                           ` Jeff King
2025-12-04 11:23                     ` [PATCH 2/4] parse: add functions for parsing from non-string buffers Patrick Steinhardt
2025-12-05 16:11                     ` Phillip Wood
2025-11-30 13:15                   ` [PATCH 3/4] cache-tree: use parse_int_from_buf() Jeff King
2025-11-30 13:16                   ` [PATCH 4/4] fsck: use parse_unsigned_from_buf() for parsing timestamp Jeff King
2025-11-18  9:12   ` [PATCH v2 5/9] fsck: assert newline presence in fsck_ident() Jeff King
2025-11-18  9:12   ` [PATCH v2 6/9] fsck: avoid strcspn() " Jeff King
2025-11-18  9:12   ` [PATCH v2 7/9] fsck: remove redundant date timestamp check Jeff King
2025-11-18  9:12   ` [PATCH v2 8/9] fsck: avoid parse_timestamp() on buffer that isn't NUL-terminated Jeff King
2025-11-18  9:12   ` [PATCH v2 9/9] t: enable ASan's strict_string_checks option Jeff King
2025-11-23  5:49   ` [PATCH v2 0/9] asan bonanza Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aRVIh9R8Pnuk+yS0@nand.local \
    --to=me@ttaylorr.com \
    --cc=cmlists@sent.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).