* Re: [PATCH] pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse
2024-06-09 15:27 [PATCH] pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse Taylor Blau
@ 2024-06-10 5:55 ` Patrick Steinhardt
2024-06-10 14:57 ` Taylor Blau
2024-06-10 20:10 ` [PATCH v2 0/3] midx: various brown paper bag fixes Taylor Blau
2024-06-11 17:28 ` [PATCH v2 0/3] midx: various brown paper bag fixes Taylor Blau
2 siblings, 1 reply; 15+ messages in thread
From: Patrick Steinhardt @ 2024-06-10 5:55 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano, Kyle Lippincott
[-- Attachment #1: Type: text/plain, Size: 2068 bytes --]
On Sun, Jun 09, 2024 at 11:27:35AM -0400, Taylor Blau wrote:
> In 795006fff4 (pack-bitmap: gracefully handle missing BTMP chunks,
> 2024-04-15), we refactored the reuse_partial_packfile_from_bitmap()
> function and stopped assigning the pack_int_id field when reusing only
> the MIDX's preferred pack. This results in an uninitialized read down in
> try_partial_reuse() like so:
I feel like I'm blind, but I cannot see how the patch changed what we do
with `pack_int_id`. It's not mentioned a single time in the diff, so how
did it have the effect of not setting it anymore?
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index fe8e8a51d3..8b9a2c698f 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -2073,6 +2073,7 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
> QSORT(packs, packs_nr, bitmapped_pack_cmp);
> } else {
> struct packed_git *pack;
> + uint32_t pack_int_id;
>
> if (bitmap_is_midx(bitmap_git)) {
> uint32_t preferred_pack_pos;
> @@ -2083,12 +2084,21 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
> }
>
> pack = bitmap_git->midx->packs[preferred_pack_pos];
> + pack_int_id = preferred_pack_pos;
> } else {
> pack = bitmap_git->pack;
> + /*
> + * Any value for 'pack_int_id' will do here. When we
> + * process the pack via try_partial_reuse(), we won't
> + * use the `pack_int_id` field since we have a non-MIDX
> + * bitmap.
> + */
> + pack_int_id = 0;
> }
>
> ALLOC_GROW(packs, packs_nr + 1, packs_alloc);
> packs[packs_nr].p = pack;
> + packs[packs_nr].pack_int_id = pack_int_id;
> packs[packs_nr].bitmap_nr = pack->num_objects;
> packs[packs_nr].bitmap_pos = 0;
Okay, the patch looks trivial enough. I was wondering whether we might
want to `memset(&packs[packs_nr], 0, sizeof(packs[packs_nr]))` as it
feels rather easy to miss initialization of one of the members. But on
the other hand, this might also cause us to paper over bugs if we did
that.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse
2024-06-10 5:55 ` Patrick Steinhardt
@ 2024-06-10 14:57 ` Taylor Blau
2024-06-11 8:12 ` Patrick Steinhardt
0 siblings, 1 reply; 15+ messages in thread
From: Taylor Blau @ 2024-06-10 14:57 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Jeff King, Junio C Hamano, Kyle Lippincott
On Mon, Jun 10, 2024 at 07:55:46AM +0200, Patrick Steinhardt wrote:
> On Sun, Jun 09, 2024 at 11:27:35AM -0400, Taylor Blau wrote:
> > In 795006fff4 (pack-bitmap: gracefully handle missing BTMP chunks,
> > 2024-04-15), we refactored the reuse_partial_packfile_from_bitmap()
> > function and stopped assigning the pack_int_id field when reusing only
> > the MIDX's preferred pack. This results in an uninitialized read down in
> > try_partial_reuse() like so:
>
> I feel like I'm blind, but I cannot see how the patch changed what we do
> with `pack_int_id`. It's not mentioned a single time in the diff, so how
> did it have the effect of not setting it anymore?
It's because prior to 795006fff4, we handled reusing a single pack from
a MIDX differently than in the post-image of that commit. Prior to
795006fff4, the loop looked like:
if (bitmap_is_midx(bitmap_git)) {
for (i = 0; i < bitmap_git->midx->num_packs; i++) {
struct bitmapped_pack pack;
if (nth_bitmapped_pack(r, bitmap_git->midx, &pack, i) < 0) {
/* ... */
return;
}
if (!pack.bitmap_nr)
continue;
if (!multi_pack_reuse && pack.bitmap_pos)
continue;
ALLOC_GROW(packs, packs_nr + 1, packs_alloc);
memcpy(&packs[packs_nr++], &pack, sizeof(pack));
}
}
Since nth_bitmapped_pack() fills out the pack_int_id field, we got it
automatically since we just memcpy()'d the result of
nth_bitmapped_pack() into our array.
In the single pack bitmap case, we don't need to initialize the
pack_int_id field because we never read it, hence the lack of MSan
failures in that mode.
But since 795006fff4 combined these two single pack cases (that is,
single-pack bitmaps, and reusing only a single pack out of a MIDX
bitmap) into one, 795006fff4 neglected to initialize the pack_int_id
field, causing this issue.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse
2024-06-10 14:57 ` Taylor Blau
@ 2024-06-11 8:12 ` Patrick Steinhardt
0 siblings, 0 replies; 15+ messages in thread
From: Patrick Steinhardt @ 2024-06-11 8:12 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano, Kyle Lippincott
[-- Attachment #1: Type: text/plain, Size: 2291 bytes --]
On Mon, Jun 10, 2024 at 10:57:54AM -0400, Taylor Blau wrote:
> On Mon, Jun 10, 2024 at 07:55:46AM +0200, Patrick Steinhardt wrote:
> > On Sun, Jun 09, 2024 at 11:27:35AM -0400, Taylor Blau wrote:
> > > In 795006fff4 (pack-bitmap: gracefully handle missing BTMP chunks,
> > > 2024-04-15), we refactored the reuse_partial_packfile_from_bitmap()
> > > function and stopped assigning the pack_int_id field when reusing only
> > > the MIDX's preferred pack. This results in an uninitialized read down in
> > > try_partial_reuse() like so:
> >
> > I feel like I'm blind, but I cannot see how the patch changed what we do
> > with `pack_int_id`. It's not mentioned a single time in the diff, so how
> > did it have the effect of not setting it anymore?
>
> It's because prior to 795006fff4, we handled reusing a single pack from
> a MIDX differently than in the post-image of that commit. Prior to
> 795006fff4, the loop looked like:
>
> if (bitmap_is_midx(bitmap_git)) {
> for (i = 0; i < bitmap_git->midx->num_packs; i++) {
> struct bitmapped_pack pack;
> if (nth_bitmapped_pack(r, bitmap_git->midx, &pack, i) < 0) {
> /* ... */
> return;
> }
> if (!pack.bitmap_nr)
> continue;
> if (!multi_pack_reuse && pack.bitmap_pos)
> continue;
>
> ALLOC_GROW(packs, packs_nr + 1, packs_alloc);
> memcpy(&packs[packs_nr++], &pack, sizeof(pack));
> }
> }
>
> Since nth_bitmapped_pack() fills out the pack_int_id field, we got it
> automatically since we just memcpy()'d the result of
> nth_bitmapped_pack() into our array.
>
> In the single pack bitmap case, we don't need to initialize the
> pack_int_id field because we never read it, hence the lack of MSan
> failures in that mode.
>
> But since 795006fff4 combined these two single pack cases (that is,
> single-pack bitmaps, and reusing only a single pack out of a MIDX
> bitmap) into one, 795006fff4 neglected to initialize the pack_int_id
> field, causing this issue.
Makes sense, thanks for the explanation!
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 0/3] midx: various brown paper bag fixes
2024-06-09 15:27 [PATCH] pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse Taylor Blau
2024-06-10 5:55 ` Patrick Steinhardt
@ 2024-06-10 20:10 ` Taylor Blau
2024-06-10 20:10 ` [PATCH v2 1/3] midx-write.c: do not read existing MIDX with `packs_to_include` Taylor Blau
` (2 more replies)
2024-06-11 17:28 ` [PATCH v2 0/3] midx: various brown paper bag fixes Taylor Blau
2 siblings, 3 replies; 15+ messages in thread
From: Taylor Blau @ 2024-06-10 20:10 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Kyle Lippincott, Patrick Steinhardt
This series fixes a couple of brown paper bag bugs in the MIDX/bitmap
code.
This started as a single patch to address the MSan issue pointing to
uninitialized memory being used as a pack ID when trying to reject
cross-pack deltas during verbatim pack reuse.
The latter two patches of this series address that issue. The middle
patch squashes the bug and hardens against further regressions in that
area with a test. The last patch further hardens us by ensuring that the
"pack" input to midx_key_to_pack_pos() is bounded by the number of packs
in the MIDX.
The first patch in this series fixes another bug I noticed while writing
the test in the second patch which bisects to my d6a8c58675
(midx-write.c: support reading an existing MIDX with `packs_to_include`,
2024-05-29).
I would like to address the more fundamental issues that are described
in the first patch, but I think the reality is that doing so is much
more involved than a short series during week 6 of the release cycle
warrants. So instead, the first patch is a revert of d6a8c58675 (plus a
test to demonstrate the issue with it) which buys us some time to teach
the MIDX code how to handle not every pack being carried forward from an
existing MIDX.
Thanks in advance for your review, and sorry for these embarrassing
bugs.
Taylor Blau (3):
midx-write.c: do not read existing MIDX with `packs_to_include`
pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse
pack-revindex.c: guard against out-of-bounds pack lookups
midx-write.c | 42 ++++++++++++++++++++++++++---------
pack-bitmap.c | 10 +++++++++
pack-revindex.c | 3 +++
t/t5326-multi-pack-bitmaps.sh | 30 +++++++++++++++++++++++++
t/t5332-multi-pack-reuse.sh | 26 ++++++++++++++++++++++
5 files changed, 100 insertions(+), 11 deletions(-)
Range-diff against v1:
1: 4aceb9233e ! 1: 0bdf925366 pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse
@@ Metadata
Author: Taylor Blau <me@ttaylorr.com>
## Commit message ##
- pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse
-
- When performing multi-pack reuse, reuse_partial_packfile_from_bitmap()
- is responsible for generating an array of bitmapped_pack structs from
- which to perform reuse.
-
- In the multi-pack case, we loop over the MIDXs packs and copy the result
- of calling `nth_bitmapped_pack()` to construct the list of reusable
- paths.
-
- But we may also want to do pack-reuse over a single pack, either because
- we only had one pack to perform reuse over (in the case of single-pack
- bitmaps), or because we explicitly asked to do single pack reuse even
- with a MIDX[^1].
-
- When this is the case, the array we generate of reusable packs contains
- only a single element, which is either (a) the pack attached to the
- single-pack bitmap, or (b) the MIDX's preferred pack.
-
- In 795006fff4 (pack-bitmap: gracefully handle missing BTMP chunks,
- 2024-04-15), we refactored the reuse_partial_packfile_from_bitmap()
- function and stopped assigning the pack_int_id field when reusing only
- the MIDX's preferred pack. This results in an uninitialized read down in
- try_partial_reuse() like so:
-
- ==7474==WARNING: MemorySanitizer: use-of-uninitialized-value
- #0 0x55c5cd191dde in try_partial_reuse pack-bitmap.c:1887:8
- #1 0x55c5cd191dde in reuse_partial_packfile_from_bitmap_1 pack-bitmap.c:2001:8
- #2 0x55c5cd191dde in reuse_partial_packfile_from_bitmap pack-bitmap.c:2105:3
- #3 0x55c5cce0bd0e in get_object_list_from_bitmap builtin/pack-objects.c:4043:3
- #4 0x55c5cce0bd0e in get_object_list builtin/pack-objects.c:4156:27
- #5 0x55c5cce0bd0e in cmd_pack_objects builtin/pack-objects.c:4596:3
- #6 0x55c5ccc8fac8 in run_builtin git.c:474:11
-
- which happens when try_partial_reuse() tries to call
- midx_pair_to_pack_pos() when it tries to reject cross-pack deltas.
-
- Avoid the uninitialized read by ensuring that the pack_int_id field is
- set in the single-pack reuse case by setting it to either the MIDX
- preferred pack's pack_int_id, or '0', in the case of single-pack
- bitmaps. In the latter case, we never read the pack_int_id field, so
- the choice of '0' is arbitrary.
-
- [^1]: This can happen for a couple of reasons, either because the
- repository is configured with 'pack.allowPackReuse=(true|single)', or
- because the MIDX was generated prior to the introduction of the BTMP
- chunk, which contains information necessary to perform multi-pack
- reuse.
-
- Reported-by: Kyle Lippincott <spectral@google.com>
+ midx-write.c: do not read existing MIDX with `packs_to_include`
+
+ Commit d6a8c58675 (midx-write.c: support reading an existing MIDX with
+ `packs_to_include`, 2024-05-29) changed the MIDX generation machinery to
+ support reading from an existing MIDX when writing a new one.
+
+ Unfortunately, the rest of the MIDX generation machinery is not prepared
+ to deal with such a change. For instance, the function responsible for
+ adding to the object ID fanout table from a MIDX source
+ (midx_fanout_add_midx_fanout()) will gladly add objects from an existing
+ MIDX for some fanout level regardless of whether or not those objects
+ came from packs that are to be included in the subsequent MIDX write.
+
+ This results in broken pseudo-pack object order (leading to incorrect
+ object traversal results) and segmentation faults, like so (generated by
+ running the added test prior to the changes in midx-write.c):
+
+ #0 0x000055ee31393f47 in midx_pack_order (ctx=0x7ffdde205c70) at midx-write.c:590
+ #1 0x000055ee31395a69 in write_midx_internal (object_dir=0x55ee32570440 ".git/objects",
+ packs_to_include=0x7ffdde205e20, packs_to_drop=0x0, preferred_pack_name=0x0,
+ refs_snapshot=0x0, flags=15) at midx-write.c:1171
+ #2 0x000055ee31395f38 in write_midx_file_only (object_dir=0x55ee32570440 ".git/objects",
+ packs_to_include=0x7ffdde205e20, preferred_pack_name=0x0, refs_snapshot=0x0, flags=15)
+ at midx-write.c:1274
+ [...]
+
+ In stack frame #0, the code on midx-write.c:590 is using the new pack ID
+ corresponding to some object which was added from the existing MIDX.
+ Importantly, the pack from which that object was selected in the
+ existing MIDX does not appear in the new MIDX as it was excluded via
+ `--stdin-packs`.
+
+ In this instance, the pack in question had pack ID "1" in the existing
+ MIDX, but since it was excluded from the new MIDX, we never filled in
+ that entry in the pack_perm table, resulting in:
+
+ (gdb) p *ctx->pack_perm@2
+ $1 = {0, 1515870810}
+
+ Which is what causes the segfault above when we try and read:
+
+ struct pack_info *pack = &ctx->info[ctx->pack_perm[i]];
+ if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
+ pack->bitmap_pos = 0;
+
+ Fundamentally, we should be able to read information from an existing
+ MIDX when generating a new one. But in practice the midx-write.c code
+ assumes that we won't run into issues like the above with incongruent
+ pack IDs, and often makes those assumptions in extremely subtle and
+ fragile ways.
+
+ Instead, let's avoid reading from an existing MIDX altogether, and stick
+ with the pre-d6a8c58675 implementation. Harden against any regressions
+ in this area by adding a test which demonstrates these issues.
+
Signed-off-by: Taylor Blau <me@ttaylorr.com>
- ## pack-bitmap.c ##
-@@ pack-bitmap.c: void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
- QSORT(packs, packs_nr, bitmapped_pack_cmp);
- } else {
- struct packed_git *pack;
-+ uint32_t pack_int_id;
-
- if (bitmap_is_midx(bitmap_git)) {
- uint32_t preferred_pack_pos;
-@@ pack-bitmap.c: void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
- }
-
- pack = bitmap_git->midx->packs[preferred_pack_pos];
-+ pack_int_id = preferred_pack_pos;
- } else {
- pack = bitmap_git->pack;
-+ /*
-+ * Any value for 'pack_int_id' will do here. When we
-+ * process the pack via try_partial_reuse(), we won't
-+ * use the `pack_int_id` field since we have a non-MIDX
-+ * bitmap.
-+ */
-+ pack_int_id = 0;
- }
-
- ALLOC_GROW(packs, packs_nr + 1, packs_alloc);
- packs[packs_nr].p = pack;
-+ packs[packs_nr].pack_int_id = pack_int_id;
- packs[packs_nr].bitmap_nr = pack->num_objects;
- packs[packs_nr].bitmap_pos = 0;
+ ## midx-write.c ##
+@@ midx-write.c: struct write_midx_context {
+ };
+ static int should_include_pack(const struct write_midx_context *ctx,
+- const char *file_name,
+- int exclude_from_midx)
++ const char *file_name)
+ {
+- if (exclude_from_midx && ctx->m && midx_contains_pack(ctx->m, file_name))
++ /*
++ * Note that at most one of ctx->m and ctx->to_include are set,
++ * so we are testing midx_contains_pack() and
++ * string_list_has_string() independently (guarded by the
++ * appropriate NULL checks).
++ *
++ * We could support passing to_include while reusing an existing
++ * MIDX, but don't currently since the reuse process drags
++ * forward all packs from an existing MIDX (without checking
++ * whether or not they appear in the to_include list).
++ *
++ * If we added support for that, these next two conditional
++ * should be performed independently (likely checking
++ * to_include before the existing MIDX).
++ */
++ if (ctx->m && midx_contains_pack(ctx->m, file_name))
+ return 0;
+- if (ctx->to_include && !string_list_has_string(ctx->to_include,
+- file_name))
++ else if (ctx->to_include &&
++ !string_list_has_string(ctx->to_include, file_name))
+ return 0;
+ return 1;
+ }
+@@ midx-write.c: static void add_pack_to_midx(const char *full_path, size_t full_path_len,
+ if (ends_with(file_name, ".idx")) {
+ display_progress(ctx->progress, ++ctx->pack_paths_checked);
+
+- if (!should_include_pack(ctx, file_name, 1))
++ if (!should_include_pack(ctx, file_name))
+ return;
+
+ ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
+@@ midx-write.c: static int fill_packs_from_midx(struct write_midx_context *ctx,
+ uint32_t i;
+
+ for (i = 0; i < ctx->m->num_packs; i++) {
+- if (!should_include_pack(ctx, ctx->m->pack_names[i], 0))
+- continue;
+-
+ ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
+
+ if (flags & MIDX_WRITE_REV_INDEX || preferred_pack_name) {
+@@ midx-write.c: static int write_midx_internal(const char *object_dir,
+ die_errno(_("unable to create leading directories of %s"),
+ midx_name.buf);
+
+- ctx.m = lookup_multi_pack_index(the_repository, object_dir);
++ if (!packs_to_include) {
++ /*
++ * Only reference an existing MIDX when not filtering which
++ * packs to include, since all packs and objects are copied
++ * blindly from an existing MIDX if one is present.
++ */
++ ctx.m = lookup_multi_pack_index(the_repository, object_dir);
++ }
++
+ if (ctx.m && !midx_checksum_valid(ctx.m)) {
+ warning(_("ignoring existing multi-pack-index; checksum mismatch"));
+ ctx.m = NULL;
+@@ midx-write.c: static int write_midx_internal(const char *object_dir,
+ ctx.nr = 0;
+ ctx.alloc = ctx.m ? ctx.m->num_packs : 16;
+ ctx.info = NULL;
+- ctx.to_include = packs_to_include;
+ ALLOC_ARRAY(ctx.info, ctx.alloc);
+
+ if (ctx.m && fill_packs_from_midx(&ctx, preferred_pack_name,
+@@ midx-write.c: static int write_midx_internal(const char *object_dir,
+ else
+ ctx.progress = NULL;
+
++ ctx.to_include = packs_to_include;
++
+ for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
+ stop_progress(&ctx.progress);
+
+
+ ## t/t5326-multi-pack-bitmaps.sh ##
+@@ t/t5326-multi-pack-bitmaps.sh: do
+ '
+ done
+
++test_expect_success 'remove one packfile between MIDX bitmap writes' '
++ git init remove-pack-between-writes &&
++ (
++ cd remove-pack-between-writes &&
++
++ test_commit A &&
++ test_commit B &&
++ test_commit C &&
++
++ # Create packs with the prefix "pack-A", "pack-B",
++ # "pack-C" to impose a lexicographic order on these
++ # packs so the pack being removed is always from the
++ # middle.
++ packdir=.git/objects/pack &&
++ A="$(echo A | git pack-objects $packdir/pack-A --revs)" &&
++ B="$(echo B | git pack-objects $packdir/pack-B --revs)" &&
++ C="$(echo C | git pack-objects $packdir/pack-C --revs)" &&
++
++ git multi-pack-index write --bitmap &&
++
++ cat >in <<-EOF &&
++ pack-A-$A.idx
++ pack-C-$C.idx
++ EOF
++ git multi-pack-index write --bitmap --stdin-packs <in &&
++
++ git rev-list --test-bitmap HEAD
++ )
++'
++
+ test_done
-: ---------- > 2: a3c28f1202 pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse
-: ---------- > 3: dadcf96c06 pack-revindex.c: guard against out-of-bounds pack lookups
base-commit: 1b76f065085811104b5f4adff001956d7e5c5d1c
--
2.45.2.447.g6b4ffca7ec.dirty
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 1/3] midx-write.c: do not read existing MIDX with `packs_to_include`
2024-06-10 20:10 ` [PATCH v2 0/3] midx: various brown paper bag fixes Taylor Blau
@ 2024-06-10 20:10 ` Taylor Blau
2024-06-10 20:10 ` [PATCH v2 2/3] pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse Taylor Blau
2024-06-10 20:10 ` [PATCH v2 3/3] pack-revindex.c: guard against out-of-bounds pack lookups Taylor Blau
2 siblings, 0 replies; 15+ messages in thread
From: Taylor Blau @ 2024-06-10 20:10 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Kyle Lippincott, Patrick Steinhardt
Commit d6a8c58675 (midx-write.c: support reading an existing MIDX with
`packs_to_include`, 2024-05-29) changed the MIDX generation machinery to
support reading from an existing MIDX when writing a new one.
Unfortunately, the rest of the MIDX generation machinery is not prepared
to deal with such a change. For instance, the function responsible for
adding to the object ID fanout table from a MIDX source
(midx_fanout_add_midx_fanout()) will gladly add objects from an existing
MIDX for some fanout level regardless of whether or not those objects
came from packs that are to be included in the subsequent MIDX write.
This results in broken pseudo-pack object order (leading to incorrect
object traversal results) and segmentation faults, like so (generated by
running the added test prior to the changes in midx-write.c):
#0 0x000055ee31393f47 in midx_pack_order (ctx=0x7ffdde205c70) at midx-write.c:590
#1 0x000055ee31395a69 in write_midx_internal (object_dir=0x55ee32570440 ".git/objects",
packs_to_include=0x7ffdde205e20, packs_to_drop=0x0, preferred_pack_name=0x0,
refs_snapshot=0x0, flags=15) at midx-write.c:1171
#2 0x000055ee31395f38 in write_midx_file_only (object_dir=0x55ee32570440 ".git/objects",
packs_to_include=0x7ffdde205e20, preferred_pack_name=0x0, refs_snapshot=0x0, flags=15)
at midx-write.c:1274
[...]
In stack frame #0, the code on midx-write.c:590 is using the new pack ID
corresponding to some object which was added from the existing MIDX.
Importantly, the pack from which that object was selected in the
existing MIDX does not appear in the new MIDX as it was excluded via
`--stdin-packs`.
In this instance, the pack in question had pack ID "1" in the existing
MIDX, but since it was excluded from the new MIDX, we never filled in
that entry in the pack_perm table, resulting in:
(gdb) p *ctx->pack_perm@2
$1 = {0, 1515870810}
Which is what causes the segfault above when we try and read:
struct pack_info *pack = &ctx->info[ctx->pack_perm[i]];
if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
pack->bitmap_pos = 0;
Fundamentally, we should be able to read information from an existing
MIDX when generating a new one. But in practice the midx-write.c code
assumes that we won't run into issues like the above with incongruent
pack IDs, and often makes those assumptions in extremely subtle and
fragile ways.
Instead, let's avoid reading from an existing MIDX altogether, and stick
with the pre-d6a8c58675 implementation. Harden against any regressions
in this area by adding a test which demonstrates these issues.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx-write.c | 42 ++++++++++++++++++++++++++---------
t/t5326-multi-pack-bitmaps.sh | 30 +++++++++++++++++++++++++
2 files changed, 61 insertions(+), 11 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index 55a6b63bac..0125aa4dc9 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -101,13 +101,27 @@ struct write_midx_context {
};
static int should_include_pack(const struct write_midx_context *ctx,
- const char *file_name,
- int exclude_from_midx)
+ const char *file_name)
{
- if (exclude_from_midx && ctx->m && midx_contains_pack(ctx->m, file_name))
+ /*
+ * Note that at most one of ctx->m and ctx->to_include are set,
+ * so we are testing midx_contains_pack() and
+ * string_list_has_string() independently (guarded by the
+ * appropriate NULL checks).
+ *
+ * We could support passing to_include while reusing an existing
+ * MIDX, but don't currently since the reuse process drags
+ * forward all packs from an existing MIDX (without checking
+ * whether or not they appear in the to_include list).
+ *
+ * If we added support for that, these next two conditional
+ * should be performed independently (likely checking
+ * to_include before the existing MIDX).
+ */
+ if (ctx->m && midx_contains_pack(ctx->m, file_name))
return 0;
- if (ctx->to_include && !string_list_has_string(ctx->to_include,
- file_name))
+ else if (ctx->to_include &&
+ !string_list_has_string(ctx->to_include, file_name))
return 0;
return 1;
}
@@ -121,7 +135,7 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len,
if (ends_with(file_name, ".idx")) {
display_progress(ctx->progress, ++ctx->pack_paths_checked);
- if (!should_include_pack(ctx, file_name, 1))
+ if (!should_include_pack(ctx, file_name))
return;
ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
@@ -880,9 +894,6 @@ static int fill_packs_from_midx(struct write_midx_context *ctx,
uint32_t i;
for (i = 0; i < ctx->m->num_packs; i++) {
- if (!should_include_pack(ctx, ctx->m->pack_names[i], 0))
- continue;
-
ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
if (flags & MIDX_WRITE_REV_INDEX || preferred_pack_name) {
@@ -937,7 +948,15 @@ static int write_midx_internal(const char *object_dir,
die_errno(_("unable to create leading directories of %s"),
midx_name.buf);
- ctx.m = lookup_multi_pack_index(the_repository, object_dir);
+ if (!packs_to_include) {
+ /*
+ * Only reference an existing MIDX when not filtering which
+ * packs to include, since all packs and objects are copied
+ * blindly from an existing MIDX if one is present.
+ */
+ ctx.m = lookup_multi_pack_index(the_repository, object_dir);
+ }
+
if (ctx.m && !midx_checksum_valid(ctx.m)) {
warning(_("ignoring existing multi-pack-index; checksum mismatch"));
ctx.m = NULL;
@@ -946,7 +965,6 @@ static int write_midx_internal(const char *object_dir,
ctx.nr = 0;
ctx.alloc = ctx.m ? ctx.m->num_packs : 16;
ctx.info = NULL;
- ctx.to_include = packs_to_include;
ALLOC_ARRAY(ctx.info, ctx.alloc);
if (ctx.m && fill_packs_from_midx(&ctx, preferred_pack_name,
@@ -963,6 +981,8 @@ static int write_midx_internal(const char *object_dir,
else
ctx.progress = NULL;
+ ctx.to_include = packs_to_include;
+
for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
stop_progress(&ctx.progress);
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index cc7220b6c0..916da389b6 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -551,4 +551,34 @@ do
'
done
+test_expect_success 'remove one packfile between MIDX bitmap writes' '
+ git init remove-pack-between-writes &&
+ (
+ cd remove-pack-between-writes &&
+
+ test_commit A &&
+ test_commit B &&
+ test_commit C &&
+
+ # Create packs with the prefix "pack-A", "pack-B",
+ # "pack-C" to impose a lexicographic order on these
+ # packs so the pack being removed is always from the
+ # middle.
+ packdir=.git/objects/pack &&
+ A="$(echo A | git pack-objects $packdir/pack-A --revs)" &&
+ B="$(echo B | git pack-objects $packdir/pack-B --revs)" &&
+ C="$(echo C | git pack-objects $packdir/pack-C --revs)" &&
+
+ git multi-pack-index write --bitmap &&
+
+ cat >in <<-EOF &&
+ pack-A-$A.idx
+ pack-C-$C.idx
+ EOF
+ git multi-pack-index write --bitmap --stdin-packs <in &&
+
+ git rev-list --test-bitmap HEAD
+ )
+'
+
test_done
--
2.45.2.447.g6b4ffca7ec.dirty
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 2/3] pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse
2024-06-10 20:10 ` [PATCH v2 0/3] midx: various brown paper bag fixes Taylor Blau
2024-06-10 20:10 ` [PATCH v2 1/3] midx-write.c: do not read existing MIDX with `packs_to_include` Taylor Blau
@ 2024-06-10 20:10 ` Taylor Blau
2024-06-11 9:11 ` Jeff King
2024-06-10 20:10 ` [PATCH v2 3/3] pack-revindex.c: guard against out-of-bounds pack lookups Taylor Blau
2 siblings, 1 reply; 15+ messages in thread
From: Taylor Blau @ 2024-06-10 20:10 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Kyle Lippincott, Patrick Steinhardt
When performing multi-pack reuse, reuse_partial_packfile_from_bitmap()
is responsible for generating an array of bitmapped_pack structs from
which to perform reuse.
In the multi-pack case, we loop over the MIDXs packs and copy the result
of calling `nth_bitmapped_pack()` to construct the list of reusable
paths.
But we may also want to do pack-reuse over a single pack, either because
we only had one pack to perform reuse over (in the case of single-pack
bitmaps), or because we explicitly asked to do single pack reuse even
with a MIDX[^1].
When this is the case, the array we generate of reusable packs contains
only a single element, which is either (a) the pack attached to the
single-pack bitmap, or (b) the MIDX's preferred pack.
In 795006fff4 (pack-bitmap: gracefully handle missing BTMP chunks,
2024-04-15), we refactored the reuse_partial_packfile_from_bitmap()
function and stopped assigning the pack_int_id field when reusing only
the MIDX's preferred pack. This results in an uninitialized read down in
try_partial_reuse() like so:
==7474==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x55c5cd191dde in try_partial_reuse pack-bitmap.c:1887:8
#1 0x55c5cd191dde in reuse_partial_packfile_from_bitmap_1 pack-bitmap.c:2001:8
#2 0x55c5cd191dde in reuse_partial_packfile_from_bitmap pack-bitmap.c:2105:3
#3 0x55c5cce0bd0e in get_object_list_from_bitmap builtin/pack-objects.c:4043:3
#4 0x55c5cce0bd0e in get_object_list builtin/pack-objects.c:4156:27
#5 0x55c5cce0bd0e in cmd_pack_objects builtin/pack-objects.c:4596:3
#6 0x55c5ccc8fac8 in run_builtin git.c:474:11
which happens when try_partial_reuse() tries to call
midx_pair_to_pack_pos() when it tries to reject cross-pack deltas.
Avoid the uninitialized read by ensuring that the pack_int_id field is
set in the single-pack reuse case by setting it to either the MIDX
preferred pack's pack_int_id, or '0', in the case of single-pack
bitmaps. In the latter case, we never read the pack_int_id field, so
the choice of '0' is arbitrary.
Guard against further regressions in this area by adding a test which
ensures that we do not throw out deltas from the preferred pack as
"cross-pack" due to an uninitialized pack_int_id.
[^1]: This can happen for a couple of reasons, either because the
repository is configured with 'pack.allowPackReuse=(true|single)', or
because the MIDX was generated prior to the introduction of the BTMP
chunk, which contains information necessary to perform multi-pack
reuse.
Reported-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 10 ++++++++++
t/t5332-multi-pack-reuse.sh | 26 ++++++++++++++++++++++++++
2 files changed, 36 insertions(+)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index fe8e8a51d3..8b9a2c698f 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2073,6 +2073,7 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
QSORT(packs, packs_nr, bitmapped_pack_cmp);
} else {
struct packed_git *pack;
+ uint32_t pack_int_id;
if (bitmap_is_midx(bitmap_git)) {
uint32_t preferred_pack_pos;
@@ -2083,12 +2084,21 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
}
pack = bitmap_git->midx->packs[preferred_pack_pos];
+ pack_int_id = preferred_pack_pos;
} else {
pack = bitmap_git->pack;
+ /*
+ * Any value for 'pack_int_id' will do here. When we
+ * process the pack via try_partial_reuse(), we won't
+ * use the `pack_int_id` field since we have a non-MIDX
+ * bitmap.
+ */
+ pack_int_id = 0;
}
ALLOC_GROW(packs, packs_nr + 1, packs_alloc);
packs[packs_nr].p = pack;
+ packs[packs_nr].pack_int_id = pack_int_id;
packs[packs_nr].bitmap_nr = pack->num_objects;
packs[packs_nr].bitmap_pos = 0;
diff --git a/t/t5332-multi-pack-reuse.sh b/t/t5332-multi-pack-reuse.sh
index 3c20738bce..ed823f37bc 100755
--- a/t/t5332-multi-pack-reuse.sh
+++ b/t/t5332-multi-pack-reuse.sh
@@ -204,4 +204,30 @@ test_expect_success 'omit delta from uninteresting base (cross pack)' '
test_pack_objects_reused_all $(($objects_nr - 1)) $packs_nr
'
+test_expect_success 'non-omitted delta in MIDX preferred pack' '
+ test_config pack.allowPackReuse single &&
+
+ cat >p1.objects <<-EOF &&
+ $(git rev-parse $base)
+ ^$(git rev-parse $delta^)
+ EOF
+ cat >p2.objects <<-EOF &&
+ $(git rev-parse F)
+ EOF
+
+ p1="$(git pack-objects --revs $packdir/pack <p1.objects)" &&
+ p2="$(git pack-objects --revs $packdir/pack <p2.objects)" &&
+
+ cat >in <<-EOF &&
+ pack-$p1.idx
+ pack-$p2.idx
+ EOF
+ git multi-pack-index write --bitmap --stdin-packs \
+ --preferred-pack=pack-$p1.pack <in &&
+
+ git show-index <$packdir/pack-$p1.idx >expect &&
+
+ test_pack_objects_reused_all $(wc -l <expect) 1
+'
+
test_done
--
2.45.2.447.g6b4ffca7ec.dirty
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/3] pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse
2024-06-10 20:10 ` [PATCH v2 2/3] pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse Taylor Blau
@ 2024-06-11 9:11 ` Jeff King
2024-06-11 17:03 ` Junio C Hamano
0 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2024-06-11 9:11 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Junio C Hamano, Kyle Lippincott, Patrick Steinhardt
On Mon, Jun 10, 2024 at 04:10:53PM -0400, Taylor Blau wrote:
> Avoid the uninitialized read by ensuring that the pack_int_id field is
> set in the single-pack reuse case by setting it to either the MIDX
> preferred pack's pack_int_id, or '0', in the case of single-pack
> bitmaps. In the latter case, we never read the pack_int_id field, so
> the choice of '0' is arbitrary.
Could we set it to some sentinel value for the single-pack case? If we
set it to "-1", then the BUG() added in patch 3 would trigger if we did
accidentally try to feed it to the midx code. Assuming you do not have
2^32-1 packs, of course. ;)
I am OK either way, though. And the rest of the patch looked good.
-Peff
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/3] pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse
2024-06-11 9:11 ` Jeff King
@ 2024-06-11 17:03 ` Junio C Hamano
0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2024-06-11 17:03 UTC (permalink / raw)
To: Jeff King; +Cc: Taylor Blau, git, Kyle Lippincott, Patrick Steinhardt
Jeff King <peff@peff.net> writes:
> On Mon, Jun 10, 2024 at 04:10:53PM -0400, Taylor Blau wrote:
>
>> Avoid the uninitialized read by ensuring that the pack_int_id field is
>> set in the single-pack reuse case by setting it to either the MIDX
>> preferred pack's pack_int_id, or '0', in the case of single-pack
>> bitmaps. In the latter case, we never read the pack_int_id field, so
>> the choice of '0' is arbitrary.
>
> Could we set it to some sentinel value for the single-pack case? If we
> set it to "-1", then the BUG() added in patch 3 would trigger if we did
> accidentally try to feed it to the midx code. Assuming you do not have
> 2^32-1 packs, of course. ;)
Yeah, I had exactly the same reaction.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 3/3] pack-revindex.c: guard against out-of-bounds pack lookups
2024-06-10 20:10 ` [PATCH v2 0/3] midx: various brown paper bag fixes Taylor Blau
2024-06-10 20:10 ` [PATCH v2 1/3] midx-write.c: do not read existing MIDX with `packs_to_include` Taylor Blau
2024-06-10 20:10 ` [PATCH v2 2/3] pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse Taylor Blau
@ 2024-06-10 20:10 ` Taylor Blau
2 siblings, 0 replies; 15+ messages in thread
From: Taylor Blau @ 2024-06-10 20:10 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Kyle Lippincott, Patrick Steinhardt
The function midx_key_to_pack_pos() is a helper function used by
midx_to_pack_pos() and midx_pair_to_pack_pos() to translate a (pack,
offset) tuple into a position into the MIDX pseudo-pack order.
Ensure that the pack ID given to midx_pair_to_pack_pos() is bounded by
the number of packs within the MIDX to prevent, for instance,
uninitialized memory from being used as a pack ID.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-revindex.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/pack-revindex.c b/pack-revindex.c
index fc63aa76a2..93ffca7731 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -527,6 +527,9 @@ static int midx_key_to_pack_pos(struct multi_pack_index *m,
{
uint32_t *found;
+ if (key->pack >= m->num_packs)
+ BUG("MIDX pack lookup out of bounds (%"PRIu32" >= %"PRIu32")",
+ key->pack, m->num_packs);
/*
* The preferred pack sorts first, so determine its identifier by
* looking at the first object in pseudo-pack order.
--
2.45.2.447.g6b4ffca7ec.dirty
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 0/3] midx: various brown paper bag fixes
2024-06-09 15:27 [PATCH] pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse Taylor Blau
2024-06-10 5:55 ` Patrick Steinhardt
2024-06-10 20:10 ` [PATCH v2 0/3] midx: various brown paper bag fixes Taylor Blau
@ 2024-06-11 17:28 ` Taylor Blau
2024-06-11 17:28 ` [PATCH v2 1/3] midx-write.c: do not read existing MIDX with `packs_to_include` Taylor Blau
` (3 more replies)
2 siblings, 4 replies; 15+ messages in thread
From: Taylor Blau @ 2024-06-11 17:28 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Kyle Lippincott, Patrick Steinhardt
This series fixes a couple of brown paper bag bugs in the MIDX/bitmap
code.
This version is basically identical to the previous round, except that
we use the sentinel value "-1" as the 'pack_int_id' when reusing from a
single (non-MIDX'd) pack. This is a "garbage in, garbage out" measure to
ensure that we notice any regressions here loudly.
Thanks in advance for your review!
Taylor Blau (3):
midx-write.c: do not read existing MIDX with `packs_to_include`
pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse
pack-revindex.c: guard against out-of-bounds pack lookups
midx-write.c | 42 ++++++++++++++++++++++++++---------
pack-bitmap.c | 13 +++++++++++
pack-revindex.c | 3 +++
t/t5326-multi-pack-bitmaps.sh | 30 +++++++++++++++++++++++++
t/t5332-multi-pack-reuse.sh | 26 ++++++++++++++++++++++
5 files changed, 103 insertions(+), 11 deletions(-)
Range-diff against v1:
1: 4aceb9233e ! 1: 0bdf925366 pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse
@@ Metadata
Author: Taylor Blau <me@ttaylorr.com>
## Commit message ##
- pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse
-
- When performing multi-pack reuse, reuse_partial_packfile_from_bitmap()
- is responsible for generating an array of bitmapped_pack structs from
- which to perform reuse.
-
- In the multi-pack case, we loop over the MIDXs packs and copy the result
- of calling `nth_bitmapped_pack()` to construct the list of reusable
- paths.
-
- But we may also want to do pack-reuse over a single pack, either because
- we only had one pack to perform reuse over (in the case of single-pack
- bitmaps), or because we explicitly asked to do single pack reuse even
- with a MIDX[^1].
-
- When this is the case, the array we generate of reusable packs contains
- only a single element, which is either (a) the pack attached to the
- single-pack bitmap, or (b) the MIDX's preferred pack.
-
- In 795006fff4 (pack-bitmap: gracefully handle missing BTMP chunks,
- 2024-04-15), we refactored the reuse_partial_packfile_from_bitmap()
- function and stopped assigning the pack_int_id field when reusing only
- the MIDX's preferred pack. This results in an uninitialized read down in
- try_partial_reuse() like so:
-
- ==7474==WARNING: MemorySanitizer: use-of-uninitialized-value
- #0 0x55c5cd191dde in try_partial_reuse pack-bitmap.c:1887:8
- #1 0x55c5cd191dde in reuse_partial_packfile_from_bitmap_1 pack-bitmap.c:2001:8
- #2 0x55c5cd191dde in reuse_partial_packfile_from_bitmap pack-bitmap.c:2105:3
- #3 0x55c5cce0bd0e in get_object_list_from_bitmap builtin/pack-objects.c:4043:3
- #4 0x55c5cce0bd0e in get_object_list builtin/pack-objects.c:4156:27
- #5 0x55c5cce0bd0e in cmd_pack_objects builtin/pack-objects.c:4596:3
- #6 0x55c5ccc8fac8 in run_builtin git.c:474:11
-
- which happens when try_partial_reuse() tries to call
- midx_pair_to_pack_pos() when it tries to reject cross-pack deltas.
-
- Avoid the uninitialized read by ensuring that the pack_int_id field is
- set in the single-pack reuse case by setting it to either the MIDX
- preferred pack's pack_int_id, or '0', in the case of single-pack
- bitmaps. In the latter case, we never read the pack_int_id field, so
- the choice of '0' is arbitrary.
-
- [^1]: This can happen for a couple of reasons, either because the
- repository is configured with 'pack.allowPackReuse=(true|single)', or
- because the MIDX was generated prior to the introduction of the BTMP
- chunk, which contains information necessary to perform multi-pack
- reuse.
-
- Reported-by: Kyle Lippincott <spectral@google.com>
+ midx-write.c: do not read existing MIDX with `packs_to_include`
+
+ Commit d6a8c58675 (midx-write.c: support reading an existing MIDX with
+ `packs_to_include`, 2024-05-29) changed the MIDX generation machinery to
+ support reading from an existing MIDX when writing a new one.
+
+ Unfortunately, the rest of the MIDX generation machinery is not prepared
+ to deal with such a change. For instance, the function responsible for
+ adding to the object ID fanout table from a MIDX source
+ (midx_fanout_add_midx_fanout()) will gladly add objects from an existing
+ MIDX for some fanout level regardless of whether or not those objects
+ came from packs that are to be included in the subsequent MIDX write.
+
+ This results in broken pseudo-pack object order (leading to incorrect
+ object traversal results) and segmentation faults, like so (generated by
+ running the added test prior to the changes in midx-write.c):
+
+ #0 0x000055ee31393f47 in midx_pack_order (ctx=0x7ffdde205c70) at midx-write.c:590
+ #1 0x000055ee31395a69 in write_midx_internal (object_dir=0x55ee32570440 ".git/objects",
+ packs_to_include=0x7ffdde205e20, packs_to_drop=0x0, preferred_pack_name=0x0,
+ refs_snapshot=0x0, flags=15) at midx-write.c:1171
+ #2 0x000055ee31395f38 in write_midx_file_only (object_dir=0x55ee32570440 ".git/objects",
+ packs_to_include=0x7ffdde205e20, preferred_pack_name=0x0, refs_snapshot=0x0, flags=15)
+ at midx-write.c:1274
+ [...]
+
+ In stack frame #0, the code on midx-write.c:590 is using the new pack ID
+ corresponding to some object which was added from the existing MIDX.
+ Importantly, the pack from which that object was selected in the
+ existing MIDX does not appear in the new MIDX as it was excluded via
+ `--stdin-packs`.
+
+ In this instance, the pack in question had pack ID "1" in the existing
+ MIDX, but since it was excluded from the new MIDX, we never filled in
+ that entry in the pack_perm table, resulting in:
+
+ (gdb) p *ctx->pack_perm@2
+ $1 = {0, 1515870810}
+
+ Which is what causes the segfault above when we try and read:
+
+ struct pack_info *pack = &ctx->info[ctx->pack_perm[i]];
+ if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
+ pack->bitmap_pos = 0;
+
+ Fundamentally, we should be able to read information from an existing
+ MIDX when generating a new one. But in practice the midx-write.c code
+ assumes that we won't run into issues like the above with incongruent
+ pack IDs, and often makes those assumptions in extremely subtle and
+ fragile ways.
+
+ Instead, let's avoid reading from an existing MIDX altogether, and stick
+ with the pre-d6a8c58675 implementation. Harden against any regressions
+ in this area by adding a test which demonstrates these issues.
+
Signed-off-by: Taylor Blau <me@ttaylorr.com>
- ## pack-bitmap.c ##
-@@ pack-bitmap.c: void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
- QSORT(packs, packs_nr, bitmapped_pack_cmp);
- } else {
- struct packed_git *pack;
-+ uint32_t pack_int_id;
-
- if (bitmap_is_midx(bitmap_git)) {
- uint32_t preferred_pack_pos;
-@@ pack-bitmap.c: void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
- }
-
- pack = bitmap_git->midx->packs[preferred_pack_pos];
-+ pack_int_id = preferred_pack_pos;
- } else {
- pack = bitmap_git->pack;
-+ /*
-+ * Any value for 'pack_int_id' will do here. When we
-+ * process the pack via try_partial_reuse(), we won't
-+ * use the `pack_int_id` field since we have a non-MIDX
-+ * bitmap.
-+ */
-+ pack_int_id = 0;
- }
-
- ALLOC_GROW(packs, packs_nr + 1, packs_alloc);
- packs[packs_nr].p = pack;
-+ packs[packs_nr].pack_int_id = pack_int_id;
- packs[packs_nr].bitmap_nr = pack->num_objects;
- packs[packs_nr].bitmap_pos = 0;
+ ## midx-write.c ##
+@@ midx-write.c: struct write_midx_context {
+ };
+ static int should_include_pack(const struct write_midx_context *ctx,
+- const char *file_name,
+- int exclude_from_midx)
++ const char *file_name)
+ {
+- if (exclude_from_midx && ctx->m && midx_contains_pack(ctx->m, file_name))
++ /*
++ * Note that at most one of ctx->m and ctx->to_include are set,
++ * so we are testing midx_contains_pack() and
++ * string_list_has_string() independently (guarded by the
++ * appropriate NULL checks).
++ *
++ * We could support passing to_include while reusing an existing
++ * MIDX, but don't currently since the reuse process drags
++ * forward all packs from an existing MIDX (without checking
++ * whether or not they appear in the to_include list).
++ *
++ * If we added support for that, these next two conditional
++ * should be performed independently (likely checking
++ * to_include before the existing MIDX).
++ */
++ if (ctx->m && midx_contains_pack(ctx->m, file_name))
+ return 0;
+- if (ctx->to_include && !string_list_has_string(ctx->to_include,
+- file_name))
++ else if (ctx->to_include &&
++ !string_list_has_string(ctx->to_include, file_name))
+ return 0;
+ return 1;
+ }
+@@ midx-write.c: static void add_pack_to_midx(const char *full_path, size_t full_path_len,
+ if (ends_with(file_name, ".idx")) {
+ display_progress(ctx->progress, ++ctx->pack_paths_checked);
+
+- if (!should_include_pack(ctx, file_name, 1))
++ if (!should_include_pack(ctx, file_name))
+ return;
+
+ ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
+@@ midx-write.c: static int fill_packs_from_midx(struct write_midx_context *ctx,
+ uint32_t i;
+
+ for (i = 0; i < ctx->m->num_packs; i++) {
+- if (!should_include_pack(ctx, ctx->m->pack_names[i], 0))
+- continue;
+-
+ ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
+
+ if (flags & MIDX_WRITE_REV_INDEX || preferred_pack_name) {
+@@ midx-write.c: static int write_midx_internal(const char *object_dir,
+ die_errno(_("unable to create leading directories of %s"),
+ midx_name.buf);
+
+- ctx.m = lookup_multi_pack_index(the_repository, object_dir);
++ if (!packs_to_include) {
++ /*
++ * Only reference an existing MIDX when not filtering which
++ * packs to include, since all packs and objects are copied
++ * blindly from an existing MIDX if one is present.
++ */
++ ctx.m = lookup_multi_pack_index(the_repository, object_dir);
++ }
++
+ if (ctx.m && !midx_checksum_valid(ctx.m)) {
+ warning(_("ignoring existing multi-pack-index; checksum mismatch"));
+ ctx.m = NULL;
+@@ midx-write.c: static int write_midx_internal(const char *object_dir,
+ ctx.nr = 0;
+ ctx.alloc = ctx.m ? ctx.m->num_packs : 16;
+ ctx.info = NULL;
+- ctx.to_include = packs_to_include;
+ ALLOC_ARRAY(ctx.info, ctx.alloc);
+
+ if (ctx.m && fill_packs_from_midx(&ctx, preferred_pack_name,
+@@ midx-write.c: static int write_midx_internal(const char *object_dir,
+ else
+ ctx.progress = NULL;
+
++ ctx.to_include = packs_to_include;
++
+ for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
+ stop_progress(&ctx.progress);
+
+
+ ## t/t5326-multi-pack-bitmaps.sh ##
+@@ t/t5326-multi-pack-bitmaps.sh: do
+ '
+ done
+
++test_expect_success 'remove one packfile between MIDX bitmap writes' '
++ git init remove-pack-between-writes &&
++ (
++ cd remove-pack-between-writes &&
++
++ test_commit A &&
++ test_commit B &&
++ test_commit C &&
++
++ # Create packs with the prefix "pack-A", "pack-B",
++ # "pack-C" to impose a lexicographic order on these
++ # packs so the pack being removed is always from the
++ # middle.
++ packdir=.git/objects/pack &&
++ A="$(echo A | git pack-objects $packdir/pack-A --revs)" &&
++ B="$(echo B | git pack-objects $packdir/pack-B --revs)" &&
++ C="$(echo C | git pack-objects $packdir/pack-C --revs)" &&
++
++ git multi-pack-index write --bitmap &&
++
++ cat >in <<-EOF &&
++ pack-A-$A.idx
++ pack-C-$C.idx
++ EOF
++ git multi-pack-index write --bitmap --stdin-packs <in &&
++
++ git rev-list --test-bitmap HEAD
++ )
++'
++
+ test_done
-: ---------- > 2: 4b006f44a5 pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse
-: ---------- > 3: 06de4005f1 pack-revindex.c: guard against out-of-bounds pack lookups
base-commit: 1b76f065085811104b5f4adff001956d7e5c5d1c
--
2.45.2.448.g06de4005f1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 1/3] midx-write.c: do not read existing MIDX with `packs_to_include`
2024-06-11 17:28 ` [PATCH v2 0/3] midx: various brown paper bag fixes Taylor Blau
@ 2024-06-11 17:28 ` Taylor Blau
2024-06-11 17:28 ` [PATCH v2 2/3] pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse Taylor Blau
` (2 subsequent siblings)
3 siblings, 0 replies; 15+ messages in thread
From: Taylor Blau @ 2024-06-11 17:28 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Kyle Lippincott, Patrick Steinhardt
Commit d6a8c58675 (midx-write.c: support reading an existing MIDX with
`packs_to_include`, 2024-05-29) changed the MIDX generation machinery to
support reading from an existing MIDX when writing a new one.
Unfortunately, the rest of the MIDX generation machinery is not prepared
to deal with such a change. For instance, the function responsible for
adding to the object ID fanout table from a MIDX source
(midx_fanout_add_midx_fanout()) will gladly add objects from an existing
MIDX for some fanout level regardless of whether or not those objects
came from packs that are to be included in the subsequent MIDX write.
This results in broken pseudo-pack object order (leading to incorrect
object traversal results) and segmentation faults, like so (generated by
running the added test prior to the changes in midx-write.c):
#0 0x000055ee31393f47 in midx_pack_order (ctx=0x7ffdde205c70) at midx-write.c:590
#1 0x000055ee31395a69 in write_midx_internal (object_dir=0x55ee32570440 ".git/objects",
packs_to_include=0x7ffdde205e20, packs_to_drop=0x0, preferred_pack_name=0x0,
refs_snapshot=0x0, flags=15) at midx-write.c:1171
#2 0x000055ee31395f38 in write_midx_file_only (object_dir=0x55ee32570440 ".git/objects",
packs_to_include=0x7ffdde205e20, preferred_pack_name=0x0, refs_snapshot=0x0, flags=15)
at midx-write.c:1274
[...]
In stack frame #0, the code on midx-write.c:590 is using the new pack ID
corresponding to some object which was added from the existing MIDX.
Importantly, the pack from which that object was selected in the
existing MIDX does not appear in the new MIDX as it was excluded via
`--stdin-packs`.
In this instance, the pack in question had pack ID "1" in the existing
MIDX, but since it was excluded from the new MIDX, we never filled in
that entry in the pack_perm table, resulting in:
(gdb) p *ctx->pack_perm@2
$1 = {0, 1515870810}
Which is what causes the segfault above when we try and read:
struct pack_info *pack = &ctx->info[ctx->pack_perm[i]];
if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
pack->bitmap_pos = 0;
Fundamentally, we should be able to read information from an existing
MIDX when generating a new one. But in practice the midx-write.c code
assumes that we won't run into issues like the above with incongruent
pack IDs, and often makes those assumptions in extremely subtle and
fragile ways.
Instead, let's avoid reading from an existing MIDX altogether, and stick
with the pre-d6a8c58675 implementation. Harden against any regressions
in this area by adding a test which demonstrates these issues.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx-write.c | 42 ++++++++++++++++++++++++++---------
t/t5326-multi-pack-bitmaps.sh | 30 +++++++++++++++++++++++++
2 files changed, 61 insertions(+), 11 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index 55a6b63bac..0125aa4dc9 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -101,13 +101,27 @@ struct write_midx_context {
};
static int should_include_pack(const struct write_midx_context *ctx,
- const char *file_name,
- int exclude_from_midx)
+ const char *file_name)
{
- if (exclude_from_midx && ctx->m && midx_contains_pack(ctx->m, file_name))
+ /*
+ * Note that at most one of ctx->m and ctx->to_include are set,
+ * so we are testing midx_contains_pack() and
+ * string_list_has_string() independently (guarded by the
+ * appropriate NULL checks).
+ *
+ * We could support passing to_include while reusing an existing
+ * MIDX, but don't currently since the reuse process drags
+ * forward all packs from an existing MIDX (without checking
+ * whether or not they appear in the to_include list).
+ *
+ * If we added support for that, these next two conditional
+ * should be performed independently (likely checking
+ * to_include before the existing MIDX).
+ */
+ if (ctx->m && midx_contains_pack(ctx->m, file_name))
return 0;
- if (ctx->to_include && !string_list_has_string(ctx->to_include,
- file_name))
+ else if (ctx->to_include &&
+ !string_list_has_string(ctx->to_include, file_name))
return 0;
return 1;
}
@@ -121,7 +135,7 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len,
if (ends_with(file_name, ".idx")) {
display_progress(ctx->progress, ++ctx->pack_paths_checked);
- if (!should_include_pack(ctx, file_name, 1))
+ if (!should_include_pack(ctx, file_name))
return;
ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
@@ -880,9 +894,6 @@ static int fill_packs_from_midx(struct write_midx_context *ctx,
uint32_t i;
for (i = 0; i < ctx->m->num_packs; i++) {
- if (!should_include_pack(ctx, ctx->m->pack_names[i], 0))
- continue;
-
ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
if (flags & MIDX_WRITE_REV_INDEX || preferred_pack_name) {
@@ -937,7 +948,15 @@ static int write_midx_internal(const char *object_dir,
die_errno(_("unable to create leading directories of %s"),
midx_name.buf);
- ctx.m = lookup_multi_pack_index(the_repository, object_dir);
+ if (!packs_to_include) {
+ /*
+ * Only reference an existing MIDX when not filtering which
+ * packs to include, since all packs and objects are copied
+ * blindly from an existing MIDX if one is present.
+ */
+ ctx.m = lookup_multi_pack_index(the_repository, object_dir);
+ }
+
if (ctx.m && !midx_checksum_valid(ctx.m)) {
warning(_("ignoring existing multi-pack-index; checksum mismatch"));
ctx.m = NULL;
@@ -946,7 +965,6 @@ static int write_midx_internal(const char *object_dir,
ctx.nr = 0;
ctx.alloc = ctx.m ? ctx.m->num_packs : 16;
ctx.info = NULL;
- ctx.to_include = packs_to_include;
ALLOC_ARRAY(ctx.info, ctx.alloc);
if (ctx.m && fill_packs_from_midx(&ctx, preferred_pack_name,
@@ -963,6 +981,8 @@ static int write_midx_internal(const char *object_dir,
else
ctx.progress = NULL;
+ ctx.to_include = packs_to_include;
+
for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
stop_progress(&ctx.progress);
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index cc7220b6c0..916da389b6 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -551,4 +551,34 @@ do
'
done
+test_expect_success 'remove one packfile between MIDX bitmap writes' '
+ git init remove-pack-between-writes &&
+ (
+ cd remove-pack-between-writes &&
+
+ test_commit A &&
+ test_commit B &&
+ test_commit C &&
+
+ # Create packs with the prefix "pack-A", "pack-B",
+ # "pack-C" to impose a lexicographic order on these
+ # packs so the pack being removed is always from the
+ # middle.
+ packdir=.git/objects/pack &&
+ A="$(echo A | git pack-objects $packdir/pack-A --revs)" &&
+ B="$(echo B | git pack-objects $packdir/pack-B --revs)" &&
+ C="$(echo C | git pack-objects $packdir/pack-C --revs)" &&
+
+ git multi-pack-index write --bitmap &&
+
+ cat >in <<-EOF &&
+ pack-A-$A.idx
+ pack-C-$C.idx
+ EOF
+ git multi-pack-index write --bitmap --stdin-packs <in &&
+
+ git rev-list --test-bitmap HEAD
+ )
+'
+
test_done
--
2.45.2.448.g06de4005f1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 2/3] pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse
2024-06-11 17:28 ` [PATCH v2 0/3] midx: various brown paper bag fixes Taylor Blau
2024-06-11 17:28 ` [PATCH v2 1/3] midx-write.c: do not read existing MIDX with `packs_to_include` Taylor Blau
@ 2024-06-11 17:28 ` Taylor Blau
2024-06-11 17:28 ` [PATCH v2 3/3] pack-revindex.c: guard against out-of-bounds pack lookups Taylor Blau
2024-06-11 17:31 ` [PATCH v2 0/3] midx: various brown paper bag fixes Taylor Blau
3 siblings, 0 replies; 15+ messages in thread
From: Taylor Blau @ 2024-06-11 17:28 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Kyle Lippincott, Patrick Steinhardt
When performing multi-pack reuse, reuse_partial_packfile_from_bitmap()
is responsible for generating an array of bitmapped_pack structs from
which to perform reuse.
In the multi-pack case, we loop over the MIDXs packs and copy the result
of calling `nth_bitmapped_pack()` to construct the list of reusable
paths.
But we may also want to do pack-reuse over a single pack, either because
we only had one pack to perform reuse over (in the case of single-pack
bitmaps), or because we explicitly asked to do single pack reuse even
with a MIDX[^1].
When this is the case, the array we generate of reusable packs contains
only a single element, which is either (a) the pack attached to the
single-pack bitmap, or (b) the MIDX's preferred pack.
In 795006fff4 (pack-bitmap: gracefully handle missing BTMP chunks,
2024-04-15), we refactored the reuse_partial_packfile_from_bitmap()
function and stopped assigning the pack_int_id field when reusing only
the MIDX's preferred pack. This results in an uninitialized read down in
try_partial_reuse() like so:
==7474==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x55c5cd191dde in try_partial_reuse pack-bitmap.c:1887:8
#1 0x55c5cd191dde in reuse_partial_packfile_from_bitmap_1 pack-bitmap.c:2001:8
#2 0x55c5cd191dde in reuse_partial_packfile_from_bitmap pack-bitmap.c:2105:3
#3 0x55c5cce0bd0e in get_object_list_from_bitmap builtin/pack-objects.c:4043:3
#4 0x55c5cce0bd0e in get_object_list builtin/pack-objects.c:4156:27
#5 0x55c5cce0bd0e in cmd_pack_objects builtin/pack-objects.c:4596:3
#6 0x55c5ccc8fac8 in run_builtin git.c:474:11
which happens when try_partial_reuse() tries to call
midx_pair_to_pack_pos() when it tries to reject cross-pack deltas.
Avoid the uninitialized read by ensuring that the pack_int_id field is
set in the single-pack reuse case by setting it to either the MIDX
preferred pack's pack_int_id, or '-1', in the case of single-pack
bitmaps. In the latter case, we never read the pack_int_id field, so
the choice of '-1' is intentional as a "garbage in, garbage out"
measure.
Guard against further regressions in this area by adding a test which
ensures that we do not throw out deltas from the preferred pack as
"cross-pack" due to an uninitialized pack_int_id.
[^1]: This can happen for a couple of reasons, either because the
repository is configured with 'pack.allowPackReuse=(true|single)', or
because the MIDX was generated prior to the introduction of the BTMP
chunk, which contains information necessary to perform multi-pack
reuse.
Reported-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 13 +++++++++++++
t/t5332-multi-pack-reuse.sh | 26 ++++++++++++++++++++++++++
2 files changed, 39 insertions(+)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index fe8e8a51d3..1d6b7f2826 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2073,6 +2073,7 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
QSORT(packs, packs_nr, bitmapped_pack_cmp);
} else {
struct packed_git *pack;
+ uint32_t pack_int_id;
if (bitmap_is_midx(bitmap_git)) {
uint32_t preferred_pack_pos;
@@ -2083,12 +2084,24 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
}
pack = bitmap_git->midx->packs[preferred_pack_pos];
+ pack_int_id = preferred_pack_pos;
} else {
pack = bitmap_git->pack;
+ /*
+ * Any value for 'pack_int_id' will do here. When we
+ * process the pack via try_partial_reuse(), we won't
+ * use the `pack_int_id` field since we have a non-MIDX
+ * bitmap.
+ *
+ * Use '-1' as a sentinel value to make it clear
+ * that we do not expect to read this field.
+ */
+ pack_int_id = -1;
}
ALLOC_GROW(packs, packs_nr + 1, packs_alloc);
packs[packs_nr].p = pack;
+ packs[packs_nr].pack_int_id = pack_int_id;
packs[packs_nr].bitmap_nr = pack->num_objects;
packs[packs_nr].bitmap_pos = 0;
diff --git a/t/t5332-multi-pack-reuse.sh b/t/t5332-multi-pack-reuse.sh
index 3c20738bce..ed823f37bc 100755
--- a/t/t5332-multi-pack-reuse.sh
+++ b/t/t5332-multi-pack-reuse.sh
@@ -204,4 +204,30 @@ test_expect_success 'omit delta from uninteresting base (cross pack)' '
test_pack_objects_reused_all $(($objects_nr - 1)) $packs_nr
'
+test_expect_success 'non-omitted delta in MIDX preferred pack' '
+ test_config pack.allowPackReuse single &&
+
+ cat >p1.objects <<-EOF &&
+ $(git rev-parse $base)
+ ^$(git rev-parse $delta^)
+ EOF
+ cat >p2.objects <<-EOF &&
+ $(git rev-parse F)
+ EOF
+
+ p1="$(git pack-objects --revs $packdir/pack <p1.objects)" &&
+ p2="$(git pack-objects --revs $packdir/pack <p2.objects)" &&
+
+ cat >in <<-EOF &&
+ pack-$p1.idx
+ pack-$p2.idx
+ EOF
+ git multi-pack-index write --bitmap --stdin-packs \
+ --preferred-pack=pack-$p1.pack <in &&
+
+ git show-index <$packdir/pack-$p1.idx >expect &&
+
+ test_pack_objects_reused_all $(wc -l <expect) 1
+'
+
test_done
--
2.45.2.448.g06de4005f1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 3/3] pack-revindex.c: guard against out-of-bounds pack lookups
2024-06-11 17:28 ` [PATCH v2 0/3] midx: various brown paper bag fixes Taylor Blau
2024-06-11 17:28 ` [PATCH v2 1/3] midx-write.c: do not read existing MIDX with `packs_to_include` Taylor Blau
2024-06-11 17:28 ` [PATCH v2 2/3] pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse Taylor Blau
@ 2024-06-11 17:28 ` Taylor Blau
2024-06-11 17:31 ` [PATCH v2 0/3] midx: various brown paper bag fixes Taylor Blau
3 siblings, 0 replies; 15+ messages in thread
From: Taylor Blau @ 2024-06-11 17:28 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Kyle Lippincott, Patrick Steinhardt
The function midx_key_to_pack_pos() is a helper function used by
midx_to_pack_pos() and midx_pair_to_pack_pos() to translate a (pack,
offset) tuple into a position into the MIDX pseudo-pack order.
Ensure that the pack ID given to midx_pair_to_pack_pos() is bounded by
the number of packs within the MIDX to prevent, for instance,
uninitialized memory from being used as a pack ID.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-revindex.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/pack-revindex.c b/pack-revindex.c
index fc63aa76a2..93ffca7731 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -527,6 +527,9 @@ static int midx_key_to_pack_pos(struct multi_pack_index *m,
{
uint32_t *found;
+ if (key->pack >= m->num_packs)
+ BUG("MIDX pack lookup out of bounds (%"PRIu32" >= %"PRIu32")",
+ key->pack, m->num_packs);
/*
* The preferred pack sorts first, so determine its identifier by
* looking at the first object in pseudo-pack order.
--
2.45.2.448.g06de4005f1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/3] midx: various brown paper bag fixes
2024-06-11 17:28 ` [PATCH v2 0/3] midx: various brown paper bag fixes Taylor Blau
` (2 preceding siblings ...)
2024-06-11 17:28 ` [PATCH v2 3/3] pack-revindex.c: guard against out-of-bounds pack lookups Taylor Blau
@ 2024-06-11 17:31 ` Taylor Blau
3 siblings, 0 replies; 15+ messages in thread
From: Taylor Blau @ 2024-06-11 17:31 UTC (permalink / raw)
To: git; +Cc: Jeff King, Junio C Hamano, Kyle Lippincott, Patrick Steinhardt
On Tue, Jun 11, 2024 at 01:28:07PM -0400, Taylor Blau wrote:
> This series fixes a couple of brown paper bag bugs in the MIDX/bitmap
> code.
>
> This version is basically identical to the previous round, except that
> we use the sentinel value "-1" as the 'pack_int_id' when reusing from a
> single (non-MIDX'd) pack. This is a "garbage in, garbage out" measure to
> ensure that we notice any regressions here loudly.
>
> Thanks in advance for your review!
I must have forgotten to tag my original "v2" as such, since my scripts
decided that this is actually v2. Apologies for any confusion, this
latest version is certainly v3, the range-diff looks like so:
--- 8< ---
1: c62daacd1d = 1: 0bdf925366 midx-write.c: do not read existing MIDX with `packs_to_include`
2: 9b78fa2923 ! 2: 4b006f44a5 pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse
@@ Commit message
Avoid the uninitialized read by ensuring that the pack_int_id field is
set in the single-pack reuse case by setting it to either the MIDX
- preferred pack's pack_int_id, or '0', in the case of single-pack
+ preferred pack's pack_int_id, or '-1', in the case of single-pack
bitmaps. In the latter case, we never read the pack_int_id field, so
- the choice of '0' is arbitrary.
+ the choice of '-1' is intentional as a "garbage in, garbage out"
+ measure.
Guard against further regressions in this area by adding a test which
ensures that we do not throw out deltas from the preferred pack as
@@ pack-bitmap.c: void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitm
+ * process the pack via try_partial_reuse(), we won't
+ * use the `pack_int_id` field since we have a non-MIDX
+ * bitmap.
++ *
++ * Use '-1' as a sentinel value to make it clear
++ * that we do not expect to read this field.
+ */
-+ pack_int_id = 0;
++ pack_int_id = -1;
}
ALLOC_GROW(packs, packs_nr + 1, packs_alloc);
3: e08f679dec = 3: 06de4005f1 pack-revindex.c: guard against out-of-bounds pack lookups
--- >8 ---
Thanks,
Taylor
^ permalink raw reply [flat|nested] 15+ messages in thread