From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>,
Kyle Lippincott <spectral@google.com>,
Patrick Steinhardt <ps@pks.im>
Subject: [PATCH v2 0/3] midx: various brown paper bag fixes
Date: Tue, 11 Jun 2024 13:28:07 -0400 [thread overview]
Message-ID: <cover.1718126886.git.me@ttaylorr.com> (raw)
In-Reply-To: <4aceb9233ed24fb1e1a324a77b665eea2cf22b39.1717946847.git.me@ttaylorr.com>
This series fixes a couple of brown paper bag bugs in the MIDX/bitmap
code.
This version is basically identical to the previous round, except that
we use the sentinel value "-1" as the 'pack_int_id' when reusing from a
single (non-MIDX'd) pack. This is a "garbage in, garbage out" measure to
ensure that we notice any regressions here loudly.
Thanks in advance for your review!
Taylor Blau (3):
midx-write.c: do not read existing MIDX with `packs_to_include`
pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse
pack-revindex.c: guard against out-of-bounds pack lookups
midx-write.c | 42 ++++++++++++++++++++++++++---------
pack-bitmap.c | 13 +++++++++++
pack-revindex.c | 3 +++
t/t5326-multi-pack-bitmaps.sh | 30 +++++++++++++++++++++++++
t/t5332-multi-pack-reuse.sh | 26 ++++++++++++++++++++++
5 files changed, 103 insertions(+), 11 deletions(-)
Range-diff against v1:
1: 4aceb9233e ! 1: 0bdf925366 pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse
@@ Metadata
Author: Taylor Blau <me@ttaylorr.com>
## Commit message ##
- pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse
-
- When performing multi-pack reuse, reuse_partial_packfile_from_bitmap()
- is responsible for generating an array of bitmapped_pack structs from
- which to perform reuse.
-
- In the multi-pack case, we loop over the MIDXs packs and copy the result
- of calling `nth_bitmapped_pack()` to construct the list of reusable
- paths.
-
- But we may also want to do pack-reuse over a single pack, either because
- we only had one pack to perform reuse over (in the case of single-pack
- bitmaps), or because we explicitly asked to do single pack reuse even
- with a MIDX[^1].
-
- When this is the case, the array we generate of reusable packs contains
- only a single element, which is either (a) the pack attached to the
- single-pack bitmap, or (b) the MIDX's preferred pack.
-
- In 795006fff4 (pack-bitmap: gracefully handle missing BTMP chunks,
- 2024-04-15), we refactored the reuse_partial_packfile_from_bitmap()
- function and stopped assigning the pack_int_id field when reusing only
- the MIDX's preferred pack. This results in an uninitialized read down in
- try_partial_reuse() like so:
-
- ==7474==WARNING: MemorySanitizer: use-of-uninitialized-value
- #0 0x55c5cd191dde in try_partial_reuse pack-bitmap.c:1887:8
- #1 0x55c5cd191dde in reuse_partial_packfile_from_bitmap_1 pack-bitmap.c:2001:8
- #2 0x55c5cd191dde in reuse_partial_packfile_from_bitmap pack-bitmap.c:2105:3
- #3 0x55c5cce0bd0e in get_object_list_from_bitmap builtin/pack-objects.c:4043:3
- #4 0x55c5cce0bd0e in get_object_list builtin/pack-objects.c:4156:27
- #5 0x55c5cce0bd0e in cmd_pack_objects builtin/pack-objects.c:4596:3
- #6 0x55c5ccc8fac8 in run_builtin git.c:474:11
-
- which happens when try_partial_reuse() tries to call
- midx_pair_to_pack_pos() when it tries to reject cross-pack deltas.
-
- Avoid the uninitialized read by ensuring that the pack_int_id field is
- set in the single-pack reuse case by setting it to either the MIDX
- preferred pack's pack_int_id, or '0', in the case of single-pack
- bitmaps. In the latter case, we never read the pack_int_id field, so
- the choice of '0' is arbitrary.
-
- [^1]: This can happen for a couple of reasons, either because the
- repository is configured with 'pack.allowPackReuse=(true|single)', or
- because the MIDX was generated prior to the introduction of the BTMP
- chunk, which contains information necessary to perform multi-pack
- reuse.
-
- Reported-by: Kyle Lippincott <spectral@google.com>
+ midx-write.c: do not read existing MIDX with `packs_to_include`
+
+ Commit d6a8c58675 (midx-write.c: support reading an existing MIDX with
+ `packs_to_include`, 2024-05-29) changed the MIDX generation machinery to
+ support reading from an existing MIDX when writing a new one.
+
+ Unfortunately, the rest of the MIDX generation machinery is not prepared
+ to deal with such a change. For instance, the function responsible for
+ adding to the object ID fanout table from a MIDX source
+ (midx_fanout_add_midx_fanout()) will gladly add objects from an existing
+ MIDX for some fanout level regardless of whether or not those objects
+ came from packs that are to be included in the subsequent MIDX write.
+
+ This results in broken pseudo-pack object order (leading to incorrect
+ object traversal results) and segmentation faults, like so (generated by
+ running the added test prior to the changes in midx-write.c):
+
+ #0 0x000055ee31393f47 in midx_pack_order (ctx=0x7ffdde205c70) at midx-write.c:590
+ #1 0x000055ee31395a69 in write_midx_internal (object_dir=0x55ee32570440 ".git/objects",
+ packs_to_include=0x7ffdde205e20, packs_to_drop=0x0, preferred_pack_name=0x0,
+ refs_snapshot=0x0, flags=15) at midx-write.c:1171
+ #2 0x000055ee31395f38 in write_midx_file_only (object_dir=0x55ee32570440 ".git/objects",
+ packs_to_include=0x7ffdde205e20, preferred_pack_name=0x0, refs_snapshot=0x0, flags=15)
+ at midx-write.c:1274
+ [...]
+
+ In stack frame #0, the code on midx-write.c:590 is using the new pack ID
+ corresponding to some object which was added from the existing MIDX.
+ Importantly, the pack from which that object was selected in the
+ existing MIDX does not appear in the new MIDX as it was excluded via
+ `--stdin-packs`.
+
+ In this instance, the pack in question had pack ID "1" in the existing
+ MIDX, but since it was excluded from the new MIDX, we never filled in
+ that entry in the pack_perm table, resulting in:
+
+ (gdb) p *ctx->pack_perm@2
+ $1 = {0, 1515870810}
+
+ Which is what causes the segfault above when we try and read:
+
+ struct pack_info *pack = &ctx->info[ctx->pack_perm[i]];
+ if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
+ pack->bitmap_pos = 0;
+
+ Fundamentally, we should be able to read information from an existing
+ MIDX when generating a new one. But in practice the midx-write.c code
+ assumes that we won't run into issues like the above with incongruent
+ pack IDs, and often makes those assumptions in extremely subtle and
+ fragile ways.
+
+ Instead, let's avoid reading from an existing MIDX altogether, and stick
+ with the pre-d6a8c58675 implementation. Harden against any regressions
+ in this area by adding a test which demonstrates these issues.
+
Signed-off-by: Taylor Blau <me@ttaylorr.com>
- ## pack-bitmap.c ##
-@@ pack-bitmap.c: void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
- QSORT(packs, packs_nr, bitmapped_pack_cmp);
- } else {
- struct packed_git *pack;
-+ uint32_t pack_int_id;
-
- if (bitmap_is_midx(bitmap_git)) {
- uint32_t preferred_pack_pos;
-@@ pack-bitmap.c: void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
- }
-
- pack = bitmap_git->midx->packs[preferred_pack_pos];
-+ pack_int_id = preferred_pack_pos;
- } else {
- pack = bitmap_git->pack;
-+ /*
-+ * Any value for 'pack_int_id' will do here. When we
-+ * process the pack via try_partial_reuse(), we won't
-+ * use the `pack_int_id` field since we have a non-MIDX
-+ * bitmap.
-+ */
-+ pack_int_id = 0;
- }
-
- ALLOC_GROW(packs, packs_nr + 1, packs_alloc);
- packs[packs_nr].p = pack;
-+ packs[packs_nr].pack_int_id = pack_int_id;
- packs[packs_nr].bitmap_nr = pack->num_objects;
- packs[packs_nr].bitmap_pos = 0;
+ ## midx-write.c ##
+@@ midx-write.c: struct write_midx_context {
+ };
+ static int should_include_pack(const struct write_midx_context *ctx,
+- const char *file_name,
+- int exclude_from_midx)
++ const char *file_name)
+ {
+- if (exclude_from_midx && ctx->m && midx_contains_pack(ctx->m, file_name))
++ /*
++ * Note that at most one of ctx->m and ctx->to_include are set,
++ * so we are testing midx_contains_pack() and
++ * string_list_has_string() independently (guarded by the
++ * appropriate NULL checks).
++ *
++ * We could support passing to_include while reusing an existing
++ * MIDX, but don't currently since the reuse process drags
++ * forward all packs from an existing MIDX (without checking
++ * whether or not they appear in the to_include list).
++ *
++ * If we added support for that, these next two conditional
++ * should be performed independently (likely checking
++ * to_include before the existing MIDX).
++ */
++ if (ctx->m && midx_contains_pack(ctx->m, file_name))
+ return 0;
+- if (ctx->to_include && !string_list_has_string(ctx->to_include,
+- file_name))
++ else if (ctx->to_include &&
++ !string_list_has_string(ctx->to_include, file_name))
+ return 0;
+ return 1;
+ }
+@@ midx-write.c: static void add_pack_to_midx(const char *full_path, size_t full_path_len,
+ if (ends_with(file_name, ".idx")) {
+ display_progress(ctx->progress, ++ctx->pack_paths_checked);
+
+- if (!should_include_pack(ctx, file_name, 1))
++ if (!should_include_pack(ctx, file_name))
+ return;
+
+ ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
+@@ midx-write.c: static int fill_packs_from_midx(struct write_midx_context *ctx,
+ uint32_t i;
+
+ for (i = 0; i < ctx->m->num_packs; i++) {
+- if (!should_include_pack(ctx, ctx->m->pack_names[i], 0))
+- continue;
+-
+ ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
+
+ if (flags & MIDX_WRITE_REV_INDEX || preferred_pack_name) {
+@@ midx-write.c: static int write_midx_internal(const char *object_dir,
+ die_errno(_("unable to create leading directories of %s"),
+ midx_name.buf);
+
+- ctx.m = lookup_multi_pack_index(the_repository, object_dir);
++ if (!packs_to_include) {
++ /*
++ * Only reference an existing MIDX when not filtering which
++ * packs to include, since all packs and objects are copied
++ * blindly from an existing MIDX if one is present.
++ */
++ ctx.m = lookup_multi_pack_index(the_repository, object_dir);
++ }
++
+ if (ctx.m && !midx_checksum_valid(ctx.m)) {
+ warning(_("ignoring existing multi-pack-index; checksum mismatch"));
+ ctx.m = NULL;
+@@ midx-write.c: static int write_midx_internal(const char *object_dir,
+ ctx.nr = 0;
+ ctx.alloc = ctx.m ? ctx.m->num_packs : 16;
+ ctx.info = NULL;
+- ctx.to_include = packs_to_include;
+ ALLOC_ARRAY(ctx.info, ctx.alloc);
+
+ if (ctx.m && fill_packs_from_midx(&ctx, preferred_pack_name,
+@@ midx-write.c: static int write_midx_internal(const char *object_dir,
+ else
+ ctx.progress = NULL;
+
++ ctx.to_include = packs_to_include;
++
+ for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
+ stop_progress(&ctx.progress);
+
+
+ ## t/t5326-multi-pack-bitmaps.sh ##
+@@ t/t5326-multi-pack-bitmaps.sh: do
+ '
+ done
+
++test_expect_success 'remove one packfile between MIDX bitmap writes' '
++ git init remove-pack-between-writes &&
++ (
++ cd remove-pack-between-writes &&
++
++ test_commit A &&
++ test_commit B &&
++ test_commit C &&
++
++ # Create packs with the prefix "pack-A", "pack-B",
++ # "pack-C" to impose a lexicographic order on these
++ # packs so the pack being removed is always from the
++ # middle.
++ packdir=.git/objects/pack &&
++ A="$(echo A | git pack-objects $packdir/pack-A --revs)" &&
++ B="$(echo B | git pack-objects $packdir/pack-B --revs)" &&
++ C="$(echo C | git pack-objects $packdir/pack-C --revs)" &&
++
++ git multi-pack-index write --bitmap &&
++
++ cat >in <<-EOF &&
++ pack-A-$A.idx
++ pack-C-$C.idx
++ EOF
++ git multi-pack-index write --bitmap --stdin-packs <in &&
++
++ git rev-list --test-bitmap HEAD
++ )
++'
++
+ test_done
-: ---------- > 2: 4b006f44a5 pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse
-: ---------- > 3: 06de4005f1 pack-revindex.c: guard against out-of-bounds pack lookups
base-commit: 1b76f065085811104b5f4adff001956d7e5c5d1c
--
2.45.2.448.g06de4005f1
next prev parent reply other threads:[~2024-06-11 17:28 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-09 15:27 [PATCH] pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse Taylor Blau
2024-06-10 5:55 ` Patrick Steinhardt
2024-06-10 14:57 ` Taylor Blau
2024-06-11 8:12 ` Patrick Steinhardt
2024-06-10 20:10 ` [PATCH v2 0/3] midx: various brown paper bag fixes Taylor Blau
2024-06-10 20:10 ` [PATCH v2 1/3] midx-write.c: do not read existing MIDX with `packs_to_include` Taylor Blau
2024-06-10 20:10 ` [PATCH v2 2/3] pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse Taylor Blau
2024-06-11 9:11 ` Jeff King
2024-06-11 17:03 ` Junio C Hamano
2024-06-10 20:10 ` [PATCH v2 3/3] pack-revindex.c: guard against out-of-bounds pack lookups Taylor Blau
2024-06-11 17:28 ` Taylor Blau [this message]
2024-06-11 17:28 ` [PATCH v2 1/3] midx-write.c: do not read existing MIDX with `packs_to_include` Taylor Blau
2024-06-11 17:28 ` [PATCH v2 2/3] pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse Taylor Blau
2024-06-11 17:28 ` [PATCH v2 3/3] pack-revindex.c: guard against out-of-bounds pack lookups Taylor Blau
2024-06-11 17:31 ` [PATCH v2 0/3] midx: various brown paper bag fixes Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1718126886.git.me@ttaylorr.com \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
--cc=spectral@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).