* [PATCH 0/5] midx-write: fix segfault and do several cleanups
@ 2025-08-28 17:39 Derrick Stolee via GitGitGadget
2025-08-28 17:39 ` [PATCH 1/5] midx-write: only load initialized packs Derrick Stolee via GitGitGadget
` (6 more replies)
0 siblings, 7 replies; 45+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-08-28 17:39 UTC (permalink / raw)
To: git; +Cc: gitster, me, Derrick Stolee
I was motivated to start looking closely at midx-write.c due to multiple
users reporting Git crashes in their background maintenance, specifically
during git multi-pack-index repack calls. I was eventually able to reproduce
it in git multi-pack-index expire as well.
Patch 1 is the only change we need to fix this bug. It includes a test case
that will fail under --stress with SANITIZE=address. It requires creating
many packfiles (50 was not enough, but 100 is enough). As far as I can tell,
this bug has existed since Git 2.47.0 in October 2024, but I started hearing
reports of this from users in July 2025 (and took a while to get a
dump/repro).
The remaining patches are cleanups based on my careful rereading of
midx-write.c. There are some issues about error handling that needed some
cleanup as well as a removal of the DISABLE_SIGN_COMPARE_WARNINGS macro.
Thanks, -Stolee
Derrick Stolee (5):
midx-write: only load initialized packs
midx-write: put failing response value back
midx-write: use cleanup when incremental midx fails
midx-write: use uint32_t for preferred_pack_idx
midx-write: reenable signed comparison errors
midx-write.c | 118 ++++++++++++++++++------------------
t/t5319-multi-pack-index.sh | 17 ++++++
2 files changed, 75 insertions(+), 60 deletions(-)
base-commit: c44beea485f0f2feaf460e2ac87fdd5608d63cf0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1965%2Fderrickstolee%2Fmidx-write-cleanup-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1965/derrickstolee/midx-write-cleanup-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1965
--
gitgitgadget
^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH 1/5] midx-write: only load initialized packs
2025-08-28 17:39 [PATCH 0/5] midx-write: fix segfault and do several cleanups Derrick Stolee via GitGitGadget
@ 2025-08-28 17:39 ` Derrick Stolee via GitGitGadget
2025-08-28 20:19 ` Junio C Hamano
2025-08-29 1:20 ` Taylor Blau
2025-08-28 17:39 ` [PATCH 2/5] midx-write: put failing response value back Derrick Stolee via GitGitGadget
` (5 subsequent siblings)
6 siblings, 2 replies; 45+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-08-28 17:39 UTC (permalink / raw)
To: git; +Cc: gitster, me, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
The fill_packs_from_midx() method was refactored in fcb2205b77 (midx:
implement support for writing incremental MIDX chains, 2024-08-06) to
allow for preferred packfiles and incremental multi-pack-indexes.
However, this led to some conditions that can cause improperly
initialized memory in the context's list of packfiles.
The conditions caring about the preferred pack name or the incremental
flag are currently necessary to load a packfile. But the context is
still being populated with pack_info structs based on the packfile array
for the existing multi-pack-index even if prepare_midx_pack() isn't
called.
Add a new test that breaks under --stress when compiled with
SANITIZE=address. The chosen number of 100 packfiles was selected to get
the --stress output to fail about 50% of the time, while 50 packfiles
could not get a failure in most --stress runs. This test has a very
minor check at the end confirming only one packfile remaining. The
failing nature of this test actually relies on auto-GC cleaning up some
packfiles during the creation of the commits, as tests setting gc.auto
to zero make the packfile count match the number of added commits but
also avoids hitting the memory issue.
The test case is marked as EXPENSIVE not only because of the number of
packfiles it creates, but because some CI environments were reporting
errors during the test that I could not reproduce, specifically around
being unable to open the packfiles or their pack-indexes.
When it fails under SANITIZE=address, it provides the following error:
AddressSanitizer:DEADLYSIGNAL
=================================================================
==3263517==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000027
==3263517==The signal is caused by a READ memory access.
==3263517==Hint: address points to the zero page.
#0 0x562d5d82d1fb in close_pack_windows packfile.c:299
#1 0x562d5d82d3ab in close_pack packfile.c:354
#2 0x562d5d7bfdb4 in write_midx_internal midx-write.c:1490
#3 0x562d5d7c7aec in midx_repack midx-write.c:1795
#4 0x562d5d46fff6 in cmd_multi_pack_index builtin/multi-pack-index.c:305
...
This failure stack trace is disconnected from the real fix because it
the bad pointers are accessed later when closing the packfiles from the
context.
There are a few different aspects to this fix that are worth noting:
1. We return to the previous behavior of fill_packs_from_midx to not
rely on the incremental flag or existence of a preferred pack.
2. The behavior to scan all layers of an incremental midx is kept, so
this is not a full revert of the change.
3. We skip allocating more room in the pack_info array if the pack
fails prepare_midx_pack().
4. The method has always returned 0 for success and 1 for failure, but
the condition checking for error added a check for a negative result
for failure, so that is now updated.
5. The call to open_pack_index() is removed, but this is needed later
in the case of a preferred pack. That call is moved to immediately
before its result is needed (checking for the object count).
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
midx-write.c | 38 ++++++++++++-------------------------
t/t5319-multi-pack-index.sh | 17 +++++++++++++++++
2 files changed, 29 insertions(+), 26 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index a0aceab5e0..d8f9679868 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -920,8 +920,7 @@ static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
return get_multi_pack_index(source);
}
-static int fill_packs_from_midx(struct write_midx_context *ctx,
- const char *preferred_pack_name, uint32_t flags)
+static int fill_packs_from_midx(struct write_midx_context *ctx)
{
struct multi_pack_index *m;
@@ -929,30 +928,13 @@ static int fill_packs_from_midx(struct write_midx_context *ctx,
uint32_t i;
for (i = 0; i < m->num_packs; i++) {
- ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
-
- /*
- * If generating a reverse index, need to have
- * packed_git's loaded to compare their
- * mtimes and object count.
- *
- * If a preferred pack is specified, need to
- * have packed_git's loaded to ensure the chosen
- * preferred pack has a non-zero object count.
- */
- if (flags & MIDX_WRITE_REV_INDEX ||
- preferred_pack_name) {
- if (prepare_midx_pack(ctx->repo, m,
- m->num_packs_in_base + i)) {
- error(_("could not load pack"));
- return 1;
- }
-
- if (open_pack_index(m->packs[i]))
- die(_("could not open index for %s"),
- m->packs[i]->pack_name);
+ if (prepare_midx_pack(ctx->repo, m,
+ m->num_packs_in_base + i)) {
+ error(_("could not load pack"));
+ return 1;
}
+ ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
fill_pack_info(&ctx->info[ctx->nr++], m->packs[i],
m->pack_names[i],
m->num_packs_in_base + i);
@@ -1123,8 +1105,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
ctx.num_multi_pack_indexes_before++;
m = m->base_midx;
}
- } else if (ctx.m && fill_packs_from_midx(&ctx, preferred_pack_name,
- flags) < 0) {
+ } else if (ctx.m && fill_packs_from_midx(&ctx)) {
goto cleanup;
}
@@ -1223,6 +1204,11 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
if (ctx.preferred_pack_idx > -1) {
struct packed_git *preferred = ctx.info[ctx.preferred_pack_idx].p;
+
+ if (open_pack_index(preferred))
+ die(_("failed to open preferred pack %s"),
+ ctx.info[ctx.preferred_pack_idx].pack_name);
+
if (!preferred->num_objects) {
error(_("cannot select preferred pack %s with no objects"),
preferred->pack_name);
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index bd75dea950..49705c62a2 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -989,6 +989,23 @@ test_expect_success 'repack --batch-size=0 repacks everything' '
)
'
+test_expect_success EXPENSIVE 'repack/expire with many packs' '
+ cp -r dup many &&
+ (
+ cd many &&
+
+ for i in $(test_seq 1 100)
+ do
+ test_commit extra$i &&
+ git maintenance run --task=loose-objects || return 1
+ done &&
+
+ git multi-pack-index write &&
+ git multi-pack-index repack &&
+ git multi-pack-index expire
+ )
+'
+
test_expect_success 'repack --batch-size=<large> repacks everything' '
(
cd dup2 &&
--
gitgitgadget
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 2/5] midx-write: put failing response value back
2025-08-28 17:39 [PATCH 0/5] midx-write: fix segfault and do several cleanups Derrick Stolee via GitGitGadget
2025-08-28 17:39 ` [PATCH 1/5] midx-write: only load initialized packs Derrick Stolee via GitGitGadget
@ 2025-08-28 17:39 ` Derrick Stolee via GitGitGadget
2025-08-28 20:45 ` Junio C Hamano
2025-08-28 17:39 ` [PATCH 3/5] midx-write: use cleanup when incremental midx fails Derrick Stolee via GitGitGadget
` (4 subsequent siblings)
6 siblings, 1 reply; 45+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-08-28 17:39 UTC (permalink / raw)
To: git; +Cc: gitster, me, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
This instance of setting the result to 1 before going to cleanup was
accidentally removed in fcb2205b77 (midx: implement support for writing
incremental MIDX chains, 2024-08-06).
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
midx-write.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/midx-write.c b/midx-write.c
index d8f9679868..85b2d471ef 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1106,6 +1106,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
m = m->base_midx;
}
} else if (ctx.m && fill_packs_from_midx(&ctx)) {
+ result = 1;
goto cleanup;
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 3/5] midx-write: use cleanup when incremental midx fails
2025-08-28 17:39 [PATCH 0/5] midx-write: fix segfault and do several cleanups Derrick Stolee via GitGitGadget
2025-08-28 17:39 ` [PATCH 1/5] midx-write: only load initialized packs Derrick Stolee via GitGitGadget
2025-08-28 17:39 ` [PATCH 2/5] midx-write: put failing response value back Derrick Stolee via GitGitGadget
@ 2025-08-28 17:39 ` Derrick Stolee via GitGitGadget
2025-08-28 20:51 ` Junio C Hamano
2025-08-28 17:39 ` [PATCH 4/5] midx-write: use uint32_t for preferred_pack_idx Derrick Stolee via GitGitGadget
` (3 subsequent siblings)
6 siblings, 1 reply; 45+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-08-28 17:39 UTC (permalink / raw)
To: git; +Cc: gitster, me, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
The incremental mode of writing a multi-pack-index has a few extra
conditions that could lead to failure, but these are currently
short-ciruiting with 'return -1' instead of setting the method's
'result' variable and going to the cleanup tag.
Replace these returns with gotos to avoid memory issues when exiting
early due to error conditions.
Unfortunately, these error conditions are difficult to reproduce with
test cases, which is perhaps one reason why the memory loss was not
caught by existing test cases in memory tracking modes.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
midx-write.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index 85b2d471ef..f2d9a990e6 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1321,13 +1321,15 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
incr = mks_tempfile_m(midx_name.buf, 0444);
if (!incr) {
error(_("unable to create temporary MIDX layer"));
- return -1;
+ result = -1;
+ goto cleanup;
}
if (adjust_shared_perm(r, get_tempfile_path(incr))) {
error(_("unable to adjust shared permissions for '%s'"),
get_tempfile_path(incr));
- return -1;
+ result = -1;
+ goto cleanup;
}
f = hashfd(r->hash_algo, get_tempfile_fd(incr),
@@ -1427,18 +1429,22 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
if (!chainf) {
error_errno(_("unable to open multi-pack-index chain file"));
- return -1;
+ result = -1;
+ goto cleanup;
}
- if (link_midx_to_chain(ctx.base_midx) < 0)
- return -1;
+ if (link_midx_to_chain(ctx.base_midx) < 0) {
+ result = -1;
+ goto cleanup;
+ }
get_split_midx_filename_ext(r->hash_algo, &final_midx_name,
object_dir, midx_hash, MIDX_EXT_MIDX);
if (rename_tempfile(&incr, final_midx_name.buf) < 0) {
error_errno(_("unable to rename new multi-pack-index layer"));
- return -1;
+ result = -1;
+ goto cleanup;
}
strbuf_release(&final_midx_name);
--
gitgitgadget
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 4/5] midx-write: use uint32_t for preferred_pack_idx
2025-08-28 17:39 [PATCH 0/5] midx-write: fix segfault and do several cleanups Derrick Stolee via GitGitGadget
` (2 preceding siblings ...)
2025-08-28 17:39 ` [PATCH 3/5] midx-write: use cleanup when incremental midx fails Derrick Stolee via GitGitGadget
@ 2025-08-28 17:39 ` Derrick Stolee via GitGitGadget
2025-08-28 20:58 ` Junio C Hamano
2025-08-29 1:35 ` Taylor Blau
2025-08-28 17:39 ` [PATCH 5/5] midx-write: reenable signed comparison errors Derrick Stolee via GitGitGadget
` (2 subsequent siblings)
6 siblings, 2 replies; 45+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-08-28 17:39 UTC (permalink / raw)
To: git; +Cc: gitster, me, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
midx-write.c has the DISABLE_SIGN_COMPARE_WARNINGS macro defined for a
few reasons, but the biggest one is the use of a signed
preferred_pack_idx member inside the write_midx_context struct. The code
currently uses -1 to indicate an unset preferred pack but pack int ids
are normally handled as uint32_t. There are also a few loops that search
for the preferred pack by name and those iterators will need updates to
uint32_t in the next change.
For now, replace the use of -1 with a 'NO_PREFERRED_PACK' macro and an
equality check. The macro stores the max value of a uint32_t, so we
cannot store a preferred pack that appears last in a list of 2^32 total
packs, but that's expected to be unreasonable already. This improves the
range from 2^31 already.
There are some careful things to worry about with initializing the
preferred pack in the struct and using that value when searching for a
preferred pack that was already incorrect but accidentally working when
the index was initialized to zero.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
midx-write.c | 26 +++++++++++++++-----------
1 file changed, 15 insertions(+), 11 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index f2d9a990e6..ea1b3a199c 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -24,6 +24,7 @@
#define BITMAP_POS_UNKNOWN (~((uint32_t)0))
#define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256)
#define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t))
+#define NO_PREFERRED_PACK (~((uint32_t)0))
extern int midx_checksum_valid(struct multi_pack_index *m);
extern void clear_midx_files_ext(const char *object_dir, const char *ext,
@@ -104,7 +105,7 @@ struct write_midx_context {
unsigned large_offsets_needed:1;
uint32_t num_large_offsets;
- int preferred_pack_idx;
+ uint32_t preferred_pack_idx;
int incremental;
uint32_t num_multi_pack_indexes_before;
@@ -260,7 +261,7 @@ static void midx_fanout_sort(struct midx_fanout *fanout)
static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
struct multi_pack_index *m,
uint32_t cur_fanout,
- int preferred_pack)
+ uint32_t preferred_pack)
{
uint32_t start = m->num_objects_in_base, end;
uint32_t cur_object;
@@ -274,7 +275,7 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
end = m->num_objects_in_base + ntohl(m->chunk_oid_fanout[cur_fanout]);
for (cur_object = start; cur_object < end; cur_object++) {
- if ((preferred_pack > -1) &&
+ if ((preferred_pack != NO_PREFERRED_PACK) &&
(preferred_pack == nth_midxed_pack_int_id(m, cur_object))) {
/*
* Objects from preferred packs are added
@@ -364,7 +365,8 @@ static void compute_sorted_entries(struct write_midx_context *ctx,
preferred, cur_fanout);
}
- if (-1 < ctx->preferred_pack_idx && ctx->preferred_pack_idx < start_pack)
+ if (ctx->preferred_pack_idx != NO_PREFERRED_PACK &&
+ ctx->preferred_pack_idx < start_pack)
midx_fanout_add_pack_fanout(&fanout, ctx->info,
ctx->preferred_pack_idx, 1,
cur_fanout);
@@ -1042,7 +1044,9 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
struct hashfile *f = NULL;
struct lock_file lk;
struct tempfile *incr;
- struct write_midx_context ctx = { 0 };
+ struct write_midx_context ctx = {
+ .preferred_pack_idx = NO_PREFERRED_PACK,
+ };
int bitmapped_packs_concat_len = 0;
int pack_name_concat_len = 0;
int dropped_packs = 0;
@@ -1150,7 +1154,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
goto cleanup; /* nothing to do */
if (preferred_pack_name) {
- ctx.preferred_pack_idx = -1;
+ ctx.preferred_pack_idx = NO_PREFERRED_PACK;
for (i = 0; i < ctx.nr; i++) {
if (!cmp_idx_or_pack_name(preferred_pack_name,
@@ -1160,12 +1164,12 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
}
}
- if (ctx.preferred_pack_idx == -1)
+ if (ctx.preferred_pack_idx == NO_PREFERRED_PACK)
warning(_("unknown preferred pack: '%s'"),
preferred_pack_name);
} else if (ctx.nr &&
(flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))) {
- struct packed_git *oldest = ctx.info[ctx.preferred_pack_idx].p;
+ struct packed_git *oldest = ctx.info[0].p;
ctx.preferred_pack_idx = 0;
if (packs_to_drop && packs_to_drop->nr)
@@ -1193,17 +1197,17 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
* objects to resolve, so the preferred value doesn't
* matter.
*/
- ctx.preferred_pack_idx = -1;
+ ctx.preferred_pack_idx = NO_PREFERRED_PACK;
}
} else {
/*
* otherwise don't mark any pack as preferred to avoid
* interfering with expiration logic below
*/
- ctx.preferred_pack_idx = -1;
+ ctx.preferred_pack_idx = NO_PREFERRED_PACK;
}
- if (ctx.preferred_pack_idx > -1) {
+ if (ctx.preferred_pack_idx != NO_PREFERRED_PACK) {
struct packed_git *preferred = ctx.info[ctx.preferred_pack_idx].p;
if (open_pack_index(preferred))
--
gitgitgadget
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 5/5] midx-write: reenable signed comparison errors
2025-08-28 17:39 [PATCH 0/5] midx-write: fix segfault and do several cleanups Derrick Stolee via GitGitGadget
` (3 preceding siblings ...)
2025-08-28 17:39 ` [PATCH 4/5] midx-write: use uint32_t for preferred_pack_idx Derrick Stolee via GitGitGadget
@ 2025-08-28 17:39 ` Derrick Stolee via GitGitGadget
2025-08-28 21:01 ` Junio C Hamano
2025-08-29 1:36 ` [PATCH 0/5] midx-write: fix segfault and do several cleanups Taylor Blau
2025-08-30 21:23 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
6 siblings, 1 reply; 45+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-08-28 17:39 UTC (permalink / raw)
To: git; +Cc: gitster, me, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
Remove the remaining signed comparison warnings in midx-write.c so that
they can be enforced as errors in the future. After the previous change,
the remaining errors are due to iterator variables named 'i'.
The strategy here involves defining the variable within the for loop
syntax to make sure we use the appropriate bitness for the loop
sentinel. This matters in at least one method where the variable was
compared to uint32_t in some loops and size_t in others.
While adjusting these loops, there were some where the loop boundary was
checking against a uint32_t value _plus one_. These were replaced with
non-strict comparisons, but also the value is checked to not be
UINT32_MAX. Since the value is the number of incremental multi-pack-
indexes, this is not a meaningful restriction. The new die() is about
defensive programming more than it being realistically possible.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
midx-write.c | 35 ++++++++++++++++++-----------------
1 file changed, 18 insertions(+), 17 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index ea1b3a199c..93f33e5baa 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1,5 +1,3 @@
-#define DISABLE_SIGN_COMPARE_WARNINGS
-
#include "git-compat-util.h"
#include "abspath.h"
#include "config.h"
@@ -845,7 +843,7 @@ static int write_midx_bitmap(struct write_midx_context *ctx,
uint32_t commits_nr,
unsigned flags)
{
- int ret, i;
+ int ret;
uint16_t options = 0;
struct bitmap_writer writer;
struct pack_idx_entry **index;
@@ -873,7 +871,7 @@ static int write_midx_bitmap(struct write_midx_context *ctx,
* this order).
*/
ALLOC_ARRAY(index, pdata->nr_objects);
- for (i = 0; i < pdata->nr_objects; i++)
+ for (uint32_t i = 0; i < pdata->nr_objects; i++)
index[i] = &pdata->objects[i].idx;
bitmap_writer_init(&writer, ctx->repo, pdata,
@@ -894,7 +892,7 @@ static int write_midx_bitmap(struct write_midx_context *ctx,
* happens between bitmap_writer_build_type_index() and
* bitmap_writer_finish().
*/
- for (i = 0; i < pdata->nr_objects; i++)
+ for (uint32_t i = 0; i < pdata->nr_objects; i++)
index[ctx->pack_order[i]] = &pdata->objects[i].idx;
bitmap_writer_select_commits(&writer, commits, commits_nr);
@@ -1040,7 +1038,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
{
struct strbuf midx_name = STRBUF_INIT;
unsigned char midx_hash[GIT_MAX_RAWSZ];
- uint32_t i, start_pack;
+ uint32_t start_pack;
struct hashfile *f = NULL;
struct lock_file lk;
struct tempfile *incr;
@@ -1156,7 +1154,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
if (preferred_pack_name) {
ctx.preferred_pack_idx = NO_PREFERRED_PACK;
- for (i = 0; i < ctx.nr; i++) {
+ for (size_t i = 0; i < ctx.nr; i++) {
if (!cmp_idx_or_pack_name(preferred_pack_name,
ctx.info[i].pack_name)) {
ctx.preferred_pack_idx = i;
@@ -1181,7 +1179,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
* pack-order has all of its objects selected from that pack
* (and not another pack containing a duplicate)
*/
- for (i = 1; i < ctx.nr; i++) {
+ for (size_t i = 1; i < ctx.nr; i++) {
struct packed_git *p = ctx.info[i].p;
if (!oldest->num_objects || p->mtime < oldest->mtime) {
@@ -1225,7 +1223,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
compute_sorted_entries(&ctx, start_pack);
ctx.large_offsets_needed = 0;
- for (i = 0; i < ctx.entries_nr; i++) {
+ for (size_t i = 0; i < ctx.entries_nr; i++) {
if (ctx.entries[i].offset > 0x7fffffff)
ctx.num_large_offsets++;
if (ctx.entries[i].offset > 0xffffffff)
@@ -1235,10 +1233,10 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
QSORT(ctx.info, ctx.nr, pack_info_compare);
if (packs_to_drop && packs_to_drop->nr) {
- int drop_index = 0;
+ size_t drop_index = 0;
int missing_drops = 0;
- for (i = 0; i < ctx.nr && drop_index < packs_to_drop->nr; i++) {
+ for (size_t i = 0; i < ctx.nr && drop_index < packs_to_drop->nr; i++) {
int cmp = strcmp(ctx.info[i].pack_name,
packs_to_drop->items[drop_index].string);
@@ -1269,7 +1267,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
* pack_perm[old_id] = new_id
*/
ALLOC_ARRAY(ctx.pack_perm, ctx.nr);
- for (i = 0; i < ctx.nr; i++) {
+ for (size_t i = 0; i < ctx.nr; i++) {
if (ctx.info[i].expired) {
dropped_packs++;
ctx.pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED;
@@ -1278,7 +1276,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
}
}
- for (i = 0; i < ctx.nr; i++) {
+ for (size_t i = 0; i < ctx.nr; i++) {
if (ctx.info[i].expired)
continue;
pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1;
@@ -1424,6 +1422,9 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
* have been freed in the previous if block.
*/
+ if (ctx.num_multi_pack_indexes_before == UINT32_MAX)
+ die("too many multi-pack-indexes");
+
CALLOC_ARRAY(keep_hashes, ctx.num_multi_pack_indexes_before + 1);
if (ctx.incremental) {
@@ -1456,7 +1457,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
keep_hashes[ctx.num_multi_pack_indexes_before] =
xstrdup(hash_to_hex_algop(midx_hash, r->hash_algo));
- for (i = 0; i < ctx.num_multi_pack_indexes_before; i++) {
+ for (uint32_t i = 0; i < ctx.num_multi_pack_indexes_before; i++) {
uint32_t j = ctx.num_multi_pack_indexes_before - i - 1;
keep_hashes[j] = xstrdup(hash_to_hex_algop(get_midx_checksum(m),
@@ -1464,7 +1465,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
m = m->base_midx;
}
- for (i = 0; i < ctx.num_multi_pack_indexes_before + 1; i++)
+ for (uint32_t i = 0; i <= ctx.num_multi_pack_indexes_before; i++)
fprintf(get_lock_file_fp(&lk), "%s\n", keep_hashes[i]);
} else {
keep_hashes[ctx.num_multi_pack_indexes_before] =
@@ -1482,7 +1483,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
ctx.incremental);
cleanup:
- for (i = 0; i < ctx.nr; i++) {
+ for (size_t i = 0; i < ctx.nr; i++) {
if (ctx.info[i].p) {
close_pack(ctx.info[i].p);
free(ctx.info[i].p);
@@ -1495,7 +1496,7 @@ cleanup:
free(ctx.pack_perm);
free(ctx.pack_order);
if (keep_hashes) {
- for (i = 0; i < ctx.num_multi_pack_indexes_before + 1; i++)
+ for (uint32_t i = 0; i <= ctx.num_multi_pack_indexes_before; i++)
free((char *)keep_hashes[i]);
free(keep_hashes);
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH 1/5] midx-write: only load initialized packs
2025-08-28 17:39 ` [PATCH 1/5] midx-write: only load initialized packs Derrick Stolee via GitGitGadget
@ 2025-08-28 20:19 ` Junio C Hamano
2025-08-29 1:20 ` Taylor Blau
1 sibling, 0 replies; 45+ messages in thread
From: Junio C Hamano @ 2025-08-28 20:19 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, me, Derrick Stolee
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> The fill_packs_from_midx() method was refactored in fcb2205b77 (midx:
> implement support for writing incremental MIDX chains, 2024-08-06) to
> allow for preferred packfiles and incremental multi-pack-indexes.
> However, this led to some conditions that can cause improperly
> initialized memory in the context's list of packfiles.
>
> The conditions caring about the preferred pack name or the incremental
> flag are currently necessary to load a packfile. But the context is
> still being populated with pack_info structs based on the packfile array
> for the existing multi-pack-index even if prepare_midx_pack() isn't
> called.
>
> Add a new test that breaks under --stress when compiled with
> SANITIZE=address. The chosen number of 100 packfiles was selected to get
> the --stress output to fail about 50% of the time, while 50 packfiles
> could not get a failure in most --stress runs. This test has a very
> minor check at the end confirming only one packfile remaining. The
> failing nature of this test actually relies on auto-GC cleaning up some
> packfiles during the creation of the commits, as tests setting gc.auto
> to zero make the packfile count match the number of added commits but
> also avoids hitting the memory issue.
>
> The test case is marked as EXPENSIVE not only because of the number of
> packfiles it creates, but because some CI environments were reporting
> errors during the test that I could not reproduce, specifically around
> being unable to open the packfiles or their pack-indexes.
>
> When it fails under SANITIZE=address, it provides the following error:
>
> AddressSanitizer:DEADLYSIGNAL
> =================================================================
> ==3263517==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000027
> ==3263517==The signal is caused by a READ memory access.
> ==3263517==Hint: address points to the zero page.
> #0 0x562d5d82d1fb in close_pack_windows packfile.c:299
> #1 0x562d5d82d3ab in close_pack packfile.c:354
> #2 0x562d5d7bfdb4 in write_midx_internal midx-write.c:1490
> #3 0x562d5d7c7aec in midx_repack midx-write.c:1795
> #4 0x562d5d46fff6 in cmd_multi_pack_index builtin/multi-pack-index.c:305
> ...
>
> This failure stack trace is disconnected from the real fix because it
> the bad pointers are accessed later when closing the packfiles from the
> context.
"because it the bad pointers" -> ??? Perhaps just drop "it"?
> There are a few different aspects to this fix that are worth noting:
>
> 1. We return to the previous behavior of fill_packs_from_midx to not
> rely on the incremental flag or existence of a preferred pack.
>
> 2. The behavior to scan all layers of an incremental midx is kept, so
> this is not a full revert of the change.
>
> 3. We skip allocating more room in the pack_info array if the pack
> fails prepare_midx_pack().
>
> 4. The method has always returned 0 for success and 1 for failure, but
> the condition checking for error added a check for a negative result
> for failure, so that is now updated.
So did the callee think it is signalling an error, but the sole
caller was not taking that as an error, and that was one of the bugs
this change fixes?
Even if that is the case, I still do not understand why we want to
say in the callee
error("message");
return 1;
and adjust the caller to it, when we can just say
return error("message");
in the callee(), especially if the only caller is expecting to be
signalled by a negative return value for an error already.
> 5. The call to open_pack_index() is removed, but this is needed later
> in the case of a preferred pack. That call is moved to immediately
> before its result is needed (checking for the object count).
>
> Signed-off-by: Derrick Stolee <stolee@gmail.com>
> ---
> midx-write.c | 38 ++++++++++++-------------------------
> t/t5319-multi-pack-index.sh | 17 +++++++++++++++++
> 2 files changed, 29 insertions(+), 26 deletions(-)
Thanks.
> diff --git a/midx-write.c b/midx-write.c
> index a0aceab5e0..d8f9679868 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -920,8 +920,7 @@ static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
> return get_multi_pack_index(source);
> }
>
> -static int fill_packs_from_midx(struct write_midx_context *ctx,
> - const char *preferred_pack_name, uint32_t flags)
> +static int fill_packs_from_midx(struct write_midx_context *ctx)
> {
> struct multi_pack_index *m;
>
> @@ -929,30 +928,13 @@ static int fill_packs_from_midx(struct write_midx_context *ctx,
> uint32_t i;
>
> for (i = 0; i < m->num_packs; i++) {
> - ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
> -
> - /*
> - * If generating a reverse index, need to have
> - * packed_git's loaded to compare their
> - * mtimes and object count.
> - *
> - * If a preferred pack is specified, need to
> - * have packed_git's loaded to ensure the chosen
> - * preferred pack has a non-zero object count.
> - */
> - if (flags & MIDX_WRITE_REV_INDEX ||
> - preferred_pack_name) {
> - if (prepare_midx_pack(ctx->repo, m,
> - m->num_packs_in_base + i)) {
> - error(_("could not load pack"));
> - return 1;
> - }
> -
> - if (open_pack_index(m->packs[i]))
> - die(_("could not open index for %s"),
> - m->packs[i]->pack_name);
> + if (prepare_midx_pack(ctx->repo, m,
> + m->num_packs_in_base + i)) {
> + error(_("could not load pack"));
> + return 1;
> }
>
> + ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
> fill_pack_info(&ctx->info[ctx->nr++], m->packs[i],
> m->pack_names[i],
> m->num_packs_in_base + i);
> @@ -1123,8 +1105,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
> ctx.num_multi_pack_indexes_before++;
> m = m->base_midx;
> }
> - } else if (ctx.m && fill_packs_from_midx(&ctx, preferred_pack_name,
> - flags) < 0) {
> + } else if (ctx.m && fill_packs_from_midx(&ctx)) {
> goto cleanup;
> }
>
> @@ -1223,6 +1204,11 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
>
> if (ctx.preferred_pack_idx > -1) {
> struct packed_git *preferred = ctx.info[ctx.preferred_pack_idx].p;
> +
> + if (open_pack_index(preferred))
> + die(_("failed to open preferred pack %s"),
> + ctx.info[ctx.preferred_pack_idx].pack_name);
> +
> if (!preferred->num_objects) {
> error(_("cannot select preferred pack %s with no objects"),
> preferred->pack_name);
> diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> index bd75dea950..49705c62a2 100755
> --- a/t/t5319-multi-pack-index.sh
> +++ b/t/t5319-multi-pack-index.sh
> @@ -989,6 +989,23 @@ test_expect_success 'repack --batch-size=0 repacks everything' '
> )
> '
>
> +test_expect_success EXPENSIVE 'repack/expire with many packs' '
> + cp -r dup many &&
> + (
> + cd many &&
> +
> + for i in $(test_seq 1 100)
> + do
> + test_commit extra$i &&
> + git maintenance run --task=loose-objects || return 1
> + done &&
> +
> + git multi-pack-index write &&
> + git multi-pack-index repack &&
> + git multi-pack-index expire
> + )
> +'
> +
> test_expect_success 'repack --batch-size=<large> repacks everything' '
> (
> cd dup2 &&
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 2/5] midx-write: put failing response value back
2025-08-28 17:39 ` [PATCH 2/5] midx-write: put failing response value back Derrick Stolee via GitGitGadget
@ 2025-08-28 20:45 ` Junio C Hamano
2025-08-29 1:26 ` Taylor Blau
0 siblings, 1 reply; 45+ messages in thread
From: Junio C Hamano @ 2025-08-28 20:45 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, me, Derrick Stolee
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Derrick Stolee <stolee@gmail.com>
>
> This instance of setting the result to 1 before going to cleanup was
> accidentally removed in fcb2205b77 (midx: implement support for writing
> incremental MIDX chains, 2024-08-06).
>
> Signed-off-by: Derrick Stolee <stolee@gmail.com>
> ---
> midx-write.c | 1 +
> 1 file changed, 1 insertion(+)
The cover letter made it sound as if [1/5] was the only fix and the
rest was clean-up, but unless all callers of write_midx_internal()
ignores the return value from it, this surely would change the
behaviour of the program, no?
And the results from write-midx_file_only(), write_midx_file(), and
expire_midx_packs(), the three callers of this _internal() function,
all seem to be used in builtin/multi-pack-index.c so wouldn't this
also be a fix?
Not that I endorse "0 is success and any non-zero value is an error",
but this does not look like a mere clean-up to me.
> diff --git a/midx-write.c b/midx-write.c
> index d8f9679868..85b2d471ef 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -1106,6 +1106,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
> m = m->base_midx;
> }
> } else if (ctx.m && fill_packs_from_midx(&ctx)) {
> + result = 1;
> goto cleanup;
> }
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 3/5] midx-write: use cleanup when incremental midx fails
2025-08-28 17:39 ` [PATCH 3/5] midx-write: use cleanup when incremental midx fails Derrick Stolee via GitGitGadget
@ 2025-08-28 20:51 ` Junio C Hamano
2025-08-29 1:29 ` Taylor Blau
0 siblings, 1 reply; 45+ messages in thread
From: Junio C Hamano @ 2025-08-28 20:51 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, me, Derrick Stolee
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Derrick Stolee <stolee@gmail.com>
>
> The incremental mode of writing a multi-pack-index has a few extra
> conditions that could lead to failure, but these are currently
> short-ciruiting with 'return -1' instead of setting the method's
> 'result' variable and going to the cleanup tag.
>
> Replace these returns with gotos to avoid memory issues when exiting
> early due to error conditions.
>
> Unfortunately, these error conditions are difficult to reproduce with
> test cases, which is perhaps one reason why the memory loss was not
> caught by existing test cases in memory tracking modes.
>
> Signed-off-by: Derrick Stolee <stolee@gmail.com>
> ---
> midx-write.c | 18 ++++++++++++------
> 1 file changed, 12 insertions(+), 6 deletions(-)
Good thinking, but may I suggest us to go one more step to adopt
even better convention if we were to do this?
Pessimistically initialize the "result" to -1 and let many "goto
cleanup" just jump there. And have "result = 0" just before the
cleanup label where the success code path joins the final cleanup
part of the function.
This is often the right way to make the flow easier to see, because
often the success code path is straight forward, and these error
conditions are what employ the "goto cleanup" from many places. By
starting pessimistic, and declaring the success at the very end of
the straight-forward success case code path, all other flows to the
clean-up labels do not have to set the "ah I failed" flag. It would
eliminate the need for patches like the previous step if the
original were following that pattern.
> diff --git a/midx-write.c b/midx-write.c
> index 85b2d471ef..f2d9a990e6 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -1321,13 +1321,15 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
> incr = mks_tempfile_m(midx_name.buf, 0444);
> if (!incr) {
> error(_("unable to create temporary MIDX layer"));
> - return -1;
> + result = -1;
> + goto cleanup;
> }
>
> if (adjust_shared_perm(r, get_tempfile_path(incr))) {
> error(_("unable to adjust shared permissions for '%s'"),
> get_tempfile_path(incr));
> - return -1;
> + result = -1;
> + goto cleanup;
> }
>
> f = hashfd(r->hash_algo, get_tempfile_fd(incr),
> @@ -1427,18 +1429,22 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
>
> if (!chainf) {
> error_errno(_("unable to open multi-pack-index chain file"));
> - return -1;
> + result = -1;
> + goto cleanup;
> }
>
> - if (link_midx_to_chain(ctx.base_midx) < 0)
> - return -1;
> + if (link_midx_to_chain(ctx.base_midx) < 0) {
> + result = -1;
> + goto cleanup;
> + }
>
> get_split_midx_filename_ext(r->hash_algo, &final_midx_name,
> object_dir, midx_hash, MIDX_EXT_MIDX);
>
> if (rename_tempfile(&incr, final_midx_name.buf) < 0) {
> error_errno(_("unable to rename new multi-pack-index layer"));
> - return -1;
> + result = -1;
> + goto cleanup;
> }
>
> strbuf_release(&final_midx_name);
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 4/5] midx-write: use uint32_t for preferred_pack_idx
2025-08-28 17:39 ` [PATCH 4/5] midx-write: use uint32_t for preferred_pack_idx Derrick Stolee via GitGitGadget
@ 2025-08-28 20:58 ` Junio C Hamano
2025-08-29 1:35 ` Taylor Blau
1 sibling, 0 replies; 45+ messages in thread
From: Junio C Hamano @ 2025-08-28 20:58 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, me, Derrick Stolee
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> For now, replace the use of -1 with a 'NO_PREFERRED_PACK' macro and an
> equality check. The macro stores the max value of a uint32_t, so we
> cannot store a preferred pack that appears last in a list of 2^32 total
> packs, but that's expected to be unreasonable already. This improves the
> range from 2^31 already.
;-)
I very much like this change. An obvious alternative may be to use
int instead of uint32_t to number and index into in-core packs, as
we all know that not just 2^32 but 2^31 is still unreasonably too
many anyway.
> There are some careful things to worry about with initializing the
> preferred pack in the struct and using that value when searching for a
> preferred pack that was already incorrect but accidentally working when
> the index was initialized to zero.
True.
> - struct write_midx_context ctx = { 0 };
> + struct write_midx_context ctx = {
> + .preferred_pack_idx = NO_PREFERRED_PACK,
> + };
Good.
> if (preferred_pack_name) {
> - ctx.preferred_pack_idx = -1;
> + ctx.preferred_pack_idx = NO_PREFERRED_PACK;
This too.
Thanks.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 5/5] midx-write: reenable signed comparison errors
2025-08-28 17:39 ` [PATCH 5/5] midx-write: reenable signed comparison errors Derrick Stolee via GitGitGadget
@ 2025-08-28 21:01 ` Junio C Hamano
2025-08-29 1:35 ` Taylor Blau
0 siblings, 1 reply; 45+ messages in thread
From: Junio C Hamano @ 2025-08-28 21:01 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, me, Derrick Stolee
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> The strategy here involves defining the variable within the for loop
> syntax to make sure we use the appropriate bitness for the loop
> sentinel. This matters in at least one method where the variable was
> compared to uint32_t in some loops and size_t in others.
Sounds like a fine way to avoid accidentally breaking these loops.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/5] midx-write: only load initialized packs
2025-08-28 17:39 ` [PATCH 1/5] midx-write: only load initialized packs Derrick Stolee via GitGitGadget
2025-08-28 20:19 ` Junio C Hamano
@ 2025-08-29 1:20 ` Taylor Blau
2025-08-30 14:33 ` Derrick Stolee
1 sibling, 1 reply; 45+ messages in thread
From: Taylor Blau @ 2025-08-29 1:20 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, gitster, Derrick Stolee
On Thu, Aug 28, 2025 at 05:39:51PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <stolee@gmail.com>
>
> The fill_packs_from_midx() method was refactored in fcb2205b77 (midx:
> implement support for writing incremental MIDX chains, 2024-08-06) to
> allow for preferred packfiles and incremental multi-pack-indexes.
> However, this led to some conditions that can cause improperly
> initialized memory in the context's list of packfiles.
>
> The conditions caring about the preferred pack name or the incremental
> flag are currently necessary to load a packfile. But the context is
> still being populated with pack_info structs based on the packfile array
> for the existing multi-pack-index even if prepare_midx_pack() isn't
> called.
Thanks for looking at this one. On the surface this looks not great, but
I am having a hard time coming up with a smaller test case that
exercises this behavior.
I can get what you wrote below to fail on my machine pretty reliable
when building with SANITIZE=address (even without --stress). All of the
spots that read from the pack_info array and access the actual
packed_git structs are guarded by either writing a MIDX bitmap or having
a non-empty preferred pack.
So I can see that we're definitely trying to close_pack() on a NULL
pointer, but I can't find a way to trigger that more directly.
(I think the fact that we don't appear to be accessing the packed_git
struct because the spots that want to are guarded by the same conditions
that cause us to call prepare_midx_pack() in the first place is pure
luck and not something that I am comfortable with us relying on. So we
should fix this up for sure, but I wish I understood the existing bug a
little more clearly first.)
> Add a new test that breaks under --stress when compiled with
> SANITIZE=address. The chosen number of 100 packfiles was selected to get
> the --stress output to fail about 50% of the time, while 50 packfiles
> could not get a failure in most --stress runs. This test has a very
> minor check at the end confirming only one packfile remaining. The
> failing nature of this test actually relies on auto-GC cleaning up some
> packfiles during the creation of the commits, as tests setting gc.auto
> to zero make the packfile count match the number of added commits but
> also avoids hitting the memory issue.
Hmm. Is this portion of the commit message out-of-date? I can't see the
check you're referring to that ensures there is only one pack remaining,
nor can I see the spot where we disable gc.auto.
> The test case is marked as EXPENSIVE not only because of the number of
> packfiles it creates, but because some CI environments were reporting
> errors during the test that I could not reproduce, specifically around
> being unable to open the packfiles or their pack-indexes.
>
> When it fails under SANITIZE=address, it provides the following error:
>
> AddressSanitizer:DEADLYSIGNAL
> =================================================================
> ==3263517==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000027
> ==3263517==The signal is caused by a READ memory access.
> ==3263517==Hint: address points to the zero page.
> #0 0x562d5d82d1fb in close_pack_windows packfile.c:299
> #1 0x562d5d82d3ab in close_pack packfile.c:354
> #2 0x562d5d7bfdb4 in write_midx_internal midx-write.c:1490
> #3 0x562d5d7c7aec in midx_repack midx-write.c:1795
> #4 0x562d5d46fff6 in cmd_multi_pack_index builtin/multi-pack-index.c:305
> ...
>
> This failure stack trace is disconnected from the real fix because it
s/it// ?
> the bad pointers are accessed later when closing the packfiles from the
> context.
>
> There are a few different aspects to this fix that are worth noting:
>
> 1. We return to the previous behavior of fill_packs_from_midx to not
> rely on the incremental flag or existence of a preferred pack.
>
> 2. The behavior to scan all layers of an incremental midx is kept, so
> this is not a full revert of the change.
>
> 3. We skip allocating more room in the pack_info array if the pack
> fails prepare_midx_pack().
>
> 4. The method has always returned 0 for success and 1 for failure, but
> the condition checking for error added a check for a negative result
> for failure, so that is now updated.
Oops ;-).
> 5. The call to open_pack_index() is removed, but this is needed later
> in the case of a preferred pack. That call is moved to immediately
> before its result is needed (checking for the object count).
I think we need to do this in at least one other spot, but see below.
> + if (prepare_midx_pack(ctx->repo, m,
> + m->num_packs_in_base + i)) {
> + error(_("could not load pack"));
> + return 1;
Looks good, though I agree with Junio's comment in his separate reply
that we could probably just turn this into "return error(...)" while
we're at it.
> @@ -1223,6 +1204,11 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
>
> if (ctx.preferred_pack_idx > -1) {
> struct packed_git *preferred = ctx.info[ctx.preferred_pack_idx].p;
> +
> + if (open_pack_index(preferred))
> + die(_("failed to open preferred pack %s"),
> + ctx.info[ctx.preferred_pack_idx].pack_name);
This makes sense, but I think we need to apply similar treatment in the
"else if" arm of the if-statement immediately above this one too. That
portion of the code handles the case where we're writing a MIDX bitmap
but didn't provide a preferred pack.
When that's the case, we loop through to try and find the oldest pack
that contains at least one object. If we don't call open_pack_index()
all of those ->num_objects fields will still be zero'd, so we'll only
find the oldest pack.
That may actually produce wrong behavior if we have duplicate objects
that aren't uniformly resolved in favor of the earliest pack in lex
order. I'd have to think about it a little more to be sure, though.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 2/5] midx-write: put failing response value back
2025-08-28 20:45 ` Junio C Hamano
@ 2025-08-29 1:26 ` Taylor Blau
0 siblings, 0 replies; 45+ messages in thread
From: Taylor Blau @ 2025-08-29 1:26 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Derrick Stolee via GitGitGadget, git, Derrick Stolee
On Thu, Aug 28, 2025 at 01:45:36PM -0700, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Derrick Stolee <stolee@gmail.com>
> >
> > This instance of setting the result to 1 before going to cleanup was
> > accidentally removed in fcb2205b77 (midx: implement support for writing
> > incremental MIDX chains, 2024-08-06).
> >
> > Signed-off-by: Derrick Stolee <stolee@gmail.com>
> > ---
> > midx-write.c | 1 +
> > 1 file changed, 1 insertion(+)
>
> The cover letter made it sound as if [1/5] was the only fix and the
> rest was clean-up, but unless all callers of write_midx_internal()
> ignores the return value from it, this surely would change the
> behaviour of the program, no?
I think the cover letter is saying that only the first patch is required
to fix the crash that was reported, and the rest are clean-ups.
Not that this isn't a bug in its own right, but it's definitely distinct
from the one that is addressed in the previous patch.
> And the results from write-midx_file_only(), write_midx_file(), and
> expire_midx_packs(), the three callers of this _internal() function,
> all seem to be used in builtin/multi-pack-index.c so wouldn't this
> also be a fix?
>
> Not that I endorse "0 is success and any non-zero value is an error",
> but this does not look like a mere clean-up to me.
>
> > diff --git a/midx-write.c b/midx-write.c
> > index d8f9679868..85b2d471ef 100644
> > --- a/midx-write.c
> > +++ b/midx-write.c
> > @@ -1106,6 +1106,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
> > m = m->base_midx;
> > }
> > } else if (ctx.m && fill_packs_from_midx(&ctx)) {
> > + result = 1;
> > goto cleanup;
It might be nice to have a test that confirms this behavior (though the
changes look obviously correct to me). It should suffice to write a
MIDX, drop one of its packs, and then try to write it again with a new
pack.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 3/5] midx-write: use cleanup when incremental midx fails
2025-08-28 20:51 ` Junio C Hamano
@ 2025-08-29 1:29 ` Taylor Blau
2025-08-30 14:44 ` Derrick Stolee
0 siblings, 1 reply; 45+ messages in thread
From: Taylor Blau @ 2025-08-29 1:29 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Derrick Stolee via GitGitGadget, git, Derrick Stolee
On Thu, Aug 28, 2025 at 01:51:18PM -0700, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Derrick Stolee <stolee@gmail.com>
> >
> > The incremental mode of writing a multi-pack-index has a few extra
> > conditions that could lead to failure, but these are currently
> > short-ciruiting with 'return -1' instead of setting the method's
> > 'result' variable and going to the cleanup tag.
> >
> > Replace these returns with gotos to avoid memory issues when exiting
> > early due to error conditions.
> >
> > Unfortunately, these error conditions are difficult to reproduce with
> > test cases, which is perhaps one reason why the memory loss was not
> > caught by existing test cases in memory tracking modes.
> >
> > Signed-off-by: Derrick Stolee <stolee@gmail.com>
> > ---
> > midx-write.c | 18 ++++++++++++------
> > 1 file changed, 12 insertions(+), 6 deletions(-)
>
> Good thinking, but may I suggest us to go one more step to adopt
> even better convention if we were to do this?
>
> Pessimistically initialize the "result" to -1 and let many "goto
> cleanup" just jump there. And have "result = 0" just before the
> cleanup label where the success code path joins the final cleanup
> part of the function.
>
> This is often the right way to make the flow easier to see, because
> often the success code path is straight forward, and these error
> conditions are what employ the "goto cleanup" from many places. By
> starting pessimistic, and declaring the success at the very end of
> the straight-forward success case code path, all other flows to the
> clean-up labels do not have to set the "ah I failed" flag. It would
> eliminate the need for patches like the previous step if the
> original were following that pattern.
Alternatively replacing something like:
error(...);
result = -1;
goto cleanup;
with just
result = error(...);
goto cleanup;
would IMHO make the code easier to read, though I agree that nothing is
forcing us to remember to assign result in the first place ;-). I am not
sure the pessimistic initialization is better in all cases either, since
we have to remember to place it before any "cleanup" label, and make
sure that that does not regress.
So, I dunno. I'm OK with what is written here, and I think we could
certainly have a separate discussion to perhaps have CodingGuidelines
take a stronger stance here.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 4/5] midx-write: use uint32_t for preferred_pack_idx
2025-08-28 17:39 ` [PATCH 4/5] midx-write: use uint32_t for preferred_pack_idx Derrick Stolee via GitGitGadget
2025-08-28 20:58 ` Junio C Hamano
@ 2025-08-29 1:35 ` Taylor Blau
1 sibling, 0 replies; 45+ messages in thread
From: Taylor Blau @ 2025-08-29 1:35 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, gitster, Derrick Stolee
On Thu, Aug 28, 2025 at 05:39:54PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <stolee@gmail.com>
>
> midx-write.c has the DISABLE_SIGN_COMPARE_WARNINGS macro defined for a
> few reasons, but the biggest one is the use of a signed
> preferred_pack_idx member inside the write_midx_context struct. The code
> currently uses -1 to indicate an unset preferred pack but pack int ids
> are normally handled as uint32_t. There are also a few loops that search
> for the preferred pack by name and those iterators will need updates to
> uint32_t in the next change.
>
> For now, replace the use of -1 with a 'NO_PREFERRED_PACK' macro and an
> equality check. The macro stores the max value of a uint32_t, so we
> cannot store a preferred pack that appears last in a list of 2^32 total
> packs, but that's expected to be unreasonable already. This improves the
> range from 2^31 already.
Nice ;-).
> There are some careful things to worry about with initializing the
> preferred pack in the struct and using that value when searching for a
> preferred pack that was already incorrect but accidentally working when
> the index was initialized to zero.
>
> Signed-off-by: Derrick Stolee <stolee@gmail.com>
> ---
> midx-write.c | 26 +++++++++++++++-----------
> 1 file changed, 15 insertions(+), 11 deletions(-)
>
> diff --git a/midx-write.c b/midx-write.c
> index f2d9a990e6..ea1b3a199c 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -24,6 +24,7 @@
> #define BITMAP_POS_UNKNOWN (~((uint32_t)0))
> #define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256)
> #define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t))
> +#define NO_PREFERRED_PACK (~((uint32_t)0))
I think that just (~(uint32_t)0) is necessary, but I don't mind the
extra clarity.
> @@ -274,7 +275,7 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
> end = m->num_objects_in_base + ntohl(m->chunk_oid_fanout[cur_fanout]);
>
> for (cur_object = start; cur_object < end; cur_object++) {
> - if ((preferred_pack > -1) &&
> + if ((preferred_pack != NO_PREFERRED_PACK) &&
> (preferred_pack == nth_midxed_pack_int_id(m, cur_object))) {
OK, here we should make sure that the preferred pack was provided.
Like you mentioned above, I guess that means we can't have the 2^32-1'st
pack, but before we couldn't have any pack greater than 2^31-1
preferred, so this is a strict improvement ;-).
> /*
> * Objects from preferred packs are added
> @@ -364,7 +365,8 @@ static void compute_sorted_entries(struct write_midx_context *ctx,
> preferred, cur_fanout);
> }
>
> - if (-1 < ctx->preferred_pack_idx && ctx->preferred_pack_idx < start_pack)
> + if (ctx->preferred_pack_idx != NO_PREFERRED_PACK &&
> + ctx->preferred_pack_idx < start_pack)
Looks good. It's tempting to say that any value of preferred_pack_idx
lower than start_pack is going to be OK, since start_pack is a uint32_t
as well, but we need to make sure that the preferred pack was specified
in the first place.
The rest all looks good as well, thanks.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 5/5] midx-write: reenable signed comparison errors
2025-08-28 21:01 ` Junio C Hamano
@ 2025-08-29 1:35 ` Taylor Blau
0 siblings, 0 replies; 45+ messages in thread
From: Taylor Blau @ 2025-08-29 1:35 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Derrick Stolee via GitGitGadget, git, Derrick Stolee
On Thu, Aug 28, 2025 at 02:01:42PM -0700, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > The strategy here involves defining the variable within the for loop
> > syntax to make sure we use the appropriate bitness for the loop
> > sentinel. This matters in at least one method where the variable was
> > compared to uint32_t in some loops and size_t in others.
>
> Sounds like a fine way to avoid accidentally breaking these loops.
Yup, all looks good to me.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 0/5] midx-write: fix segfault and do several cleanups
2025-08-28 17:39 [PATCH 0/5] midx-write: fix segfault and do several cleanups Derrick Stolee via GitGitGadget
` (4 preceding siblings ...)
2025-08-28 17:39 ` [PATCH 5/5] midx-write: reenable signed comparison errors Derrick Stolee via GitGitGadget
@ 2025-08-29 1:36 ` Taylor Blau
2025-08-30 21:23 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
6 siblings, 0 replies; 45+ messages in thread
From: Taylor Blau @ 2025-08-29 1:36 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, gitster, Derrick Stolee
On Thu, Aug 28, 2025 at 05:39:50PM +0000, Derrick Stolee via GitGitGadget wrote:
> Derrick Stolee (5):
> midx-write: only load initialized packs
> midx-write: put failing response value back
> midx-write: use cleanup when incremental midx fails
> midx-write: use uint32_t for preferred_pack_idx
> midx-write: reenable signed comparison errors
Thanks for looking at this, and my apologies for the regression plugged
by the first patch.
The last four patches look good to me, and I left just some minor
comments throughout. The first patch does leave me feeling like there
ought to be a smaller reproduction, but I can't seem to find one easily.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/5] midx-write: only load initialized packs
2025-08-29 1:20 ` Taylor Blau
@ 2025-08-30 14:33 ` Derrick Stolee
0 siblings, 0 replies; 45+ messages in thread
From: Derrick Stolee @ 2025-08-30 14:33 UTC (permalink / raw)
To: Taylor Blau, Derrick Stolee via GitGitGadget; +Cc: git, gitster
On 8/28/25 9:20 PM, Taylor Blau wrote:
> On Thu, Aug 28, 2025 at 05:39:51PM +0000, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <stolee@gmail.com>
>>
>> The fill_packs_from_midx() method was refactored in fcb2205b77 (midx:
>> implement support for writing incremental MIDX chains, 2024-08-06) to
>> allow for preferred packfiles and incremental multi-pack-indexes.
>> However, this led to some conditions that can cause improperly
>> initialized memory in the context's list of packfiles.
>>
>> The conditions caring about the preferred pack name or the incremental
>> flag are currently necessary to load a packfile. But the context is
>> still being populated with pack_info structs based on the packfile array
>> for the existing multi-pack-index even if prepare_midx_pack() isn't
>> called.
>
> Thanks for looking at this one. On the surface this looks not great, but
> I am having a hard time coming up with a smaller test case that
> exercises this behavior.
>
> I can get what you wrote below to fail on my machine pretty reliable
> when building with SANITIZE=address (even without --stress). All of the
> spots that read from the pack_info array and access the actual
> packed_git structs are guarded by either writing a MIDX bitmap or having
> a non-empty preferred pack.
I'm glad you're able to reproduce it. My --stress runs had about a
50% hit rate.
>> Add a new test that breaks under --stress when compiled with
>> SANITIZE=address. The chosen number of 100 packfiles was selected to get
>> the --stress output to fail about 50% of the time, while 50 packfiles
>> could not get a failure in most --stress runs. This test has a very
>> minor check at the end confirming only one packfile remaining. The
>> failing nature of this test actually relies on auto-GC cleaning up some
>> packfiles during the creation of the commits, as tests setting gc.auto
>> to zero make the packfile count match the number of added commits but
>> also avoids hitting the memory issue.
>
> Hmm. Is this portion of the commit message out-of-date? I can't see the
> check you're referring to that ensures there is only one pack remaining,
> nor can I see the spot where we disable gc.auto.
You're right. When I added more robustness around the packfile count
by removing gc.auto, the test stopped failing pre-fix. Then, I forgot
to remove mention of those test updates.
>> The test case is marked as EXPENSIVE not only because of the number of
>> packfiles it creates, but because some CI environments were reporting
>> errors during the test that I could not reproduce, specifically around
>> being unable to open the packfiles or their pack-indexes.
>>
>> When it fails under SANITIZE=address, it provides the following error:
>>
>> AddressSanitizer:DEADLYSIGNAL
>> =================================================================
>> ==3263517==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000027
>> ==3263517==The signal is caused by a READ memory access.
>> ==3263517==Hint: address points to the zero page.
>> #0 0x562d5d82d1fb in close_pack_windows packfile.c:299
>> #1 0x562d5d82d3ab in close_pack packfile.c:354
>> #2 0x562d5d7bfdb4 in write_midx_internal midx-write.c:1490
>> #3 0x562d5d7c7aec in midx_repack midx-write.c:1795
>> #4 0x562d5d46fff6 in cmd_multi_pack_index builtin/multi-pack-index.c:305
>> ...
>>
>> This failure stack trace is disconnected from the real fix because it
>
> s/it// ?
Thanks.
>> the bad pointers are accessed later when closing the packfiles from the
>> context.
>>
>> There are a few different aspects to this fix that are worth noting:
>>
>> 1. We return to the previous behavior of fill_packs_from_midx to not
>> rely on the incremental flag or existence of a preferred pack.
>>
>> 2. The behavior to scan all layers of an incremental midx is kept, so
>> this is not a full revert of the change.
>>
>> 3. We skip allocating more room in the pack_info array if the pack
>> fails prepare_midx_pack().
>>
>> 4. The method has always returned 0 for success and 1 for failure, but
>> the condition checking for error added a check for a negative result
>> for failure, so that is now updated.
>
> Oops ;-).
>
>> 5. The call to open_pack_index() is removed, but this is needed later
>> in the case of a preferred pack. That call is moved to immediately
>> before its result is needed (checking for the object count).
>
> I think we need to do this in at least one other spot, but see below.
Interesting!
>> + if (prepare_midx_pack(ctx->repo, m,
>> + m->num_packs_in_base + i)) {
>> + error(_("could not load pack"));
>> + return 1;
>
> Looks good, though I agree with Junio's comment in his separate reply
> that we could probably just turn this into "return error(...)" while
> we're at it.
Can do.
>> @@ -1223,6 +1204,11 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
>>
>> if (ctx.preferred_pack_idx > -1) {
>> struct packed_git *preferred = ctx.info[ctx.preferred_pack_idx].p;
>> +
>> + if (open_pack_index(preferred))
>> + die(_("failed to open preferred pack %s"),
>> + ctx.info[ctx.preferred_pack_idx].pack_name);
>
> This makes sense, but I think we need to apply similar treatment in the
> "else if" arm of the if-statement immediately above this one too. That
> portion of the code handles the case where we're writing a MIDX bitmap
> but didn't provide a preferred pack.
>
> When that's the case, we loop through to try and find the oldest pack
> that contains at least one object. If we don't call open_pack_index()
> all of those ->num_objects fields will still be zero'd, so we'll only
> find the oldest pack.
>
> That may actually produce wrong behavior if we have duplicate objects
> that aren't uniformly resolved in favor of the earliest pack in lex
> order. I'd have to think about it a little more to be sure, though.
I see. In this case, we need to open_pack_index() before relying on
oldest->num_objects, which only needs to happen for the first pack
and any packfile that wins via mtime preference. It also seems like
we can _warn_ on failures to open packfiles in those cases, since it
isn't fatal if some packfiles fail to open.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 3/5] midx-write: use cleanup when incremental midx fails
2025-08-29 1:29 ` Taylor Blau
@ 2025-08-30 14:44 ` Derrick Stolee
0 siblings, 0 replies; 45+ messages in thread
From: Derrick Stolee @ 2025-08-30 14:44 UTC (permalink / raw)
To: Taylor Blau, Junio C Hamano; +Cc: Derrick Stolee via GitGitGadget, git
On 8/28/25 9:29 PM, Taylor Blau wrote:
> On Thu, Aug 28, 2025 at 01:51:18PM -0700, Junio C Hamano wrote:
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>> From: Derrick Stolee <stolee@gmail.com>
>>>
>>> The incremental mode of writing a multi-pack-index has a few extra
>>> conditions that could lead to failure, but these are currently
>>> short-ciruiting with 'return -1' instead of setting the method's
>>> 'result' variable and going to the cleanup tag.
>>>
>>> Replace these returns with gotos to avoid memory issues when exiting
>>> early due to error conditions.
>>>
>>> Unfortunately, these error conditions are difficult to reproduce with
>>> test cases, which is perhaps one reason why the memory loss was not
>>> caught by existing test cases in memory tracking modes.
>>>
>>> Signed-off-by: Derrick Stolee <stolee@gmail.com>
>>> ---
>>> midx-write.c | 18 ++++++++++++------
>>> 1 file changed, 12 insertions(+), 6 deletions(-)
>>
>> Good thinking, but may I suggest us to go one more step to adopt
>> even better convention if we were to do this?
>>
>> Pessimistically initialize the "result" to -1 and let many "goto
>> cleanup" just jump there. And have "result = 0" just before the
>> cleanup label where the success code path joins the final cleanup
>> part of the function.
>>
>> This is often the right way to make the flow easier to see, because
>> often the success code path is straight forward, and these error
>> conditions are what employ the "goto cleanup" from many places. By
>> starting pessimistic, and declaring the success at the very end of
>> the straight-forward success case code path, all other flows to the
>> clean-up labels do not have to set the "ah I failed" flag. It would
>> eliminate the need for patches like the previous step if the
>> original were following that pattern.
>
> Alternatively replacing something like:
>
> error(...);
> result = -1;
> goto cleanup;
>
> with just
>
> result = error(...);
> goto cleanup;
>
> would IMHO make the code easier to read, though I agree that nothing is
> forcing us to remember to assign result in the first place ;-). I am not
> sure the pessimistic initialization is better in all cases either, since
> we have to remember to place it before any "cleanup" label, and make
> sure that that does not regress.
>
> So, I dunno. I'm OK with what is written here, and I think we could
> certainly have a separate discussion to perhaps have CodingGuidelines
> take a stronger stance here.
I'll look into adding this as a cleanup. This specific patch is
about adding the 'goto's where they were missing. I can make a new
patch that unifies the result initialization to -1 (and thus making
the method unified in returning -1 on error). There are many more
"result = 1" lines that need to change.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH v2 0/6] midx-write: fix segfault and do several cleanups
2025-08-28 17:39 [PATCH 0/5] midx-write: fix segfault and do several cleanups Derrick Stolee via GitGitGadget
` (5 preceding siblings ...)
2025-08-29 1:36 ` [PATCH 0/5] midx-write: fix segfault and do several cleanups Taylor Blau
@ 2025-08-30 21:23 ` Derrick Stolee via GitGitGadget
2025-08-30 21:23 ` [PATCH v2 1/6] midx-write: only load initialized packs Derrick Stolee via GitGitGadget
` (6 more replies)
6 siblings, 7 replies; 45+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-08-30 21:23 UTC (permalink / raw)
To: git; +Cc: gitster, me, Derrick Stolee
I was motivated to start looking closely at midx-write.c due to multiple
users reporting Git crashes in their background maintenance, specifically
during git multi-pack-index repack calls. I was eventually able to reproduce
it in git multi-pack-index expire as well.
Patch 1 is the only change we need to fix this bug. It includes a test case
that will fail under --stress with SANITIZE=address. It requires creating
many packfiles (50 was not enough, but 100 is enough). As far as I can tell,
this bug has existed since Git 2.47.0 in October 2024, but I started hearing
reports of this from users in July 2025 (and took a while to get a
dump/repro).
The remaining patches are cleanups based on my careful rereading of
midx-write.c. There are some issues about error handling that needed some
cleanup as well as a removal of the DISABLE_SIGN_COMPARE_WARNINGS macro.
Updates in V2
=============
* A stale comment to an unsubmitted version of the test is removed.
* More cases needing open_pack_index() are patched.
* Typos fixed.
* A new patch assumes error and sets result to zero only on the few
successful paths.
Thanks, -Stolee
Derrick Stolee (6):
midx-write: only load initialized packs
midx-write: put failing response value back
midx-write: use cleanup when incremental midx fails
midx-write: use uint32_t for preferred_pack_idx
midx-write: reenable signed comparison errors
midx-write: simplify error cases
midx-write.c | 134 +++++++++++++++++-------------------
t/t5319-multi-pack-index.sh | 22 +++++-
2 files changed, 86 insertions(+), 70 deletions(-)
base-commit: c44beea485f0f2feaf460e2ac87fdd5608d63cf0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1965%2Fderrickstolee%2Fmidx-write-cleanup-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1965/derrickstolee/midx-write-cleanup-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1965
Range-diff vs v1:
1: 4a4b35c694 ! 1: e02a444315 midx-write: only load initialized packs
@@ Commit message
Add a new test that breaks under --stress when compiled with
SANITIZE=address. The chosen number of 100 packfiles was selected to get
the --stress output to fail about 50% of the time, while 50 packfiles
- could not get a failure in most --stress runs. This test has a very
- minor check at the end confirming only one packfile remaining. The
- failing nature of this test actually relies on auto-GC cleaning up some
- packfiles during the creation of the commits, as tests setting gc.auto
- to zero make the packfile count match the number of added commits but
- also avoids hitting the memory issue.
+ could not get a failure in most --stress runs.
The test case is marked as EXPENSIVE not only because of the number of
packfiles it creates, but because some CI environments were reporting
@@ Commit message
#4 0x562d5d46fff6 in cmd_multi_pack_index builtin/multi-pack-index.c:305
...
- This failure stack trace is disconnected from the real fix because it
- the bad pointers are accessed later when closing the packfiles from the
- context.
+ This failure stack trace is disconnected from the real fix because the bad
+ pointers are accessed later when closing the packfiles from the context.
There are a few different aspects to this fix that are worth noting:
@@ midx-write.c: static int fill_packs_from_midx(struct write_midx_context *ctx,
- if (open_pack_index(m->packs[i]))
- die(_("could not open index for %s"),
- m->packs[i]->pack_name);
+- }
+ if (prepare_midx_pack(ctx->repo, m,
-+ m->num_packs_in_base + i)) {
-+ error(_("could not load pack"));
-+ return 1;
- }
++ m->num_packs_in_base + i))
++ return error(_("could not load pack"));
+ ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
fill_pack_info(&ctx->info[ctx->nr++], m->packs[i],
@@ midx-write.c: static int write_midx_internal(struct repository *r, const char *o
goto cleanup;
}
+@@ midx-write.c: static int write_midx_internal(struct repository *r, const char *object_dir,
+ struct packed_git *oldest = ctx.info[ctx.preferred_pack_idx].p;
+ ctx.preferred_pack_idx = 0;
+
++ /*
++ * Attempt opening the pack index to populate num_objects.
++ * Ignore failiures as they can be expected and are not
++ * fatal during this selection time.
++ */
++ open_pack_index(oldest);
++
+ if (packs_to_drop && packs_to_drop->nr)
+ BUG("cannot write a MIDX bitmap during expiration");
+
+@@ midx-write.c: static int write_midx_internal(struct repository *r, const char *object_dir,
+
+ if (!oldest->num_objects || p->mtime < oldest->mtime) {
+ oldest = p;
++ open_pack_index(oldest);
+ ctx.preferred_pack_idx = i;
+ }
+ }
@@ midx-write.c: static int write_midx_internal(struct repository *r, const char *object_dir,
if (ctx.preferred_pack_idx > -1) {
2: 709555c531 ! 2: a1dd3ed874 midx-write: put failing response value back
@@ Commit message
This instance of setting the result to 1 before going to cleanup was
accidentally removed in fcb2205b77 (midx: implement support for writing
- incremental MIDX chains, 2024-08-06).
+ incremental MIDX chains, 2024-08-06). Build upon a test that already deletes
+ a packfile to verify that this error propagates to full command failure.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
@@ midx-write.c: static int write_midx_internal(struct repository *r, const char *o
goto cleanup;
}
+
+ ## t/t5319-multi-pack-index.sh ##
+@@ t/t5319-multi-pack-index.sh: test_expect_success 'load reverse index when missing .idx, .pack' '
+ mv $idx.bak $idx &&
+
+ mv $pack $pack.bak &&
+- git cat-file --batch-check="%(objectsize:disk)" <tip
++ git cat-file --batch-check="%(objectsize:disk)" <tip &&
++
++ test_must_fail git multi-pack-index write 2>err &&
++ grep "could not load pack" err
+ )
+ '
+
3: a5bee03601 = 3: c4f75cca09 midx-write: use cleanup when incremental midx fails
4: bd97db26f7 ! 4: 2290e27ded midx-write: use uint32_t for preferred_pack_idx
@@ midx-write.c: static int write_midx_internal(struct repository *r, const char *o
+ struct packed_git *oldest = ctx.info[0].p;
ctx.preferred_pack_idx = 0;
- if (packs_to_drop && packs_to_drop->nr)
+ /*
@@ midx-write.c: static int write_midx_internal(struct repository *r, const char *object_dir,
* objects to resolve, so the preferred value doesn't
* matter.
5: eb1abdca32 = 5: 35302f5228 midx-write: reenable signed comparison errors
-: ---------- > 6: 7be25cf534 midx-write: simplify error cases
--
gitgitgadget
^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH v2 1/6] midx-write: only load initialized packs
2025-08-30 21:23 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
@ 2025-08-30 21:23 ` Derrick Stolee via GitGitGadget
2025-09-03 10:14 ` Patrick Steinhardt
2025-08-30 21:23 ` [PATCH v2 2/6] midx-write: put failing response value back Derrick Stolee via GitGitGadget
` (5 subsequent siblings)
6 siblings, 1 reply; 45+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-08-30 21:23 UTC (permalink / raw)
To: git; +Cc: gitster, me, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
The fill_packs_from_midx() method was refactored in fcb2205b77 (midx:
implement support for writing incremental MIDX chains, 2024-08-06) to
allow for preferred packfiles and incremental multi-pack-indexes.
However, this led to some conditions that can cause improperly
initialized memory in the context's list of packfiles.
The conditions caring about the preferred pack name or the incremental
flag are currently necessary to load a packfile. But the context is
still being populated with pack_info structs based on the packfile array
for the existing multi-pack-index even if prepare_midx_pack() isn't
called.
Add a new test that breaks under --stress when compiled with
SANITIZE=address. The chosen number of 100 packfiles was selected to get
the --stress output to fail about 50% of the time, while 50 packfiles
could not get a failure in most --stress runs.
The test case is marked as EXPENSIVE not only because of the number of
packfiles it creates, but because some CI environments were reporting
errors during the test that I could not reproduce, specifically around
being unable to open the packfiles or their pack-indexes.
When it fails under SANITIZE=address, it provides the following error:
AddressSanitizer:DEADLYSIGNAL
=================================================================
==3263517==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000027
==3263517==The signal is caused by a READ memory access.
==3263517==Hint: address points to the zero page.
#0 0x562d5d82d1fb in close_pack_windows packfile.c:299
#1 0x562d5d82d3ab in close_pack packfile.c:354
#2 0x562d5d7bfdb4 in write_midx_internal midx-write.c:1490
#3 0x562d5d7c7aec in midx_repack midx-write.c:1795
#4 0x562d5d46fff6 in cmd_multi_pack_index builtin/multi-pack-index.c:305
...
This failure stack trace is disconnected from the real fix because the bad
pointers are accessed later when closing the packfiles from the context.
There are a few different aspects to this fix that are worth noting:
1. We return to the previous behavior of fill_packs_from_midx to not
rely on the incremental flag or existence of a preferred pack.
2. The behavior to scan all layers of an incremental midx is kept, so
this is not a full revert of the change.
3. We skip allocating more room in the pack_info array if the pack
fails prepare_midx_pack().
4. The method has always returned 0 for success and 1 for failure, but
the condition checking for error added a check for a negative result
for failure, so that is now updated.
5. The call to open_pack_index() is removed, but this is needed later
in the case of a preferred pack. That call is moved to immediately
before its result is needed (checking for the object count).
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
midx-write.c | 46 +++++++++++++++----------------------
t/t5319-multi-pack-index.sh | 17 ++++++++++++++
2 files changed, 36 insertions(+), 27 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index a0aceab5e0..070a7f61f4 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -920,8 +920,7 @@ static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
return get_multi_pack_index(source);
}
-static int fill_packs_from_midx(struct write_midx_context *ctx,
- const char *preferred_pack_name, uint32_t flags)
+static int fill_packs_from_midx(struct write_midx_context *ctx)
{
struct multi_pack_index *m;
@@ -929,30 +928,11 @@ static int fill_packs_from_midx(struct write_midx_context *ctx,
uint32_t i;
for (i = 0; i < m->num_packs; i++) {
- ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
-
- /*
- * If generating a reverse index, need to have
- * packed_git's loaded to compare their
- * mtimes and object count.
- *
- * If a preferred pack is specified, need to
- * have packed_git's loaded to ensure the chosen
- * preferred pack has a non-zero object count.
- */
- if (flags & MIDX_WRITE_REV_INDEX ||
- preferred_pack_name) {
- if (prepare_midx_pack(ctx->repo, m,
- m->num_packs_in_base + i)) {
- error(_("could not load pack"));
- return 1;
- }
-
- if (open_pack_index(m->packs[i]))
- die(_("could not open index for %s"),
- m->packs[i]->pack_name);
- }
+ if (prepare_midx_pack(ctx->repo, m,
+ m->num_packs_in_base + i))
+ return error(_("could not load pack"));
+ ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
fill_pack_info(&ctx->info[ctx->nr++], m->packs[i],
m->pack_names[i],
m->num_packs_in_base + i);
@@ -1123,8 +1103,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
ctx.num_multi_pack_indexes_before++;
m = m->base_midx;
}
- } else if (ctx.m && fill_packs_from_midx(&ctx, preferred_pack_name,
- flags) < 0) {
+ } else if (ctx.m && fill_packs_from_midx(&ctx)) {
goto cleanup;
}
@@ -1186,6 +1165,13 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
struct packed_git *oldest = ctx.info[ctx.preferred_pack_idx].p;
ctx.preferred_pack_idx = 0;
+ /*
+ * Attempt opening the pack index to populate num_objects.
+ * Ignore failiures as they can be expected and are not
+ * fatal during this selection time.
+ */
+ open_pack_index(oldest);
+
if (packs_to_drop && packs_to_drop->nr)
BUG("cannot write a MIDX bitmap during expiration");
@@ -1200,6 +1186,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
if (!oldest->num_objects || p->mtime < oldest->mtime) {
oldest = p;
+ open_pack_index(oldest);
ctx.preferred_pack_idx = i;
}
}
@@ -1223,6 +1210,11 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
if (ctx.preferred_pack_idx > -1) {
struct packed_git *preferred = ctx.info[ctx.preferred_pack_idx].p;
+
+ if (open_pack_index(preferred))
+ die(_("failed to open preferred pack %s"),
+ ctx.info[ctx.preferred_pack_idx].pack_name);
+
if (!preferred->num_objects) {
error(_("cannot select preferred pack %s with no objects"),
preferred->pack_name);
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index bd75dea950..49705c62a2 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -989,6 +989,23 @@ test_expect_success 'repack --batch-size=0 repacks everything' '
)
'
+test_expect_success EXPENSIVE 'repack/expire with many packs' '
+ cp -r dup many &&
+ (
+ cd many &&
+
+ for i in $(test_seq 1 100)
+ do
+ test_commit extra$i &&
+ git maintenance run --task=loose-objects || return 1
+ done &&
+
+ git multi-pack-index write &&
+ git multi-pack-index repack &&
+ git multi-pack-index expire
+ )
+'
+
test_expect_success 'repack --batch-size=<large> repacks everything' '
(
cd dup2 &&
--
gitgitgadget
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v2 2/6] midx-write: put failing response value back
2025-08-30 21:23 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
2025-08-30 21:23 ` [PATCH v2 1/6] midx-write: only load initialized packs Derrick Stolee via GitGitGadget
@ 2025-08-30 21:23 ` Derrick Stolee via GitGitGadget
2025-09-03 10:15 ` Patrick Steinhardt
2025-08-30 21:23 ` [PATCH v2 3/6] midx-write: use cleanup when incremental midx fails Derrick Stolee via GitGitGadget
` (4 subsequent siblings)
6 siblings, 1 reply; 45+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-08-30 21:23 UTC (permalink / raw)
To: git; +Cc: gitster, me, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
This instance of setting the result to 1 before going to cleanup was
accidentally removed in fcb2205b77 (midx: implement support for writing
incremental MIDX chains, 2024-08-06). Build upon a test that already deletes
a packfile to verify that this error propagates to full command failure.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
midx-write.c | 1 +
t/t5319-multi-pack-index.sh | 5 ++++-
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/midx-write.c b/midx-write.c
index 070a7f61f4..0f1d5653ab 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1104,6 +1104,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
m = m->base_midx;
}
} else if (ctx.m && fill_packs_from_midx(&ctx)) {
+ result = 1;
goto cleanup;
}
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 49705c62a2..008e65c22e 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -1100,7 +1100,10 @@ test_expect_success 'load reverse index when missing .idx, .pack' '
mv $idx.bak $idx &&
mv $pack $pack.bak &&
- git cat-file --batch-check="%(objectsize:disk)" <tip
+ git cat-file --batch-check="%(objectsize:disk)" <tip &&
+
+ test_must_fail git multi-pack-index write 2>err &&
+ grep "could not load pack" err
)
'
--
gitgitgadget
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v2 3/6] midx-write: use cleanup when incremental midx fails
2025-08-30 21:23 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
2025-08-30 21:23 ` [PATCH v2 1/6] midx-write: only load initialized packs Derrick Stolee via GitGitGadget
2025-08-30 21:23 ` [PATCH v2 2/6] midx-write: put failing response value back Derrick Stolee via GitGitGadget
@ 2025-08-30 21:23 ` Derrick Stolee via GitGitGadget
2025-09-03 10:15 ` Patrick Steinhardt
2025-08-30 21:23 ` [PATCH v2 4/6] midx-write: use uint32_t for preferred_pack_idx Derrick Stolee via GitGitGadget
` (3 subsequent siblings)
6 siblings, 1 reply; 45+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-08-30 21:23 UTC (permalink / raw)
To: git; +Cc: gitster, me, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
The incremental mode of writing a multi-pack-index has a few extra
conditions that could lead to failure, but these are currently
short-ciruiting with 'return -1' instead of setting the method's
'result' variable and going to the cleanup tag.
Replace these returns with gotos to avoid memory issues when exiting
early due to error conditions.
Unfortunately, these error conditions are difficult to reproduce with
test cases, which is perhaps one reason why the memory loss was not
caught by existing test cases in memory tracking modes.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
midx-write.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index 0f1d5653ab..cb0211289d 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1327,13 +1327,15 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
incr = mks_tempfile_m(midx_name.buf, 0444);
if (!incr) {
error(_("unable to create temporary MIDX layer"));
- return -1;
+ result = -1;
+ goto cleanup;
}
if (adjust_shared_perm(r, get_tempfile_path(incr))) {
error(_("unable to adjust shared permissions for '%s'"),
get_tempfile_path(incr));
- return -1;
+ result = -1;
+ goto cleanup;
}
f = hashfd(r->hash_algo, get_tempfile_fd(incr),
@@ -1433,18 +1435,22 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
if (!chainf) {
error_errno(_("unable to open multi-pack-index chain file"));
- return -1;
+ result = -1;
+ goto cleanup;
}
- if (link_midx_to_chain(ctx.base_midx) < 0)
- return -1;
+ if (link_midx_to_chain(ctx.base_midx) < 0) {
+ result = -1;
+ goto cleanup;
+ }
get_split_midx_filename_ext(r->hash_algo, &final_midx_name,
object_dir, midx_hash, MIDX_EXT_MIDX);
if (rename_tempfile(&incr, final_midx_name.buf) < 0) {
error_errno(_("unable to rename new multi-pack-index layer"));
- return -1;
+ result = -1;
+ goto cleanup;
}
strbuf_release(&final_midx_name);
--
gitgitgadget
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v2 4/6] midx-write: use uint32_t for preferred_pack_idx
2025-08-30 21:23 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
` (2 preceding siblings ...)
2025-08-30 21:23 ` [PATCH v2 3/6] midx-write: use cleanup when incremental midx fails Derrick Stolee via GitGitGadget
@ 2025-08-30 21:23 ` Derrick Stolee via GitGitGadget
2025-09-03 10:15 ` Patrick Steinhardt
2025-08-30 21:23 ` [PATCH v2 5/6] midx-write: reenable signed comparison errors Derrick Stolee via GitGitGadget
` (2 subsequent siblings)
6 siblings, 1 reply; 45+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-08-30 21:23 UTC (permalink / raw)
To: git; +Cc: gitster, me, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
midx-write.c has the DISABLE_SIGN_COMPARE_WARNINGS macro defined for a
few reasons, but the biggest one is the use of a signed
preferred_pack_idx member inside the write_midx_context struct. The code
currently uses -1 to indicate an unset preferred pack but pack int ids
are normally handled as uint32_t. There are also a few loops that search
for the preferred pack by name and those iterators will need updates to
uint32_t in the next change.
For now, replace the use of -1 with a 'NO_PREFERRED_PACK' macro and an
equality check. The macro stores the max value of a uint32_t, so we
cannot store a preferred pack that appears last in a list of 2^32 total
packs, but that's expected to be unreasonable already. This improves the
range from 2^31 already.
There are some careful things to worry about with initializing the
preferred pack in the struct and using that value when searching for a
preferred pack that was already incorrect but accidentally working when
the index was initialized to zero.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
midx-write.c | 26 +++++++++++++++-----------
1 file changed, 15 insertions(+), 11 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index cb0211289d..1822268ce2 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -24,6 +24,7 @@
#define BITMAP_POS_UNKNOWN (~((uint32_t)0))
#define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256)
#define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t))
+#define NO_PREFERRED_PACK (~((uint32_t)0))
extern int midx_checksum_valid(struct multi_pack_index *m);
extern void clear_midx_files_ext(const char *object_dir, const char *ext,
@@ -104,7 +105,7 @@ struct write_midx_context {
unsigned large_offsets_needed:1;
uint32_t num_large_offsets;
- int preferred_pack_idx;
+ uint32_t preferred_pack_idx;
int incremental;
uint32_t num_multi_pack_indexes_before;
@@ -260,7 +261,7 @@ static void midx_fanout_sort(struct midx_fanout *fanout)
static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
struct multi_pack_index *m,
uint32_t cur_fanout,
- int preferred_pack)
+ uint32_t preferred_pack)
{
uint32_t start = m->num_objects_in_base, end;
uint32_t cur_object;
@@ -274,7 +275,7 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
end = m->num_objects_in_base + ntohl(m->chunk_oid_fanout[cur_fanout]);
for (cur_object = start; cur_object < end; cur_object++) {
- if ((preferred_pack > -1) &&
+ if ((preferred_pack != NO_PREFERRED_PACK) &&
(preferred_pack == nth_midxed_pack_int_id(m, cur_object))) {
/*
* Objects from preferred packs are added
@@ -364,7 +365,8 @@ static void compute_sorted_entries(struct write_midx_context *ctx,
preferred, cur_fanout);
}
- if (-1 < ctx->preferred_pack_idx && ctx->preferred_pack_idx < start_pack)
+ if (ctx->preferred_pack_idx != NO_PREFERRED_PACK &&
+ ctx->preferred_pack_idx < start_pack)
midx_fanout_add_pack_fanout(&fanout, ctx->info,
ctx->preferred_pack_idx, 1,
cur_fanout);
@@ -1040,7 +1042,9 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
struct hashfile *f = NULL;
struct lock_file lk;
struct tempfile *incr;
- struct write_midx_context ctx = { 0 };
+ struct write_midx_context ctx = {
+ .preferred_pack_idx = NO_PREFERRED_PACK,
+ };
int bitmapped_packs_concat_len = 0;
int pack_name_concat_len = 0;
int dropped_packs = 0;
@@ -1148,7 +1152,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
goto cleanup; /* nothing to do */
if (preferred_pack_name) {
- ctx.preferred_pack_idx = -1;
+ ctx.preferred_pack_idx = NO_PREFERRED_PACK;
for (i = 0; i < ctx.nr; i++) {
if (!cmp_idx_or_pack_name(preferred_pack_name,
@@ -1158,12 +1162,12 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
}
}
- if (ctx.preferred_pack_idx == -1)
+ if (ctx.preferred_pack_idx == NO_PREFERRED_PACK)
warning(_("unknown preferred pack: '%s'"),
preferred_pack_name);
} else if (ctx.nr &&
(flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))) {
- struct packed_git *oldest = ctx.info[ctx.preferred_pack_idx].p;
+ struct packed_git *oldest = ctx.info[0].p;
ctx.preferred_pack_idx = 0;
/*
@@ -1199,17 +1203,17 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
* objects to resolve, so the preferred value doesn't
* matter.
*/
- ctx.preferred_pack_idx = -1;
+ ctx.preferred_pack_idx = NO_PREFERRED_PACK;
}
} else {
/*
* otherwise don't mark any pack as preferred to avoid
* interfering with expiration logic below
*/
- ctx.preferred_pack_idx = -1;
+ ctx.preferred_pack_idx = NO_PREFERRED_PACK;
}
- if (ctx.preferred_pack_idx > -1) {
+ if (ctx.preferred_pack_idx != NO_PREFERRED_PACK) {
struct packed_git *preferred = ctx.info[ctx.preferred_pack_idx].p;
if (open_pack_index(preferred))
--
gitgitgadget
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v2 5/6] midx-write: reenable signed comparison errors
2025-08-30 21:23 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
` (3 preceding siblings ...)
2025-08-30 21:23 ` [PATCH v2 4/6] midx-write: use uint32_t for preferred_pack_idx Derrick Stolee via GitGitGadget
@ 2025-08-30 21:23 ` Derrick Stolee via GitGitGadget
2025-09-03 10:15 ` Patrick Steinhardt
2025-08-30 21:23 ` [PATCH v2 6/6] midx-write: simplify error cases Derrick Stolee via GitGitGadget
2025-09-05 19:26 ` [PATCH v3 0/6] midx-write: fix segfault and do several cleanups Derrick Stolee via GitGitGadget
6 siblings, 1 reply; 45+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-08-30 21:23 UTC (permalink / raw)
To: git; +Cc: gitster, me, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
Remove the remaining signed comparison warnings in midx-write.c so that
they can be enforced as errors in the future. After the previous change,
the remaining errors are due to iterator variables named 'i'.
The strategy here involves defining the variable within the for loop
syntax to make sure we use the appropriate bitness for the loop
sentinel. This matters in at least one method where the variable was
compared to uint32_t in some loops and size_t in others.
While adjusting these loops, there were some where the loop boundary was
checking against a uint32_t value _plus one_. These were replaced with
non-strict comparisons, but also the value is checked to not be
UINT32_MAX. Since the value is the number of incremental multi-pack-
indexes, this is not a meaningful restriction. The new die() is about
defensive programming more than it being realistically possible.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
midx-write.c | 35 ++++++++++++++++++-----------------
1 file changed, 18 insertions(+), 17 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index 1822268ce2..14a0947c46 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1,5 +1,3 @@
-#define DISABLE_SIGN_COMPARE_WARNINGS
-
#include "git-compat-util.h"
#include "abspath.h"
#include "config.h"
@@ -845,7 +843,7 @@ static int write_midx_bitmap(struct write_midx_context *ctx,
uint32_t commits_nr,
unsigned flags)
{
- int ret, i;
+ int ret;
uint16_t options = 0;
struct bitmap_writer writer;
struct pack_idx_entry **index;
@@ -873,7 +871,7 @@ static int write_midx_bitmap(struct write_midx_context *ctx,
* this order).
*/
ALLOC_ARRAY(index, pdata->nr_objects);
- for (i = 0; i < pdata->nr_objects; i++)
+ for (uint32_t i = 0; i < pdata->nr_objects; i++)
index[i] = &pdata->objects[i].idx;
bitmap_writer_init(&writer, ctx->repo, pdata,
@@ -894,7 +892,7 @@ static int write_midx_bitmap(struct write_midx_context *ctx,
* happens between bitmap_writer_build_type_index() and
* bitmap_writer_finish().
*/
- for (i = 0; i < pdata->nr_objects; i++)
+ for (uint32_t i = 0; i < pdata->nr_objects; i++)
index[ctx->pack_order[i]] = &pdata->objects[i].idx;
bitmap_writer_select_commits(&writer, commits, commits_nr);
@@ -1038,7 +1036,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
{
struct strbuf midx_name = STRBUF_INIT;
unsigned char midx_hash[GIT_MAX_RAWSZ];
- uint32_t i, start_pack;
+ uint32_t start_pack;
struct hashfile *f = NULL;
struct lock_file lk;
struct tempfile *incr;
@@ -1154,7 +1152,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
if (preferred_pack_name) {
ctx.preferred_pack_idx = NO_PREFERRED_PACK;
- for (i = 0; i < ctx.nr; i++) {
+ for (size_t i = 0; i < ctx.nr; i++) {
if (!cmp_idx_or_pack_name(preferred_pack_name,
ctx.info[i].pack_name)) {
ctx.preferred_pack_idx = i;
@@ -1186,7 +1184,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
* pack-order has all of its objects selected from that pack
* (and not another pack containing a duplicate)
*/
- for (i = 1; i < ctx.nr; i++) {
+ for (size_t i = 1; i < ctx.nr; i++) {
struct packed_git *p = ctx.info[i].p;
if (!oldest->num_objects || p->mtime < oldest->mtime) {
@@ -1231,7 +1229,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
compute_sorted_entries(&ctx, start_pack);
ctx.large_offsets_needed = 0;
- for (i = 0; i < ctx.entries_nr; i++) {
+ for (size_t i = 0; i < ctx.entries_nr; i++) {
if (ctx.entries[i].offset > 0x7fffffff)
ctx.num_large_offsets++;
if (ctx.entries[i].offset > 0xffffffff)
@@ -1241,10 +1239,10 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
QSORT(ctx.info, ctx.nr, pack_info_compare);
if (packs_to_drop && packs_to_drop->nr) {
- int drop_index = 0;
+ size_t drop_index = 0;
int missing_drops = 0;
- for (i = 0; i < ctx.nr && drop_index < packs_to_drop->nr; i++) {
+ for (size_t i = 0; i < ctx.nr && drop_index < packs_to_drop->nr; i++) {
int cmp = strcmp(ctx.info[i].pack_name,
packs_to_drop->items[drop_index].string);
@@ -1275,7 +1273,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
* pack_perm[old_id] = new_id
*/
ALLOC_ARRAY(ctx.pack_perm, ctx.nr);
- for (i = 0; i < ctx.nr; i++) {
+ for (size_t i = 0; i < ctx.nr; i++) {
if (ctx.info[i].expired) {
dropped_packs++;
ctx.pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED;
@@ -1284,7 +1282,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
}
}
- for (i = 0; i < ctx.nr; i++) {
+ for (size_t i = 0; i < ctx.nr; i++) {
if (ctx.info[i].expired)
continue;
pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1;
@@ -1430,6 +1428,9 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
* have been freed in the previous if block.
*/
+ if (ctx.num_multi_pack_indexes_before == UINT32_MAX)
+ die("too many multi-pack-indexes");
+
CALLOC_ARRAY(keep_hashes, ctx.num_multi_pack_indexes_before + 1);
if (ctx.incremental) {
@@ -1462,7 +1463,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
keep_hashes[ctx.num_multi_pack_indexes_before] =
xstrdup(hash_to_hex_algop(midx_hash, r->hash_algo));
- for (i = 0; i < ctx.num_multi_pack_indexes_before; i++) {
+ for (uint32_t i = 0; i < ctx.num_multi_pack_indexes_before; i++) {
uint32_t j = ctx.num_multi_pack_indexes_before - i - 1;
keep_hashes[j] = xstrdup(hash_to_hex_algop(get_midx_checksum(m),
@@ -1470,7 +1471,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
m = m->base_midx;
}
- for (i = 0; i < ctx.num_multi_pack_indexes_before + 1; i++)
+ for (uint32_t i = 0; i <= ctx.num_multi_pack_indexes_before; i++)
fprintf(get_lock_file_fp(&lk), "%s\n", keep_hashes[i]);
} else {
keep_hashes[ctx.num_multi_pack_indexes_before] =
@@ -1488,7 +1489,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
ctx.incremental);
cleanup:
- for (i = 0; i < ctx.nr; i++) {
+ for (size_t i = 0; i < ctx.nr; i++) {
if (ctx.info[i].p) {
close_pack(ctx.info[i].p);
free(ctx.info[i].p);
@@ -1501,7 +1502,7 @@ cleanup:
free(ctx.pack_perm);
free(ctx.pack_order);
if (keep_hashes) {
- for (i = 0; i < ctx.num_multi_pack_indexes_before + 1; i++)
+ for (uint32_t i = 0; i <= ctx.num_multi_pack_indexes_before; i++)
free((char *)keep_hashes[i]);
free(keep_hashes);
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v2 6/6] midx-write: simplify error cases
2025-08-30 21:23 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
` (4 preceding siblings ...)
2025-08-30 21:23 ` [PATCH v2 5/6] midx-write: reenable signed comparison errors Derrick Stolee via GitGitGadget
@ 2025-08-30 21:23 ` Derrick Stolee via GitGitGadget
2025-09-03 10:15 ` Patrick Steinhardt
2025-09-05 19:26 ` [PATCH v3 0/6] midx-write: fix segfault and do several cleanups Derrick Stolee via GitGitGadget
6 siblings, 1 reply; 45+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-08-30 21:23 UTC (permalink / raw)
To: git; +Cc: gitster, me, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
The write_midx_internal() method uses gotos to jump to a cleanup section to
clear memory before returning 'result'. Since these jumps are more common
for error conditions, initialize 'result' to -1 and then only set it to 0
before returning with success. There are a couple places where we return
with success via a jump.
This has the added benefit that the method now returns -1 on error instead
of an inconsistent 1 or -1.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
midx-write.c | 26 +++++++++-----------------
1 file changed, 9 insertions(+), 17 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index 14a0947c46..047ffdcdbf 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1046,7 +1046,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
int bitmapped_packs_concat_len = 0;
int pack_name_concat_len = 0;
int dropped_packs = 0;
- int result = 0;
+ int result = -1;
const char **keep_hashes = NULL;
struct chunkfile *cf;
@@ -1099,14 +1099,12 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
error(_("could not load reverse index for MIDX %s"),
hash_to_hex_algop(get_midx_checksum(m),
m->repo->hash_algo));
- result = 1;
goto cleanup;
}
ctx.num_multi_pack_indexes_before++;
m = m->base_midx;
}
} else if (ctx.m && fill_packs_from_midx(&ctx)) {
- result = 1;
goto cleanup;
}
@@ -1142,12 +1140,16 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
*/
if (!want_bitmap)
clear_midx_files_ext(object_dir, "bitmap", NULL);
+
+ result = 0;
goto cleanup;
}
}
- if (ctx.incremental && !ctx.nr)
+ if (ctx.incremental && !ctx.nr) {
+ result = 0;
goto cleanup; /* nothing to do */
+ }
if (preferred_pack_name) {
ctx.preferred_pack_idx = NO_PREFERRED_PACK;
@@ -1221,7 +1223,6 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
if (!preferred->num_objects) {
error(_("cannot select preferred pack %s with no objects"),
preferred->pack_name);
- result = 1;
goto cleanup;
}
}
@@ -1260,10 +1261,8 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
}
}
- if (missing_drops) {
- result = 1;
+ if (missing_drops)
goto cleanup;
- }
}
/*
@@ -1309,7 +1308,6 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
if (ctx.nr - dropped_packs == 0) {
error(_("no pack files to index."));
- result = 1;
goto cleanup;
}
@@ -1329,14 +1327,12 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
incr = mks_tempfile_m(midx_name.buf, 0444);
if (!incr) {
error(_("unable to create temporary MIDX layer"));
- result = -1;
goto cleanup;
}
if (adjust_shared_perm(r, get_tempfile_path(incr))) {
error(_("unable to adjust shared permissions for '%s'"),
get_tempfile_path(incr));
- result = -1;
goto cleanup;
}
@@ -1414,7 +1410,6 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
midx_hash, &pdata, commits, commits_nr,
flags) < 0) {
error(_("could not write multi-pack bitmap"));
- result = 1;
clear_packing_data(&pdata);
free(commits);
goto cleanup;
@@ -1440,21 +1435,17 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
if (!chainf) {
error_errno(_("unable to open multi-pack-index chain file"));
- result = -1;
goto cleanup;
}
- if (link_midx_to_chain(ctx.base_midx) < 0) {
- result = -1;
+ if (link_midx_to_chain(ctx.base_midx) < 0)
goto cleanup;
- }
get_split_midx_filename_ext(r->hash_algo, &final_midx_name,
object_dir, midx_hash, MIDX_EXT_MIDX);
if (rename_tempfile(&incr, final_midx_name.buf) < 0) {
error_errno(_("unable to rename new multi-pack-index layer"));
- result = -1;
goto cleanup;
}
@@ -1487,6 +1478,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
clear_midx_files(r, object_dir, keep_hashes,
ctx.num_multi_pack_indexes_before + 1,
ctx.incremental);
+ result = 0;
cleanup:
for (size_t i = 0; i < ctx.nr; i++) {
--
gitgitgadget
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH v2 1/6] midx-write: only load initialized packs
2025-08-30 21:23 ` [PATCH v2 1/6] midx-write: only load initialized packs Derrick Stolee via GitGitGadget
@ 2025-09-03 10:14 ` Patrick Steinhardt
2025-09-05 18:58 ` Derrick Stolee
0 siblings, 1 reply; 45+ messages in thread
From: Patrick Steinhardt @ 2025-09-03 10:14 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, gitster, me, Derrick Stolee
On Sat, Aug 30, 2025 at 09:23:22PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <stolee@gmail.com>
>
> The fill_packs_from_midx() method was refactored in fcb2205b77 (midx:
> implement support for writing incremental MIDX chains, 2024-08-06) to
> allow for preferred packfiles and incremental multi-pack-indexes.
> However, this led to some conditions that can cause improperly
> initialized memory in the context's list of packfiles.
>
> The conditions caring about the preferred pack name or the incremental
> flag are currently necessary to load a packfile. But the context is
> still being populated with pack_info structs based on the packfile array
> for the existing multi-pack-index even if prepare_midx_pack() isn't
> called.
I honestly don't quite understand why the conditions are necessary here.
In other words, why do we need to be careful _not_ to open the
packfiles?
> Add a new test that breaks under --stress when compiled with
> SANITIZE=address. The chosen number of 100 packfiles was selected to get
> the --stress output to fail about 50% of the time, while 50 packfiles
> could not get a failure in most --stress runs.
>
> The test case is marked as EXPENSIVE not only because of the number of
> packfiles it creates, but because some CI environments were reporting
> errors during the test that I could not reproduce, specifically around
> being unable to open the packfiles or their pack-indexes.
>
> When it fails under SANITIZE=address, it provides the following error:
>
> AddressSanitizer:DEADLYSIGNAL
> =================================================================
> ==3263517==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000027
> ==3263517==The signal is caused by a READ memory access.
> ==3263517==Hint: address points to the zero page.
> #0 0x562d5d82d1fb in close_pack_windows packfile.c:299
> #1 0x562d5d82d3ab in close_pack packfile.c:354
> #2 0x562d5d7bfdb4 in write_midx_internal midx-write.c:1490
> #3 0x562d5d7c7aec in midx_repack midx-write.c:1795
> #4 0x562d5d46fff6 in cmd_multi_pack_index builtin/multi-pack-index.c:305
> ...
>
> This failure stack trace is disconnected from the real fix because the bad
> pointers are accessed later when closing the packfiles from the context.
Okay. So in other words we need to make sure to always prepare the
MIDX'd packfiles, but we may not want to open them?
> There are a few different aspects to this fix that are worth noting:
>
> 1. We return to the previous behavior of fill_packs_from_midx to not
> rely on the incremental flag or existence of a preferred pack.
>
> 2. The behavior to scan all layers of an incremental midx is kept, so
> this is not a full revert of the change.
>
> 3. We skip allocating more room in the pack_info array if the pack
> fails prepare_midx_pack().
>
> 4. The method has always returned 0 for success and 1 for failure, but
> the condition checking for error added a check for a negative result
> for failure, so that is now updated.
Nit, feel free to ignore: this change feels like it would make for a
nice separate commit.
Patrick
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 2/6] midx-write: put failing response value back
2025-08-30 21:23 ` [PATCH v2 2/6] midx-write: put failing response value back Derrick Stolee via GitGitGadget
@ 2025-09-03 10:15 ` Patrick Steinhardt
2025-09-05 19:03 ` Derrick Stolee
0 siblings, 1 reply; 45+ messages in thread
From: Patrick Steinhardt @ 2025-09-03 10:15 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, gitster, me, Derrick Stolee
On Sat, Aug 30, 2025 at 09:23:23PM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/midx-write.c b/midx-write.c
> index 070a7f61f4..0f1d5653ab 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -1104,6 +1104,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
> m = m->base_midx;
> }
> } else if (ctx.m && fill_packs_from_midx(&ctx)) {
> + result = 1;
> goto cleanup;
> }
Would it make sense to also convert this command to return negative
error codes?
> diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> index 49705c62a2..008e65c22e 100755
> --- a/t/t5319-multi-pack-index.sh
> +++ b/t/t5319-multi-pack-index.sh
> @@ -1100,7 +1100,10 @@ test_expect_success 'load reverse index when missing .idx, .pack' '
> mv $idx.bak $idx &&
>
> mv $pack $pack.bak &&
> - git cat-file --batch-check="%(objectsize:disk)" <tip
> + git cat-file --batch-check="%(objectsize:disk)" <tip &&
> +
> + test_must_fail git multi-pack-index write 2>err &&
> + grep "could not load pack" err
Nit: this should probably use `test_grep`.
Patrick
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 3/6] midx-write: use cleanup when incremental midx fails
2025-08-30 21:23 ` [PATCH v2 3/6] midx-write: use cleanup when incremental midx fails Derrick Stolee via GitGitGadget
@ 2025-09-03 10:15 ` Patrick Steinhardt
0 siblings, 0 replies; 45+ messages in thread
From: Patrick Steinhardt @ 2025-09-03 10:15 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, gitster, me, Derrick Stolee
On Sat, Aug 30, 2025 at 09:23:24PM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/midx-write.c b/midx-write.c
> index 0f1d5653ab..cb0211289d 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -1327,13 +1327,15 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
> incr = mks_tempfile_m(midx_name.buf, 0444);
> if (!incr) {
> error(_("unable to create temporary MIDX layer"));
> - return -1;
> + result = -1;
> + goto cleanup;
We can instead use `result = error(...); goto cleanup;` here and in most
other cases.
Patrick
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 4/6] midx-write: use uint32_t for preferred_pack_idx
2025-08-30 21:23 ` [PATCH v2 4/6] midx-write: use uint32_t for preferred_pack_idx Derrick Stolee via GitGitGadget
@ 2025-09-03 10:15 ` Patrick Steinhardt
2025-09-05 19:05 ` Derrick Stolee
0 siblings, 1 reply; 45+ messages in thread
From: Patrick Steinhardt @ 2025-09-03 10:15 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, gitster, me, Derrick Stolee
On Sat, Aug 30, 2025 at 09:23:25PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <stolee@gmail.com>
>
> midx-write.c has the DISABLE_SIGN_COMPARE_WARNINGS macro defined for a
> few reasons, but the biggest one is the use of a signed
> preferred_pack_idx member inside the write_midx_context struct. The code
> currently uses -1 to indicate an unset preferred pack but pack int ids
> are normally handled as uint32_t. There are also a few loops that search
> for the preferred pack by name and those iterators will need updates to
> uint32_t in the next change.
>
> For now, replace the use of -1 with a 'NO_PREFERRED_PACK' macro and an
> equality check. The macro stores the max value of a uint32_t, so we
> cannot store a preferred pack that appears last in a list of 2^32 total
> packs, but that's expected to be unreasonable already. This improves the
> range from 2^31 already.
Tiny nit: the last sentence reads a bit funny. Maybe something like
this?
Furthermore, with this change we end up extending the range from
2^31 possible packs to 2^32-1.
> There are some careful things to worry about with initializing the
> preferred pack in the struct and using that value when searching for a
> preferred pack that was already incorrect but accidentally working when
> the index was initialized to zero.
>
> Signed-off-by: Derrick Stolee <stolee@gmail.com>
> ---
> midx-write.c | 26 +++++++++++++++-----------
> 1 file changed, 15 insertions(+), 11 deletions(-)
>
> diff --git a/midx-write.c b/midx-write.c
> index cb0211289d..1822268ce2 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -274,7 +275,7 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
> end = m->num_objects_in_base + ntohl(m->chunk_oid_fanout[cur_fanout]);
>
> for (cur_object = start; cur_object < end; cur_object++) {
> - if ((preferred_pack > -1) &&
> + if ((preferred_pack != NO_PREFERRED_PACK) &&
> (preferred_pack == nth_midxed_pack_int_id(m, cur_object))) {
> /*
> * Objects from preferred packs are added
Neither of these braces around comparisons are really needed, but feel
free to ignore as you simply piggy-back on existing style.
> @@ -1040,7 +1042,9 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
> struct hashfile *f = NULL;
> struct lock_file lk;
> struct tempfile *incr;
> - struct write_midx_context ctx = { 0 };
> + struct write_midx_context ctx = {
> + .preferred_pack_idx = NO_PREFERRED_PACK,
> + };
> int bitmapped_packs_concat_len = 0;
> int pack_name_concat_len = 0;
> int dropped_packs = 0;
Why is this change needed? We didn't previously initialize
`.preferred_pack_idx = -1` either.
Patrick
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 5/6] midx-write: reenable signed comparison errors
2025-08-30 21:23 ` [PATCH v2 5/6] midx-write: reenable signed comparison errors Derrick Stolee via GitGitGadget
@ 2025-09-03 10:15 ` Patrick Steinhardt
0 siblings, 0 replies; 45+ messages in thread
From: Patrick Steinhardt @ 2025-09-03 10:15 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, gitster, me, Derrick Stolee
On Sat, Aug 30, 2025 at 09:23:26PM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/midx-write.c b/midx-write.c
> index 1822268ce2..14a0947c46 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -1430,6 +1428,9 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
> * have been freed in the previous if block.
> */
>
> + if (ctx.num_multi_pack_indexes_before == UINT32_MAX)
> + die("too many multi-pack-indexes");
> +
> CALLOC_ARRAY(keep_hashes, ctx.num_multi_pack_indexes_before + 1);
>
> if (ctx.incremental) {
Should this error message be translated?
Everything else in this commit looks good to me. Thanks for cleaning
these up.
Patrick
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 6/6] midx-write: simplify error cases
2025-08-30 21:23 ` [PATCH v2 6/6] midx-write: simplify error cases Derrick Stolee via GitGitGadget
@ 2025-09-03 10:15 ` Patrick Steinhardt
2025-09-03 18:43 ` Junio C Hamano
0 siblings, 1 reply; 45+ messages in thread
From: Patrick Steinhardt @ 2025-09-03 10:15 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, gitster, me, Derrick Stolee
On Sat, Aug 30, 2025 at 09:23:27PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <stolee@gmail.com>
>
> The write_midx_internal() method uses gotos to jump to a cleanup section to
> clear memory before returning 'result'. Since these jumps are more common
> for error conditions, initialize 'result' to -1 and then only set it to 0
> before returning with success. There are a couple places where we return
> with success via a jump.
>
> This has the added benefit that the method now returns -1 on error instead
> of an inconsistent 1 or -1.
>
> Signed-off-by: Derrick Stolee <stolee@gmail.com>
> ---
> midx-write.c | 26 +++++++++-----------------
> 1 file changed, 9 insertions(+), 17 deletions(-)
>
> diff --git a/midx-write.c b/midx-write.c
> index 14a0947c46..047ffdcdbf 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -1046,7 +1046,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
> int bitmapped_packs_concat_len = 0;
> int pack_name_concat_len = 0;
> int dropped_packs = 0;
> - int result = 0;
> + int result = -1;
> const char **keep_hashes = NULL;
> struct chunkfile *cf;
I personally prefer to keep the result uninitialized and then assign the
result of `error()` to it. It's almost the same lines of code as we have
right now, but it has the advantage that the compiler will complain
about `result` being uninitialized if we ever forget to set it. So it's
overall way more explicit, and the compiler protects us.
But seeing that Junio previously recommended to go into the direction of
setting it to `-1` I won't insist on such a refactoring. So please feel
free to ignore this comment.
Patrick
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 6/6] midx-write: simplify error cases
2025-09-03 10:15 ` Patrick Steinhardt
@ 2025-09-03 18:43 ` Junio C Hamano
0 siblings, 0 replies; 45+ messages in thread
From: Junio C Hamano @ 2025-09-03 18:43 UTC (permalink / raw)
To: Patrick Steinhardt
Cc: Derrick Stolee via GitGitGadget, git, me, Derrick Stolee
Patrick Steinhardt <ps@pks.im> writes:
>> - int result = 0;
>> + int result = -1;
>> const char **keep_hashes = NULL;
>> struct chunkfile *cf;
>
> I personally prefer to keep the result uninitialized and then assign the
> result of `error()` to it. It's almost the same lines of code as we have
> right now, but it has the advantage that the compiler will complain
> about `result` being uninitialized if we ever forget to set it. So it's
> overall way more explicit, and the compiler protects us.
>
> But seeing that Junio previously recommended to go into the direction of
> setting it to `-1` I won't insist on such a refactoring. So please feel
> free to ignore this comment.
I am equally fine with uninitialized one, as long as compilers are
trustworthy in all cases. But the code path I made a comment IIRC
did not necessarily have calls to error(), so the same number of
code argument does not apply. And initializing it to zero is worse
than leaving it uninitialized.
Thanks.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 1/6] midx-write: only load initialized packs
2025-09-03 10:14 ` Patrick Steinhardt
@ 2025-09-05 18:58 ` Derrick Stolee
0 siblings, 0 replies; 45+ messages in thread
From: Derrick Stolee @ 2025-09-05 18:58 UTC (permalink / raw)
To: Patrick Steinhardt, Derrick Stolee via GitGitGadget; +Cc: git, gitster, me
On 9/3/2025 6:14 AM, Patrick Steinhardt wrote:
> On Sat, Aug 30, 2025 at 09:23:22PM +0000, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <stolee@gmail.com>
>>
>> The fill_packs_from_midx() method was refactored in fcb2205b77 (midx:
>> implement support for writing incremental MIDX chains, 2024-08-06) to
>> allow for preferred packfiles and incremental multi-pack-indexes.
>> However, this led to some conditions that can cause improperly
>> initialized memory in the context's list of packfiles.
>>
>> The conditions caring about the preferred pack name or the incremental
>> flag are currently necessary to load a packfile. But the context is
>> still being populated with pack_info structs based on the packfile array
>> for the existing multi-pack-index even if prepare_midx_pack() isn't
>> called.
>
> I honestly don't quite understand why the conditions are necessary here.
> In other words, why do we need to be careful _not_ to open the
> packfiles?
My wording is poor. "We don't load packfiles unless one of these
conditions holds" is more appropriate.
There are some test cases that want to keep things working even
when a .idx file disappears, I think. This is a reason to be
careful about open_pack_index(), but prepare_midx_pack() is
something we want to call always.
>> Add a new test that breaks under --stress when compiled with
>> SANITIZE=address. The chosen number of 100 packfiles was selected to get
>> the --stress output to fail about 50% of the time, while 50 packfiles
>> could not get a failure in most --stress runs.
>>
>> The test case is marked as EXPENSIVE not only because of the number of
>> packfiles it creates, but because some CI environments were reporting
>> errors during the test that I could not reproduce, specifically around
>> being unable to open the packfiles or their pack-indexes.
>>
>> When it fails under SANITIZE=address, it provides the following error:
>>
>> AddressSanitizer:DEADLYSIGNAL
>> =================================================================
>> ==3263517==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000027
>> ==3263517==The signal is caused by a READ memory access.
>> ==3263517==Hint: address points to the zero page.
>> #0 0x562d5d82d1fb in close_pack_windows packfile.c:299
>> #1 0x562d5d82d3ab in close_pack packfile.c:354
>> #2 0x562d5d7bfdb4 in write_midx_internal midx-write.c:1490
>> #3 0x562d5d7c7aec in midx_repack midx-write.c:1795
>> #4 0x562d5d46fff6 in cmd_multi_pack_index builtin/multi-pack-index.c:305
>> ...
>>
>> This failure stack trace is disconnected from the real fix because the bad
>> pointers are accessed later when closing the packfiles from the context.
>
> Okay. So in other words we need to make sure to always prepare the
> MIDX'd packfiles, but we may not want to open them?
Yes. Always prepare. Don't always open (since that loads the .idx).
>> There are a few different aspects to this fix that are worth noting:
>>
>> 1. We return to the previous behavior of fill_packs_from_midx to not
>> rely on the incremental flag or existence of a preferred pack.
>>
>> 2. The behavior to scan all layers of an incremental midx is kept, so
>> this is not a full revert of the change.
>>
>> 3. We skip allocating more room in the pack_info array if the pack
>> fails prepare_midx_pack().
>>
>> 4. The method has always returned 0 for success and 1 for failure, but
>> the condition checking for error added a check for a negative result
>> for failure, so that is now updated.
>
> Nit, feel free to ignore: this change feels like it would make for a
> nice separate commit.
True. I only included it since I was modifying the call anyway due to
the changing parameters.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 2/6] midx-write: put failing response value back
2025-09-03 10:15 ` Patrick Steinhardt
@ 2025-09-05 19:03 ` Derrick Stolee
0 siblings, 0 replies; 45+ messages in thread
From: Derrick Stolee @ 2025-09-05 19:03 UTC (permalink / raw)
To: Patrick Steinhardt, Derrick Stolee via GitGitGadget; +Cc: git, gitster, me
On 9/3/2025 6:15 AM, Patrick Steinhardt wrote:
> On Sat, Aug 30, 2025 at 09:23:23PM +0000, Derrick Stolee via GitGitGadget wrote:
>> diff --git a/midx-write.c b/midx-write.c
>> index 070a7f61f4..0f1d5653ab 100644
>> --- a/midx-write.c
>> +++ b/midx-write.c
>> @@ -1104,6 +1104,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
>> m = m->base_midx;
>> }
>> } else if (ctx.m && fill_packs_from_midx(&ctx)) {
>> + result = 1;
>> goto cleanup;
>> }
>
> Would it make sense to also convert this command to return negative
> error codes?
(Yes, in patch 6)
>> diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
>> index 49705c62a2..008e65c22e 100755
>> --- a/t/t5319-multi-pack-index.sh
>> +++ b/t/t5319-multi-pack-index.sh
>> @@ -1100,7 +1100,10 @@ test_expect_success 'load reverse index when missing .idx, .pack' '
>> mv $idx.bak $idx &&
>>
>> mv $pack $pack.bak &&
>> - git cat-file --batch-check="%(objectsize:disk)" <tip
>> + git cat-file --batch-check="%(objectsize:disk)" <tip &&
>> +
>> + test_must_fail git multi-pack-index write 2>err &&
>> + grep "could not load pack" err
>
> Nit: this should probably use `test_grep`.
You're right. I think I misremembered the preferred use back when
test_i18ngrep was deprecated.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2 4/6] midx-write: use uint32_t for preferred_pack_idx
2025-09-03 10:15 ` Patrick Steinhardt
@ 2025-09-05 19:05 ` Derrick Stolee
0 siblings, 0 replies; 45+ messages in thread
From: Derrick Stolee @ 2025-09-05 19:05 UTC (permalink / raw)
To: Patrick Steinhardt, Derrick Stolee via GitGitGadget; +Cc: git, gitster, me
On 9/3/2025 6:15 AM, Patrick Steinhardt wrote:
> On Sat, Aug 30, 2025 at 09:23:25PM +0000, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <stolee@gmail.com>
>>
>> midx-write.c has the DISABLE_SIGN_COMPARE_WARNINGS macro defined for a
>> few reasons, but the biggest one is the use of a signed
>> preferred_pack_idx member inside the write_midx_context struct. The code
>> currently uses -1 to indicate an unset preferred pack but pack int ids
>> are normally handled as uint32_t. There are also a few loops that search
>> for the preferred pack by name and those iterators will need updates to
>> uint32_t in the next change.
>>
>> For now, replace the use of -1 with a 'NO_PREFERRED_PACK' macro and an
>> equality check. The macro stores the max value of a uint32_t, so we
>> cannot store a preferred pack that appears last in a list of 2^32 total
>> packs, but that's expected to be unreasonable already. This improves the
>> range from 2^31 already.
>
> Tiny nit: the last sentence reads a bit funny. Maybe something like
> this?
>
> Furthermore, with this change we end up extending the range from
> 2^31 possible packs to 2^32-1.
That is better.
>> @@ -1040,7 +1042,9 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
>> struct hashfile *f = NULL;
>> struct lock_file lk;
>> struct tempfile *incr;
>> - struct write_midx_context ctx = { 0 };
>> + struct write_midx_context ctx = {
>> + .preferred_pack_idx = NO_PREFERRED_PACK,
>> + };
>> int bitmapped_packs_concat_len = 0;
>> int pack_name_concat_len = 0;
>> int dropped_packs = 0;
>
> Why is this change needed? We didn't previously initialize
> `.preferred_pack_idx = -1` either.
I think the previous lack of initialization was incorrect. It happened
to work because it became initialized to -1 later _or_ its value of
zero was implicitly used when searching for a preferred pack.
I thought it prudent to set this value as the default instead of
implying that the 0th packfile was preferred.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH v3 0/6] midx-write: fix segfault and do several cleanups
2025-08-30 21:23 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
` (5 preceding siblings ...)
2025-08-30 21:23 ` [PATCH v2 6/6] midx-write: simplify error cases Derrick Stolee via GitGitGadget
@ 2025-09-05 19:26 ` Derrick Stolee via GitGitGadget
2025-09-05 19:26 ` [PATCH v3 1/6] midx-write: only load initialized packs Derrick Stolee via GitGitGadget
` (6 more replies)
6 siblings, 7 replies; 45+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-09-05 19:26 UTC (permalink / raw)
To: git; +Cc: gitster, me, Patrick Steinhardt, Derrick Stolee
I was motivated to start looking closely at midx-write.c due to multiple
users reporting Git crashes in their background maintenance, specifically
during git multi-pack-index repack calls. I was eventually able to reproduce
it in git multi-pack-index expire as well.
Patch 1 is the only change we need to fix this bug. It includes a test case
that will fail under --stress with SANITIZE=address. It requires creating
many packfiles (50 was not enough, but 100 is enough). As far as I can tell,
this bug has existed since Git 2.47.0 in October 2024, but I started hearing
reports of this from users in July 2025 (and took a while to get a
dump/repro).
The remaining patches are cleanups based on my careful rereading of
midx-write.c. There are some issues about error handling that needed some
cleanup as well as a removal of the DISABLE_SIGN_COMPARE_WARNINGS macro.
Updates in V3
=============
* Use test_grep over grep.
* Translate an error message.
* Clarify a commit message.
Updates in V2
=============
* A stale comment to an unsubmitted version of the test is removed.
* More cases needing open_pack_index() are patched.
* Typos fixed.
* A new patch assumes error and sets result to zero only on the few
successful paths.
Thanks, -Stolee
Derrick Stolee (6):
midx-write: only load initialized packs
midx-write: put failing response value back
midx-write: use cleanup when incremental midx fails
midx-write: use uint32_t for preferred_pack_idx
midx-write: reenable signed comparison errors
midx-write: simplify error cases
midx-write.c | 134 +++++++++++++++++-------------------
t/t5319-multi-pack-index.sh | 22 +++++-
2 files changed, 86 insertions(+), 70 deletions(-)
base-commit: c44beea485f0f2feaf460e2ac87fdd5608d63cf0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1965%2Fderrickstolee%2Fmidx-write-cleanup-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1965/derrickstolee/midx-write-cleanup-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1965
Range-diff vs v2:
1: e02a444315 = 1: e02a444315 midx-write: only load initialized packs
2: a1dd3ed874 ! 2: 1e5f43a417 midx-write: put failing response value back
@@ t/t5319-multi-pack-index.sh: test_expect_success 'load reverse index when missin
+ git cat-file --batch-check="%(objectsize:disk)" <tip &&
+
+ test_must_fail git multi-pack-index write 2>err &&
-+ grep "could not load pack" err
++ test_grep "could not load pack" err
)
'
3: c4f75cca09 = 3: 414ae51024 midx-write: use cleanup when incremental midx fails
4: 2290e27ded ! 4: b113b3f012 midx-write: use uint32_t for preferred_pack_idx
@@ Commit message
For now, replace the use of -1 with a 'NO_PREFERRED_PACK' macro and an
equality check. The macro stores the max value of a uint32_t, so we
cannot store a preferred pack that appears last in a list of 2^32 total
- packs, but that's expected to be unreasonable already. This improves the
- range from 2^31 already.
+ packs, but that's expected to be unreasonable already. Furthermore, with
+ this change we end up extending the range from 2^31 possible packs to
+ 2^32-1.
There are some careful things to worry about with initializing the
preferred pack in the struct and using that value when searching for a
5: 35302f5228 ! 5: 7c68f2535c midx-write: reenable signed comparison errors
@@ midx-write.c: static int write_midx_internal(struct repository *r, const char *o
*/
+ if (ctx.num_multi_pack_indexes_before == UINT32_MAX)
-+ die("too many multi-pack-indexes");
++ die(_("too many multi-pack-indexes"));
+
CALLOC_ARRAY(keep_hashes, ctx.num_multi_pack_indexes_before + 1);
6: 7be25cf534 = 6: 224be4ee5c midx-write: simplify error cases
--
gitgitgadget
^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH v3 1/6] midx-write: only load initialized packs
2025-09-05 19:26 ` [PATCH v3 0/6] midx-write: fix segfault and do several cleanups Derrick Stolee via GitGitGadget
@ 2025-09-05 19:26 ` Derrick Stolee via GitGitGadget
2025-09-05 19:26 ` [PATCH v3 2/6] midx-write: put failing response value back Derrick Stolee via GitGitGadget
` (5 subsequent siblings)
6 siblings, 0 replies; 45+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-09-05 19:26 UTC (permalink / raw)
To: git; +Cc: gitster, me, Patrick Steinhardt, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
The fill_packs_from_midx() method was refactored in fcb2205b77 (midx:
implement support for writing incremental MIDX chains, 2024-08-06) to
allow for preferred packfiles and incremental multi-pack-indexes.
However, this led to some conditions that can cause improperly
initialized memory in the context's list of packfiles.
The conditions caring about the preferred pack name or the incremental
flag are currently necessary to load a packfile. But the context is
still being populated with pack_info structs based on the packfile array
for the existing multi-pack-index even if prepare_midx_pack() isn't
called.
Add a new test that breaks under --stress when compiled with
SANITIZE=address. The chosen number of 100 packfiles was selected to get
the --stress output to fail about 50% of the time, while 50 packfiles
could not get a failure in most --stress runs.
The test case is marked as EXPENSIVE not only because of the number of
packfiles it creates, but because some CI environments were reporting
errors during the test that I could not reproduce, specifically around
being unable to open the packfiles or their pack-indexes.
When it fails under SANITIZE=address, it provides the following error:
AddressSanitizer:DEADLYSIGNAL
=================================================================
==3263517==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000027
==3263517==The signal is caused by a READ memory access.
==3263517==Hint: address points to the zero page.
#0 0x562d5d82d1fb in close_pack_windows packfile.c:299
#1 0x562d5d82d3ab in close_pack packfile.c:354
#2 0x562d5d7bfdb4 in write_midx_internal midx-write.c:1490
#3 0x562d5d7c7aec in midx_repack midx-write.c:1795
#4 0x562d5d46fff6 in cmd_multi_pack_index builtin/multi-pack-index.c:305
...
This failure stack trace is disconnected from the real fix because the bad
pointers are accessed later when closing the packfiles from the context.
There are a few different aspects to this fix that are worth noting:
1. We return to the previous behavior of fill_packs_from_midx to not
rely on the incremental flag or existence of a preferred pack.
2. The behavior to scan all layers of an incremental midx is kept, so
this is not a full revert of the change.
3. We skip allocating more room in the pack_info array if the pack
fails prepare_midx_pack().
4. The method has always returned 0 for success and 1 for failure, but
the condition checking for error added a check for a negative result
for failure, so that is now updated.
5. The call to open_pack_index() is removed, but this is needed later
in the case of a preferred pack. That call is moved to immediately
before its result is needed (checking for the object count).
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
midx-write.c | 46 +++++++++++++++----------------------
t/t5319-multi-pack-index.sh | 17 ++++++++++++++
2 files changed, 36 insertions(+), 27 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index a0aceab5e0..070a7f61f4 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -920,8 +920,7 @@ static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
return get_multi_pack_index(source);
}
-static int fill_packs_from_midx(struct write_midx_context *ctx,
- const char *preferred_pack_name, uint32_t flags)
+static int fill_packs_from_midx(struct write_midx_context *ctx)
{
struct multi_pack_index *m;
@@ -929,30 +928,11 @@ static int fill_packs_from_midx(struct write_midx_context *ctx,
uint32_t i;
for (i = 0; i < m->num_packs; i++) {
- ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
-
- /*
- * If generating a reverse index, need to have
- * packed_git's loaded to compare their
- * mtimes and object count.
- *
- * If a preferred pack is specified, need to
- * have packed_git's loaded to ensure the chosen
- * preferred pack has a non-zero object count.
- */
- if (flags & MIDX_WRITE_REV_INDEX ||
- preferred_pack_name) {
- if (prepare_midx_pack(ctx->repo, m,
- m->num_packs_in_base + i)) {
- error(_("could not load pack"));
- return 1;
- }
-
- if (open_pack_index(m->packs[i]))
- die(_("could not open index for %s"),
- m->packs[i]->pack_name);
- }
+ if (prepare_midx_pack(ctx->repo, m,
+ m->num_packs_in_base + i))
+ return error(_("could not load pack"));
+ ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
fill_pack_info(&ctx->info[ctx->nr++], m->packs[i],
m->pack_names[i],
m->num_packs_in_base + i);
@@ -1123,8 +1103,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
ctx.num_multi_pack_indexes_before++;
m = m->base_midx;
}
- } else if (ctx.m && fill_packs_from_midx(&ctx, preferred_pack_name,
- flags) < 0) {
+ } else if (ctx.m && fill_packs_from_midx(&ctx)) {
goto cleanup;
}
@@ -1186,6 +1165,13 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
struct packed_git *oldest = ctx.info[ctx.preferred_pack_idx].p;
ctx.preferred_pack_idx = 0;
+ /*
+ * Attempt opening the pack index to populate num_objects.
+ * Ignore failiures as they can be expected and are not
+ * fatal during this selection time.
+ */
+ open_pack_index(oldest);
+
if (packs_to_drop && packs_to_drop->nr)
BUG("cannot write a MIDX bitmap during expiration");
@@ -1200,6 +1186,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
if (!oldest->num_objects || p->mtime < oldest->mtime) {
oldest = p;
+ open_pack_index(oldest);
ctx.preferred_pack_idx = i;
}
}
@@ -1223,6 +1210,11 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
if (ctx.preferred_pack_idx > -1) {
struct packed_git *preferred = ctx.info[ctx.preferred_pack_idx].p;
+
+ if (open_pack_index(preferred))
+ die(_("failed to open preferred pack %s"),
+ ctx.info[ctx.preferred_pack_idx].pack_name);
+
if (!preferred->num_objects) {
error(_("cannot select preferred pack %s with no objects"),
preferred->pack_name);
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index bd75dea950..49705c62a2 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -989,6 +989,23 @@ test_expect_success 'repack --batch-size=0 repacks everything' '
)
'
+test_expect_success EXPENSIVE 'repack/expire with many packs' '
+ cp -r dup many &&
+ (
+ cd many &&
+
+ for i in $(test_seq 1 100)
+ do
+ test_commit extra$i &&
+ git maintenance run --task=loose-objects || return 1
+ done &&
+
+ git multi-pack-index write &&
+ git multi-pack-index repack &&
+ git multi-pack-index expire
+ )
+'
+
test_expect_success 'repack --batch-size=<large> repacks everything' '
(
cd dup2 &&
--
gitgitgadget
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v3 2/6] midx-write: put failing response value back
2025-09-05 19:26 ` [PATCH v3 0/6] midx-write: fix segfault and do several cleanups Derrick Stolee via GitGitGadget
2025-09-05 19:26 ` [PATCH v3 1/6] midx-write: only load initialized packs Derrick Stolee via GitGitGadget
@ 2025-09-05 19:26 ` Derrick Stolee via GitGitGadget
2025-09-05 19:26 ` [PATCH v3 3/6] midx-write: use cleanup when incremental midx fails Derrick Stolee via GitGitGadget
` (4 subsequent siblings)
6 siblings, 0 replies; 45+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-09-05 19:26 UTC (permalink / raw)
To: git; +Cc: gitster, me, Patrick Steinhardt, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
This instance of setting the result to 1 before going to cleanup was
accidentally removed in fcb2205b77 (midx: implement support for writing
incremental MIDX chains, 2024-08-06). Build upon a test that already deletes
a packfile to verify that this error propagates to full command failure.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
midx-write.c | 1 +
t/t5319-multi-pack-index.sh | 5 ++++-
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/midx-write.c b/midx-write.c
index 070a7f61f4..0f1d5653ab 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1104,6 +1104,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
m = m->base_midx;
}
} else if (ctx.m && fill_packs_from_midx(&ctx)) {
+ result = 1;
goto cleanup;
}
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 49705c62a2..2c22fdb931 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -1100,7 +1100,10 @@ test_expect_success 'load reverse index when missing .idx, .pack' '
mv $idx.bak $idx &&
mv $pack $pack.bak &&
- git cat-file --batch-check="%(objectsize:disk)" <tip
+ git cat-file --batch-check="%(objectsize:disk)" <tip &&
+
+ test_must_fail git multi-pack-index write 2>err &&
+ test_grep "could not load pack" err
)
'
--
gitgitgadget
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v3 3/6] midx-write: use cleanup when incremental midx fails
2025-09-05 19:26 ` [PATCH v3 0/6] midx-write: fix segfault and do several cleanups Derrick Stolee via GitGitGadget
2025-09-05 19:26 ` [PATCH v3 1/6] midx-write: only load initialized packs Derrick Stolee via GitGitGadget
2025-09-05 19:26 ` [PATCH v3 2/6] midx-write: put failing response value back Derrick Stolee via GitGitGadget
@ 2025-09-05 19:26 ` Derrick Stolee via GitGitGadget
2025-09-05 19:26 ` [PATCH v3 4/6] midx-write: use uint32_t for preferred_pack_idx Derrick Stolee via GitGitGadget
` (3 subsequent siblings)
6 siblings, 0 replies; 45+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-09-05 19:26 UTC (permalink / raw)
To: git; +Cc: gitster, me, Patrick Steinhardt, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
The incremental mode of writing a multi-pack-index has a few extra
conditions that could lead to failure, but these are currently
short-ciruiting with 'return -1' instead of setting the method's
'result' variable and going to the cleanup tag.
Replace these returns with gotos to avoid memory issues when exiting
early due to error conditions.
Unfortunately, these error conditions are difficult to reproduce with
test cases, which is perhaps one reason why the memory loss was not
caught by existing test cases in memory tracking modes.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
midx-write.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index 0f1d5653ab..cb0211289d 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1327,13 +1327,15 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
incr = mks_tempfile_m(midx_name.buf, 0444);
if (!incr) {
error(_("unable to create temporary MIDX layer"));
- return -1;
+ result = -1;
+ goto cleanup;
}
if (adjust_shared_perm(r, get_tempfile_path(incr))) {
error(_("unable to adjust shared permissions for '%s'"),
get_tempfile_path(incr));
- return -1;
+ result = -1;
+ goto cleanup;
}
f = hashfd(r->hash_algo, get_tempfile_fd(incr),
@@ -1433,18 +1435,22 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
if (!chainf) {
error_errno(_("unable to open multi-pack-index chain file"));
- return -1;
+ result = -1;
+ goto cleanup;
}
- if (link_midx_to_chain(ctx.base_midx) < 0)
- return -1;
+ if (link_midx_to_chain(ctx.base_midx) < 0) {
+ result = -1;
+ goto cleanup;
+ }
get_split_midx_filename_ext(r->hash_algo, &final_midx_name,
object_dir, midx_hash, MIDX_EXT_MIDX);
if (rename_tempfile(&incr, final_midx_name.buf) < 0) {
error_errno(_("unable to rename new multi-pack-index layer"));
- return -1;
+ result = -1;
+ goto cleanup;
}
strbuf_release(&final_midx_name);
--
gitgitgadget
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v3 4/6] midx-write: use uint32_t for preferred_pack_idx
2025-09-05 19:26 ` [PATCH v3 0/6] midx-write: fix segfault and do several cleanups Derrick Stolee via GitGitGadget
` (2 preceding siblings ...)
2025-09-05 19:26 ` [PATCH v3 3/6] midx-write: use cleanup when incremental midx fails Derrick Stolee via GitGitGadget
@ 2025-09-05 19:26 ` Derrick Stolee via GitGitGadget
2025-09-05 19:26 ` [PATCH v3 5/6] midx-write: reenable signed comparison errors Derrick Stolee via GitGitGadget
` (2 subsequent siblings)
6 siblings, 0 replies; 45+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-09-05 19:26 UTC (permalink / raw)
To: git; +Cc: gitster, me, Patrick Steinhardt, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
midx-write.c has the DISABLE_SIGN_COMPARE_WARNINGS macro defined for a
few reasons, but the biggest one is the use of a signed
preferred_pack_idx member inside the write_midx_context struct. The code
currently uses -1 to indicate an unset preferred pack but pack int ids
are normally handled as uint32_t. There are also a few loops that search
for the preferred pack by name and those iterators will need updates to
uint32_t in the next change.
For now, replace the use of -1 with a 'NO_PREFERRED_PACK' macro and an
equality check. The macro stores the max value of a uint32_t, so we
cannot store a preferred pack that appears last in a list of 2^32 total
packs, but that's expected to be unreasonable already. Furthermore, with
this change we end up extending the range from 2^31 possible packs to
2^32-1.
There are some careful things to worry about with initializing the
preferred pack in the struct and using that value when searching for a
preferred pack that was already incorrect but accidentally working when
the index was initialized to zero.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
midx-write.c | 26 +++++++++++++++-----------
1 file changed, 15 insertions(+), 11 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index cb0211289d..1822268ce2 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -24,6 +24,7 @@
#define BITMAP_POS_UNKNOWN (~((uint32_t)0))
#define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256)
#define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t))
+#define NO_PREFERRED_PACK (~((uint32_t)0))
extern int midx_checksum_valid(struct multi_pack_index *m);
extern void clear_midx_files_ext(const char *object_dir, const char *ext,
@@ -104,7 +105,7 @@ struct write_midx_context {
unsigned large_offsets_needed:1;
uint32_t num_large_offsets;
- int preferred_pack_idx;
+ uint32_t preferred_pack_idx;
int incremental;
uint32_t num_multi_pack_indexes_before;
@@ -260,7 +261,7 @@ static void midx_fanout_sort(struct midx_fanout *fanout)
static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
struct multi_pack_index *m,
uint32_t cur_fanout,
- int preferred_pack)
+ uint32_t preferred_pack)
{
uint32_t start = m->num_objects_in_base, end;
uint32_t cur_object;
@@ -274,7 +275,7 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
end = m->num_objects_in_base + ntohl(m->chunk_oid_fanout[cur_fanout]);
for (cur_object = start; cur_object < end; cur_object++) {
- if ((preferred_pack > -1) &&
+ if ((preferred_pack != NO_PREFERRED_PACK) &&
(preferred_pack == nth_midxed_pack_int_id(m, cur_object))) {
/*
* Objects from preferred packs are added
@@ -364,7 +365,8 @@ static void compute_sorted_entries(struct write_midx_context *ctx,
preferred, cur_fanout);
}
- if (-1 < ctx->preferred_pack_idx && ctx->preferred_pack_idx < start_pack)
+ if (ctx->preferred_pack_idx != NO_PREFERRED_PACK &&
+ ctx->preferred_pack_idx < start_pack)
midx_fanout_add_pack_fanout(&fanout, ctx->info,
ctx->preferred_pack_idx, 1,
cur_fanout);
@@ -1040,7 +1042,9 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
struct hashfile *f = NULL;
struct lock_file lk;
struct tempfile *incr;
- struct write_midx_context ctx = { 0 };
+ struct write_midx_context ctx = {
+ .preferred_pack_idx = NO_PREFERRED_PACK,
+ };
int bitmapped_packs_concat_len = 0;
int pack_name_concat_len = 0;
int dropped_packs = 0;
@@ -1148,7 +1152,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
goto cleanup; /* nothing to do */
if (preferred_pack_name) {
- ctx.preferred_pack_idx = -1;
+ ctx.preferred_pack_idx = NO_PREFERRED_PACK;
for (i = 0; i < ctx.nr; i++) {
if (!cmp_idx_or_pack_name(preferred_pack_name,
@@ -1158,12 +1162,12 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
}
}
- if (ctx.preferred_pack_idx == -1)
+ if (ctx.preferred_pack_idx == NO_PREFERRED_PACK)
warning(_("unknown preferred pack: '%s'"),
preferred_pack_name);
} else if (ctx.nr &&
(flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))) {
- struct packed_git *oldest = ctx.info[ctx.preferred_pack_idx].p;
+ struct packed_git *oldest = ctx.info[0].p;
ctx.preferred_pack_idx = 0;
/*
@@ -1199,17 +1203,17 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
* objects to resolve, so the preferred value doesn't
* matter.
*/
- ctx.preferred_pack_idx = -1;
+ ctx.preferred_pack_idx = NO_PREFERRED_PACK;
}
} else {
/*
* otherwise don't mark any pack as preferred to avoid
* interfering with expiration logic below
*/
- ctx.preferred_pack_idx = -1;
+ ctx.preferred_pack_idx = NO_PREFERRED_PACK;
}
- if (ctx.preferred_pack_idx > -1) {
+ if (ctx.preferred_pack_idx != NO_PREFERRED_PACK) {
struct packed_git *preferred = ctx.info[ctx.preferred_pack_idx].p;
if (open_pack_index(preferred))
--
gitgitgadget
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v3 5/6] midx-write: reenable signed comparison errors
2025-09-05 19:26 ` [PATCH v3 0/6] midx-write: fix segfault and do several cleanups Derrick Stolee via GitGitGadget
` (3 preceding siblings ...)
2025-09-05 19:26 ` [PATCH v3 4/6] midx-write: use uint32_t for preferred_pack_idx Derrick Stolee via GitGitGadget
@ 2025-09-05 19:26 ` Derrick Stolee via GitGitGadget
2025-09-05 19:26 ` [PATCH v3 6/6] midx-write: simplify error cases Derrick Stolee via GitGitGadget
2025-09-05 19:38 ` [PATCH v3 0/6] midx-write: fix segfault and do several cleanups Junio C Hamano
6 siblings, 0 replies; 45+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-09-05 19:26 UTC (permalink / raw)
To: git; +Cc: gitster, me, Patrick Steinhardt, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
Remove the remaining signed comparison warnings in midx-write.c so that
they can be enforced as errors in the future. After the previous change,
the remaining errors are due to iterator variables named 'i'.
The strategy here involves defining the variable within the for loop
syntax to make sure we use the appropriate bitness for the loop
sentinel. This matters in at least one method where the variable was
compared to uint32_t in some loops and size_t in others.
While adjusting these loops, there were some where the loop boundary was
checking against a uint32_t value _plus one_. These were replaced with
non-strict comparisons, but also the value is checked to not be
UINT32_MAX. Since the value is the number of incremental multi-pack-
indexes, this is not a meaningful restriction. The new die() is about
defensive programming more than it being realistically possible.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
midx-write.c | 35 ++++++++++++++++++-----------------
1 file changed, 18 insertions(+), 17 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index 1822268ce2..cd7bf7554a 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1,5 +1,3 @@
-#define DISABLE_SIGN_COMPARE_WARNINGS
-
#include "git-compat-util.h"
#include "abspath.h"
#include "config.h"
@@ -845,7 +843,7 @@ static int write_midx_bitmap(struct write_midx_context *ctx,
uint32_t commits_nr,
unsigned flags)
{
- int ret, i;
+ int ret;
uint16_t options = 0;
struct bitmap_writer writer;
struct pack_idx_entry **index;
@@ -873,7 +871,7 @@ static int write_midx_bitmap(struct write_midx_context *ctx,
* this order).
*/
ALLOC_ARRAY(index, pdata->nr_objects);
- for (i = 0; i < pdata->nr_objects; i++)
+ for (uint32_t i = 0; i < pdata->nr_objects; i++)
index[i] = &pdata->objects[i].idx;
bitmap_writer_init(&writer, ctx->repo, pdata,
@@ -894,7 +892,7 @@ static int write_midx_bitmap(struct write_midx_context *ctx,
* happens between bitmap_writer_build_type_index() and
* bitmap_writer_finish().
*/
- for (i = 0; i < pdata->nr_objects; i++)
+ for (uint32_t i = 0; i < pdata->nr_objects; i++)
index[ctx->pack_order[i]] = &pdata->objects[i].idx;
bitmap_writer_select_commits(&writer, commits, commits_nr);
@@ -1038,7 +1036,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
{
struct strbuf midx_name = STRBUF_INIT;
unsigned char midx_hash[GIT_MAX_RAWSZ];
- uint32_t i, start_pack;
+ uint32_t start_pack;
struct hashfile *f = NULL;
struct lock_file lk;
struct tempfile *incr;
@@ -1154,7 +1152,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
if (preferred_pack_name) {
ctx.preferred_pack_idx = NO_PREFERRED_PACK;
- for (i = 0; i < ctx.nr; i++) {
+ for (size_t i = 0; i < ctx.nr; i++) {
if (!cmp_idx_or_pack_name(preferred_pack_name,
ctx.info[i].pack_name)) {
ctx.preferred_pack_idx = i;
@@ -1186,7 +1184,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
* pack-order has all of its objects selected from that pack
* (and not another pack containing a duplicate)
*/
- for (i = 1; i < ctx.nr; i++) {
+ for (size_t i = 1; i < ctx.nr; i++) {
struct packed_git *p = ctx.info[i].p;
if (!oldest->num_objects || p->mtime < oldest->mtime) {
@@ -1231,7 +1229,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
compute_sorted_entries(&ctx, start_pack);
ctx.large_offsets_needed = 0;
- for (i = 0; i < ctx.entries_nr; i++) {
+ for (size_t i = 0; i < ctx.entries_nr; i++) {
if (ctx.entries[i].offset > 0x7fffffff)
ctx.num_large_offsets++;
if (ctx.entries[i].offset > 0xffffffff)
@@ -1241,10 +1239,10 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
QSORT(ctx.info, ctx.nr, pack_info_compare);
if (packs_to_drop && packs_to_drop->nr) {
- int drop_index = 0;
+ size_t drop_index = 0;
int missing_drops = 0;
- for (i = 0; i < ctx.nr && drop_index < packs_to_drop->nr; i++) {
+ for (size_t i = 0; i < ctx.nr && drop_index < packs_to_drop->nr; i++) {
int cmp = strcmp(ctx.info[i].pack_name,
packs_to_drop->items[drop_index].string);
@@ -1275,7 +1273,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
* pack_perm[old_id] = new_id
*/
ALLOC_ARRAY(ctx.pack_perm, ctx.nr);
- for (i = 0; i < ctx.nr; i++) {
+ for (size_t i = 0; i < ctx.nr; i++) {
if (ctx.info[i].expired) {
dropped_packs++;
ctx.pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED;
@@ -1284,7 +1282,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
}
}
- for (i = 0; i < ctx.nr; i++) {
+ for (size_t i = 0; i < ctx.nr; i++) {
if (ctx.info[i].expired)
continue;
pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1;
@@ -1430,6 +1428,9 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
* have been freed in the previous if block.
*/
+ if (ctx.num_multi_pack_indexes_before == UINT32_MAX)
+ die(_("too many multi-pack-indexes"));
+
CALLOC_ARRAY(keep_hashes, ctx.num_multi_pack_indexes_before + 1);
if (ctx.incremental) {
@@ -1462,7 +1463,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
keep_hashes[ctx.num_multi_pack_indexes_before] =
xstrdup(hash_to_hex_algop(midx_hash, r->hash_algo));
- for (i = 0; i < ctx.num_multi_pack_indexes_before; i++) {
+ for (uint32_t i = 0; i < ctx.num_multi_pack_indexes_before; i++) {
uint32_t j = ctx.num_multi_pack_indexes_before - i - 1;
keep_hashes[j] = xstrdup(hash_to_hex_algop(get_midx_checksum(m),
@@ -1470,7 +1471,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
m = m->base_midx;
}
- for (i = 0; i < ctx.num_multi_pack_indexes_before + 1; i++)
+ for (uint32_t i = 0; i <= ctx.num_multi_pack_indexes_before; i++)
fprintf(get_lock_file_fp(&lk), "%s\n", keep_hashes[i]);
} else {
keep_hashes[ctx.num_multi_pack_indexes_before] =
@@ -1488,7 +1489,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
ctx.incremental);
cleanup:
- for (i = 0; i < ctx.nr; i++) {
+ for (size_t i = 0; i < ctx.nr; i++) {
if (ctx.info[i].p) {
close_pack(ctx.info[i].p);
free(ctx.info[i].p);
@@ -1501,7 +1502,7 @@ cleanup:
free(ctx.pack_perm);
free(ctx.pack_order);
if (keep_hashes) {
- for (i = 0; i < ctx.num_multi_pack_indexes_before + 1; i++)
+ for (uint32_t i = 0; i <= ctx.num_multi_pack_indexes_before; i++)
free((char *)keep_hashes[i]);
free(keep_hashes);
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v3 6/6] midx-write: simplify error cases
2025-09-05 19:26 ` [PATCH v3 0/6] midx-write: fix segfault and do several cleanups Derrick Stolee via GitGitGadget
` (4 preceding siblings ...)
2025-09-05 19:26 ` [PATCH v3 5/6] midx-write: reenable signed comparison errors Derrick Stolee via GitGitGadget
@ 2025-09-05 19:26 ` Derrick Stolee via GitGitGadget
2025-09-05 19:38 ` [PATCH v3 0/6] midx-write: fix segfault and do several cleanups Junio C Hamano
6 siblings, 0 replies; 45+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-09-05 19:26 UTC (permalink / raw)
To: git; +Cc: gitster, me, Patrick Steinhardt, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
The write_midx_internal() method uses gotos to jump to a cleanup section to
clear memory before returning 'result'. Since these jumps are more common
for error conditions, initialize 'result' to -1 and then only set it to 0
before returning with success. There are a couple places where we return
with success via a jump.
This has the added benefit that the method now returns -1 on error instead
of an inconsistent 1 or -1.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
midx-write.c | 26 +++++++++-----------------
1 file changed, 9 insertions(+), 17 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index cd7bf7554a..72189f74bb 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1046,7 +1046,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
int bitmapped_packs_concat_len = 0;
int pack_name_concat_len = 0;
int dropped_packs = 0;
- int result = 0;
+ int result = -1;
const char **keep_hashes = NULL;
struct chunkfile *cf;
@@ -1099,14 +1099,12 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
error(_("could not load reverse index for MIDX %s"),
hash_to_hex_algop(get_midx_checksum(m),
m->repo->hash_algo));
- result = 1;
goto cleanup;
}
ctx.num_multi_pack_indexes_before++;
m = m->base_midx;
}
} else if (ctx.m && fill_packs_from_midx(&ctx)) {
- result = 1;
goto cleanup;
}
@@ -1142,12 +1140,16 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
*/
if (!want_bitmap)
clear_midx_files_ext(object_dir, "bitmap", NULL);
+
+ result = 0;
goto cleanup;
}
}
- if (ctx.incremental && !ctx.nr)
+ if (ctx.incremental && !ctx.nr) {
+ result = 0;
goto cleanup; /* nothing to do */
+ }
if (preferred_pack_name) {
ctx.preferred_pack_idx = NO_PREFERRED_PACK;
@@ -1221,7 +1223,6 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
if (!preferred->num_objects) {
error(_("cannot select preferred pack %s with no objects"),
preferred->pack_name);
- result = 1;
goto cleanup;
}
}
@@ -1260,10 +1261,8 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
}
}
- if (missing_drops) {
- result = 1;
+ if (missing_drops)
goto cleanup;
- }
}
/*
@@ -1309,7 +1308,6 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
if (ctx.nr - dropped_packs == 0) {
error(_("no pack files to index."));
- result = 1;
goto cleanup;
}
@@ -1329,14 +1327,12 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
incr = mks_tempfile_m(midx_name.buf, 0444);
if (!incr) {
error(_("unable to create temporary MIDX layer"));
- result = -1;
goto cleanup;
}
if (adjust_shared_perm(r, get_tempfile_path(incr))) {
error(_("unable to adjust shared permissions for '%s'"),
get_tempfile_path(incr));
- result = -1;
goto cleanup;
}
@@ -1414,7 +1410,6 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
midx_hash, &pdata, commits, commits_nr,
flags) < 0) {
error(_("could not write multi-pack bitmap"));
- result = 1;
clear_packing_data(&pdata);
free(commits);
goto cleanup;
@@ -1440,21 +1435,17 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
if (!chainf) {
error_errno(_("unable to open multi-pack-index chain file"));
- result = -1;
goto cleanup;
}
- if (link_midx_to_chain(ctx.base_midx) < 0) {
- result = -1;
+ if (link_midx_to_chain(ctx.base_midx) < 0)
goto cleanup;
- }
get_split_midx_filename_ext(r->hash_algo, &final_midx_name,
object_dir, midx_hash, MIDX_EXT_MIDX);
if (rename_tempfile(&incr, final_midx_name.buf) < 0) {
error_errno(_("unable to rename new multi-pack-index layer"));
- result = -1;
goto cleanup;
}
@@ -1487,6 +1478,7 @@ static int write_midx_internal(struct repository *r, const char *object_dir,
clear_midx_files(r, object_dir, keep_hashes,
ctx.num_multi_pack_indexes_before + 1,
ctx.incremental);
+ result = 0;
cleanup:
for (size_t i = 0; i < ctx.nr; i++) {
--
gitgitgadget
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH v3 0/6] midx-write: fix segfault and do several cleanups
2025-09-05 19:26 ` [PATCH v3 0/6] midx-write: fix segfault and do several cleanups Derrick Stolee via GitGitGadget
` (5 preceding siblings ...)
2025-09-05 19:26 ` [PATCH v3 6/6] midx-write: simplify error cases Derrick Stolee via GitGitGadget
@ 2025-09-05 19:38 ` Junio C Hamano
2025-09-05 19:57 ` Derrick Stolee
6 siblings, 1 reply; 45+ messages in thread
From: Junio C Hamano @ 2025-09-05 19:38 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget
Cc: git, me, Patrick Steinhardt, Derrick Stolee
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> I was motivated to start looking closely at midx-write.c due to multiple
> users reporting Git crashes in their background maintenance, specifically
> during git multi-pack-index repack calls. I was eventually able to reproduce
> it in git multi-pack-index expire as well.
>
> Patch 1 is the only change we need to fix this bug. It includes a test case
> that will fail under --stress with SANITIZE=address. It requires creating
> many packfiles (50 was not enough, but 100 is enough). As far as I can tell,
> this bug has existed since Git 2.47.0 in October 2024, but I started hearing
> reports of this from users in July 2025 (and took a while to get a
> dump/repro).
>
> The remaining patches are cleanups based on my careful rereading of
> midx-write.c. There are some issues about error handling that needed some
> cleanup as well as a removal of the DISABLE_SIGN_COMPARE_WARNINGS macro.
>
>
> Updates in V3
> =============
>
> * Use test_grep over grep.
> * Translate an error message.
> * Clarify a commit message.
All incremental changes made sense to me. Will replace.
Shall we mark the topic ready for 'next' by now?
Thanks.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v3 0/6] midx-write: fix segfault and do several cleanups
2025-09-05 19:38 ` [PATCH v3 0/6] midx-write: fix segfault and do several cleanups Junio C Hamano
@ 2025-09-05 19:57 ` Derrick Stolee
0 siblings, 0 replies; 45+ messages in thread
From: Derrick Stolee @ 2025-09-05 19:57 UTC (permalink / raw)
To: Junio C Hamano, Derrick Stolee via GitGitGadget
Cc: git, me, Patrick Steinhardt
On 9/5/2025 3:38 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> I was motivated to start looking closely at midx-write.c due to multiple
>> users reporting Git crashes in their background maintenance, specifically
>> during git multi-pack-index repack calls. I was eventually able to reproduce
>> it in git multi-pack-index expire as well.
>>
>> Patch 1 is the only change we need to fix this bug. It includes a test case
>> that will fail under --stress with SANITIZE=address. It requires creating
>> many packfiles (50 was not enough, but 100 is enough). As far as I can tell,
>> this bug has existed since Git 2.47.0 in October 2024, but I started hearing
>> reports of this from users in July 2025 (and took a while to get a
>> dump/repro).
>>
>> The remaining patches are cleanups based on my careful rereading of
>> midx-write.c. There are some issues about error handling that needed some
>> cleanup as well as a removal of the DISABLE_SIGN_COMPARE_WARNINGS macro.
>>
>>
>> Updates in V3
>> =============
>>
>> * Use test_grep over grep.
>> * Translate an error message.
>> * Clarify a commit message.
>
> All incremental changes made sense to me. Will replace.
>
> Shall we mark the topic ready for 'next' by now?
I believe it's ready. Thanks.
-Stolee
^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2025-09-05 19:57 UTC | newest]
Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-28 17:39 [PATCH 0/5] midx-write: fix segfault and do several cleanups Derrick Stolee via GitGitGadget
2025-08-28 17:39 ` [PATCH 1/5] midx-write: only load initialized packs Derrick Stolee via GitGitGadget
2025-08-28 20:19 ` Junio C Hamano
2025-08-29 1:20 ` Taylor Blau
2025-08-30 14:33 ` Derrick Stolee
2025-08-28 17:39 ` [PATCH 2/5] midx-write: put failing response value back Derrick Stolee via GitGitGadget
2025-08-28 20:45 ` Junio C Hamano
2025-08-29 1:26 ` Taylor Blau
2025-08-28 17:39 ` [PATCH 3/5] midx-write: use cleanup when incremental midx fails Derrick Stolee via GitGitGadget
2025-08-28 20:51 ` Junio C Hamano
2025-08-29 1:29 ` Taylor Blau
2025-08-30 14:44 ` Derrick Stolee
2025-08-28 17:39 ` [PATCH 4/5] midx-write: use uint32_t for preferred_pack_idx Derrick Stolee via GitGitGadget
2025-08-28 20:58 ` Junio C Hamano
2025-08-29 1:35 ` Taylor Blau
2025-08-28 17:39 ` [PATCH 5/5] midx-write: reenable signed comparison errors Derrick Stolee via GitGitGadget
2025-08-28 21:01 ` Junio C Hamano
2025-08-29 1:35 ` Taylor Blau
2025-08-29 1:36 ` [PATCH 0/5] midx-write: fix segfault and do several cleanups Taylor Blau
2025-08-30 21:23 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
2025-08-30 21:23 ` [PATCH v2 1/6] midx-write: only load initialized packs Derrick Stolee via GitGitGadget
2025-09-03 10:14 ` Patrick Steinhardt
2025-09-05 18:58 ` Derrick Stolee
2025-08-30 21:23 ` [PATCH v2 2/6] midx-write: put failing response value back Derrick Stolee via GitGitGadget
2025-09-03 10:15 ` Patrick Steinhardt
2025-09-05 19:03 ` Derrick Stolee
2025-08-30 21:23 ` [PATCH v2 3/6] midx-write: use cleanup when incremental midx fails Derrick Stolee via GitGitGadget
2025-09-03 10:15 ` Patrick Steinhardt
2025-08-30 21:23 ` [PATCH v2 4/6] midx-write: use uint32_t for preferred_pack_idx Derrick Stolee via GitGitGadget
2025-09-03 10:15 ` Patrick Steinhardt
2025-09-05 19:05 ` Derrick Stolee
2025-08-30 21:23 ` [PATCH v2 5/6] midx-write: reenable signed comparison errors Derrick Stolee via GitGitGadget
2025-09-03 10:15 ` Patrick Steinhardt
2025-08-30 21:23 ` [PATCH v2 6/6] midx-write: simplify error cases Derrick Stolee via GitGitGadget
2025-09-03 10:15 ` Patrick Steinhardt
2025-09-03 18:43 ` Junio C Hamano
2025-09-05 19:26 ` [PATCH v3 0/6] midx-write: fix segfault and do several cleanups Derrick Stolee via GitGitGadget
2025-09-05 19:26 ` [PATCH v3 1/6] midx-write: only load initialized packs Derrick Stolee via GitGitGadget
2025-09-05 19:26 ` [PATCH v3 2/6] midx-write: put failing response value back Derrick Stolee via GitGitGadget
2025-09-05 19:26 ` [PATCH v3 3/6] midx-write: use cleanup when incremental midx fails Derrick Stolee via GitGitGadget
2025-09-05 19:26 ` [PATCH v3 4/6] midx-write: use uint32_t for preferred_pack_idx Derrick Stolee via GitGitGadget
2025-09-05 19:26 ` [PATCH v3 5/6] midx-write: reenable signed comparison errors Derrick Stolee via GitGitGadget
2025-09-05 19:26 ` [PATCH v3 6/6] midx-write: simplify error cases Derrick Stolee via GitGitGadget
2025-09-05 19:38 ` [PATCH v3 0/6] midx-write: fix segfault and do several cleanups Junio C Hamano
2025-09-05 19:57 ` Derrick Stolee
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).