* [PATCH v4 16/16] repack: allow `--write-midx=incremental` without `--geometric`
From: Taylor Blau @ 2026-05-19 15:58 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <cover.1779206239.git.me@ttaylorr.com>
Previously, `--write-midx=incremental` required `--geometric` and would
die() without it. Relax this restriction so that incremental MIDX
repacking can be used independently.
Without `--geometric`, the behavior is append-only: a single new MIDX
layer is created containing whatever packs were written by the repack
and appended to the existing chain (or a new chain is started). Existing
layers are preserved as-is with no compaction or merging.
Implement this via a new repack_make_midx_append_plan() that builds a
plan consisting of a WRITE step for the freshly written packs followed
by COPY steps for every existing MIDX layer. The existing compaction
plan (repack_make_midx_compaction_plan) is used only when `--geometric`
is active.
Update the documentation to describe the behavior with and without
`--geometric`, and replace the test that enforced the old restriction
with one exercising append-only incremental MIDX repacking.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
Documentation/git-repack.adoc | 19 +++++----
builtin/repack.c | 3 --
repack-midx.c | 64 +++++++++++++++++++++++++++--
t/t7705-repack-incremental-midx.sh | 65 +++++++++++++++++++++++++++---
4 files changed, 133 insertions(+), 18 deletions(-)
diff --git a/Documentation/git-repack.adoc b/Documentation/git-repack.adoc
index 27a99cc46f4..72c42015e23 100644
--- a/Documentation/git-repack.adoc
+++ b/Documentation/git-repack.adoc
@@ -263,14 +263,19 @@ linkgit:git-multi-pack-index[1]).
`incremental`;;
Write an incremental MIDX chain instead of a single
- flat MIDX. This mode requires `--geometric`.
+ flat MIDX.
+
-The incremental mode maintains a chain of MIDX layers that is compacted
-over time using a geometric merging strategy. Each repack creates a new
-tip layer containing the newly written pack(s). Adjacent layers are then
-merged whenever the newer layer's object count exceeds
-`1/repack.midxSplitFactor` of the next deeper layer's count. Layers
-that do not meet this condition are retained as-is.
+Without `--geometric`, a new MIDX layer is appended to the existing
+chain (or a new chain is started) containing whatever packs were written
+by the repack. Existing layers are preserved as-is.
++
+When combined with `--geometric`, the incremental mode maintains a chain
+of MIDX layers that is compacted over time using a geometric merging
+strategy. Each repack creates a new tip layer containing the newly
+written pack(s). Adjacent layers are then merged whenever the newer
+layer's object count exceeds `1/repack.midxSplitFactor` of the next
+deeper layer's count. Layers that do not meet this condition are
+retained as-is.
+
The result is that newer (tip) layers tend to contain many small packs
with relatively few objects, while older (deeper) layers contain fewer,
diff --git a/builtin/repack.c b/builtin/repack.c
index 5ffa18e085e..1524a9c13ad 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -263,9 +263,6 @@ int cmd_repack(int argc,
if (pack_everything & PACK_CRUFT)
pack_everything |= ALL_INTO_ONE;
- if (write_midx == REPACK_WRITE_MIDX_INCREMENTAL && !geometry.split_factor)
- die(_("--write-midx=incremental requires --geometric"));
-
if (write_bitmaps < 0) {
if (write_midx == REPACK_WRITE_MIDX_NONE &&
(!(pack_everything & ALL_INTO_ONE) || !is_bare_repository()))
diff --git a/repack-midx.c b/repack-midx.c
index 4f5deeb97bf..b6b1de71805 100644
--- a/repack-midx.c
+++ b/repack-midx.c
@@ -548,6 +548,60 @@ static void midx_compaction_step_release(struct midx_compaction_step *step)
free(step->csum);
}
+/*
+ * Build an append-only MIDX plan: a single WRITE step for the freshly
+ * written packs, plus COPY steps for every existing layer. No
+ * compaction or merging is performed.
+ */
+static void repack_make_midx_append_plan(struct repack_write_midx_opts *opts,
+ struct midx_compaction_step **steps_p,
+ size_t *steps_nr_p)
+{
+ struct multi_pack_index *m;
+ struct midx_compaction_step *steps = NULL;
+ struct midx_compaction_step *step;
+ size_t steps_nr = 0, steps_alloc = 0;
+
+ odb_reprepare(opts->existing->repo->objects);
+ m = get_multi_pack_index(opts->existing->source);
+
+ if (opts->names->nr) {
+ struct strbuf buf = STRBUF_INIT;
+ uint32_t i;
+
+ ALLOC_GROW(steps, st_add(steps_nr, 1), steps_alloc);
+
+ step = &steps[steps_nr++];
+ memset(step, 0, sizeof(*step));
+
+ step->type = MIDX_COMPACTION_STEP_WRITE;
+ string_list_init_dup(&step->u.write);
+
+ for (i = 0; i < opts->names->nr; i++) {
+ strbuf_reset(&buf);
+ strbuf_addf(&buf, "pack-%s.idx",
+ opts->names->items[i].string);
+ string_list_append(&step->u.write, buf.buf);
+ }
+
+ strbuf_release(&buf);
+ }
+
+ for (; m; m = m->base_midx) {
+ ALLOC_GROW(steps, st_add(steps_nr, 1), steps_alloc);
+
+ step = &steps[steps_nr++];
+ memset(step, 0, sizeof(*step));
+
+ step->type = MIDX_COMPACTION_STEP_COPY;
+ step->u.copy = m;
+ step->objects_nr = m->num_objects;
+ }
+
+ *steps_p = steps;
+ *steps_nr_p = steps_nr;
+}
+
static int repack_make_midx_compaction_plan(struct repack_write_midx_opts *opts,
struct midx_compaction_step **steps_p,
size_t *steps_nr_p)
@@ -904,9 +958,13 @@ static int write_midx_incremental(struct repack_write_midx_opts *opts)
goto done;
}
- if (repack_make_midx_compaction_plan(opts, &steps, &steps_nr) < 0) {
- ret = error(_("unable to generate compaction plan"));
- goto done;
+ if (opts->geometry->split_factor) {
+ if (repack_make_midx_compaction_plan(opts, &steps, &steps_nr) < 0) {
+ ret = error(_("unable to generate compaction plan"));
+ goto done;
+ }
+ } else {
+ repack_make_midx_append_plan(opts, &steps, &steps_nr);
}
for (i = 0; i < steps_nr; i++) {
diff --git a/t/t7705-repack-incremental-midx.sh b/t/t7705-repack-incremental-midx.sh
index 9e317ff6e8f..25a8c40e8ee 100755
--- a/t/t7705-repack-incremental-midx.sh
+++ b/t/t7705-repack-incremental-midx.sh
@@ -63,10 +63,36 @@ create_layers () {
done
}
-test_expect_success '--write-midx=incremental requires --geometric' '
- test_must_fail git repack --write-midx=incremental 2>err &&
+test_expect_success '--write-midx=incremental without --geometric' '
+ git init incremental-without-geometric &&
+ (
+ cd incremental-without-geometric &&
- test_grep -- "--write-midx=incremental requires --geometric" err
+ git config maintenance.auto false &&
+
+ test_commit first &&
+ git repack -d &&
+
+ test_commit second &&
+ git repack --write-midx=incremental &&
+
+ git multi-pack-index verify &&
+ test_line_count = 1 $midx_chain &&
+ cp $midx_chain $midx_chain.before &&
+
+ # A second repack appends a new layer without
+ # disturbing the existing one.
+ test_commit third &&
+ git repack --write-midx=incremental &&
+
+ git multi-pack-index verify &&
+ test_line_count = 2 $midx_chain &&
+ head -n 1 $midx_chain.before >expect &&
+ head -n 1 $midx_chain >actual &&
+ test_cmp expect actual &&
+
+ git fsck
+ )
'
test_expect_success 'below layer threshold, tip packs excluded' '
@@ -334,8 +360,7 @@ test_expect_success 'kept packs are excluded from repack' '
# entirely, so no rollup occurs as there is only one
# non-kept pack. A new MIDX layer is written containing
# that pack.
- git repack --geometric=2 -d --write-midx=incremental \
- --write-bitmap-index &&
+ git repack --geometric=2 -d --write-midx=incremental &&
test-tool read-midx $objdir >actual &&
grep "^pack-.*\.idx$" actual >actual.packs &&
@@ -433,6 +458,36 @@ test_expect_success 'repack -ad removes stale incremental chain' '
)
'
+test_expect_success 'repack -ad --write-midx=incremental is safe' '
+ git init ad-incremental-midx &&
+ (
+ cd ad-incremental-midx &&
+
+ git config maintenance.auto false &&
+
+ # Build a MIDX chain with multiple layers referencing
+ # distinct packs.
+ test_commit first &&
+ git repack -d &&
+
+ test_commit second &&
+ git repack -d --write-midx=incremental &&
+
+ git multi-pack-index verify &&
+ test_line_count = 1 $midx_chain &&
+
+ # Now do a full -ad repack. The new pack contains all
+ # objects, but any retained MIDX layers still reference
+ # the now-deleted packs.
+ test_commit third &&
+ git repack -ad --write-midx=incremental &&
+
+ git multi-pack-index verify &&
+ git fsck &&
+ git rev-list --all --objects >/dev/null
+ )
+'
+
test_expect_success 'repack rejects invalid midxSplitFactor' '
test_when_finished "rm -fr bad-split-factor" &&
git init bad-split-factor &&
--
2.54.0.175.g8bd0ec98dc3
^ permalink raw reply related
* [PATCH v4 15/16] repack: introduce `--write-midx=incremental`
From: Taylor Blau @ 2026-05-19 15:58 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <cover.1779206239.git.me@ttaylorr.com>
Expose the incremental MIDX repacking mode (implemented in an earlier
commit) via a new --write-midx=incremental option for `git repack`.
Add "incremental" as a recognized argument to the --write-midx
OPT_CALLBACK, mapping it to REPACK_WRITE_MIDX_INCREMENTAL. When this
mode is active and --geometric is in use, set the midx_layer_threshold
on the pack geometry so that only packs in sufficiently large tip layers
are considered for repacking.
Two new configuration options control the compaction behavior:
- repack.midxSplitFactor (default: 2): the factor used in the
geometric merging condition for MIDX layers.
- repack.midxNewLayerThreshold (default: 8): the minimum number of
packs in the tip MIDX layer before its packs are considered as
candidates for geometric repacking.
Add tests exercising the new mode across a variety of scenarios
including basic geometric violations, multi-round chain integrity,
branching and merging histories, cross-layer object uniqueness, and
threshold-based compaction.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
Documentation/config/repack.adoc | 18 ++
Documentation/git-repack.adoc | 39 ++-
builtin/repack.c | 49 ++-
midx.c | 29 ++
midx.h | 3 +
repack-geometry.c | 13 +-
repack-midx.c | 5 +
repack.c | 56 +++-
repack.h | 10 +-
t/meson.build | 1 +
t/t7705-repack-incremental-midx.sh | 470 +++++++++++++++++++++++++++++
11 files changed, 669 insertions(+), 24 deletions(-)
create mode 100755 t/t7705-repack-incremental-midx.sh
diff --git a/Documentation/config/repack.adoc b/Documentation/config/repack.adoc
index e9e78dcb198..4c22a499f62 100644
--- a/Documentation/config/repack.adoc
+++ b/Documentation/config/repack.adoc
@@ -46,3 +46,21 @@ repack.midxMustContainCruft::
`--write-midx`. When false, cruft packs are only included in the MIDX
when necessary (e.g., because they might be required to form a
reachability closure with MIDX bitmaps). Defaults to true.
+
+repack.midxSplitFactor::
+ The factor used in the geometric merging condition when
+ compacting incremental MIDX layers during `git repack` when
+ invoked with the `--write-midx=incremental` option.
++
+Adjacent layers are merged when the accumulated object count of the
+newer layer exceeds `1/<N>` of the object count of the next deeper
+layer. Must be at least 2. Defaults to 2.
+
+repack.midxNewLayerThreshold::
+ The minimum number of packs in the tip MIDX layer before those
+ packs are considered as candidates for geometric repacking
+ during `git repack --write-midx=incremental`.
++
+When the tip layer has fewer packs than this threshold, those packs are
+excluded from the geometric repack entirely, and are thus left
+unmodified. Must be at least 1. Defaults to 8.
diff --git a/Documentation/git-repack.adoc b/Documentation/git-repack.adoc
index 673ce910837..27a99cc46f4 100644
--- a/Documentation/git-repack.adoc
+++ b/Documentation/git-repack.adoc
@@ -11,7 +11,7 @@ SYNOPSIS
[verse]
'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m]
[--window=<n>] [--depth=<n>] [--threads=<n>] [--keep-pack=<pack-name>]
- [--write-midx] [--name-hash-version=<n>] [--path-walk]
+ [--write-midx[=<mode>]] [--name-hash-version=<n>] [--path-walk]
DESCRIPTION
-----------
@@ -250,9 +250,42 @@ pack as the preferred pack for object selection by the MIDX (see
linkgit:git-multi-pack-index[1]).
-m::
---write-midx::
+--write-midx[=<mode>]::
Write a multi-pack index (see linkgit:git-multi-pack-index[1])
- containing the non-redundant packs.
+ containing the non-redundant packs. The following modes are
+ available:
++
+--
+ `default`;;
+ Write a single MIDX covering all packs. This is the
+ default when `--write-midx` is given without an
+ explicit mode.
+
+ `incremental`;;
+ Write an incremental MIDX chain instead of a single
+ flat MIDX. This mode requires `--geometric`.
++
+The incremental mode maintains a chain of MIDX layers that is compacted
+over time using a geometric merging strategy. Each repack creates a new
+tip layer containing the newly written pack(s). Adjacent layers are then
+merged whenever the newer layer's object count exceeds
+`1/repack.midxSplitFactor` of the next deeper layer's count. Layers
+that do not meet this condition are retained as-is.
++
+The result is that newer (tip) layers tend to contain many small packs
+with relatively few objects, while older (deeper) layers contain fewer,
+larger packs covering more objects. Because compaction is driven by the
+tip of the chain, newer layers are also rewritten more frequently than
+older ones, which are only touched when enough objects have accumulated
+to justify merging into them. This keeps the total number of layers
+logarithmic relative to the total number of objects.
++
+Only packs in the tip MIDX layer are considered as candidates for the
+geometric repack; packs in deeper layers are left untouched. If the tip
+layer contains fewer packs than `repack.midxNewLayerThreshold`, those
+packs are excluded from the geometry entirely, and a new layer is
+created for any new pack(s) without disturbing the existing chain.
+--
--name-hash-version=<n>::
Provide this argument to the underlying `git pack-objects` process.
diff --git a/builtin/repack.c b/builtin/repack.c
index 75c57736780..5ffa18e085e 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -33,7 +33,7 @@ static int midx_must_contain_cruft = 1;
static const char *const git_repack_usage[] = {
N_("git repack [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m]\n"
"[--window=<n>] [--depth=<n>] [--threads=<n>] [--keep-pack=<pack-name>]\n"
- "[--write-midx] [--name-hash-version=<n>] [--path-walk]"),
+ "[--write-midx[=<mode>]] [--name-hash-version=<n>] [--path-walk]"),
NULL
};
@@ -48,6 +48,8 @@ static const char incremental_bitmap_conflict_error[] = N_(
struct repack_config_ctx {
struct pack_objects_args *po_args;
struct pack_objects_args *cruft_po_args;
+ int midx_split_factor;
+ int midx_new_layer_threshold;
};
static int repack_config(const char *var, const char *value,
@@ -97,6 +99,16 @@ static int repack_config(const char *var, const char *value,
midx_must_contain_cruft = git_config_bool(var, value);
return 0;
}
+ if (!strcmp(var, "repack.midxsplitfactor")) {
+ repack_ctx->midx_split_factor = git_config_int(var, value,
+ ctx->kvi);
+ return 0;
+ }
+ if (!strcmp(var, "repack.midxnewlayerthreshold")) {
+ repack_ctx->midx_new_layer_threshold = git_config_int(var, value,
+ ctx->kvi);
+ return 0;
+ }
return git_default_config(var, value, ctx, cb);
}
@@ -112,6 +124,8 @@ static int option_parse_write_midx(const struct option *opt, const char *arg,
if (!arg || !*arg)
*cfg = REPACK_WRITE_MIDX_DEFAULT;
+ else if (!strcmp(arg, "incremental"))
+ *cfg = REPACK_WRITE_MIDX_INCREMENTAL;
else
return error(_("unknown value for %s: %s"), opt->long_name, arg);
@@ -226,6 +240,8 @@ int cmd_repack(int argc,
memset(&config_ctx, 0, sizeof(config_ctx));
config_ctx.po_args = &po_args;
config_ctx.cruft_po_args = &cruft_po_args;
+ config_ctx.midx_split_factor = DEFAULT_MIDX_SPLIT_FACTOR;
+ config_ctx.midx_new_layer_threshold = DEFAULT_MIDX_NEW_LAYER_THRESHOLD;
repo_config(repo, repack_config, &config_ctx);
@@ -247,6 +263,9 @@ int cmd_repack(int argc,
if (pack_everything & PACK_CRUFT)
pack_everything |= ALL_INTO_ONE;
+ if (write_midx == REPACK_WRITE_MIDX_INCREMENTAL && !geometry.split_factor)
+ die(_("--write-midx=incremental requires --geometric"));
+
if (write_bitmaps < 0) {
if (write_midx == REPACK_WRITE_MIDX_NONE &&
(!(pack_everything & ALL_INTO_ONE) || !is_bare_repository()))
@@ -273,6 +292,13 @@ int cmd_repack(int argc,
write_bitmaps = 0;
}
+ if (config_ctx.midx_split_factor < 2)
+ die(_("invalid value for %s: %d"), "--midx-split-factor",
+ config_ctx.midx_split_factor);
+ if (config_ctx.midx_new_layer_threshold < 1)
+ die(_("invalid value for %s: %d"), "--midx-new-layer-threshold",
+ config_ctx.midx_new_layer_threshold);
+
if (write_midx != REPACK_WRITE_MIDX_NONE && write_bitmaps) {
struct strbuf path = STRBUF_INIT;
@@ -296,6 +322,10 @@ int cmd_repack(int argc,
if (geometry.split_factor) {
if (pack_everything)
die(_("options '%s' and '%s' cannot be used together"), "--geometric", "-A/-a");
+ if (write_midx == REPACK_WRITE_MIDX_INCREMENTAL) {
+ geometry.midx_layer_threshold = config_ctx.midx_new_layer_threshold;
+ geometry.midx_layer_threshold_set = true;
+ }
pack_geometry_init(&geometry, &existing, &po_args);
pack_geometry_split(&geometry);
}
@@ -545,8 +575,11 @@ int cmd_repack(int argc,
packtmp);
/* End of pack replacement. */
- if (delete_redundant && pack_everything & ALL_INTO_ONE)
+ if (delete_redundant && pack_everything & ALL_INTO_ONE) {
+ if (write_midx == REPACK_WRITE_MIDX_INCREMENTAL)
+ existing_packs_retain_midx_packs(&existing);
existing_packs_mark_for_deletion(&existing, &names);
+ }
if (write_midx != REPACK_WRITE_MIDX_NONE) {
struct repack_write_midx_opts opts = {
@@ -558,8 +591,8 @@ int cmd_repack(int argc,
.show_progress = show_progress,
.write_bitmaps = write_bitmaps > 0,
.midx_must_contain_cruft = midx_must_contain_cruft,
- .midx_split_factor = DEFAULT_MIDX_SPLIT_FACTOR,
- .midx_new_layer_threshold = DEFAULT_MIDX_NEW_LAYER_THRESHOLD,
+ .midx_split_factor = config_ctx.midx_split_factor,
+ .midx_new_layer_threshold = config_ctx.midx_new_layer_threshold,
.mode = write_midx,
};
@@ -572,11 +605,15 @@ int cmd_repack(int argc,
if (delete_redundant) {
int opts = 0;
- existing_packs_remove_redundant(&existing, packdir);
+ bool wrote_incremental_midx = write_midx == REPACK_WRITE_MIDX_INCREMENTAL;
+
+ existing_packs_remove_redundant(&existing, packdir,
+ wrote_incremental_midx);
if (geometry.split_factor)
pack_geometry_remove_redundant(&geometry, &names,
- &existing, packdir);
+ &existing, packdir,
+ wrote_incremental_midx);
if (show_progress)
opts |= PRUNE_PACKED_VERBOSE;
prune_packed_objects(opts);
diff --git a/midx.c b/midx.c
index dc86c8e7fee..efbfbb13f41 100644
--- a/midx.c
+++ b/midx.c
@@ -850,6 +850,35 @@ void clear_midx_file(struct repository *r)
strbuf_release(&midx);
}
+void clear_incremental_midx_files(struct repository *r,
+ const struct strvec *keep_hashes)
+{
+ struct odb_source *source = r->objects->sources;
+ struct strbuf chain = STRBUF_INIT;
+
+ get_midx_chain_filename(source, &chain);
+
+ for (; source; source = source->next) {
+ struct odb_source_files *files = odb_source_files_downcast(source);
+ if (files->packed->midx)
+ close_midx(files->packed->midx);
+ files->packed->midx = NULL;
+ }
+
+ if (!keep_hashes && remove_path(chain.buf))
+ die(_("failed to clear multi-pack-index chain at %s"),
+ chain.buf);
+
+ clear_incremental_midx_files_ext(r->objects->sources, MIDX_EXT_BITMAP,
+ keep_hashes);
+ clear_incremental_midx_files_ext(r->objects->sources, MIDX_EXT_REV,
+ keep_hashes);
+ clear_incremental_midx_files_ext(r->objects->sources, MIDX_EXT_MIDX,
+ keep_hashes);
+
+ strbuf_release(&chain);
+}
+
static int verify_midx_error;
__attribute__((format (printf, 1, 2)))
diff --git a/midx.h b/midx.h
index 3ee12dd08ec..63853a03a47 100644
--- a/midx.h
+++ b/midx.h
@@ -9,6 +9,7 @@ struct repository;
struct bitmapped_pack;
struct git_hash_algo;
struct odb_source;
+struct strvec;
#define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
#define MIDX_VERSION_V1 1
@@ -143,6 +144,8 @@ int write_midx_file_compact(struct odb_source *source,
const char *incremental_base,
unsigned flags);
void clear_midx_file(struct repository *r);
+void clear_incremental_midx_files(struct repository *r,
+ const struct strvec *keep_hashes);
int verify_midx_file(struct odb_source *source, unsigned flags);
int expire_midx_packs(struct odb_source *source, unsigned flags);
int midx_repack(struct odb_source *source, size_t batch_size, unsigned flags);
diff --git a/repack-geometry.c b/repack-geometry.c
index 2408b8a3cc2..2064683dcfe 100644
--- a/repack-geometry.c
+++ b/repack-geometry.c
@@ -249,7 +249,8 @@ static void remove_redundant_packs(struct packed_git **pack,
uint32_t pack_nr,
struct string_list *names,
struct existing_packs *existing,
- const char *packdir)
+ const char *packdir,
+ bool wrote_incremental_midx)
{
const struct git_hash_algo *algop = existing->repo->hash_algo;
struct strbuf buf = STRBUF_INIT;
@@ -269,7 +270,8 @@ static void remove_redundant_packs(struct packed_git **pack,
(string_list_has_string(&existing->kept_packs, buf.buf)))
continue;
- repack_remove_redundant_pack(existing->repo, packdir, buf.buf);
+ repack_remove_redundant_pack(existing->repo, packdir, buf.buf,
+ wrote_incremental_midx);
}
strbuf_release(&buf);
@@ -278,12 +280,13 @@ static void remove_redundant_packs(struct packed_git **pack,
void pack_geometry_remove_redundant(struct pack_geometry *geometry,
struct string_list *names,
struct existing_packs *existing,
- const char *packdir)
+ const char *packdir,
+ bool wrote_incremental_midx)
{
remove_redundant_packs(geometry->pack, geometry->split,
- names, existing, packdir);
+ names, existing, packdir, wrote_incremental_midx);
remove_redundant_packs(geometry->promisor_pack, geometry->promisor_split,
- names, existing, packdir);
+ names, existing, packdir, wrote_incremental_midx);
}
void pack_geometry_release(struct pack_geometry *geometry)
diff --git a/repack-midx.c b/repack-midx.c
index f97331fb1b7..4f5deeb97bf 100644
--- a/repack-midx.c
+++ b/repack-midx.c
@@ -887,6 +887,7 @@ static int write_midx_incremental(struct repack_write_midx_opts *opts)
struct midx_compaction_step *steps = NULL;
struct strbuf lock_name = STRBUF_INIT;
struct lock_file lf;
+ struct strvec keep_hashes = STRVEC_INIT;
size_t steps_nr = 0;
size_t i;
int ret = 0;
@@ -932,11 +933,15 @@ static int write_midx_incremental(struct repack_write_midx_opts *opts)
BUG("missing result for compaction step %"PRIuMAX,
(uintmax_t)i);
fprintf(get_lock_file_fp(&lf), "%s\n", step->csum);
+ strvec_push(&keep_hashes, step->csum);
}
commit_lock_file(&lf);
+ clear_incremental_midx_files(opts->existing->repo, &keep_hashes);
+
done:
+ strvec_clear(&keep_hashes);
strbuf_release(&lock_name);
for (i = 0; i < steps_nr; i++)
midx_compaction_step_release(&steps[i]);
diff --git a/repack.c b/repack.c
index 2ee6b51420a..571dabb665e 100644
--- a/repack.c
+++ b/repack.c
@@ -55,14 +55,18 @@ void pack_objects_args_release(struct pack_objects_args *args)
}
void repack_remove_redundant_pack(struct repository *repo, const char *dir_name,
- const char *base_name)
+ const char *base_name,
+ bool wrote_incremental_midx)
{
struct strbuf buf = STRBUF_INIT;
struct odb_source *source = repo->objects->sources;
struct multi_pack_index *m = get_multi_pack_index(source);
strbuf_addf(&buf, "%s.pack", base_name);
- if (m && source->local && midx_contains_pack(m, buf.buf))
+ if (m && source->local && midx_contains_pack(m, buf.buf)) {
clear_midx_file(repo);
+ if (!wrote_incremental_midx)
+ clear_incremental_midx_files(repo, NULL);
+ }
strbuf_insertf(&buf, 0, "%s/", dir_name);
unlink_pack_path(buf.buf, 1);
strbuf_release(&buf);
@@ -250,25 +254,63 @@ void existing_packs_mark_for_deletion(struct existing_packs *existing,
&existing->cruft_packs);
}
+/*
+ * Mark every pack that is referenced by the existing MIDX chain as
+ * retained, so that a subsequent call to
+ * existing_packs_mark_for_deletion() will not mark them for deletion.
+ *
+ * This is used when writing an incremental MIDX layer on top of an
+ * existing chain: retained layers continue to reference the same
+ * packs on disk, so those packs must not be unlinked even if the
+ * freshly-written pack supersedes them.
+ */
+void existing_packs_retain_midx_packs(struct existing_packs *existing)
+{
+ struct string_list_item *item;
+ struct strbuf buf = STRBUF_INIT;
+
+ for_each_string_list_item(item, &existing->midx_packs) {
+ struct string_list_item *found;
+
+ strbuf_reset(&buf);
+ strbuf_addstr(&buf, item->string);
+ strbuf_strip_suffix(&buf, ".pack");
+ strbuf_strip_suffix(&buf, ".idx");
+
+ found = string_list_lookup(&existing->non_kept_packs, buf.buf);
+ if (found)
+ existing_packs_mark_retained(found);
+
+ found = string_list_lookup(&existing->cruft_packs, buf.buf);
+ if (found)
+ existing_packs_mark_retained(found);
+ }
+
+ strbuf_release(&buf);
+}
+
static void remove_redundant_packs_1(struct repository *repo,
struct string_list *packs,
- const char *packdir)
+ const char *packdir,
+ bool wrote_incremental_midx)
{
struct string_list_item *item;
for_each_string_list_item(item, packs) {
if (!existing_pack_is_marked_for_deletion(item))
continue;
- repack_remove_redundant_pack(repo, packdir, item->string);
+ repack_remove_redundant_pack(repo, packdir, item->string,
+ wrote_incremental_midx);
}
}
void existing_packs_remove_redundant(struct existing_packs *existing,
- const char *packdir)
+ const char *packdir,
+ bool wrote_incremental_midx)
{
remove_redundant_packs_1(existing->repo, &existing->non_kept_packs,
- packdir);
+ packdir, wrote_incremental_midx);
remove_redundant_packs_1(existing->repo, &existing->cruft_packs,
- packdir);
+ packdir, wrote_incremental_midx);
}
void existing_packs_release(struct existing_packs *existing)
diff --git a/repack.h b/repack.h
index 831ccfb1c6c..f9fbc895f02 100644
--- a/repack.h
+++ b/repack.h
@@ -34,7 +34,8 @@ void prepare_pack_objects(struct child_process *cmd,
void pack_objects_args_release(struct pack_objects_args *args);
void repack_remove_redundant_pack(struct repository *repo, const char *dir_name,
- const char *base_name);
+ const char *base_name,
+ bool wrote_incremental_midx);
struct write_pack_opts {
struct pack_objects_args *po_args;
@@ -83,8 +84,10 @@ void existing_packs_retain_cruft(struct existing_packs *existing,
struct packed_git *cruft);
void existing_packs_mark_for_deletion(struct existing_packs *existing,
struct string_list *names);
+void existing_packs_retain_midx_packs(struct existing_packs *existing);
void existing_packs_remove_redundant(struct existing_packs *existing,
- const char *packdir);
+ const char *packdir,
+ bool wrote_incremental_midx);
void existing_packs_release(struct existing_packs *existing);
struct generated_pack;
@@ -129,7 +132,8 @@ struct packed_git *pack_geometry_preferred_pack(struct pack_geometry *geometry);
void pack_geometry_remove_redundant(struct pack_geometry *geometry,
struct string_list *names,
struct existing_packs *existing,
- const char *packdir);
+ const char *packdir,
+ bool wrote_incremental_midx);
void pack_geometry_release(struct pack_geometry *geometry);
struct tempfile;
diff --git a/t/meson.build b/t/meson.build
index 7528e5cda5f..25f0d823d8e 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -951,6 +951,7 @@ integration_tests = [
't7702-repack-cyclic-alternate.sh',
't7703-repack-geometric.sh',
't7704-repack-cruft.sh',
+ 't7705-repack-incremental-midx.sh',
't7800-difftool.sh',
't7810-grep.sh',
't7811-grep-open.sh',
diff --git a/t/t7705-repack-incremental-midx.sh b/t/t7705-repack-incremental-midx.sh
new file mode 100755
index 00000000000..9e317ff6e8f
--- /dev/null
+++ b/t/t7705-repack-incremental-midx.sh
@@ -0,0 +1,470 @@
+#!/bin/sh
+
+test_description='git repack --write-midx=incremental'
+
+. ./test-lib.sh
+
+GIT_TEST_MULTI_PACK_INDEX=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=0
+
+objdir=.git/objects
+packdir=$objdir/pack
+midxdir=$packdir/multi-pack-index.d
+midx_chain=$midxdir/multi-pack-index-chain
+
+# incrementally_repack N
+#
+# Make "N" new commits, each stored in their own pack, and then repacked
+# with the --write-midx=incremental strategy.
+incrementally_repack () {
+ for i in $(test_seq 1 "$1")
+ do
+ test_commit "$i" &&
+
+ git repack --geometric=2 -d --write-midx=incremental \
+ --write-bitmap-index &&
+ git multi-pack-index verify || return 1
+ done
+}
+
+# Create packs with geometrically increasing sizes so that they
+# satisfy the geometric progression and survive a --geometric=2
+# repack without being rolled up. Creates 3 packs containing 1,
+# 2, and 6 commits (3, 6, and 18 objects) respectively.
+create_geometric_packs () {
+ test_commit "small" &&
+ git repack -d &&
+
+ test_commit_bulk --message="medium" 2 &&
+ test_commit_bulk --message="large" 6 &&
+
+ git repack --geometric=2 -d --write-midx=incremental \
+ --write-bitmap-index
+}
+
+# create_layer <test_commit_bulk args>
+#
+# Creates a new MIDX layer with the contents of "test_commit_bulk $@".
+create_layer () {
+ test_commit_bulk "$@" &&
+
+ git multi-pack-index write --incremental --bitmap
+}
+
+# create_layers
+#
+# Reads lines of "<message> <nr>" from stdin and creates a new MIDX
+# layer for each line. See create_layer above for more.
+create_layers () {
+ while read msg nr
+ do
+ create_layer --message="$msg" "$nr" || return 1
+ done
+}
+
+test_expect_success '--write-midx=incremental requires --geometric' '
+ test_must_fail git repack --write-midx=incremental 2>err &&
+
+ test_grep -- "--write-midx=incremental requires --geometric" err
+'
+
+test_expect_success 'below layer threshold, tip packs excluded' '
+ git init below-layer-threshold-tip-packs-excluded &&
+ (
+ cd below-layer-threshold-tip-packs-excluded &&
+
+ git config maintenance.auto false &&
+ git config repack.midxnewlayerthreshold 4 &&
+ git config repack.midxsplitfactor 2 &&
+
+ # Create 3 packs forming a geometric progression by
+ # object count such that they are unmodified by the
+ # initial repack. The MIDX chain thusly contains a
+ # single layer with three packs.
+ create_geometric_packs &&
+ ls $packdir/pack-*.idx | sort >packs.before &&
+ test_line_count = 1 $midx_chain &&
+ cp $midx_chain $midx_chain.before &&
+
+ # Repack a new commit. Since the layer threshold is
+ # unmet, a new MIDX layer is added on top of the
+ # existing one.
+ test_commit extra &&
+ git repack --geometric=2 -d --write-midx=incremental \
+ --write-bitmap-index &&
+ git multi-pack-index verify &&
+
+ ls $packdir/pack-*.idx | sort >packs.after &&
+ comm -13 packs.before packs.after >packs.new &&
+ test_line_count = 1 packs.new &&
+
+ test_line_count = 2 "$midx_chain" &&
+ head -n 1 "$midx_chain.before" >expect &&
+ head -n 1 "$midx_chain" >actual &&
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'above layer threshold, tip packs repacked' '
+ git init above-layer-threshold-tip-packs-repacked &&
+ (
+ cd above-layer-threshold-tip-packs-repacked &&
+
+ git config maintenance.auto false &&
+ git config repack.midxnewlayerthreshold 2 &&
+ git config repack.midxsplitfactor 2 &&
+
+ # Same setup, but with the layer threshold set to 2.
+ # Since the tip MIDX layer meets that threshold, its
+ # packs are considered repack candidates.
+ create_geometric_packs &&
+ cp $midx_chain $midx_chain.before &&
+
+ # Perturb the existing progression such that it is
+ # rolled up into a single new pack, invalidating the
+ # existing MIDX layer and replacing it with a new one.
+ test_commit extra &&
+ git repack -d &&
+ git repack --geometric=2 -d --write-midx=incremental \
+ --write-bitmap-index &&
+
+ ! test_cmp $midx_chain.before $midx_chain &&
+ test_line_count = 1 $midx_chain &&
+
+ git multi-pack-index verify
+ )
+'
+
+test_expect_success 'above layer threshold, tip layer preserved' '
+ git init above-layer-threshold-tip-layer-preserved &&
+ (
+ cd above-layer-threshold-tip-layer-preserved &&
+
+ git config maintenance.auto false &&
+ git config repack.midxnewlayerthreshold 2 &&
+ git config repack.midxsplitfactor 2 &&
+
+ test_commit_bulk --message="medium" 2 &&
+ test_commit_bulk --message="large" 6 &&
+
+ git repack --geometric=2 -d --write-midx=incremental \
+ --write-bitmap-index &&
+
+ test_line_count = 1 "$midx_chain" &&
+ ls $packdir/pack-*.idx | sort >packs.before &&
+ cp $midx_chain $midx_chain.before &&
+
+ # Create objects to form a pack satisfying the geometric
+ # progression (thus preserving the tip layer), but not
+ # so large that it meets the layer merging condition.
+ test_commit_bulk --message="small" 1 &&
+ git repack --geometric=2 -d --write-midx=incremental \
+ --write-bitmap-index &&
+
+ ls $packdir/pack-*.idx | sort >packs.after &&
+ comm -13 packs.before packs.after >packs.new &&
+
+ test_line_count = 1 packs.new &&
+ test_line_count = 3 packs.after &&
+ test_line_count = 2 "$midx_chain" &&
+ head -n 1 "$midx_chain.before" >expect &&
+ head -n 1 "$midx_chain" >actual &&
+ test_cmp expect actual &&
+
+ git multi-pack-index verify
+ )
+'
+
+test_expect_success 'above layer threshold, tip packs preserved' '
+ git init above-layer-threshold-tip-packs-preserved &&
+ (
+ cd above-layer-threshold-tip-packs-preserved &&
+
+ git config maintenance.auto false &&
+ git config repack.midxnewlayerthreshold 2 &&
+ git config repack.midxsplitfactor 2 &&
+
+ create_geometric_packs &&
+ ls $packdir/pack-*.idx | sort >packs.before &&
+ cp $midx_chain $midx_chain.before &&
+
+ # Same setup as above, but this time the new objects do
+ # not satisfy the new layer merging condition, resulting
+ # in a new tip layer.
+ test_commit_bulk --message="huge" 18 &&
+ git repack --geometric=2 -d --write-midx=incremental \
+ --write-bitmap-index &&
+
+ ls $packdir/pack-*.idx | sort >packs.after &&
+ comm -13 packs.before packs.after >packs.new &&
+
+ ! test_cmp $midx_chain.before $midx_chain &&
+ test_line_count = 1 $midx_chain &&
+ test_line_count = 1 packs.new &&
+
+ git multi-pack-index verify
+ )
+'
+
+test_expect_success 'new tip absorbs multiple layers' '
+ git init new-tip-absorbs-multiple-layers &&
+ (
+ cd new-tip-absorbs-multiple-layers &&
+
+ git config maintenance.auto false &&
+ git config repack.midxnewlayerthreshold 1 &&
+ git config repack.midxsplitfactor 2 &&
+
+ # Build a 4-layer chain where each layer is too small to
+ # absorb the one below it. The sizes must satisfy L(n) <
+ # L(n-1)/2 for each adjacent pair:
+ #
+ # L0 (oldest): 75 obj (25 commits)
+ # L1: 21 obj (7 commits, 21 < 75/2)
+ # L2: 9 obj (3 commits, 9 < 21/2)
+ # L3 (tip): 3 obj (1 commit, 3 < 9/2)
+ create_layers <<-\EOF &&
+ L0 25
+ L1 7
+ L2 3
+ L3 1
+ EOF
+
+ test_line_count = 4 "$midx_chain" &&
+ cp $midx_chain $midx_chain.before &&
+
+ # Now add a new commit. The merging condition is
+ # satisfied between L3-L1, but violated at L0, which is
+ # too large relative to the accumulated size.
+ #
+ # As a result, the chain shrinks from 4 to 2 layers.
+ test_commit new &&
+ git repack --geometric=2 -d --write-midx=incremental \
+ --write-bitmap-index &&
+
+ ! test_cmp $midx_chain.before $midx_chain &&
+ test_line_count = 2 "$midx_chain" &&
+ git multi-pack-index verify
+ )
+'
+
+test_expect_success 'compaction of older layers' '
+ git init compaction-of-older-layers &&
+ (
+ cd compaction-of-older-layers &&
+
+ git config maintenance.auto false &&
+ git config repack.midxnewlayerthreshold 1 &&
+ git config repack.midxsplitfactor 2 &&
+
+ # Build a chain with two small layers at the bottom
+ # and a larger barrier layer on top, producing a
+ # chain that violates the compaction invariant, since
+ # the two small layers would normally have been merged.
+ create_layers <<-\EOF &&
+ one 2
+ two 4
+ barrier 54
+ EOF
+
+ cp $midx_chain $midx_chain.before &&
+
+ # Running an incremental repack compacts the two
+ # small layers at the bottom of the chain as a
+ # separate step in the compaction plan.
+ test_commit another &&
+ git repack --geometric=2 -d --write-midx=incremental \
+ --write-bitmap-index &&
+
+ test_line_count = 2 "$midx_chain" &&
+ git multi-pack-index verify
+ )
+'
+
+test_expect_success 'geometric rollup with surviving tip packs' '
+ git init geometric-rollup-with-surviving-tip-packs &&
+ (
+ cd geometric-rollup-with-surviving-tip-packs &&
+
+ git config maintenance.auto false &&
+ git config repack.midxnewlayerthreshold 1 &&
+ git config repack.midxsplitfactor 2 &&
+
+ # Create a pack large enough to anchor the geometric
+ # progression when small packs are added alongside it.
+ create_layer --message="big" 5 &&
+
+ test_line_count = 1 "$midx_chain" &&
+ cp $midx_chain $midx_chain.before &&
+
+ # Repack a small number of objects such that the
+ # progression is unbothered. Note that the existing pack
+ # is considered a repack candidate as the new layer
+ # threshold is set to 1.
+ test_commit small-1 &&
+ git repack -d &&
+ git repack --geometric=2 -d --write-midx=incremental \
+ --write-bitmap-index &&
+
+ ! test_cmp $midx_chain.before $midx_chain &&
+ cp $midx_chain $midx_chain.before
+ )
+'
+
+test_expect_success 'kept packs are excluded from repack' '
+ git init kept-packs-excluded-from-repack &&
+ (
+ cd kept-packs-excluded-from-repack &&
+
+ git config maintenance.auto false &&
+ git config repack.midxnewlayerthreshold 1 &&
+ git config repack.midxsplitfactor 2 &&
+
+ # Create two equal-sized packs, marking one as kept.
+ for i in A B
+ do
+ test_commit "$i" && git repack -d || return 1
+ done &&
+
+ keep=$(ls $packdir/pack-*.idx | head -n 1) &&
+ touch "${keep%.idx}.keep" &&
+
+ # The kept pack is excluded as a repacking candidate
+ # entirely, so no rollup occurs as there is only one
+ # non-kept pack. A new MIDX layer is written containing
+ # that pack.
+ git repack --geometric=2 -d --write-midx=incremental \
+ --write-bitmap-index &&
+
+ test-tool read-midx $objdir >actual &&
+ grep "^pack-.*\.idx$" actual >actual.packs &&
+ test_line_count = 1 actual.packs &&
+ test_grep ! "$keep" actual.packs &&
+
+ git multi-pack-index verify &&
+
+ # All objects (from both kept and non-kept packs)
+ # must still be accessible.
+ git fsck
+ )
+'
+
+test_expect_success 'incremental MIDX with --max-pack-size' '
+ git init incremental-midx-with--max-pack-size &&
+ (
+ cd incremental-midx-with--max-pack-size &&
+
+ git config maintenance.auto false &&
+ git config repack.midxnewlayerthreshold 1 &&
+ git config repack.midxsplitfactor 2 &&
+
+ create_layer --message="base" 1 &&
+
+ # Now add enough data that a small --max-pack-size will
+ # cause pack-objects to split its output. Create objects
+ # large enough to fill multiple packs.
+ test-tool genrandom foo 1M >big1 &&
+ test-tool genrandom bar 1M >big2 &&
+ git add big1 big2 &&
+ test_tick &&
+ git commit -a -m "big blobs" &&
+ git repack -d &&
+
+ git repack --geometric=2 -d --write-midx=incremental \
+ --write-bitmap-index --max-pack-size=1M &&
+
+ test_line_count = 1 "$midx_chain" &&
+ test-tool read-midx $objdir >actual &&
+ grep "^pack-.*\.idx$" actual >actual.packs &&
+ test_line_count -gt 1 actual.packs &&
+
+ git multi-pack-index verify
+ )
+'
+
+test_expect_success 'noop repack preserves valid MIDX chain' '
+ git init noop-repack-preserves-valid-midx-chain &&
+ (
+ cd noop-repack-preserves-valid-midx-chain &&
+
+ git config maintenance.auto false &&
+ git config repack.midxnewlayerthreshold 1 &&
+ git config repack.midxsplitfactor 2 &&
+
+ create_layer --message="base" 1 &&
+
+ git multi-pack-index verify &&
+ cp $midx_chain $midx_chain.before &&
+
+ # Running again with no new objects should not break
+ # the MIDX chain. It produces "Nothing new to pack."
+ git repack --geometric=2 -d --write-midx=incremental \
+ --write-bitmap-index &&
+
+ test_cmp $midx_chain.before $midx_chain &&
+
+ git multi-pack-index verify &&
+ git fsck
+ )
+'
+
+test_expect_success 'repack -ad removes stale incremental chain' '
+ git init repack--ad-removes-stale-incremental-chain &&
+ (
+ cd repack--ad-removes-stale-incremental-chain &&
+
+ git config maintenance.auto false &&
+ git config repack.midxnewlayerthreshold 1 &&
+ git config repack.midxsplitfactor 2 &&
+
+ create_layers <<-\EOF &&
+ one 1
+ two 1
+ EOF
+
+ test_path_is_file $midx_chain &&
+ test_line_count = 2 $midx_chain &&
+
+ git repack -ad &&
+
+ test_path_is_missing $packdir/multi-pack-index &&
+ test_dir_is_empty $midxdir
+ )
+'
+
+test_expect_success 'repack rejects invalid midxSplitFactor' '
+ test_when_finished "rm -fr bad-split-factor" &&
+ git init bad-split-factor &&
+ (
+ cd bad-split-factor &&
+ test_commit base &&
+
+ for v in 0 1 -1
+ do
+ test_must_fail git -c repack.midxSplitFactor=$v \
+ repack -d --geometric=2 --write-midx=incremental 2>err &&
+ test_grep "invalid value for --midx-split-factor" err ||
+ return 1
+ done
+ )
+'
+
+test_expect_success 'repack rejects invalid midxNewLayerThreshold' '
+ test_when_finished "rm -fr bad-layer-threshold" &&
+ git init bad-layer-threshold &&
+ (
+ cd bad-layer-threshold &&
+ test_commit base &&
+
+ for v in 0 -1
+ do
+ test_must_fail git -c repack.midxNewLayerThreshold=$v \
+ repack -d --geometric=2 --write-midx=incremental 2>err &&
+ test_grep "invalid value for --midx-new-layer-threshold" err ||
+ return 1
+ done
+ )
+'
+
+test_done
--
2.54.0.175.g8bd0ec98dc3
^ permalink raw reply related
* [PATCH v4 14/16] repack: implement incremental MIDX repacking
From: Taylor Blau @ 2026-05-19 15:58 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <cover.1779206239.git.me@ttaylorr.com>
Implement the `write_midx_incremental()` function, which builds and
maintains an incremental MIDX chain as part of the geometric repacking
process.
Unlike the default mode which writes a single flat MIDX, the incremental
mode constructs a compaction plan that determines which MIDX layers to
write, compact, or copy, and then executes each step using `git
multi-pack-index` subcommands with the --no-write-chain-file flag.
The repacking strategy works as follows:
* Acquire the lock guarding the multi-pack-index-chain.
* A new MIDX layer is always written containing the newly created
pack(s). If the tip MIDX layer was rewritten during geometric
repacking, any surviving packs from that layer are also included.
* Starting from the new layer, adjacent MIDX layers are merged together
as long as the accumulated object count exceeds half the object count
of the next deeper layer (controlled by 'repack.midxSplitFactor').
* Remaining layers in the chain are evaluated pairwise and either
compacted or copied as-is, following the same merging condition.
* Write the contents of the new multi-pack-index chain, atomically move
it into place, and then release the lock.
* Delete any now-unused MIDX layers.
After writing the new layer, the strategy is evaluated among the
existing MIDX layers in order from oldest to newest. Each step that
writes a new MIDX layer uses "--no-write-chain-file" to avoid updating
the multi-pack-index-chain file. After all steps are complete, the new
chain file is written and then atomically moved into place.
At present, this functionality is exposed behind a new enum value,
`REPACK_WRITE_MIDX_INCREMENTAL`, but has no external callers. A
subsequent commit will expose this mode via `git repack
--write-midx=incremental`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/repack.c | 5 +
repack-midx.c | 593 +++++++++++++++++++++++++++++++++++++++++++++--
repack.h | 3 +
3 files changed, 588 insertions(+), 13 deletions(-)
diff --git a/builtin/repack.c b/builtin/repack.c
index 5d366340c34..75c57736780 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -42,6 +42,9 @@ static const char incremental_bitmap_conflict_error[] = N_(
"--no-write-bitmap-index or disable the pack.writeBitmaps configuration."
);
+#define DEFAULT_MIDX_SPLIT_FACTOR 2
+#define DEFAULT_MIDX_NEW_LAYER_THRESHOLD 8
+
struct repack_config_ctx {
struct pack_objects_args *po_args;
struct pack_objects_args *cruft_po_args;
@@ -555,6 +558,8 @@ int cmd_repack(int argc,
.show_progress = show_progress,
.write_bitmaps = write_bitmaps > 0,
.midx_must_contain_cruft = midx_must_contain_cruft,
+ .midx_split_factor = DEFAULT_MIDX_SPLIT_FACTOR,
+ .midx_new_layer_threshold = DEFAULT_MIDX_NEW_LAYER_THRESHOLD,
.mode = write_midx,
};
diff --git a/repack-midx.c b/repack-midx.c
index b1ca3797080..f97331fb1b7 100644
--- a/repack-midx.c
+++ b/repack-midx.c
@@ -2,12 +2,16 @@
#include "repack.h"
#include "hash.h"
#include "hex.h"
+#include "lockfile.h"
+#include "midx.h"
#include "odb.h"
#include "oidset.h"
#include "pack-bitmap.h"
+#include "path.h"
#include "refs.h"
#include "run-command.h"
#include "tempfile.h"
+#include "trace2.h"
struct midx_snapshot_ref_data {
struct repository *repo;
@@ -293,26 +297,30 @@ static void repack_prepare_midx_command(struct child_process *cmd,
}
static int repack_fill_midx_stdin_packs(struct child_process *cmd,
- struct string_list *include)
+ struct string_list *include,
+ struct string_list *out)
{
+ struct strbuf in_buf = STRBUF_INIT;
+ struct strbuf out_buf = STRBUF_INIT;
struct string_list_item *item;
- FILE *in;
int ret;
- cmd->in = -1;
-
strvec_push(&cmd->args, "--stdin-packs");
- ret = start_command(cmd);
- if (ret)
- return ret;
-
- in = xfdopen(cmd->in, "w");
for_each_string_list_item(item, include)
- fprintf(in, "%s\n", item->string);
- fclose(in);
+ strbuf_addf(&in_buf, "%s\n", item->string);
- return finish_command(cmd);
+ ret = pipe_command(cmd, in_buf.buf, in_buf.len,
+ out ? &out_buf : NULL, 0, NULL, 0);
+
+ if (out)
+ string_list_split_f(out, out_buf.buf, "\n", -1,
+ STRING_LIST_SPLIT_NONEMPTY);
+
+ strbuf_release(&in_buf);
+ strbuf_release(&out_buf);
+
+ return ret;
}
static int write_midx_included_packs(struct repack_write_midx_opts *opts)
@@ -369,7 +377,7 @@ static int write_midx_included_packs(struct repack_write_midx_opts *opts)
strvec_pushf(&cmd.args, "--refs-snapshot=%s",
opts->refs_snapshot);
- ret = repack_fill_midx_stdin_packs(&cmd, &include);
+ ret = repack_fill_midx_stdin_packs(&cmd, &include, NULL);
done:
if (!ret && opts->write_bitmaps)
remove_redundant_bitmaps(&include, opts->packdir);
@@ -379,6 +387,563 @@ static int write_midx_included_packs(struct repack_write_midx_opts *opts)
return ret;
}
+struct midx_compaction_step {
+ union {
+ struct multi_pack_index *copy;
+ struct string_list write;
+ struct {
+ struct multi_pack_index *from;
+ struct multi_pack_index *to;
+ } compact;
+ } u;
+
+ uint32_t objects_nr;
+ char *csum;
+
+ enum {
+ MIDX_COMPACTION_STEP_UNKNOWN,
+ MIDX_COMPACTION_STEP_COPY,
+ MIDX_COMPACTION_STEP_WRITE,
+ MIDX_COMPACTION_STEP_COMPACT,
+ } type;
+};
+
+static const char *midx_compaction_step_base(const struct midx_compaction_step *step)
+{
+ switch (step->type) {
+ case MIDX_COMPACTION_STEP_UNKNOWN:
+ BUG("cannot use UNKNOWN step as a base");
+ case MIDX_COMPACTION_STEP_COPY:
+ return midx_get_checksum_hex(step->u.copy);
+ case MIDX_COMPACTION_STEP_WRITE:
+ BUG("cannot use WRITE step as a base");
+ case MIDX_COMPACTION_STEP_COMPACT:
+ return midx_get_checksum_hex(step->u.compact.to);
+ default:
+ BUG("unhandled midx compaction step type %d", step->type);
+ }
+}
+
+static int midx_compaction_step_exec_copy(struct midx_compaction_step *step)
+{
+ step->csum = xstrdup(midx_get_checksum_hex(step->u.copy));
+ return 0;
+}
+
+static int midx_compaction_step_exec_write(struct midx_compaction_step *step,
+ struct repack_write_midx_opts *opts,
+ const char *base)
+{
+ struct child_process cmd = CHILD_PROCESS_INIT;
+ struct string_list hash = STRING_LIST_INIT_DUP;
+ struct string_list_item *item;
+ const char *preferred_pack = NULL;
+ int ret = 0;
+
+ if (!step->u.write.nr) {
+ ret = error(_("no packs to write MIDX during compaction"));
+ goto out;
+ }
+
+ for_each_string_list_item(item, &step->u.write) {
+ if (item->util)
+ preferred_pack = item->string;
+ }
+
+ repack_prepare_midx_command(&cmd, opts, "write");
+ strvec_pushl(&cmd.args, "--incremental", "--no-write-chain-file", NULL);
+ strvec_pushf(&cmd.args, "--base=%s", base ? base : "none");
+
+ if (preferred_pack) {
+ struct strbuf buf = STRBUF_INIT;
+
+ strbuf_addstr(&buf, preferred_pack);
+ strbuf_strip_suffix(&buf, ".idx");
+ strbuf_addstr(&buf, ".pack");
+
+ strvec_pushf(&cmd.args, "--preferred-pack=%s", buf.buf);
+
+ strbuf_release(&buf);
+ }
+
+ ret = repack_fill_midx_stdin_packs(&cmd, &step->u.write, &hash);
+ if (hash.nr != 1) {
+ ret = error(_("expected exactly one line during MIDX write, "
+ "got: %"PRIuMAX),
+ (uintmax_t)hash.nr);
+ goto out;
+ }
+
+ step->csum = xstrdup(hash.items[0].string);
+
+out:
+ string_list_clear(&hash, 0);
+
+ return ret;
+}
+
+static int midx_compaction_step_exec_compact(struct midx_compaction_step *step,
+ struct repack_write_midx_opts *opts)
+{
+ struct child_process cmd = CHILD_PROCESS_INIT;
+ struct strbuf buf = STRBUF_INIT;
+ FILE *out = NULL;
+ int ret;
+
+ repack_prepare_midx_command(&cmd, opts, "compact");
+ strvec_pushl(&cmd.args, "--incremental", "--no-write-chain-file",
+ midx_get_checksum_hex(step->u.compact.from),
+ midx_get_checksum_hex(step->u.compact.to), NULL);
+
+ cmd.out = -1;
+
+ ret = start_command(&cmd);
+ if (ret)
+ goto out;
+
+ out = xfdopen(cmd.out, "r");
+ while (strbuf_getline_lf(&buf, out) != EOF) {
+ if (step->csum) {
+ ret = error(_("unexpected MIDX output: '%s'"), buf.buf);
+ fclose(out);
+ out = NULL;
+ finish_command(&cmd);
+ goto out;
+ }
+ step->csum = strbuf_detach(&buf, NULL);
+ }
+
+ ret = finish_command(&cmd);
+
+out:
+ if (out)
+ fclose(out);
+ strbuf_release(&buf);
+
+ return ret;
+}
+
+static int midx_compaction_step_exec(struct midx_compaction_step *step,
+ struct repack_write_midx_opts *opts,
+ const char *base)
+{
+ switch (step->type) {
+ case MIDX_COMPACTION_STEP_UNKNOWN:
+ BUG("cannot execute UNKNOWN midx compaction step");
+ case MIDX_COMPACTION_STEP_COPY:
+ return midx_compaction_step_exec_copy(step);
+ case MIDX_COMPACTION_STEP_WRITE:
+ return midx_compaction_step_exec_write(step, opts, base);
+ case MIDX_COMPACTION_STEP_COMPACT:
+ return midx_compaction_step_exec_compact(step, opts);
+ default:
+ BUG("unhandled midx compaction step type %d", step->type);
+ }
+}
+
+static void midx_compaction_step_release(struct midx_compaction_step *step)
+{
+ if (step->type == MIDX_COMPACTION_STEP_WRITE)
+ string_list_clear(&step->u.write, 0);
+ free(step->csum);
+}
+
+static int repack_make_midx_compaction_plan(struct repack_write_midx_opts *opts,
+ struct midx_compaction_step **steps_p,
+ size_t *steps_nr_p)
+{
+ struct multi_pack_index *m;
+ struct midx_compaction_step *steps = NULL;
+ struct midx_compaction_step step = { 0 };
+ struct strbuf buf = STRBUF_INIT;
+ size_t steps_nr = 0, steps_alloc = 0;
+ uint32_t i;
+ int ret = 0;
+
+ trace2_region_enter("repack", "make_midx_compaction_plan",
+ opts->existing->repo);
+
+ odb_reprepare(opts->existing->repo->objects);
+ m = get_multi_pack_index(opts->existing->source);
+
+ for (i = 0; m && i < m->num_packs + m->num_packs_in_base; i++) {
+ if (prepare_midx_pack(m, i)) {
+ ret = error(_("could not load pack %"PRIu32" from MIDX"),
+ i);
+ goto out;
+ }
+ }
+
+ trace2_region_enter("repack", "steps:write", opts->existing->repo);
+
+ /*
+ * The first MIDX in the resulting chain is always going to be
+ * new.
+ *
+ * At a minimum, it will include all of the newly written packs.
+ * If there is an existing MIDX whose tip layer contains packs
+ * that were repacked, it will also include any of its packs
+ * which were *not* rolled up as part of the geometric repack
+ * (if any), and the previous tip will be replaced.
+ *
+ * It may grow to include the packs from zero or more MIDXs from
+ * the old chain, beginning either at the old tip (if the MIDX
+ * was *not* rewritten) or the old tip's base MIDX layer
+ * (otherwise).
+ */
+ step.type = MIDX_COMPACTION_STEP_WRITE;
+ string_list_init_dup(&step.u.write);
+
+ for (i = 0; i < opts->names->nr; i++) {
+ strbuf_reset(&buf);
+ strbuf_addf(&buf, "pack-%s.idx", opts->names->items[i].string);
+ string_list_append(&step.u.write, buf.buf);
+
+ trace2_data_string("repack", opts->existing->repo,
+ "include:fresh",
+ step.u.write.items[step.u.write.nr - 1].string);
+ }
+ for (i = 0; i < opts->geometry->split; i++) {
+ struct packed_git *p = opts->geometry->pack[i];
+ if (unsigned_add_overflows(step.objects_nr, p->num_objects)) {
+ ret = error(_("too many objects in MIDX compaction step"));
+ goto out;
+ }
+
+ step.objects_nr += p->num_objects;
+ }
+ trace2_data_intmax("repack", opts->existing->repo,
+ "include:fresh:objects_nr",
+ (uintmax_t)step.objects_nr);
+
+ /*
+ * Now handle any existing packs which were *not* rewritten.
+ *
+ * The list of packs in opts->geometry only contains MIDX'd
+ * packs from the newest layer when that layer has more than
+ * 'repack.midxNewLayerThreshold' number of packs.
+ *
+ * If the MIDX tip was rewritten (that is, one or more of those
+ * packs appear below the split line), then add all packs above
+ * the split line to the new layer, as the old one is no longer
+ * usable.
+ *
+ * If the MIDX tip was not rewritten (that is, all MIDX'd packs
+ * from the youngest layer appear below the split line, or were
+ * not included in the geometric repack at all because there
+ * were too few of them), ignore them since we'll retain the
+ * existing layer as-is.
+ */
+ for (i = opts->geometry->split; i < opts->geometry->pack_nr; i++) {
+ struct packed_git *p = opts->geometry->pack[i];
+ struct string_list_item *item;
+
+ strbuf_reset(&buf);
+ strbuf_addstr(&buf, pack_basename(p));
+ strbuf_strip_suffix(&buf, ".pack");
+ strbuf_addstr(&buf, ".idx");
+
+ if (p->multi_pack_index &&
+ !opts->geometry->midx_tip_rewritten) {
+ trace2_data_string("repack", opts->existing->repo,
+ "exclude:unmodified", buf.buf);
+ continue;
+ }
+
+ trace2_data_string("repack", opts->existing->repo,
+ "include:unmodified", buf.buf);
+ trace2_data_string("repack", opts->existing->repo,
+ "include:unmodified:midx",
+ p->multi_pack_index ? "true" : "false");
+
+ item = string_list_append(&step.u.write, buf.buf);
+ if (p->multi_pack_index || i == opts->geometry->pack_nr - 1)
+ item->util = (void *)1; /* mark as preferred */
+
+ if (unsigned_add_overflows(step.objects_nr, p->num_objects)) {
+ ret = error(_("too many objects in MIDX compaction step"));
+ goto out;
+ }
+
+ step.objects_nr += p->num_objects;
+ }
+ trace2_data_intmax("repack", opts->existing->repo,
+ "include:unmodified:objects_nr",
+ (uintmax_t)step.objects_nr);
+
+ /*
+ * If the MIDX tip was rewritten, then we no longer consider it
+ * a candidate for compaction, since it will not exist in the
+ * MIDX chain being built.
+ */
+ if (opts->geometry->midx_tip_rewritten)
+ m = m->base_midx;
+
+ trace2_data_string("repack", opts->existing->repo, "midx:rewrote-tip",
+ opts->geometry->midx_tip_rewritten ? "true" : "false");
+
+ trace2_region_enter("repack", "compact", opts->existing->repo);
+
+ /*
+ * Compact additional MIDX layers into this proposed one until
+ * the merging condition is violated.
+ */
+ while (m) {
+ uint32_t preferred_pack_idx;
+
+ trace2_data_string("repack", opts->existing->repo,
+ "candidate", midx_get_checksum_hex(m));
+
+ if (step.objects_nr < m->num_objects / opts->midx_split_factor) {
+ /*
+ * Stop compacting MIDX layer as soon as the
+ * merged size is less than half the size of the
+ * next layer in the chain.
+ */
+ trace2_data_string("repack", opts->existing->repo,
+ "compact", "violated");
+ trace2_data_intmax("repack", opts->existing->repo,
+ "objects_nr",
+ (uintmax_t)step.objects_nr);
+ trace2_data_intmax("repack", opts->existing->repo,
+ "next_objects_nr",
+ (uintmax_t)m->num_objects);
+ trace2_data_intmax("repack", opts->existing->repo,
+ "split_factor",
+ (uintmax_t)opts->midx_split_factor);
+
+ break;
+ }
+
+ if (midx_preferred_pack(m, &preferred_pack_idx) < 0) {
+ ret = error(_("could not find preferred pack for MIDX "
+ "%s"), midx_get_checksum_hex(m));
+ goto out;
+ }
+
+ for (i = 0; i < m->num_packs; i++) {
+ struct string_list_item *item;
+ uint32_t pack_int_id = i + m->num_packs_in_base;
+ struct packed_git *p = nth_midxed_pack(m, pack_int_id);
+
+ strbuf_reset(&buf);
+ strbuf_addstr(&buf, pack_basename(p));
+ strbuf_strip_suffix(&buf, ".pack");
+ strbuf_addstr(&buf, ".idx");
+
+ trace2_data_string("repack", opts->existing->repo,
+ "midx:pack", buf.buf);
+
+ item = string_list_append(&step.u.write, buf.buf);
+ if (pack_int_id == preferred_pack_idx)
+ item->util = (void *)1; /* mark as preferred */
+ }
+
+ if (unsigned_add_overflows(step.objects_nr, m->num_objects)) {
+ ret = error(_("too many objects in MIDX compaction step"));
+ goto out;
+ }
+ step.objects_nr += m->num_objects;
+
+ m = m->base_midx;
+ }
+
+ if (step.u.write.nr > 0) {
+ /*
+ * As long as there is at least one new pack to write
+ * (and thus the MIDX is non-empty), add it to the plan.
+ */
+ ALLOC_GROW(steps, steps_nr + 1, steps_alloc);
+ steps[steps_nr++] = step;
+ }
+
+ trace2_data_intmax("repack", opts->existing->repo,
+ "step:objects_nr", (uintmax_t)step.objects_nr);
+ trace2_data_intmax("repack", opts->existing->repo,
+ "step:packs_nr", (uintmax_t)step.u.write.nr);
+
+ trace2_region_leave("repack", "compact", opts->existing->repo);
+ trace2_region_leave("repack", "steps:write", opts->existing->repo);
+
+ trace2_region_enter("repack", "steps:rest", opts->existing->repo);
+
+ /*
+ * Then start over, repeat, and either compact or keep as-is
+ * each MIDX layer until we have exhausted the chain.
+ *
+ * Finally, evaluate the remainder of the chain (if any) and
+ * either compact a sequence of adjacent layers, or keep
+ * individual layers as-is according to the same merging
+ * condition as above.
+ */
+ while (m) {
+ struct multi_pack_index *next = m;
+
+ ALLOC_GROW(steps, steps_nr + 1, steps_alloc);
+
+ memset(&step, 0, sizeof(step));
+ step.type = MIDX_COMPACTION_STEP_UNKNOWN;
+
+ trace2_region_enter("repack", "step", opts->existing->repo);
+
+ trace2_data_string("repack", opts->existing->repo,
+ "from", midx_get_checksum_hex(m));
+
+ while (next) {
+ uint32_t proposed_objects_nr;
+ if (unsigned_add_overflows(step.objects_nr, next->num_objects)) {
+ ret = error(_("too many objects in MIDX compaction step"));
+ trace2_region_leave("repack", "step", opts->existing->repo);
+ goto out;
+ }
+
+ proposed_objects_nr = step.objects_nr + next->num_objects;
+
+ trace2_data_string("repack", opts->existing->repo,
+ "proposed",
+ midx_get_checksum_hex(next));
+ trace2_data_intmax("repack", opts->existing->repo,
+ "proposed:objects_nr",
+ (uintmax_t)next->num_objects);
+
+ if (!next->base_midx) {
+ /*
+ * If we are at the end of the MIDX
+ * chain, there is nothing to compact,
+ * so mark it and stop.
+ */
+ step.objects_nr = proposed_objects_nr;
+ break;
+ }
+
+ if (proposed_objects_nr < next->base_midx->num_objects / opts->midx_split_factor) {
+ /*
+ * If there is a MIDX following this
+ * one, but our accumulated size is less
+ * than half of its size, compacting
+ * them would violate the merging
+ * condition, so stop here.
+ */
+
+ trace2_data_string("repack", opts->existing->repo,
+ "compact:violated:at",
+ midx_get_checksum_hex(next->base_midx));
+ trace2_data_intmax("repack", opts->existing->repo,
+ "compact:violated:at:objects_nr",
+ (uintmax_t)next->base_midx->num_objects);
+ break;
+ }
+
+ /*
+ * Otherwise, it is OK to compact the next layer
+ * into this one. Do so, and then continue
+ * through the remainder of the chain.
+ */
+ step.objects_nr = proposed_objects_nr;
+ trace2_data_intmax("repack", opts->existing->repo,
+ "step:objects_nr",
+ (uintmax_t)step.objects_nr);
+ next = next->base_midx;
+ }
+
+ if (m == next) {
+ step.type = MIDX_COMPACTION_STEP_COPY;
+ step.u.copy = m;
+
+ trace2_data_string("repack", opts->existing->repo,
+ "type", "copy");
+ } else {
+ step.type = MIDX_COMPACTION_STEP_COMPACT;
+ step.u.compact.from = next;
+ step.u.compact.to = m;
+
+ trace2_data_string("repack", opts->existing->repo,
+ "to", midx_get_checksum_hex(m));
+ trace2_data_string("repack", opts->existing->repo,
+ "type", "compact");
+ }
+
+ m = next->base_midx;
+ steps[steps_nr++] = step;
+ trace2_region_leave("repack", "step", opts->existing->repo);
+ }
+
+ trace2_region_leave("repack", "steps:rest", opts->existing->repo);
+
+out:
+ *steps_p = steps;
+ *steps_nr_p = steps_nr;
+
+ strbuf_release(&buf);
+
+ trace2_region_leave("repack", "make_midx_compaction_plan",
+ opts->existing->repo);
+
+ return ret;
+}
+
+static int write_midx_incremental(struct repack_write_midx_opts *opts)
+{
+ struct midx_compaction_step *steps = NULL;
+ struct strbuf lock_name = STRBUF_INIT;
+ struct lock_file lf;
+ size_t steps_nr = 0;
+ size_t i;
+ int ret = 0;
+
+ get_midx_chain_filename(opts->existing->source, &lock_name);
+ if (safe_create_leading_directories(opts->existing->repo,
+ lock_name.buf))
+ die_errno(_("unable to create leading directories of %s"),
+ lock_name.buf);
+ hold_lock_file_for_update(&lf, lock_name.buf, LOCK_DIE_ON_ERROR);
+
+ if (!fdopen_lock_file(&lf, "w")) {
+ ret = error_errno(_("unable to open multi-pack-index chain file"));
+ goto done;
+ }
+
+ if (repack_make_midx_compaction_plan(opts, &steps, &steps_nr) < 0) {
+ ret = error(_("unable to generate compaction plan"));
+ goto done;
+ }
+
+ for (i = 0; i < steps_nr; i++) {
+ struct midx_compaction_step *step = &steps[i];
+ char *base = NULL;
+
+ if (i + 1 < steps_nr)
+ base = xstrdup(midx_compaction_step_base(&steps[i + 1]));
+
+ if (midx_compaction_step_exec(step, opts, base) < 0) {
+ ret = error(_("unable to execute compaction step %"PRIuMAX),
+ (uintmax_t)i);
+ free(base);
+ goto done;
+ }
+
+ free(base);
+ }
+
+ i = steps_nr;
+ while (i--) {
+ struct midx_compaction_step *step = &steps[i];
+ if (!step->csum)
+ BUG("missing result for compaction step %"PRIuMAX,
+ (uintmax_t)i);
+ fprintf(get_lock_file_fp(&lf), "%s\n", step->csum);
+ }
+
+ commit_lock_file(&lf);
+
+done:
+ strbuf_release(&lock_name);
+ for (i = 0; i < steps_nr; i++)
+ midx_compaction_step_release(&steps[i]);
+ free(steps);
+ return ret;
+}
+
int repack_write_midx(struct repack_write_midx_opts *opts)
{
switch (opts->mode) {
@@ -386,6 +951,8 @@ int repack_write_midx(struct repack_write_midx_opts *opts)
BUG("write_midx mode is NONE?");
case REPACK_WRITE_MIDX_DEFAULT:
return write_midx_included_packs(opts);
+ case REPACK_WRITE_MIDX_INCREMENTAL:
+ return write_midx_incremental(opts);
default:
BUG("unhandled write_midx mode: %d", opts->mode);
}
diff --git a/repack.h b/repack.h
index 81907fcce7f..831ccfb1c6c 100644
--- a/repack.h
+++ b/repack.h
@@ -137,6 +137,7 @@ struct tempfile;
enum repack_write_midx_mode {
REPACK_WRITE_MIDX_NONE,
REPACK_WRITE_MIDX_DEFAULT,
+ REPACK_WRITE_MIDX_INCREMENTAL,
};
struct repack_write_midx_opts {
@@ -148,6 +149,8 @@ struct repack_write_midx_opts {
int show_progress;
int write_bitmaps;
int midx_must_contain_cruft;
+ int midx_split_factor;
+ int midx_new_layer_threshold;
enum repack_write_midx_mode mode;
};
--
2.54.0.175.g8bd0ec98dc3
^ permalink raw reply related
* [PATCH v4 13/16] packfile: ensure `close_pack_revindex()` frees in-memory revindex
From: Taylor Blau @ 2026-05-19 15:58 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <cover.1779206239.git.me@ttaylorr.com>
The following commit will introduce a case where we write a MIDX bitmap
over packs that do not themselves have on-disk *.rev files.
This case is supported within Git, and we will simply fall back to
generating the revindex in memory. But we don't ever release that
memory, causing a leak that is exposed by a test introduced in the
following commit.
(As far as I could find, we never free()'d memory allocated as a
byproduct of creating an in-memory revindex, likely because that code
predates the leak-checking niceties we have in the test suite now.)
Rectify this by calling `FREE_AND_NULL()` on the `p->revindex` field
when calling `close_pack_revindex()`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
packfile.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/packfile.c b/packfile.c
index b012d648ada..a1e88fdb223 100644
--- a/packfile.c
+++ b/packfile.c
@@ -420,6 +420,8 @@ void close_pack_index(struct packed_git *p)
static void close_pack_revindex(struct packed_git *p)
{
+ FREE_AND_NULL(p->revindex);
+
if (!p->revindex_map)
return;
--
2.54.0.175.g8bd0ec98dc3
^ permalink raw reply related
* [PATCH v4 12/16] builtin/repack.c: convert `--write-midx` to an `OPT_CALLBACK`
From: Taylor Blau @ 2026-05-19 15:58 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <cover.1779206239.git.me@ttaylorr.com>
Change the --write-midx (-m) flag from an OPT_BOOL to an OPT_CALLBACK
that accepts an optional mode argument. Introduce an enum with
REPACK_WRITE_MIDX_NONE and REPACK_WRITE_MIDX_DEFAULT to distinguish
between the two states, and update all existing boolean checks
accordingly.
For now, passing no argument (or just `-m`) selects the default mode,
preserving existing behavior. A subsequent commit will add a new mode
for writing incremental MIDXs.
Extract repack_write_midx() as a dispatcher that selects the
appropriate MIDX-writing implementation based on the mode.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/repack.c | 50 ++++++++++++++++++++++++++++++++++++------------
repack-midx.c | 14 +++++++++++++-
repack.h | 8 +++++++-
3 files changed, 58 insertions(+), 14 deletions(-)
diff --git a/builtin/repack.c b/builtin/repack.c
index 24be147d39a..5d366340c34 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -97,6 +97,24 @@ static int repack_config(const char *var, const char *value,
return git_default_config(var, value, ctx, cb);
}
+static int option_parse_write_midx(const struct option *opt, const char *arg,
+ int unset)
+{
+ enum repack_write_midx_mode *cfg = opt->value;
+
+ if (unset) {
+ *cfg = REPACK_WRITE_MIDX_NONE;
+ return 0;
+ }
+
+ if (!arg || !*arg)
+ *cfg = REPACK_WRITE_MIDX_DEFAULT;
+ else
+ return error(_("unknown value for %s: %s"), opt->long_name, arg);
+
+ return 0;
+}
+
int cmd_repack(int argc,
const char **argv,
const char *prefix,
@@ -119,7 +137,7 @@ int cmd_repack(int argc,
struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
struct pack_objects_args po_args = PACK_OBJECTS_ARGS_INIT;
struct pack_objects_args cruft_po_args = PACK_OBJECTS_ARGS_INIT;
- int write_midx = 0;
+ enum repack_write_midx_mode write_midx = REPACK_WRITE_MIDX_NONE;
const char *cruft_expiration = NULL;
const char *expire_to = NULL;
const char *filter_to = NULL;
@@ -185,8 +203,14 @@ int cmd_repack(int argc,
N_("do not repack this pack")),
OPT_INTEGER('g', "geometric", &geometry.split_factor,
N_("find a geometric progression with factor <N>")),
- OPT_BOOL('m', "write-midx", &write_midx,
- N_("write a multi-pack index of the resulting packs")),
+ OPT_CALLBACK_F(0, "write-midx", &write_midx,
+ N_("mode"),
+ N_("write a multi-pack index of the resulting packs"),
+ PARSE_OPT_OPTARG, option_parse_write_midx),
+ OPT_SET_INT_F('m', NULL, &write_midx,
+ N_("write a multi-pack index of the resulting packs"),
+ REPACK_WRITE_MIDX_DEFAULT,
+ PARSE_OPT_HIDDEN),
OPT_STRING(0, "expire-to", &expire_to, N_("dir"),
N_("pack prefix to store a pack containing pruned objects")),
OPT_STRING(0, "filter-to", &filter_to, N_("dir"),
@@ -221,14 +245,16 @@ int cmd_repack(int argc,
pack_everything |= ALL_INTO_ONE;
if (write_bitmaps < 0) {
- if (!write_midx &&
+ if (write_midx == REPACK_WRITE_MIDX_NONE &&
(!(pack_everything & ALL_INTO_ONE) || !is_bare_repository()))
write_bitmaps = 0;
}
if (po_args.pack_kept_objects < 0)
- po_args.pack_kept_objects = write_bitmaps > 0 && !write_midx;
+ po_args.pack_kept_objects = write_bitmaps > 0 &&
+ write_midx == REPACK_WRITE_MIDX_NONE;
- if (write_bitmaps && !(pack_everything & ALL_INTO_ONE) && !write_midx)
+ if (write_bitmaps && !(pack_everything & ALL_INTO_ONE) &&
+ write_midx == REPACK_WRITE_MIDX_NONE)
die(_(incremental_bitmap_conflict_error));
if (write_bitmaps && po_args.local &&
@@ -244,7 +270,7 @@ int cmd_repack(int argc,
write_bitmaps = 0;
}
- if (write_midx && write_bitmaps) {
+ if (write_midx != REPACK_WRITE_MIDX_NONE && write_bitmaps) {
struct strbuf path = STRBUF_INIT;
strbuf_addf(&path, "%s/%s_XXXXXX",
@@ -297,7 +323,7 @@ int cmd_repack(int argc,
}
if (repo_has_promisor_remote(repo))
strvec_push(&cmd.args, "--exclude-promisor-objects");
- if (!write_midx) {
+ if (write_midx == REPACK_WRITE_MIDX_NONE) {
if (write_bitmaps > 0)
strvec_push(&cmd.args, "--write-bitmap-index");
else if (write_bitmaps < 0)
@@ -519,7 +545,7 @@ int cmd_repack(int argc,
if (delete_redundant && pack_everything & ALL_INTO_ONE)
existing_packs_mark_for_deletion(&existing, &names);
- if (write_midx) {
+ if (write_midx != REPACK_WRITE_MIDX_NONE) {
struct repack_write_midx_opts opts = {
.existing = &existing,
.geometry = &geometry,
@@ -528,11 +554,11 @@ int cmd_repack(int argc,
.packdir = packdir,
.show_progress = show_progress,
.write_bitmaps = write_bitmaps > 0,
- .midx_must_contain_cruft = midx_must_contain_cruft
+ .midx_must_contain_cruft = midx_must_contain_cruft,
+ .mode = write_midx,
};
- ret = write_midx_included_packs(&opts);
-
+ ret = repack_write_midx(&opts);
if (ret)
goto cleanup;
}
diff --git a/repack-midx.c b/repack-midx.c
index 3fe83715da4..b1ca3797080 100644
--- a/repack-midx.c
+++ b/repack-midx.c
@@ -315,7 +315,7 @@ static int repack_fill_midx_stdin_packs(struct child_process *cmd,
return finish_command(cmd);
}
-int write_midx_included_packs(struct repack_write_midx_opts *opts)
+static int write_midx_included_packs(struct repack_write_midx_opts *opts)
{
struct child_process cmd = CHILD_PROCESS_INIT;
struct string_list include = STRING_LIST_INIT_DUP;
@@ -378,3 +378,15 @@ int write_midx_included_packs(struct repack_write_midx_opts *opts)
return ret;
}
+
+int repack_write_midx(struct repack_write_midx_opts *opts)
+{
+ switch (opts->mode) {
+ case REPACK_WRITE_MIDX_NONE:
+ BUG("write_midx mode is NONE?");
+ case REPACK_WRITE_MIDX_DEFAULT:
+ return write_midx_included_packs(opts);
+ default:
+ BUG("unhandled write_midx mode: %d", opts->mode);
+ }
+}
diff --git a/repack.h b/repack.h
index 77d24ee45fb..81907fcce7f 100644
--- a/repack.h
+++ b/repack.h
@@ -134,6 +134,11 @@ void pack_geometry_release(struct pack_geometry *geometry);
struct tempfile;
+enum repack_write_midx_mode {
+ REPACK_WRITE_MIDX_NONE,
+ REPACK_WRITE_MIDX_DEFAULT,
+};
+
struct repack_write_midx_opts {
struct existing_packs *existing;
struct pack_geometry *geometry;
@@ -143,10 +148,11 @@ struct repack_write_midx_opts {
int show_progress;
int write_bitmaps;
int midx_must_contain_cruft;
+ enum repack_write_midx_mode mode;
};
void midx_snapshot_refs(struct repository *repo, struct tempfile *f);
-int write_midx_included_packs(struct repack_write_midx_opts *opts);
+int repack_write_midx(struct repack_write_midx_opts *opts);
int write_filtered_pack(const struct write_pack_opts *opts,
struct existing_packs *existing,
--
2.54.0.175.g8bd0ec98dc3
^ permalink raw reply related
* [PATCH v4 11/16] repack-geometry: prepare for incremental MIDX repacking
From: Taylor Blau @ 2026-05-19 15:58 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <cover.1779206239.git.me@ttaylorr.com>
Teach `pack_geometry_init()` to optionally restrict the set of
repacking candidates to only packs in the tip MIDX layer when a
`midx_layer_threshold` is configured. If the tip layer has fewer packs
than the threshold, those packs are excluded entirely; otherwise only
packs in that layer participate in the geometric repack.
Also track whether any tip-layer packs were included in the rollup
(`midx_tip_rewritten`), which a subsequent commit will use to decide
how to update the MIDX chain after repacking.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
repack-geometry.c | 35 +++++++++++++++++++++++++++++++++++
repack.h | 4 ++++
2 files changed, 39 insertions(+)
diff --git a/repack-geometry.c b/repack-geometry.c
index 7cebd0cb45f..2408b8a3cc2 100644
--- a/repack-geometry.c
+++ b/repack-geometry.c
@@ -4,6 +4,7 @@
#include "repack.h"
#include "repository.h"
#include "hex.h"
+#include "midx.h"
#include "packfile.h"
static uint32_t pack_geometry_weight(struct packed_git *p)
@@ -31,8 +32,28 @@ void pack_geometry_init(struct pack_geometry *geometry,
{
struct packed_git *p;
struct strbuf buf = STRBUF_INIT;
+ struct multi_pack_index *m = get_multi_pack_index(existing->source);
repo_for_each_pack(existing->repo, p) {
+ if (geometry->midx_layer_threshold_set && m &&
+ p->multi_pack_index) {
+ /*
+ * When writing MIDX layers incrementally,
+ * ignore packs unless they are in the most
+ * recent MIDX layer *and* there are at least
+ * 'midx_layer_threshold' packs in that layer.
+ *
+ * Otherwise 'p' is either in an older layer, or
+ * the youngest layer does not have enough packs
+ * to consider its packs as candidates for
+ * repacking. In either of those cases we want
+ * to ignore the pack.
+ */
+ if (m->num_packs < geometry->midx_layer_threshold ||
+ !midx_layer_contains_pack(m, pack_basename(p)))
+ continue;
+ }
+
if (args->local && !p->pack_local)
/*
* When asked to only repack local packfiles we skip
@@ -173,6 +194,20 @@ void pack_geometry_split(struct pack_geometry *geometry)
geometry->promisor_split = compute_pack_geometry_split(geometry->promisor_pack,
geometry->promisor_pack_nr,
geometry->split_factor);
+ for (uint32_t i = 0; i < geometry->split; i++) {
+ struct packed_git *p = geometry->pack[i];
+ /*
+ * During incremental MIDX/bitmap repacking, any packs
+ * included in the rollup are either (a) not MIDX'd, or
+ * (b) contained in the tip layer iff it has at least
+ * the threshold number of packs.
+ *
+ * In the latter case, we can safely conclude that the
+ * tip of the MIDX chain will be rewritten.
+ */
+ if (p->multi_pack_index)
+ geometry->midx_tip_rewritten = true;
+ }
}
struct packed_git *pack_geometry_preferred_pack(struct pack_geometry *geometry)
diff --git a/repack.h b/repack.h
index c0e9f0ca647..77d24ee45fb 100644
--- a/repack.h
+++ b/repack.h
@@ -108,6 +108,10 @@ struct pack_geometry {
uint32_t promisor_pack_nr, promisor_pack_alloc;
uint32_t promisor_split;
+ uint32_t midx_layer_threshold;
+ bool midx_layer_threshold_set;
+ bool midx_tip_rewritten;
+
int split_factor;
};
--
2.54.0.175.g8bd0ec98dc3
^ permalink raw reply related
* [PATCH v4 10/16] repack-midx: extract `repack_fill_midx_stdin_packs()`
From: Taylor Blau @ 2026-05-19 15:58 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <cover.1779206239.git.me@ttaylorr.com>
The function `write_midx_included_packs()` manages the lifecycle of
writing packs to stdin when running `git multi-pack-index write` as a
child process.
Extract a standalone `repack_fill_midx_stdin_packs()` helper, which
handles `--stdin-packs` argument setup, starting the command, writing
pack names to its standard input, and finishing the command.
This simplifies `write_midx_included_packs()` and prepares for a
subsequent commit where the same helper is called with `cmd->out = -1`
to capture the MIDX's checksum from the command's standard output,
which is needed when writing MIDX layers with `--no-write-chain-file`.
No functional changes are included in this patch.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
repack-midx.c | 38 ++++++++++++++++++++++++--------------
1 file changed, 24 insertions(+), 14 deletions(-)
diff --git a/repack-midx.c b/repack-midx.c
index 5634dc186d0..3fe83715da4 100644
--- a/repack-midx.c
+++ b/repack-midx.c
@@ -292,23 +292,42 @@ static void repack_prepare_midx_command(struct child_process *cmd,
strvec_push(&cmd->args, "--bitmap");
}
+static int repack_fill_midx_stdin_packs(struct child_process *cmd,
+ struct string_list *include)
+{
+ struct string_list_item *item;
+ FILE *in;
+ int ret;
+
+ cmd->in = -1;
+
+ strvec_push(&cmd->args, "--stdin-packs");
+
+ ret = start_command(cmd);
+ if (ret)
+ return ret;
+
+ in = xfdopen(cmd->in, "w");
+ for_each_string_list_item(item, include)
+ fprintf(in, "%s\n", item->string);
+ fclose(in);
+
+ return finish_command(cmd);
+}
+
int write_midx_included_packs(struct repack_write_midx_opts *opts)
{
struct child_process cmd = CHILD_PROCESS_INIT;
struct string_list include = STRING_LIST_INIT_DUP;
struct string_list_item *item;
struct packed_git *preferred = pack_geometry_preferred_pack(opts->geometry);
- FILE *in;
int ret = 0;
midx_included_packs(&include, opts);
if (!include.nr)
goto done;
- cmd.in = -1;
-
repack_prepare_midx_command(&cmd, opts, "write");
- strvec_push(&cmd.args, "--stdin-packs");
if (preferred)
strvec_pushf(&cmd.args, "--preferred-pack=%s",
@@ -350,16 +369,7 @@ int write_midx_included_packs(struct repack_write_midx_opts *opts)
strvec_pushf(&cmd.args, "--refs-snapshot=%s",
opts->refs_snapshot);
- ret = start_command(&cmd);
- if (ret)
- goto done;
-
- in = xfdopen(cmd.in, "w");
- for_each_string_list_item(item, &include)
- fprintf(in, "%s\n", item->string);
- fclose(in);
-
- ret = finish_command(&cmd);
+ ret = repack_fill_midx_stdin_packs(&cmd, &include);
done:
if (!ret && opts->write_bitmaps)
remove_redundant_bitmaps(&include, opts->packdir);
--
2.54.0.175.g8bd0ec98dc3
^ permalink raw reply related
* [PATCH v4 09/16] repack-midx: factor out `repack_prepare_midx_command()`
From: Taylor Blau @ 2026-05-19 15:58 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <cover.1779206239.git.me@ttaylorr.com>
The `write_midx_included_packs()` function assembles and executes a
`git multi-pack-index write` command, constructing the argument list
inline.
Future commits will introduce additional callers that need to construct
similar `git multi-pack-index` commands (for both `write` and `compact`
subcommands), so extract the common portions of the command setup into a
reusable `repack_prepare_midx_command()` helper.
The extracted helper sets `git_cmd`, pushes `multi-pack-index` and a
subcommand, and handles `--progress`/`--no-progress` and `--bitmap`
flags. The remaining arguments that are specific to the `write`
subcommand (such as `--stdin-packs`) are left to the caller.
No functional changes are included in this patch.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
repack-midx.c | 30 +++++++++++++++++++-----------
1 file changed, 19 insertions(+), 11 deletions(-)
diff --git a/repack-midx.c b/repack-midx.c
index 0682b80c427..5634dc186d0 100644
--- a/repack-midx.c
+++ b/repack-midx.c
@@ -275,6 +275,23 @@ static void remove_redundant_bitmaps(struct string_list *include,
strbuf_release(&path);
}
+static void repack_prepare_midx_command(struct child_process *cmd,
+ struct repack_write_midx_opts *opts,
+ const char *subcommand)
+{
+ cmd->git_cmd = 1;
+
+ strvec_pushl(&cmd->args, "multi-pack-index", subcommand, NULL);
+
+ if (opts->show_progress)
+ strvec_push(&cmd->args, "--progress");
+ else
+ strvec_push(&cmd->args, "--no-progress");
+
+ if (opts->write_bitmaps)
+ strvec_push(&cmd->args, "--bitmap");
+}
+
int write_midx_included_packs(struct repack_write_midx_opts *opts)
{
struct child_process cmd = CHILD_PROCESS_INIT;
@@ -289,18 +306,9 @@ int write_midx_included_packs(struct repack_write_midx_opts *opts)
goto done;
cmd.in = -1;
- cmd.git_cmd = 1;
- strvec_push(&cmd.args, "multi-pack-index");
- strvec_pushl(&cmd.args, "write", "--stdin-packs", NULL);
-
- if (opts->show_progress)
- strvec_push(&cmd.args, "--progress");
- else
- strvec_push(&cmd.args, "--no-progress");
-
- if (opts->write_bitmaps)
- strvec_push(&cmd.args, "--bitmap");
+ repack_prepare_midx_command(&cmd, opts, "write");
+ strvec_push(&cmd.args, "--stdin-packs");
if (preferred)
strvec_pushf(&cmd.args, "--preferred-pack=%s",
--
2.54.0.175.g8bd0ec98dc3
^ permalink raw reply related
* [PATCH v4 08/16] midx: expose `midx_layer_contains_pack()`
From: Taylor Blau @ 2026-05-19 15:58 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <cover.1779206239.git.me@ttaylorr.com>
Rename the function `midx_contains_pack_1()` to instead be called
`midx_layer_contains_pack()` and make it accessible. Unlike
`midx_contains_pack()` (which recurses through the entire chain), this
function checks only a single MIDX layer.
This will be used by a subsequent commit to determine whether a given
pack belongs to the tip MIDX layer specifically, rather than to any
layer in the chain.
No functional changes are present in this commit.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx.c | 6 +++---
midx.h | 2 ++
2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/midx.c b/midx.c
index bcb8c999015..dc86c8e7fee 100644
--- a/midx.c
+++ b/midx.c
@@ -667,8 +667,8 @@ static int midx_pack_names_cmp(const void *a, const void *b, void *m_)
m->pack_names[*(const size_t *)b]);
}
-static int midx_contains_pack_1(struct multi_pack_index *m,
- const char *idx_or_pack_name)
+int midx_layer_contains_pack(struct multi_pack_index *m,
+ const char *idx_or_pack_name)
{
uint32_t first = 0, last = m->num_packs;
@@ -709,7 +709,7 @@ static int midx_contains_pack_1(struct multi_pack_index *m,
int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name)
{
for (; m; m = m->base_midx)
- if (midx_contains_pack_1(m, idx_or_pack_name))
+ if (midx_layer_contains_pack(m, idx_or_pack_name))
return 1;
return 0;
}
diff --git a/midx.h b/midx.h
index 77dd66de02b..3ee12dd08ec 100644
--- a/midx.h
+++ b/midx.h
@@ -119,6 +119,8 @@ struct object_id *nth_midxed_object_oid(struct object_id *oid,
int fill_midx_entry(struct multi_pack_index *m, const struct object_id *oid, struct pack_entry *e);
int midx_contains_pack(struct multi_pack_index *m,
const char *idx_or_pack_name);
+int midx_layer_contains_pack(struct multi_pack_index *m,
+ const char *idx_or_pack_name);
int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id);
int prepare_multi_pack_index_one(struct odb_source *source);
--
2.54.0.175.g8bd0ec98dc3
^ permalink raw reply related
* [PATCH v4 07/16] repack: track the ODB source via existing_packs
From: Taylor Blau @ 2026-05-19 15:57 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <cover.1779206239.git.me@ttaylorr.com>
Store the ODB source in the `existing_packs` struct and use that in
place of the raw `repo->objects->sources` access within `cmd_repack()`.
The source used is still assigned from the first source in the list, so
there are no functional changes in this commit. The changes instead
serve two purposes (one immediate, one not):
- The incremental MIDX-based repacking machinery will need to know what
source is being used to read the existing MIDX/chain (should one
exist).
- In the future, if "git repack" is taught how to operate on other
object sources, this field will serve as the authoritative value for
that source.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/repack.c | 5 ++---
repack.c | 2 ++
repack.h | 1 +
3 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/builtin/repack.c b/builtin/repack.c
index 4c5a82c2c8d..24be147d39a 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -417,7 +417,7 @@ int cmd_repack(int argc,
* midx_has_unknown_packs() will make the decision for
* us.
*/
- if (!get_multi_pack_index(repo->objects->sources))
+ if (!get_multi_pack_index(existing.source))
midx_must_contain_cruft = 1;
}
@@ -564,8 +564,7 @@ int cmd_repack(int argc,
unsigned flags = 0;
if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL, 0))
flags |= MIDX_WRITE_INCREMENTAL;
- write_midx_file(repo->objects->sources,
- NULL, NULL, flags);
+ write_midx_file(existing.source, NULL, NULL, flags);
}
cleanup:
diff --git a/repack.c b/repack.c
index 596841027af..2ee6b51420a 100644
--- a/repack.c
+++ b/repack.c
@@ -154,6 +154,8 @@ void existing_packs_collect(struct existing_packs *existing,
string_list_append(&existing->non_kept_packs, buf.buf);
}
+ existing->source = existing->repo->objects->sources;
+
string_list_sort(&existing->kept_packs);
string_list_sort(&existing->non_kept_packs);
string_list_sort(&existing->cruft_packs);
diff --git a/repack.h b/repack.h
index bc9f2e1a5de..c0e9f0ca647 100644
--- a/repack.h
+++ b/repack.h
@@ -56,6 +56,7 @@ struct packed_git;
struct existing_packs {
struct repository *repo;
+ struct odb_source *source;
struct string_list kept_packs;
struct string_list non_kept_packs;
struct string_list cruft_packs;
--
2.54.0.175.g8bd0ec98dc3
^ permalink raw reply related
* [PATCH v4 06/16] midx: support custom `--base` for incremental MIDX writes
From: Taylor Blau @ 2026-05-19 15:57 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <cover.1779206239.git.me@ttaylorr.com>
Both `compact` and `write --incremental` fix the base of the resulting
MIDX layer: `compact` always places the compacted result on top of
"from's" immediate parent in the chain, and `write --incremental` always
appends a new layer to the existing tip. In both cases the base is not
configurable.
Future callers need additional flexibility. For instance, the incremental
MIDX-based repacking code may wish to write a layer based on some
intermediate ancestor rather than the current tip, or produce a root
layer when replacing the bottommost entries in the chain.
Introduce a new `--base` option for both subcommands to specify the
checksum of the MIDX layer to use as the base. The given checksum must
refer to a valid layer in the MIDX chain that is an ancestor of the
topmost layer being written or compacted.
The special value "none" is accepted to produce a root layer with no
parent. This will be needed when the incremental repacking machinery
determines that the bottommost layers of the chain should be replaced.
If no `--base` is given, behavior is unchanged: `compact` uses "from's"
immediate parent in the chain, and `write` appends to the existing tip.
For the `write` subcommand, `--base` requires `--no-write-chain-file`. A plain
`write --incremental` appends a new layer to the live chain tip with no
mechanism to atomically replace it; overriding the base would produce a
layer that does not extend the tip, breaking chain invariants. With
`--no-write-chain-file` the chain is left unmodified and the caller is
responsible for assembling a valid chain.
For `compact`, no such restriction applies. The compaction operation
atomically replaces the compacted range in the chain file, so writing
the result on top of any valid ancestor preserves chain invariants.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
Documentation/git-multi-pack-index.adoc | 17 +++++-
builtin/multi-pack-index.c | 24 ++++++--
midx-write.c | 34 ++++++++++-
midx.h | 5 +-
t/t5334-incremental-multi-pack-index.sh | 30 ++++++++++
t/t5335-compact-multi-pack-index.sh | 77 +++++++++++++++++++++++++
6 files changed, 178 insertions(+), 9 deletions(-)
diff --git a/Documentation/git-multi-pack-index.adoc b/Documentation/git-multi-pack-index.adoc
index c26196815e2..c6d23aeeb9a 100644
--- a/Documentation/git-multi-pack-index.adoc
+++ b/Documentation/git-multi-pack-index.adoc
@@ -12,8 +12,10 @@ SYNOPSIS
'git multi-pack-index' [<options>] write [--preferred-pack=<pack>]
[--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]
[--refs-snapshot=<path>] [--[no-]write-chain-file]
+ [--base=<checksum>]
'git multi-pack-index' [<options>] compact [--[no-]incremental]
- [--[no-]bitmap] [--[no-]write-chain-file] <from> <to>
+ [--[no-]bitmap] [--base=<checksum>] [--[no-]write-chain-file]
+ <from> <to>
'git multi-pack-index' [<options>] verify
'git multi-pack-index' [<options>] expire
'git multi-pack-index' [<options>] repack [--batch-size=<size>]
@@ -90,6 +92,13 @@ marker).
The checksum of the new layer is printed to standard
output, allowing the caller to assemble and write the
chain itself. Requires `--incremental`.
+
+ --base=<checksum>::
+ Specify the checksum of an existing MIDX layer to use
+ as the base when writing a new incremental layer.
+ The special value `none` indicates that the new layer
+ should have no base (i.e., it becomes a root layer).
+ Requires `--no-write-chain-file`.
--
compact::
@@ -110,6 +119,12 @@ compact::
MIDX layer but do not update the multi-pack-index-chain
file. The checksum of the new layer is printed to
standard output. Requires `--incremental`.
+
+ --base=<checksum>::
+ Specify the checksum of an existing MIDX layer to use
+ as the base for the compacted result, instead of using
+ the immediate parent of `<from>`. The special value
+ `none` indicates that the result should have no base.
--
+
Note that the compact command requires writing a version-2 midx that
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index f861b4b8394..00ffb36394d 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -16,11 +16,13 @@
#define BUILTIN_MIDX_WRITE_USAGE \
N_("git multi-pack-index [<options>] write [--preferred-pack=<pack>]\n" \
" [--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]\n" \
- " [--refs-snapshot=<path>] [--[no-]write-chain-file]")
+ " [--refs-snapshot=<path>] [--[no-]write-chain-file]\n" \
+ " [--base=<checksum>]")
#define BUILTIN_MIDX_COMPACT_USAGE \
N_("git multi-pack-index [<options>] compact [--[no-]incremental]\n" \
- " [--[no-]bitmap] [--[no-]write-chain-file] <from> <to>")
+ " [--[no-]bitmap] [--base=<checksum>] [--[no-]write-chain-file]\n" \
+ " <from> <to>")
#define BUILTIN_MIDX_VERIFY_USAGE \
N_("git multi-pack-index [<options>] verify")
@@ -63,6 +65,7 @@ static char const * const builtin_multi_pack_index_usage[] = {
static struct opts_multi_pack_index {
char *object_dir;
const char *preferred_pack;
+ const char *incremental_base;
char *refs_snapshot;
unsigned long batch_size;
unsigned flags;
@@ -151,6 +154,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
N_("pack for reuse when computing a multi-pack bitmap")),
OPT_BIT(0, "bitmap", &opts.flags, N_("write multi-pack bitmap"),
MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
+ OPT_STRING(0, "base", &opts.incremental_base, N_("checksum"),
+ N_("base MIDX for incremental writes")),
OPT_BIT(0, "incremental", &opts.flags,
N_("write a new incremental MIDX"), MIDX_WRITE_INCREMENTAL),
OPT_NEGBIT(0, "write-chain-file", &opts.flags,
@@ -190,6 +195,13 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
options);
}
+ if (opts.incremental_base &&
+ !(opts.flags & MIDX_WRITE_NO_CHAIN)) {
+ error(_("cannot use --base without --no-write-chain-file"));
+ usage_with_options(builtin_multi_pack_index_write_usage,
+ options);
+ }
+
source = handle_object_dir_option(repo);
FREE_AND_NULL(options);
@@ -201,7 +213,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
ret = write_midx_file_only(source, &packs,
opts.preferred_pack,
- opts.refs_snapshot, opts.flags);
+ opts.refs_snapshot,
+ opts.incremental_base, opts.flags);
string_list_clear(&packs, 0);
free(opts.refs_snapshot);
@@ -229,6 +242,8 @@ static int cmd_multi_pack_index_compact(int argc, const char **argv,
struct option *options;
static struct option builtin_multi_pack_index_compact_options[] = {
+ OPT_STRING(0, "base", &opts.incremental_base, N_("checksum"),
+ N_("base MIDX for incremental writes")),
OPT_BIT(0, "bitmap", &opts.flags, N_("write multi-pack bitmap"),
MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
OPT_BIT(0, "incremental", &opts.flags,
@@ -290,7 +305,8 @@ static int cmd_multi_pack_index_compact(int argc, const char **argv,
die(_("MIDX %s must be an ancestor of %s"), argv[0], argv[1]);
}
- ret = write_midx_file_compact(source, from_midx, to_midx, opts.flags);
+ ret = write_midx_file_compact(source, from_midx, to_midx,
+ opts.incremental_base, opts.flags);
return ret;
}
diff --git a/midx-write.c b/midx-write.c
index 38c898e5ff5..561e9eedc0e 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1247,6 +1247,7 @@ struct write_midx_opts {
const char *preferred_pack_name;
const char *refs_snapshot;
+ const char *incremental_base;
unsigned flags;
};
@@ -1330,11 +1331,32 @@ static int write_midx_internal(struct write_midx_opts *opts)
/*
* If compacting MIDX layer(s) in the range [from, to], then the
- * compacted MIDX will share the same base MIDX as 'from'.
+ * compacted MIDX will share the same base MIDX as 'from',
+ * unless a custom --base is specified (see below).
*/
if (ctx.compact)
ctx.base_midx = ctx.compact_from->base_midx;
+ if (opts->incremental_base) {
+ if (!strcmp(opts->incremental_base, "none")) {
+ ctx.base_midx = NULL;
+ } else {
+ while (ctx.base_midx) {
+ const char *cmp = midx_get_checksum_hex(ctx.base_midx);
+ if (!strcmp(opts->incremental_base, cmp))
+ break;
+
+ ctx.base_midx = ctx.base_midx->base_midx;
+ }
+
+ if (!ctx.base_midx) {
+ error(_("could not find base MIDX '%s'"),
+ opts->incremental_base);
+ goto cleanup;
+ }
+ }
+ }
+
ctx.nr = 0;
ctx.alloc = ctx.m ? ctx.m->num_packs + ctx.m->num_packs_in_base : 16;
ctx.info = NULL;
@@ -1827,7 +1849,8 @@ static int write_midx_internal(struct write_midx_opts *opts)
int write_midx_file(struct odb_source *source,
const char *preferred_pack_name,
- const char *refs_snapshot, unsigned flags)
+ const char *refs_snapshot,
+ unsigned flags)
{
struct write_midx_opts opts = {
.source = source,
@@ -1842,13 +1865,16 @@ int write_midx_file(struct odb_source *source,
int write_midx_file_only(struct odb_source *source,
struct string_list *packs_to_include,
const char *preferred_pack_name,
- const char *refs_snapshot, unsigned flags)
+ const char *refs_snapshot,
+ const char *incremental_base,
+ unsigned flags)
{
struct write_midx_opts opts = {
.source = source,
.packs_to_include = packs_to_include,
.preferred_pack_name = preferred_pack_name,
.refs_snapshot = refs_snapshot,
+ .incremental_base = incremental_base,
.flags = flags,
};
@@ -1858,12 +1884,14 @@ int write_midx_file_only(struct odb_source *source,
int write_midx_file_compact(struct odb_source *source,
struct multi_pack_index *from,
struct multi_pack_index *to,
+ const char *incremental_base,
unsigned flags)
{
struct write_midx_opts opts = {
.source = source,
.compact_from = from,
.compact_to = to,
+ .incremental_base = incremental_base,
.flags = flags | MIDX_WRITE_COMPACT,
};
diff --git a/midx.h b/midx.h
index 5b193882dcf..77dd66de02b 100644
--- a/midx.h
+++ b/midx.h
@@ -132,10 +132,13 @@ int write_midx_file(struct odb_source *source,
int write_midx_file_only(struct odb_source *source,
struct string_list *packs_to_include,
const char *preferred_pack_name,
- const char *refs_snapshot, unsigned flags);
+ const char *refs_snapshot,
+ const char *incremental_base,
+ unsigned flags);
int write_midx_file_compact(struct odb_source *source,
struct multi_pack_index *from,
struct multi_pack_index *to,
+ const char *incremental_base,
unsigned flags);
void clear_midx_file(struct repository *r);
int verify_midx_file(struct odb_source *source, unsigned flags);
diff --git a/t/t5334-incremental-multi-pack-index.sh b/t/t5334-incremental-multi-pack-index.sh
index 66d6894761b..68a103d13d2 100755
--- a/t/t5334-incremental-multi-pack-index.sh
+++ b/t/t5334-incremental-multi-pack-index.sh
@@ -113,6 +113,36 @@ test_expect_success 'write non-incremental MIDX layer with --no-write-chain-file
test_grep "cannot use --no-write-chain-file without --incremental" err
'
+test_expect_success 'write MIDX layer with --base without --no-write-chain-file' '
+ test_must_fail git multi-pack-index write --bitmap --incremental \
+ --base=none 2>err &&
+ test_grep "cannot use --base without --no-write-chain-file" err
+'
+
+test_expect_success 'write MIDX layer with --base=none and --no-write-chain-file' '
+ test_commit base-none &&
+ git repack -d &&
+
+ cp "$midx_chain" "$midx_chain.bak" &&
+ layer="$(git multi-pack-index write --bitmap --incremental \
+ --no-write-chain-file --base=none)" &&
+
+ test_cmp "$midx_chain.bak" "$midx_chain" &&
+ test_path_is_file "$midxdir/multi-pack-index-$layer.midx"
+'
+
+test_expect_success 'write MIDX layer with --base=<hash> and --no-write-chain-file' '
+ test_commit base-hash &&
+ git repack -d &&
+
+ cp "$midx_chain" "$midx_chain.bak" &&
+ layer="$(git multi-pack-index write --bitmap --incremental \
+ --no-write-chain-file --base="$(nth_line 1 "$midx_chain")")" &&
+
+ test_cmp "$midx_chain.bak" "$midx_chain" &&
+ test_path_is_file "$midxdir/multi-pack-index-$layer.midx"
+'
+
for reuse in false single multi
do
test_expect_success "full clone (pack.allowPackReuse=$reuse)" '
diff --git a/t/t5335-compact-multi-pack-index.sh b/t/t5335-compact-multi-pack-index.sh
index 1a65d48b62b..ec1dafe89fc 100755
--- a/t/t5335-compact-multi-pack-index.sh
+++ b/t/t5335-compact-multi-pack-index.sh
@@ -304,6 +304,7 @@ test_expect_success 'MIDX compaction with --no-write-chain-file' '
layer="$(git multi-pack-index compact --incremental \
--no-write-chain-file \
+ --base="$(nth_line 1 "$midx_chain")" \
"$(nth_line 2 "$midx_chain")" \
"$(nth_line 3 "$midx_chain")")" &&
@@ -326,4 +327,80 @@ test_expect_success 'MIDX compaction with --no-write-chain-file' '
)
'
+test_expect_success 'MIDX compaction with --base' '
+ git init midx-compact-with--base &&
+ (
+ cd midx-compact-with--base &&
+
+ git config maintenance.auto false &&
+
+ write_packs A B C D &&
+
+ test_line_count = 4 "$midx_chain" &&
+
+ cp "$midx_chain" "$midx_chain.bak" &&
+
+ git multi-pack-index compact --incremental \
+ --base="$(nth_line 1 "$midx_chain")" \
+ "$(nth_line 3 "$midx_chain")" \
+ "$(nth_line 4 "$midx_chain")" &&
+ test_line_count = 2 $midx_chain &&
+
+ nth_line 1 "$midx_chain.bak" >expect &&
+ nth_line 1 "$midx_chain" >actual &&
+
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'MIDX compaction with --base=none' '
+ git init midx-compact-base-none &&
+ (
+ cd midx-compact-base-none &&
+
+ git config maintenance.auto false &&
+
+ write_packs A B C D &&
+
+ test_line_count = 4 $midx_chain &&
+
+ cp "$midx_chain" "$midx_chain".bak &&
+
+ # Compact the two bottommost layers (A and B) into a new
+ # root layer with no parent.
+ git multi-pack-index compact --incremental \
+ --base=none \
+ "$(nth_line 1 "$midx_chain")" \
+ "$(nth_line 2 "$midx_chain")" &&
+
+ test_line_count = 3 $midx_chain &&
+
+ # The upper layers (C and D) should be preserved
+ # unchanged.
+ nth_line 3 "$midx_chain.bak" >expect &&
+ nth_line 4 "$midx_chain.bak" >>expect &&
+ nth_line 2 "$midx_chain" >actual &&
+ nth_line 3 "$midx_chain" >>actual &&
+
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'MIDX compaction with bogus --base checksum' '
+ git init midx-compact-bogus-base &&
+ (
+ cd midx-compact-bogus-base &&
+
+ git config maintenance.auto false &&
+
+ write_packs A B C &&
+
+ test_must_fail git multi-pack-index compact --incremental \
+ --base=deadbeef \
+ "$(nth_line 2 "$midx_chain")" \
+ "$(nth_line 3 "$midx_chain")" 2>err &&
+ test_grep "could not find base MIDX" err
+ )
+'
+
test_done
--
2.54.0.175.g8bd0ec98dc3
^ permalink raw reply related
* [PATCH v4 05/16] midx: introduce `--no-write-chain-file` for incremental MIDX writes
From: Taylor Blau @ 2026-05-19 15:57 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <cover.1779206239.git.me@ttaylorr.com>
When writing an incremental MIDX layer, the MIDX machinery writes the
new layer into the multi-pack-index.d directory and then updates the
multi-pack-index-chain file to include the freshly written layer.
Future callers however may not wish to immediately update the MIDX chain
itself, preferring instead to write out new layer(s) themselves before
atomically updating the chain. Concretely, the new incremental
MIDX-based repacking strategy will want to do exactly this (that is,
assemble the new MIDX chain itself before writing a new chain file and
atomically linking it into place).
Introduce a `--no-write-chain-file` flag that:
* writes the new MIDX layer into the multi-pack-index.d directory
* prints its checksum
* does not update the multi-pack-index-chain file.
The MIDX chain file (and thus, the lock protecting it) remain untouched,
allowing callers to assemble the chain themselves. This flag requires
`--incremental`, since the notion of a separate layer only makes sense
for incremental MIDXs.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
Documentation/git-multi-pack-index.adoc | 17 ++++++++--
builtin/multi-pack-index.c | 28 +++++++++++++++--
midx-write.c | 42 ++++++++++++++++---------
midx.h | 1 +
t/t5334-incremental-multi-pack-index.sh | 17 ++++++++++
t/t5335-compact-multi-pack-index.sh | 36 +++++++++++++++++++++
6 files changed, 123 insertions(+), 18 deletions(-)
diff --git a/Documentation/git-multi-pack-index.adoc b/Documentation/git-multi-pack-index.adoc
index 3a5aa227784..c26196815e2 100644
--- a/Documentation/git-multi-pack-index.adoc
+++ b/Documentation/git-multi-pack-index.adoc
@@ -11,9 +11,9 @@ SYNOPSIS
[verse]
'git multi-pack-index' [<options>] write [--preferred-pack=<pack>]
[--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]
- [--refs-snapshot=<path>]
+ [--refs-snapshot=<path>] [--[no-]write-chain-file]
'git multi-pack-index' [<options>] compact [--[no-]incremental]
- [--[no-]bitmap] <from> <to>
+ [--[no-]bitmap] [--[no-]write-chain-file] <from> <to>
'git multi-pack-index' [<options>] verify
'git multi-pack-index' [<options>] expire
'git multi-pack-index' [<options>] repack [--batch-size=<size>]
@@ -83,6 +83,13 @@ marker).
and packs not present in an existing MIDX layer.
Migrates non-incremental MIDXs to incremental ones when
necessary.
+
+ --[no-]write-chain-file::
+ When used with `--incremental`, write a new MIDX layer
+ but do not update the multi-pack-index-chain file.
+ The checksum of the new layer is printed to standard
+ output, allowing the caller to assemble and write the
+ chain itself. Requires `--incremental`.
--
compact::
@@ -97,6 +104,12 @@ compact::
--[no-]bitmap::
Control whether or not a multi-pack bitmap is written.
+
+ --[no-]write-chain-file::
+ When used with `--incremental`, write a new compacted
+ MIDX layer but do not update the multi-pack-index-chain
+ file. The checksum of the new layer is printed to
+ standard output. Requires `--incremental`.
--
+
Note that the compact command requires writing a version-2 midx that
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 0f72d96c02d..f861b4b8394 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -16,11 +16,11 @@
#define BUILTIN_MIDX_WRITE_USAGE \
N_("git multi-pack-index [<options>] write [--preferred-pack=<pack>]\n" \
" [--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]\n" \
- " [--refs-snapshot=<path>]")
+ " [--refs-snapshot=<path>] [--[no-]write-chain-file]")
#define BUILTIN_MIDX_COMPACT_USAGE \
N_("git multi-pack-index [<options>] compact [--[no-]incremental]\n" \
- " [--[no-]bitmap] <from> <to>")
+ " [--[no-]bitmap] [--[no-]write-chain-file] <from> <to>")
#define BUILTIN_MIDX_VERIFY_USAGE \
N_("git multi-pack-index [<options>] verify")
@@ -153,6 +153,9 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
OPT_BIT(0, "incremental", &opts.flags,
N_("write a new incremental MIDX"), MIDX_WRITE_INCREMENTAL),
+ OPT_NEGBIT(0, "write-chain-file", &opts.flags,
+ N_("write the multi-pack-index chain file"),
+ MIDX_WRITE_NO_CHAIN),
OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
N_("write multi-pack index containing only given indexes")),
OPT_FILENAME(0, "refs-snapshot", &opts.refs_snapshot,
@@ -178,6 +181,15 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
if (argc)
usage_with_options(builtin_multi_pack_index_write_usage,
options);
+
+ if (opts.flags & MIDX_WRITE_NO_CHAIN &&
+ !(opts.flags & MIDX_WRITE_INCREMENTAL)) {
+ error(_("cannot use %s without %s"),
+ "--no-write-chain-file", "--incremental");
+ usage_with_options(builtin_multi_pack_index_write_usage,
+ options);
+ }
+
source = handle_object_dir_option(repo);
FREE_AND_NULL(options);
@@ -221,6 +233,9 @@ static int cmd_multi_pack_index_compact(int argc, const char **argv,
MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
OPT_BIT(0, "incremental", &opts.flags,
N_("write a new incremental MIDX"), MIDX_WRITE_INCREMENTAL),
+ OPT_NEGBIT(0, "write-chain-file", &opts.flags,
+ N_("write the multi-pack-index chain file"),
+ MIDX_WRITE_NO_CHAIN),
OPT_END(),
};
@@ -239,6 +254,15 @@ static int cmd_multi_pack_index_compact(int argc, const char **argv,
if (argc != 2)
usage_with_options(builtin_multi_pack_index_compact_usage,
options);
+
+ if (opts.flags & MIDX_WRITE_NO_CHAIN &&
+ !(opts.flags & MIDX_WRITE_INCREMENTAL)) {
+ error(_("cannot use %s without %s"),
+ "--no-write-chain-file", "--incremental");
+ usage_with_options(builtin_multi_pack_index_compact_usage,
+ options);
+ }
+
source = handle_object_dir_option(the_repository);
FREE_AND_NULL(options);
diff --git a/midx-write.c b/midx-write.c
index 5d9409a9741..38c898e5ff5 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1257,7 +1257,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
unsigned char midx_hash[GIT_MAX_RAWSZ];
uint32_t start_pack;
struct hashfile *f = NULL;
- struct lock_file lk;
+ struct lock_file lk = LOCK_INIT;
struct tempfile *incr;
struct write_midx_context ctx = {
.preferred_pack_idx = NO_PREFERRED_PACK,
@@ -1601,11 +1601,14 @@ static int write_midx_internal(struct write_midx_opts *opts)
}
if (ctx.incremental) {
- struct strbuf lock_name = STRBUF_INIT;
+ if (!(opts->flags & MIDX_WRITE_NO_CHAIN)) {
+ struct strbuf lock_name = STRBUF_INIT;
- get_midx_chain_filename(opts->source, &lock_name);
- hold_lock_file_for_update(&lk, lock_name.buf, LOCK_DIE_ON_ERROR);
- strbuf_release(&lock_name);
+ get_midx_chain_filename(opts->source, &lock_name);
+ hold_lock_file_for_update(&lk, lock_name.buf,
+ LOCK_DIE_ON_ERROR);
+ strbuf_release(&lock_name);
+ }
incr = mks_tempfile_m(midx_name.buf, 0444);
if (!incr) {
@@ -1707,16 +1710,23 @@ static int write_midx_internal(struct write_midx_opts *opts)
if (ctx.num_multi_pack_indexes_before == UINT32_MAX)
die(_("too many multi-pack-indexes"));
+ if (!is_lock_file_locked(&lk))
+ printf("%s\n", hash_to_hex_algop(midx_hash, r->hash_algo));
+ else if (opts->flags & MIDX_WRITE_NO_CHAIN)
+ BUG("lockfile held with MIDX_WRITE_NO_CHAIN set?");
+
if (ctx.incremental) {
- FILE *chainf = fdopen_lock_file(&lk, "w");
struct strbuf final_midx_name = STRBUF_INIT;
struct multi_pack_index *m = ctx.base_midx;
struct multi_pack_index **layers = NULL;
size_t layers_nr = 0, layers_alloc = 0;
- if (!chainf) {
- error_errno(_("unable to open multi-pack-index chain file"));
- goto cleanup;
+ if (is_lock_file_locked(&lk)){
+ FILE *chainf = fdopen_lock_file(&lk, "w");
+ if (!chainf) {
+ error_errno(_("unable to open multi-pack-index chain file"));
+ goto cleanup;
+ }
}
if (link_midx_to_chain(ctx.base_midx) < 0)
@@ -1773,8 +1783,10 @@ static int write_midx_internal(struct write_midx_opts *opts)
free(layers);
- for (size_t i = 0; i < keep_hashes.nr; i++)
- fprintf(get_lock_file_fp(&lk), "%s\n", keep_hashes.v[i]);
+ if (is_lock_file_locked(&lk))
+ for (size_t i = 0; i < keep_hashes.nr; i++)
+ fprintf(get_lock_file_fp(&lk), "%s\n",
+ keep_hashes.v[i]);
} else {
strvec_push(&keep_hashes,
hash_to_hex_algop(midx_hash, r->hash_algo));
@@ -1783,10 +1795,12 @@ static int write_midx_internal(struct write_midx_opts *opts)
if (ctx.m || ctx.base_midx)
odb_close(ctx.repo->objects);
- if (commit_lock_file(&lk) < 0)
- die_errno(_("could not write multi-pack-index"));
+ if (is_lock_file_locked(&lk)) {
+ if (commit_lock_file(&lk) < 0)
+ die_errno(_("could not write multi-pack-index"));
- clear_midx_files(opts->source, &keep_hashes, ctx.incremental);
+ clear_midx_files(opts->source, &keep_hashes, ctx.incremental);
+ }
result = 0;
cleanup:
diff --git a/midx.h b/midx.h
index 08f3728e520..5b193882dcf 100644
--- a/midx.h
+++ b/midx.h
@@ -83,6 +83,7 @@ struct multi_pack_index {
#define MIDX_WRITE_BITMAP_LOOKUP_TABLE (1 << 4)
#define MIDX_WRITE_INCREMENTAL (1 << 5)
#define MIDX_WRITE_COMPACT (1 << 6)
+#define MIDX_WRITE_NO_CHAIN (1 << 7)
#define MIDX_EXT_REV "rev"
#define MIDX_EXT_BITMAP "bitmap"
diff --git a/t/t5334-incremental-multi-pack-index.sh b/t/t5334-incremental-multi-pack-index.sh
index c9f5b4e87aa..66d6894761b 100755
--- a/t/t5334-incremental-multi-pack-index.sh
+++ b/t/t5334-incremental-multi-pack-index.sh
@@ -96,6 +96,23 @@ test_expect_success 'show object from second pack' '
git cat-file -p 2.2
'
+test_expect_success 'write MIDX layer with --no-write-chain-file' '
+ test_commit no-write-chain-file &&
+ git repack -d &&
+
+ cp "$midx_chain" "$midx_chain.bak" &&
+ layer="$(git multi-pack-index write --bitmap --incremental \
+ --no-write-chain-file)" &&
+
+ test_cmp "$midx_chain.bak" "$midx_chain" &&
+ test_path_is_file "$midxdir/multi-pack-index-$layer.midx"
+'
+
+test_expect_success 'write non-incremental MIDX layer with --no-write-chain-file' '
+ test_must_fail git multi-pack-index write --bitmap --no-write-chain-file 2>err &&
+ test_grep "cannot use --no-write-chain-file without --incremental" err
+'
+
for reuse in false single multi
do
test_expect_success "full clone (pack.allowPackReuse=$reuse)" '
diff --git a/t/t5335-compact-multi-pack-index.sh b/t/t5335-compact-multi-pack-index.sh
index 40f3844282f..1a65d48b62b 100755
--- a/t/t5335-compact-multi-pack-index.sh
+++ b/t/t5335-compact-multi-pack-index.sh
@@ -290,4 +290,40 @@ test_expect_success 'MIDX compaction with bitmaps (non-trivial)' '
)
'
+test_expect_success 'MIDX compaction with --no-write-chain-file' '
+ git init midx-compact-with--no-write-chain-file &&
+ (
+ cd midx-compact-with--no-write-chain-file &&
+
+ git config maintenance.auto false &&
+
+ write_packs A B C D &&
+
+ test_line_count = 4 $midx_chain &&
+ cp "$midx_chain" "$midx_chain".bak &&
+
+ layer="$(git multi-pack-index compact --incremental \
+ --no-write-chain-file \
+ "$(nth_line 2 "$midx_chain")" \
+ "$(nth_line 3 "$midx_chain")")" &&
+
+ test_cmp "$midx_chain.bak" "$midx_chain" &&
+
+ # After writing the new layer, insert it into the chain
+ # manually. This is done in order to make $layer visible
+ # to the read-midx test helper below, and matches what
+ # the MIDX command would do without --no-write-chain-file.
+ {
+ nth_line 1 "$midx_chain.bak" &&
+ echo $layer &&
+ nth_line 4 "$midx_chain.bak"
+ } >$midx_chain &&
+
+ test-tool read-midx $objdir $layer >midx.data &&
+ grep "^pack-B-.*\.idx" midx.data &&
+ grep "^pack-C-.*\.idx" midx.data
+
+ )
+'
+
test_done
--
2.54.0.175.g8bd0ec98dc3
^ permalink raw reply related
* [PATCH v4 04/16] midx: use `strvec` for `keep_hashes`
From: Taylor Blau @ 2026-05-19 15:57 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <cover.1779206239.git.me@ttaylorr.com>
The `keep_hashes` array in `write_midx_internal()` accumulates the
checksums of MIDX files that should be retained when pruning stale
entries from the MIDX chain. For similar reasons as in a previous
commit, rewrite this using a strvec, requiring us to pass one fewer
parameter.
Unlike the aforementioned previous commit, use a `strvec` instead of a
`string_list`, which provides a more ergonomic interface to adjust the
values at a particular index. The ordering is important here, as this
value is used to determine the contents of the resulting
`multi-pack-index-chain` file when writing with "--incremental".
Since the previous commit already builds the array in forward order, the
conversion is straightforward: replace indexed assignments with
`strvec_push()`, drop the pre-counting and `CALLOC_ARRAY()`, and
simplify cleanup via `strvec_clear()`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx-write.c | 84 ++++++++++++++++++----------------------------------
midx.c | 20 ++++++-------
2 files changed, 38 insertions(+), 66 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index 55c778a97cb..5d9409a9741 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -29,8 +29,7 @@ extern void clear_midx_files_ext(struct odb_source *source, const char *ext,
const char *keep_hash);
extern void clear_incremental_midx_files_ext(struct odb_source *source,
const char *ext,
- const char **keep_hashes,
- uint32_t hashes_nr);
+ const struct strvec *keep_hashes);
extern int cmp_idx_or_pack_name(const char *idx_or_pack_name,
const char *idx_name);
@@ -1109,8 +1108,7 @@ static int link_midx_to_chain(struct multi_pack_index *m)
}
static void clear_midx_files(struct odb_source *source,
- const char **hashes, uint32_t hashes_nr,
- unsigned incremental)
+ const struct strvec *hashes, unsigned incremental)
{
/*
* if incremental:
@@ -1124,13 +1122,15 @@ static void clear_midx_files(struct odb_source *source,
*/
struct strbuf buf = STRBUF_INIT;
const char *exts[] = { MIDX_EXT_BITMAP, MIDX_EXT_REV, MIDX_EXT_MIDX };
- uint32_t i, j;
+ uint32_t i;
for (i = 0; i < ARRAY_SIZE(exts); i++) {
- clear_incremental_midx_files_ext(source, exts[i],
- hashes, hashes_nr);
- for (j = 0; j < hashes_nr; j++)
- clear_midx_files_ext(source, exts[i], hashes[j]);
+ clear_incremental_midx_files_ext(source, exts[i], hashes);
+ if (hashes) {
+ for (size_t j = 0; j < hashes->nr; j++)
+ clear_midx_files_ext(source, exts[i],
+ hashes->v[j]);
+ }
}
if (incremental)
@@ -1267,8 +1267,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
int pack_name_concat_len = 0;
int dropped_packs = 0;
int result = -1;
- const char **keep_hashes = NULL;
- size_t keep_hashes_nr = 0;
+ struct strvec keep_hashes = STRVEC_INIT;
struct chunkfile *cf;
trace2_region_enter("midx", "write_midx_internal", r);
@@ -1708,32 +1707,12 @@ static int write_midx_internal(struct write_midx_opts *opts)
if (ctx.num_multi_pack_indexes_before == UINT32_MAX)
die(_("too many multi-pack-indexes"));
- if (ctx.compact) {
- struct multi_pack_index *m;
-
- /*
- * Keep all MIDX layers excluding those in the range [from, to].
- */
- for (m = ctx.base_midx; m; m = m->base_midx)
- keep_hashes_nr++;
- for (m = ctx.m;
- m && midx_hashcmp(m, ctx.compact_to, r->hash_algo);
- m = m->base_midx)
- keep_hashes_nr++;
-
- keep_hashes_nr++; /* include the compacted layer */
- } else {
- keep_hashes_nr = ctx.num_multi_pack_indexes_before + 1;
- }
- CALLOC_ARRAY(keep_hashes, keep_hashes_nr);
-
if (ctx.incremental) {
FILE *chainf = fdopen_lock_file(&lk, "w");
struct strbuf final_midx_name = STRBUF_INIT;
struct multi_pack_index *m = ctx.base_midx;
struct multi_pack_index **layers = NULL;
size_t layers_nr = 0, layers_alloc = 0;
- size_t j = 0;
if (!chainf) {
error_errno(_("unable to open multi-pack-index chain file"));
@@ -1761,12 +1740,12 @@ static int write_midx_internal(struct write_midx_opts *opts)
layers[layers_nr++] = mp;
}
while (layers_nr)
- keep_hashes[j++] =
- xstrdup(midx_get_checksum_hex(layers[--layers_nr]));
+ strvec_push(&keep_hashes,
+ midx_get_checksum_hex(layers[--layers_nr]));
- keep_hashes[j++] =
- xstrdup(hash_to_hex_algop(midx_hash,
- r->hash_algo));
+ strvec_push(&keep_hashes,
+ hash_to_hex_algop(midx_hash,
+ r->hash_algo));
for (mp = ctx.m;
mp && midx_hashcmp(mp, ctx.compact_to,
@@ -1776,31 +1755,29 @@ static int write_midx_internal(struct write_midx_opts *opts)
layers[layers_nr++] = mp;
}
while (layers_nr)
- keep_hashes[j++] =
- xstrdup(midx_get_checksum_hex(layers[--layers_nr]));
+ strvec_push(&keep_hashes,
+ midx_get_checksum_hex(layers[--layers_nr]));
} else {
for (; m; m = m->base_midx) {
ALLOC_GROW(layers, layers_nr + 1, layers_alloc);
layers[layers_nr++] = m;
}
while (layers_nr)
- keep_hashes[j++] =
- xstrdup(midx_get_checksum_hex(layers[--layers_nr]));
+ strvec_push(&keep_hashes,
+ midx_get_checksum_hex(layers[--layers_nr]));
- keep_hashes[j++] =
- xstrdup(hash_to_hex_algop(midx_hash,
- r->hash_algo));
+ strvec_push(&keep_hashes,
+ hash_to_hex_algop(midx_hash,
+ r->hash_algo));
}
- ASSERT(j == keep_hashes_nr);
-
free(layers);
- for (uint32_t i = 0; i < j; i++)
- fprintf(get_lock_file_fp(&lk), "%s\n", keep_hashes[i]);
+ for (size_t i = 0; i < keep_hashes.nr; i++)
+ fprintf(get_lock_file_fp(&lk), "%s\n", keep_hashes.v[i]);
} else {
- keep_hashes[ctx.num_multi_pack_indexes_before] =
- xstrdup(hash_to_hex_algop(midx_hash, r->hash_algo));
+ strvec_push(&keep_hashes,
+ hash_to_hex_algop(midx_hash, r->hash_algo));
}
if (ctx.m || ctx.base_midx)
@@ -1809,8 +1786,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
if (commit_lock_file(&lk) < 0)
die_errno(_("could not write multi-pack-index"));
- clear_midx_files(opts->source, keep_hashes, keep_hashes_nr,
- ctx.incremental);
+ clear_midx_files(opts->source, &keep_hashes, ctx.incremental);
result = 0;
cleanup:
@@ -1826,11 +1802,7 @@ static int write_midx_internal(struct write_midx_opts *opts)
free(ctx.entries);
free(ctx.pack_perm);
free(ctx.pack_order);
- if (keep_hashes) {
- for (uint32_t i = 0; i < keep_hashes_nr; i++)
- free((char *)keep_hashes[i]);
- free(keep_hashes);
- }
+ strvec_clear(&keep_hashes);
strbuf_release(&midx_name);
close_midx(midx_to_free);
diff --git a/midx.c b/midx.c
index f75e3c9fa6d..bcb8c999015 100644
--- a/midx.c
+++ b/midx.c
@@ -12,6 +12,7 @@
#include "chunk-format.h"
#include "pack-bitmap.h"
#include "pack-revindex.h"
+#include "strvec.h"
#define MIDX_PACK_ERROR ((void *)(intptr_t)-1)
@@ -19,8 +20,7 @@ int midx_checksum_valid(struct multi_pack_index *m);
void clear_midx_files_ext(struct odb_source *source, const char *ext,
const char *keep_hash);
void clear_incremental_midx_files_ext(struct odb_source *source, const char *ext,
- char **keep_hashes,
- uint32_t hashes_nr);
+ const struct strvec *keep_hashes);
int cmp_idx_or_pack_name(const char *idx_or_pack_name,
const char *idx_name);
@@ -799,22 +799,22 @@ void clear_midx_files_ext(struct odb_source *source, const char *ext,
}
void clear_incremental_midx_files_ext(struct odb_source *source, const char *ext,
- char **keep_hashes,
- uint32_t hashes_nr)
+ const struct strvec *keep_hashes)
{
struct clear_midx_data data = {
.keep = STRSET_INIT,
.ext = ext,
};
struct strbuf buf = STRBUF_INIT;
- uint32_t i;
- for (i = 0; i < hashes_nr; i++) {
- strbuf_reset(&buf);
- strbuf_addf(&buf, "multi-pack-index-%s.%s", keep_hashes[i],
- ext);
+ if (keep_hashes) {
+ for (size_t i = 0; i < keep_hashes->nr; i++) {
+ strbuf_reset(&buf);
+ strbuf_addf(&buf, "multi-pack-index-%s.%s",
+ keep_hashes->v[i], ext);
- strset_add(&data.keep, buf.buf);
+ strset_add(&data.keep, buf.buf);
+ }
}
for_each_file_in_pack_subdir(source->path, "multi-pack-index.d",
--
2.54.0.175.g8bd0ec98dc3
^ permalink raw reply related
* [PATCH v4 03/16] midx: build `keep_hashes` array in order
From: Taylor Blau @ 2026-05-19 15:57 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <cover.1779206239.git.me@ttaylorr.com>
Instead of filling the keep_hashes array using reverse indexing (e.g.,
`keep_hashes[count - i - 1]`) while traversing linked lists forward,
collect linked list nodes into a temporary `layers` array and then
iterate it backwards to fill `keep_hashes` sequentially.
This makes the filling logic easier to follow, since each segment of the
array is filled with a simple forward-marching index. Moreover, this
change prepares us for a subsequent commit that will switch to using a
`strvec`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx-write.c | 66 ++++++++++++++++++++++++++++------------------------
1 file changed, 36 insertions(+), 30 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index 9328f65a201..55c778a97cb 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1731,6 +1731,9 @@ static int write_midx_internal(struct write_midx_opts *opts)
FILE *chainf = fdopen_lock_file(&lk, "w");
struct strbuf final_midx_name = STRBUF_INIT;
struct multi_pack_index *m = ctx.base_midx;
+ struct multi_pack_index **layers = NULL;
+ size_t layers_nr = 0, layers_alloc = 0;
+ size_t j = 0;
if (!chainf) {
error_errno(_("unable to open multi-pack-index chain file"));
@@ -1751,46 +1754,49 @@ static int write_midx_internal(struct write_midx_opts *opts)
strbuf_release(&final_midx_name);
if (ctx.compact) {
- struct multi_pack_index *m;
- uint32_t num_layers_before_from = 0;
- uint32_t i;
+ struct multi_pack_index *mp;
- for (m = ctx.base_midx; m; m = m->base_midx)
- num_layers_before_from++;
-
- m = ctx.base_midx;
- for (i = 0; i < num_layers_before_from; i++) {
- uint32_t j = num_layers_before_from - i - 1;
-
- keep_hashes[j] = xstrdup(midx_get_checksum_hex(m));
- m = m->base_midx;
+ for (mp = ctx.base_midx; mp; mp = mp->base_midx) {
+ ALLOC_GROW(layers, layers_nr + 1, layers_alloc);
+ layers[layers_nr++] = mp;
}
+ while (layers_nr)
+ keep_hashes[j++] =
+ xstrdup(midx_get_checksum_hex(layers[--layers_nr]));
- keep_hashes[i] = xstrdup(hash_to_hex_algop(midx_hash,
- r->hash_algo));
+ keep_hashes[j++] =
+ xstrdup(hash_to_hex_algop(midx_hash,
+ r->hash_algo));
- i = 0;
- for (m = ctx.m;
- m && midx_hashcmp(m, ctx.compact_to, r->hash_algo);
- m = m->base_midx) {
- keep_hashes[keep_hashes_nr - i - 1] =
- xstrdup(midx_get_checksum_hex(m));
- i++;
+ for (mp = ctx.m;
+ mp && midx_hashcmp(mp, ctx.compact_to,
+ r->hash_algo);
+ mp = mp->base_midx) {
+ ALLOC_GROW(layers, layers_nr + 1, layers_alloc);
+ layers[layers_nr++] = mp;
}
+ while (layers_nr)
+ keep_hashes[j++] =
+ xstrdup(midx_get_checksum_hex(layers[--layers_nr]));
} else {
- keep_hashes[ctx.num_multi_pack_indexes_before] =
+ for (; m; m = m->base_midx) {
+ ALLOC_GROW(layers, layers_nr + 1, layers_alloc);
+ layers[layers_nr++] = m;
+ }
+ while (layers_nr)
+ keep_hashes[j++] =
+ xstrdup(midx_get_checksum_hex(layers[--layers_nr]));
+
+ keep_hashes[j++] =
xstrdup(hash_to_hex_algop(midx_hash,
r->hash_algo));
-
- for (uint32_t i = 0; i < ctx.num_multi_pack_indexes_before; i++) {
- uint32_t j = ctx.num_multi_pack_indexes_before - i - 1;
-
- keep_hashes[j] = xstrdup(midx_get_checksum_hex(m));
- m = m->base_midx;
- }
}
- for (uint32_t i = 0; i < keep_hashes_nr; i++)
+ ASSERT(j == keep_hashes_nr);
+
+ free(layers);
+
+ for (uint32_t i = 0; i < j; i++)
fprintf(get_lock_file_fp(&lk), "%s\n", keep_hashes[i]);
} else {
keep_hashes[ctx.num_multi_pack_indexes_before] =
--
2.54.0.175.g8bd0ec98dc3
^ permalink raw reply related
* [PATCH v4 02/16] midx: use `strset` for retained MIDX files
From: Taylor Blau @ 2026-05-19 15:57 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <cover.1779206239.git.me@ttaylorr.com>
Both `clear_midx_files_ext()` and `clear_incremental_midx_files_ext()`
build a list of filenames to keep while pruning stale MIDX files. Today
they hand-roll an array instead of using a `strset`, thus requiring us
to pass an additional length parameter, and makes lookups linear.
Replace the bare array with a `strset` which can be passed around as a
single parameter. Though it improves lookup performance, the difference
is likely immeasurable given how small the keep_hashes array typically
is.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx.c | 57 +++++++++++++++++++++++++++------------------------------
1 file changed, 27 insertions(+), 30 deletions(-)
diff --git a/midx.c b/midx.c
index 81d6ab11e6e..f75e3c9fa6d 100644
--- a/midx.c
+++ b/midx.c
@@ -758,8 +758,7 @@ int midx_checksum_valid(struct multi_pack_index *m)
}
struct clear_midx_data {
- char **keep;
- uint32_t keep_nr;
+ struct strset keep;
const char *ext;
};
@@ -767,15 +766,12 @@ static void clear_midx_file_ext(const char *full_path, size_t full_path_len UNUS
const char *file_name, void *_data)
{
struct clear_midx_data *data = _data;
- uint32_t i;
if (!(starts_with(file_name, "multi-pack-index-") &&
ends_with(file_name, data->ext)))
return;
- for (i = 0; i < data->keep_nr; i++) {
- if (!strcmp(data->keep[i], file_name))
- return;
- }
+ if (strset_contains(&data->keep, file_name))
+ return;
if (unlink(full_path))
die_errno(_("failed to remove %s"), full_path);
}
@@ -783,48 +779,49 @@ static void clear_midx_file_ext(const char *full_path, size_t full_path_len UNUS
void clear_midx_files_ext(struct odb_source *source, const char *ext,
const char *keep_hash)
{
- struct clear_midx_data data;
- memset(&data, 0, sizeof(struct clear_midx_data));
+ struct clear_midx_data data = {
+ .keep = STRSET_INIT,
+ .ext = ext,
+ };
if (keep_hash) {
- ALLOC_ARRAY(data.keep, 1);
+ struct strbuf buf = STRBUF_INIT;
+ strbuf_addf(&buf, "multi-pack-index-%s.%s", keep_hash, ext);
- data.keep[0] = xstrfmt("multi-pack-index-%s.%s", keep_hash, ext);
- data.keep_nr = 1;
+ strset_add(&data.keep, buf.buf);
+
+ strbuf_release(&buf);
}
- data.ext = ext;
- for_each_file_in_pack_dir(source->path,
- clear_midx_file_ext,
- &data);
+ for_each_file_in_pack_dir(source->path, clear_midx_file_ext, &data);
- if (keep_hash)
- free(data.keep[0]);
- free(data.keep);
+ strset_clear(&data.keep);
}
void clear_incremental_midx_files_ext(struct odb_source *source, const char *ext,
char **keep_hashes,
uint32_t hashes_nr)
{
- struct clear_midx_data data;
+ struct clear_midx_data data = {
+ .keep = STRSET_INIT,
+ .ext = ext,
+ };
+ struct strbuf buf = STRBUF_INIT;
uint32_t i;
- memset(&data, 0, sizeof(struct clear_midx_data));
+ for (i = 0; i < hashes_nr; i++) {
+ strbuf_reset(&buf);
+ strbuf_addf(&buf, "multi-pack-index-%s.%s", keep_hashes[i],
+ ext);
- ALLOC_ARRAY(data.keep, hashes_nr);
- for (i = 0; i < hashes_nr; i++)
- data.keep[i] = xstrfmt("multi-pack-index-%s.%s", keep_hashes[i],
- ext);
- data.keep_nr = hashes_nr;
- data.ext = ext;
+ strset_add(&data.keep, buf.buf);
+ }
for_each_file_in_pack_subdir(source->path, "multi-pack-index.d",
clear_midx_file_ext, &data);
- for (i = 0; i < hashes_nr; i++)
- free(data.keep[i]);
- free(data.keep);
+ strbuf_release(&buf);
+ strset_clear(&data.keep);
}
void clear_midx_file(struct repository *r)
--
2.54.0.175.g8bd0ec98dc3
^ permalink raw reply related
* [PATCH v4 01/16] midx-write: handle noop writes when converting incremental chains
From: Taylor Blau @ 2026-05-19 15:57 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <cover.1779206239.git.me@ttaylorr.com>
When updating a MIDX, we optimize out writes that will result in an
identical MIDX as the one we already have on disk. See b3bab9d2729
(midx-write: extract function to test whether MIDX needs updating,
2025-12-10) for more details on exactly which writes are optimized out.
If `midx_needs_update()` can't rule out any of the obvious cases (e.g.,
the checksum is invalid, we're requesting a different version, or
performing compaction which always requires an update), then we compare
the packs we're writing to the packs we already know about. If there are
an equal number of packs being written as there are in any existing
MIDX layer(s), then we compare the packs by their name.
This comparison fails when we have an incremental MIDX chain with
at least two layers, since we do not recursively peel through earlier
layers, instead treating the `->pack_names` array of the tip MIDX layer
as containing all `m->num_packs + m->num_packs_in_base` packs.
Adjust this to instead look through the MIDX layers one by one when
comparing pack names. While we're at it, fix a typo above in the same
function.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx-write.c | 18 ++++++++++--------
t/t5334-incremental-multi-pack-index.sh | 16 ++++++++++++++++
2 files changed, 26 insertions(+), 8 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index a25cab75aba..9328f65a201 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1152,7 +1152,7 @@ static bool midx_needs_update(struct multi_pack_index *midx, struct write_midx_c
/*
* Ensure that we have a valid checksum before consulting the
- * exisiting MIDX in order to determine if we can avoid an
+ * existing MIDX in order to determine if we can avoid an
* update.
*
* This is necessary because the given MIDX is loaded directly
@@ -1208,14 +1208,16 @@ static bool midx_needs_update(struct multi_pack_index *midx, struct write_midx_c
BUG("same pack added twice?");
}
- for (uint32_t i = 0; i < ctx->nr; i++) {
- strbuf_reset(&buf);
- strbuf_addstr(&buf, midx->pack_names[i]);
- strbuf_strip_suffix(&buf, ".idx");
+ for (struct multi_pack_index *m = midx; m; m = m->base_midx) {
+ for (uint32_t i = 0; i < m->num_packs; i++) {
+ strbuf_reset(&buf);
+ strbuf_addstr(&buf, m->pack_names[i]);
+ strbuf_strip_suffix(&buf, ".idx");
- if (!strset_contains(&packs, buf.buf))
- goto out;
- strset_remove(&packs, buf.buf);
+ if (!strset_contains(&packs, buf.buf))
+ goto out;
+ strset_remove(&packs, buf.buf);
+ }
}
needed = false;
diff --git a/t/t5334-incremental-multi-pack-index.sh b/t/t5334-incremental-multi-pack-index.sh
index 99c7d44d8e9..c9f5b4e87aa 100755
--- a/t/t5334-incremental-multi-pack-index.sh
+++ b/t/t5334-incremental-multi-pack-index.sh
@@ -132,4 +132,20 @@ test_expect_success 'relink existing MIDX layer' '
'
+test_expect_success 'non-incremental write with existing incremental chain' '
+ git init non-incremental-write-with-existing &&
+ test_when_finished "rm -fr non-incremental-write-with-existing" &&
+
+ (
+ cd non-incremental-write-with-existing &&
+
+ git config set maintenance.auto false &&
+
+ write_midx_layer &&
+ write_midx_layer &&
+
+ git multi-pack-index write
+ )
+'
+
test_done
--
2.54.0.175.g8bd0ec98dc3
^ permalink raw reply related
* [PATCH v4 00/16] repack: incremental MIDX/bitmap-based repacking
From: Taylor Blau @ 2026-05-19 15:57 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt
In-Reply-To: <cover.1774820449.git.me@ttaylorr.com>
Here is another small reroll of my series to implement the last
remaining component of the incremental MIDX/bitmap-based repacking
strategy [1].
The changes since v3 are fairly small:
- `repack_prepare_midx_command()` now calls its third parameter
"subcommand" instead of "verb".
- `clear_incremental_midx_files()` now avoids a redundant conditional
around `r->objects` and reuses a local `struct odb_source *` when
closing any open incremental MIDX handles.
As usual, a range-diff is included below for convenience. Thanks in
advance for reviewing!
[1]: https://lore.kernel.org/git/cover.1777507303.git.me@ttaylorr.com/
Taylor Blau (16):
midx-write: handle noop writes when converting incremental chains
midx: use `strset` for retained MIDX files
midx: build `keep_hashes` array in order
midx: use `strvec` for `keep_hashes`
midx: introduce `--no-write-chain-file` for incremental MIDX writes
midx: support custom `--base` for incremental MIDX writes
repack: track the ODB source via existing_packs
midx: expose `midx_layer_contains_pack()`
repack-midx: factor out `repack_prepare_midx_command()`
repack-midx: extract `repack_fill_midx_stdin_packs()`
repack-geometry: prepare for incremental MIDX repacking
builtin/repack.c: convert `--write-midx` to an `OPT_CALLBACK`
packfile: ensure `close_pack_revindex()` frees in-memory revindex
repack: implement incremental MIDX repacking
repack: introduce `--write-midx=incremental`
repack: allow `--write-midx=incremental` without `--geometric`
Documentation/config/repack.adoc | 18 +
Documentation/git-multi-pack-index.adoc | 32 +-
Documentation/git-repack.adoc | 44 +-
builtin/multi-pack-index.c | 48 +-
builtin/repack.c | 102 +++-
midx-write.c | 206 ++++---
midx.c | 102 ++--
midx.h | 11 +-
packfile.c | 2 +
repack-geometry.c | 48 +-
repack-midx.c | 710 +++++++++++++++++++++++-
repack.c | 58 +-
repack.h | 26 +-
t/meson.build | 1 +
t/t5334-incremental-multi-pack-index.sh | 63 +++
t/t5335-compact-multi-pack-index.sh | 113 ++++
t/t7705-repack-incremental-midx.sh | 525 ++++++++++++++++++
17 files changed, 1909 insertions(+), 200 deletions(-)
create mode 100755 t/t7705-repack-incremental-midx.sh
Range-diff against v3:
1: d6c27317c25 = 1: ead11e610c8 midx-write: handle noop writes when converting incremental chains
2: 629c8d23116 = 2: ece55bf2957 midx: use `strset` for retained MIDX files
3: e303bf6a4ac = 3: 5609d1941e6 midx: build `keep_hashes` array in order
4: 42d76c70060 = 4: 13b7c808860 midx: use `strvec` for `keep_hashes`
5: 2c80aa34fac = 5: cac3fd54bf0 midx: introduce `--no-write-chain-file` for incremental MIDX writes
6: 2a05f4b86f3 = 6: 1bbb387d6b6 midx: support custom `--base` for incremental MIDX writes
7: 92aba3d366f = 7: 4a93adb3ad3 repack: track the ODB source via existing_packs
8: d3ac65c1f11 = 8: 8d1b8b1d301 midx: expose `midx_layer_contains_pack()`
9: 1bd2f194c6f ! 9: 42111e5f75d repack-midx: factor out `repack_prepare_midx_command()`
@@ Commit message
subcommands), so extract the common portions of the command setup into a
reusable `repack_prepare_midx_command()` helper.
- The extracted helper sets `git_cmd`, pushes the `multi-pack-index`
- subcommand and verb, and handles `--progress`/`--no-progress` and
- `--bitmap` flags. The remaining arguments that are specific to the
- `write` subcommand (such as `--stdin-packs`) are left to the caller.
+ The extracted helper sets `git_cmd`, pushes `multi-pack-index` and a
+ subcommand, and handles `--progress`/`--no-progress` and `--bitmap`
+ flags. The remaining arguments that are specific to the `write`
+ subcommand (such as `--stdin-packs`) are left to the caller.
No functional changes are included in this patch.
@@ repack-midx.c: static void remove_redundant_bitmaps(struct string_list *include,
+static void repack_prepare_midx_command(struct child_process *cmd,
+ struct repack_write_midx_opts *opts,
-+ const char *verb)
++ const char *subcommand)
+{
+ cmd->git_cmd = 1;
+
-+ strvec_pushl(&cmd->args, "multi-pack-index", verb, NULL);
++ strvec_pushl(&cmd->args, "multi-pack-index", subcommand, NULL);
+
+ if (opts->show_progress)
+ strvec_push(&cmd->args, "--progress");
10: 2a87a1e4561 = 10: ed76e6efd1c repack-midx: extract `repack_fill_midx_stdin_packs()`
11: 3d32b9c88da = 11: 9665f1b3a64 repack-geometry: prepare for incremental MIDX repacking
12: 1f7a5479bb8 = 12: e0db62b9f10 builtin/repack.c: convert `--write-midx` to an `OPT_CALLBACK`
13: b155f25d53c = 13: c8c846b1ac1 packfile: ensure `close_pack_revindex()` frees in-memory revindex
14: ef012314930 = 14: 78f1a98c1ea repack: implement incremental MIDX repacking
15: 04cfecd5136 ! 15: 95bf8b21fff repack: introduce `--write-midx=incremental`
@@ midx.c: void clear_midx_file(struct repository *r)
+void clear_incremental_midx_files(struct repository *r,
+ const struct strvec *keep_hashes)
+{
++ struct odb_source *source = r->objects->sources;
+ struct strbuf chain = STRBUF_INIT;
+
-+ get_midx_chain_filename(r->objects->sources, &chain);
++ get_midx_chain_filename(source, &chain);
+
-+ if (r->objects) {
-+ struct odb_source *source = r->objects->sources;
-+ for (source = r->objects->sources; source; source = source->next) {
-+ struct odb_source_files *files = odb_source_files_downcast(source);
-+ if (files->packed->midx)
-+ close_midx(files->packed->midx);
-+ files->packed->midx = NULL;
-+ }
++ for (; source; source = source->next) {
++ struct odb_source_files *files = odb_source_files_downcast(source);
++ if (files->packed->midx)
++ close_midx(files->packed->midx);
++ files->packed->midx = NULL;
+ }
+
+ if (!keep_hashes && remove_path(chain.buf))
16: 1c05dfce579 = 16: 8bd0ec98dc3 repack: allow `--write-midx=incremental` without `--geometric`
base-commit: 7bcaabddcf68bd0702697da5904c3b68c52f94cf
--
2.54.0.175.g8bd0ec98dc3
^ permalink raw reply
* [PATCH v3 8/8] doc: promisor: improve acceptFromServer entry
From: Christian Couder @ 2026-05-19 15:38 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Patrick Steinhardt, Taylor Blau, Karthik Nayak,
Elijah Newren, Toon Claes, Christian Couder, Christian Couder
In-Reply-To: <20260519153808.494105-1-christian.couder@gmail.com>
The entry for the `promisor.acceptFromServer` in
"Documentation/config/promisor.adoc" has a number of issues:
- it's not clear if new remotes and URLs can be created,
- it looks like a big block of text,
- it's not easy to see all the options,
- it's not easy to see which option is the default one,
- for "knownName", it says "advertised by the client" instead of
"advertised by the server",
- it doesn't refer to the new related `acceptFromServerUrl`
option.
Let's address all these issues by rewording large parts of it
and using bullet points for the different options.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
Documentation/config/promisor.adoc | 53 ++++++++++++++++++++----------
1 file changed, 35 insertions(+), 18 deletions(-)
diff --git a/Documentation/config/promisor.adoc b/Documentation/config/promisor.adoc
index 11b5716294..cc728bb0b5 100644
--- a/Documentation/config/promisor.adoc
+++ b/Documentation/config/promisor.adoc
@@ -32,24 +32,41 @@ variable is set to "true", and the "name" and "url" fields are always
advertised regardless of this setting.
promisor.acceptFromServer::
- If set to "all", a client will accept all the promisor remotes
- a server might advertise using the "promisor-remote"
- capability. If set to "knownName" the client will accept
- promisor remotes which are already configured on the client
- and have the same name as those advertised by the client. This
- is not very secure, but could be used in a corporate setup
- where servers and clients are trusted to not switch name and
- URLs. If set to "knownUrl", the client will accept promisor
- remotes which have both the same name and the same URL
- configured on the client as the name and URL advertised by the
- server. This is more secure than "all" or "knownName", so it
- should be used if possible instead of those options. Default
- is "none", which means no promisor remote advertised by a
- server will be accepted. By accepting a promisor remote, the
- client agrees that the server might omit objects that are
- lazily fetchable from this promisor remote from its responses
- to "fetch" and "clone" requests from the client. Name and URL
- comparisons are case sensitive. See linkgit:gitprotocol-v2[5].
+ Controls which promisor remotes advertised by a server (using the
+ "promisor-remote" protocol capability) a client will accept. By
+ accepting a promisor remote, the client agrees that the server
+ might omit objects that are lazily fetchable from this promisor
+ remote from its responses to "fetch" and "clone" requests.
++
+Note that this option does not cause new remotes to be automatically
+created in the client's configuration. It only allows remotes which
+are somehow already configured to be trusted for the current
+operation, or their fields to be updated (if `promisor.storeFields` is
+set and the remote already exists locally). To allow Git to
+automatically create and persist new remotes from server
+advertisements, use `promisor.acceptFromServerUrl`.
++
+The available options are:
++
+* `none` (default): No promisor remote advertised by a server will be
+ accepted.
++
+* `knownUrl`: The client will accept promisor remotes that are already
+ configured on the client and have both the same name and the same URL
+ as advertised by the server. This is more secure than `all` or
+ `knownName`, and should be used if possible instead of those options.
++
+* `knownName`: The client will accept promisor remotes that are already
+ configured on the client and have the same name as those advertised
+ by the server. This is not very secure, but could be used in a corporate
+ setup where servers and clients are trusted to not switch names and URLs.
++
+* `all`: The client will accept all the promisor remotes a server might
+ advertise. This is the least secure option and should only be used in
+ fully trusted environments.
++
+Name and URL comparisons are case-sensitive. See linkgit:gitprotocol-v2[5]
+for protocol details.
promisor.acceptFromServerUrl::
A glob pattern to specify which server-advertised URLs a
--
2.54.0.134.gbbe8e27878.dirty
^ permalink raw reply related
* [PATCH v3 7/8] promisor-remote: auto-configure unknown remotes
From: Christian Couder @ 2026-05-19 15:38 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Patrick Steinhardt, Taylor Blau, Karthik Nayak,
Elijah Newren, Toon Claes, Christian Couder, Christian Couder
In-Reply-To: <20260519153808.494105-1-christian.couder@gmail.com>
Previous commits have introduced the `promisor.acceptFromServerUrl`
config variable to allowlist some URLs advertised by a server through
the "promisor-remote" protocol capability.
However the new `promisor.acceptFromServerUrl` mechanism, like the old
`promisor.acceptFromServer` mechanism, still requires a remote to
already exist in the client's local configuration before it can be
accepted. This places a significant manual burden on users to
pre-configure these remotes, and creates friction for administrators
who have to troubleshoot or manually provision these setups for their
teams.
To eliminate this burden, let's automatically create a new `[remote]`
section in the client's config when a server advertises an unknown
remote whose URL matches a `promisor.acceptFromServerUrl` glob pattern.
Concretely, let's add four helpers:
- sanitize_remote_name(): turn an arbitrary URL-derived string into a
valid remote name by replacing non-alphanumeric characters,
collapsing runs of '-', and prepending "promisor-auto-".
- promisor_remote_name_from_url(): normalize the URL and extract
host+port+path to build a human-readable base name, then pass it
through sanitize_remote_name().
- configure_auto_promisor_remote(): write the remote.*.url,
remote.*.promisor and remote.*.advertisedAs keys to the repo
config.
- handle_matching_allowed_url(): pick the final name (user-supplied
alias or auto-generated), handle collisions by appending "-1",
"-2", etc., then call configure_auto_promisor_remote().
Let's also add should_accept_new_remote_url() which reuses the
url_matches_accept_list() helper introduced in a previous commit to
find a matching pattern, then delegates to handle_matching_allowed_url()
to create the remote.
And then let's call should_accept_new_remote_url() from the '!item'
(unknown remote) branch of should_accept_remote(), setting
`reload_config` so that the newly-written config is picked up.
Finally let's document all that by:
- expanding the `promisor.acceptFromServerUrl` entry to describe
auto-creation, the optional "name=" prefix syntax, the
"promisor-auto-*" generation rules, and numeric-suffix collision
handling, and by
- adding a "remote.<name>.advertisedAs" entry to "remote.adoc".
Also let's extend the precedence paragraph added by a previous commit
to mention this new acceptance path: until now, the only way for
`promisor.acceptFromServerUrl` to trigger acceptance was to allow
field updates for a known remote. With this commit, it can also trigger
auto-creation of a previously-unknown remote whose advertised URL
matches the allowlist.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
Documentation/config/promisor.adoc | 39 +++--
Documentation/config/remote.adoc | 9 ++
promisor-remote.c | 201 +++++++++++++++++++++++++-
t/t5710-promisor-remote-capability.sh | 104 +++++++++++++
4 files changed, 340 insertions(+), 13 deletions(-)
diff --git a/Documentation/config/promisor.adoc b/Documentation/config/promisor.adoc
index 4a1ecb4425..11b5716294 100644
--- a/Documentation/config/promisor.adoc
+++ b/Documentation/config/promisor.adoc
@@ -54,7 +54,8 @@ promisor.acceptFromServer::
promisor.acceptFromServerUrl::
A glob pattern to specify which server-advertised URLs a
client is allowed to act on. When a URL matches, the client
- will accept the advertised remote as a promisor remote and may
+ will accept the advertised remote as a promisor remote, may
+ automatically create a new remote configuration for it and may
automatically accept field updates (such as authentication
tokens) from the server, even if `promisor.acceptFromServer`
is set to `none` (the default).
@@ -65,12 +66,13 @@ this option in _ANY_ config file read by Git.
+
When both `promisor.acceptFromServer` and `promisor.acceptFromServerUrl`
are set, `promisor.acceptFromServerUrl` is consulted first and takes
-precedence: if a matching pattern leads to acceptance (by accepting
-field updates for a known remote whose URL matches both the local
-configuration and the allowlist), the advertised remote is accepted
-regardless of the `promisor.acceptFromServer` setting. If no pattern
-in `promisor.acceptFromServerUrl` triggers acceptance, the decision
-is left to `promisor.acceptFromServer`.
+precedence: if a matching pattern leads to acceptance (either by
+auto-configuring an unknown remote or by accepting field updates for
+a known remote whose URL matches both the local configuration and the
+allowlist), the advertised remote is accepted regardless of the
+`promisor.acceptFromServer` setting. If no pattern in
+`promisor.acceptFromServerUrl` triggers acceptance, the decision is
+left to `promisor.acceptFromServer`.
+
Note however that, even when an advertised URL matches a pattern in
`promisor.acceptFromServerUrl`, an already-existing remote on the
@@ -85,9 +87,10 @@ documentation of that option.)
Be _VERY_ careful with these patterns: `*` matches any sequence of
characters within the 'host' and 'path' parts of a URL (but cannot
cross part boundaries). An overly broad pattern is a major security
-risk, as a matching URL allows a server to update fields (such as
-authentication tokens) on known remotes without further confirmation.
-To minimize security risks, follow these guidelines:
+risk, as a matching URL allows a server to auto-configure new remotes
+and to update fields (such as authentication tokens) on known remotes
+without further confirmation. To minimize security risks, follow these
+guidelines:
+
1. Start with a secure protocol scheme, like `https://` or `ssh://`.
+
@@ -121,6 +124,22 @@ ignored during matching. Note that embedding credentials in URLs is
discouraged. Passing authentication tokens via the `token` field of
the `promisor-remote` capability is strongly preferred.
+
+The glob pattern can optionally be prefixed with a remote name and an
+equals sign (e.g., `cdn=https://cdn.example.com/*`). If such a prefix
+is provided, accepted remotes will be saved under that name. If no
+such prefix is provided, a safe remote name will be automatically
+generated by sanitizing the URL and prefixing it with
+`promisor-auto-`.
++
+If a remote with the chosen name already exists but points to a
+different URL, Git will append a numeric suffix (e.g., `-1`, `-2`) to
+the name to prevent overwriting existing configurations. You should
+make sure that this doesn't happen often though, as remotes will be
+rejected if the numeric suffix increases too much. In all cases, the
+original name advertised by the server is recorded in the
+`remote.<name>.advertisedAs` configuration variable for tracing and
+debugging purposes.
++
For the security implications of accepting a promisor remote, see the
documentation of `promisor.acceptFromServer`. For details on the
protocol, see linkgit:gitprotocol-v2[5].
diff --git a/Documentation/config/remote.adoc b/Documentation/config/remote.adoc
index 91e46f66f5..6e2bbdf457 100644
--- a/Documentation/config/remote.adoc
+++ b/Documentation/config/remote.adoc
@@ -91,6 +91,15 @@ remote.<name>.promisor::
When set to true, this remote will be used to fetch promisor
objects.
+remote.<name>.advertisedAs::
+ When a promisor remote is automatically configured using
+ information advertised by a server through the
+ `promisor-remote` protocol capability (see
+ `promisor.acceptFromServerUrl`), the server's originally
+ advertised name is saved in this variable. This is for
+ information, tracing and debugging purposes. Users should not
+ typically modify or create such configuration entries.
+
remote.<name>.partialclonefilter::
The filter that will be applied when fetching from this promisor remote.
Changing or clearing this value will only affect fetches for new commits.
diff --git a/promisor-remote.c b/promisor-remote.c
index ac4f54c590..cbfe6672a3 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -813,10 +813,197 @@ static struct allowed_url *url_matches_accept_list(
return NULL;
}
-static int should_accept_remote(enum accept_promisor accept,
+/*
+ * Sanitize the buffer to make it a valid remote name coming from the
+ * server by:
+ *
+ * - replacing any non alphanumeric character with a '-'
+ * - stripping any leading '-',
+ * - condensing multiple '-' into one,
+ * - prepending "promisor-auto-",
+ * - validating the result.
+ */
+static int sanitize_remote_name(struct strbuf *buf, const char *url)
+{
+ char prev = '-';
+ for (size_t i = 0; i < buf->len; ) {
+ if (!isalnum(buf->buf[i]))
+ buf->buf[i] = '-';
+ if (prev == '-' && buf->buf[i] == '-') {
+ strbuf_remove(buf, i, 1);
+ } else {
+ prev = buf->buf[i];
+ i++;
+ }
+ }
+
+ strbuf_strip_suffix(buf, "-");
+
+ if (!buf->len) {
+ warning(_("couldn't generate a valid remote name from "
+ "advertised url '%s', ignoring this remote"), url);
+ return -1;
+ }
+
+ strbuf_insertstr(buf, 0, "promisor-auto-");
+
+ if (!valid_remote_name(buf->buf)) {
+ warning(_("generated remote name '%s' from advertised url '%s' "
+ "is invalid, ignoring this remote"), buf->buf, url);
+ return -1;
+ }
+
+ return 0;
+}
+
+static char *promisor_remote_name_from_url(const char *url)
+{
+ struct url_info url_info = { 0 };
+ char *normalized = url_normalize(url, &url_info);
+ struct strbuf buf = STRBUF_INIT;
+
+ if (!normalized) {
+ warning(_("couldn't normalize advertised url '%s', "
+ "ignoring this remote"), url);
+ return NULL;
+ }
+
+ if (url_info.host_len) {
+ strbuf_add(&buf, normalized + url_info.host_off, url_info.host_len);
+ strbuf_addch(&buf, '-');
+ }
+
+ if (url_info.port_len) {
+ strbuf_add(&buf, normalized + url_info.port_off, url_info.port_len);
+ strbuf_addch(&buf, '-');
+ }
+
+ if (url_info.path_len) {
+ strbuf_add(&buf, normalized + url_info.path_off, url_info.path_len);
+ strbuf_trim_trailing_dir_sep(&buf);
+ strbuf_strip_suffix(&buf, ".git");
+ }
+
+ free(normalized);
+
+ if (sanitize_remote_name(&buf, url)) {
+ strbuf_release(&buf);
+ return NULL;
+ }
+
+ return strbuf_detach(&buf, NULL);
+}
+
+static void configure_auto_promisor_remote(struct repository *repo,
+ const char *name,
+ const char *url,
+ const char *advertised_as,
+ bool reuse)
+{
+ char *key;
+
+ if (!reuse) {
+ fprintf(stderr, _("Auto-creating promisor remote '%s' for URL '%s'\n"),
+ name, url);
+
+ key = xstrfmt("remote.%s.url", name);
+ repo_config_set_gently(repo, key, url);
+ free(key);
+ }
+
+ /* NB: when reusing, this promotes an existing non-promisor remote */
+ key = xstrfmt("remote.%s.promisor", name);
+ repo_config_set_gently(repo, key, "true");
+ free(key);
+
+ if (advertised_as) {
+ key = xstrfmt("remote.%s.advertisedAs", name);
+ repo_config_set_gently(repo, key, advertised_as);
+ free(key);
+ }
+}
+
+#define MAX_REMOTES_WITH_SIMILAR_NAMES 20
+
+/* Return the allocated local name, or NULL on failure */
+static char *handle_matching_allowed_url(struct repository *repo,
+ char *allowed_name,
+ const char *remote_url,
+ const char *remote_name)
+{
+ char *name;
+ char *basename = allowed_name ?
+ xstrdup(allowed_name) :
+ promisor_remote_name_from_url(remote_url);
+ int i = 0;
+ bool reuse = false;
+
+ if (!basename)
+ return NULL;
+
+ name = xstrdup(basename);
+
+ while (i < MAX_REMOTES_WITH_SIMILAR_NAMES) {
+ char *url_key = xstrfmt("remote.%s.url", name);
+ const char *existing_url;
+ int exists = !repo_config_get_string_tmp(repo, url_key, &existing_url);
+
+ free(url_key);
+
+ if (!exists)
+ break; /* Free to use */
+
+ if (!strcmp(existing_url, remote_url)) {
+ reuse = true;
+ break; /* Same URL, so safe to reuse */
+ }
+
+ i++;
+ free(name);
+ name = xstrfmt("%s-%d", basename, i);
+ }
+
+ if (i < MAX_REMOTES_WITH_SIMILAR_NAMES) {
+ configure_auto_promisor_remote(repo, name,
+ remote_url, remote_name,
+ reuse);
+ } else {
+ warning(_("too many remotes accepted with name like '%s-X', "
+ "ignoring this remote"), basename);
+ FREE_AND_NULL(name);
+ }
+
+ free(basename);
+ return name;
+}
+
+static int should_accept_new_remote_url(struct repository *repo,
+ struct string_list *accept_urls,
+ struct promisor_info *advertised)
+{
+ struct allowed_url *allowed = url_matches_accept_list(accept_urls,
+ advertised->url);
+ if (allowed) {
+ char *name = handle_matching_allowed_url(repo,
+ allowed->remote_name,
+ advertised->url,
+ advertised->name);
+ if (name) {
+ free((char *)advertised->local_name);
+ advertised->local_name = name;
+ return 1;
+ }
+ }
+
+ return 0;
+}
+
+static int should_accept_remote(struct repository *repo,
+ enum accept_promisor accept,
struct promisor_info *advertised,
struct string_list *accept_urls,
- struct string_list *config_info)
+ struct string_list *config_info,
+ bool *reload_config)
{
struct promisor_info *p;
struct string_list_item *item;
@@ -833,6 +1020,13 @@ static int should_accept_remote(enum accept_promisor accept,
if (!item) {
/* We don't know about that remote */
+
+ int res = should_accept_new_remote_url(repo, accept_urls, advertised);
+ if (res) {
+ *reload_config = true;
+ return res;
+ }
+
if (accept == ACCEPT_ALL)
return all_fields_match(advertised, config_info, NULL);
return 0;
@@ -1093,7 +1287,8 @@ static void filter_promisor_remote(struct repository *repo,
string_list_sort(&config_info);
}
- if (should_accept_remote(accept, advertised, &accept_urls, &config_info)) {
+ if (should_accept_remote(repo, accept, advertised, &accept_urls,
+ &config_info, &reload_config)) {
if (!store_info)
store_info = store_info_new(repo);
if (promisor_store_advertised_fields(advertised, store_info))
diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh
index 0659b2ac15..549acff23f 100755
--- a/t/t5710-promisor-remote-capability.sh
+++ b/t/t5710-promisor-remote-capability.sh
@@ -458,6 +458,107 @@ test_expect_success "clone with 'None', URL allowlisted, but client has differen
initialize_server 1 "$oid"
'
+test_expect_success "clone with URL allowlisted and no remote already configured" '
+ git -C server config promisor.advertise true &&
+ test_when_finished "rm -rf client" &&
+ test_when_finished "rm -f full_names" &&
+
+ GIT_NO_LAZY_FETCH=0 git clone \
+ -c promisor.acceptfromserver=None \
+ -c promisor.acceptFromServerUrl="$ENCODED_TRASH_DIRECTORY_URL/*" \
+ --no-local --filter="blob:limit=5k" server client &&
+
+ # Check that exactly one remote has been auto-created, identified
+ # by "remote.<name>.advertisedAs" == "lop".
+ git -C client config get --all --show-names --regexp \
+ "remote\..*\.advertisedas" >full_names &&
+ test_line_count = 1 full_names &&
+ REMOTE_NAME=$(sed "s/^remote\.\(.*\)\.advertisedas .*$/\1/" full_names) &&
+
+ # Check ".url" and ".promisor" values
+ printf "%s\n" "$TRASH_DIRECTORY_URL/lop" "true" >expect &&
+ git -C client config "remote.$REMOTE_NAME.url" >actual &&
+ git -C client config "remote.$REMOTE_NAME.promisor" >>actual &&
+ test_cmp expect actual &&
+
+ # Check that the largest object is still missing on the server
+ check_missing_objects server 1 "$oid"
+'
+
+test_expect_success "clone with named URL allowlisted and no pre-configured remote" '
+ git -C server config promisor.advertise true &&
+ test_when_finished "rm -rf client" &&
+
+ GIT_NO_LAZY_FETCH=0 git clone \
+ -c promisor.acceptfromserver=None \
+ -c promisor.acceptFromServerUrl="cdn=$ENCODED_TRASH_DIRECTORY_URL/*" \
+ --no-local --filter="blob:limit=5k" server client &&
+
+ # Check that a remote has been auto-created with the right "cdn" name and fields.
+ printf "%s\n" "$TRASH_DIRECTORY_URL/lop" "true" "lop" >expect &&
+ git -C client config "remote.cdn.url" >actual &&
+ git -C client config "remote.cdn.promisor" >>actual &&
+ git -C client config "remote.cdn.advertisedAs" >>actual &&
+ test_cmp expect actual &&
+
+ # Check that the largest object is still missing on the server
+ check_missing_objects server 1 "$oid"
+'
+
+test_expect_success "clone with URL allowlisted but colliding name" '
+ git -C server config promisor.advertise true &&
+ test_when_finished "rm -rf client" &&
+
+ GIT_NO_LAZY_FETCH=0 git clone -c remote.cdn.promisor=true \
+ -c remote.cdn.fetch="+refs/heads/*:refs/remotes/lop/*" \
+ -c remote.cdn.url="https://example.com/cdn" \
+ -c promisor.acceptfromserver=None \
+ -c promisor.acceptFromServerUrl="cdn=$ENCODED_TRASH_DIRECTORY_URL/*" \
+ --no-local --filter="blob:limit=5k" server client &&
+
+ # Check that a remote has been auto-created with the right "cdn-1" name and fields.
+ printf "%s\n" "$TRASH_DIRECTORY_URL/lop" "true" "lop" >expect &&
+ git -C client config "remote.cdn-1.url" >actual &&
+ git -C client config "remote.cdn-1.promisor" >>actual &&
+ git -C client config "remote.cdn-1.advertisedAs" >>actual &&
+ test_cmp expect actual &&
+
+ # Check that the original "cdn" remote was not overwritten.
+ printf "%s\n" "https://example.com/cdn" "true" >expect &&
+ git -C client config "remote.cdn.url" >actual &&
+ git -C client config "remote.cdn.promisor" >>actual &&
+ test_cmp expect actual &&
+
+ # Check that the largest object is still missing on the server
+ check_missing_objects server 1 "$oid"
+'
+
+test_expect_success "clone with URL allowlisted and reusable remote" '
+ git -C server config promisor.advertise true &&
+ test_when_finished "rm -rf client" &&
+
+ GIT_NO_LAZY_FETCH=0 git clone \
+ -c remote.cdn.fetch="+refs/heads/*:refs/remotes/lop/*" \
+ -c remote.cdn.url="$TRASH_DIRECTORY_URL/lop" \
+ -c promisor.acceptfromserver=None \
+ -c promisor.acceptFromServerUrl="cdn=$ENCODED_TRASH_DIRECTORY_URL/*" \
+ --no-local --filter="blob:limit=5k" server client &&
+
+ # Check that the existing "cdn" remote has been properly updated.
+ printf "%s\n" "$TRASH_DIRECTORY_URL/lop" "true" "lop" "+refs/heads/*:refs/remotes/lop/*" >expect &&
+ git -C client config "remote.cdn.url" >actual &&
+ git -C client config "remote.cdn.promisor" >>actual &&
+ git -C client config "remote.cdn.advertisedAs" >>actual &&
+ git -C client config "remote.cdn.fetch" >>actual &&
+ test_cmp expect actual &&
+
+ # Check that no new "cdn-1" remote has been created.
+ test_must_fail git -C client config "remote.cdn-1.url" &&
+
+ # Check that the largest object is still missing on the server
+ check_missing_objects server 1 "$oid"
+'
+
test_expect_success "clone with invalid promisor.acceptFromServerUrl" '
git -C server config promisor.advertise true &&
test_when_finished "rm -rf client" &&
@@ -472,6 +573,9 @@ test_expect_success "clone with invalid promisor.acceptFromServerUrl" '
# Check that a warning was emitted
test_grep "invalid remote name '\''bad name'\''" err &&
+ # Check that no remote was auto-created
+ test_must_fail git -C client config get --regexp "remote\..*\.advertisedas" &&
+
# Check that the largest object is not missing on the server
check_missing_objects server 0 "" &&
--
2.54.0.134.gbbe8e27878.dirty
^ permalink raw reply related
* [PATCH v3 6/8] promisor-remote: trust known remotes matching acceptFromServerUrl
From: Christian Couder @ 2026-05-19 15:38 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Patrick Steinhardt, Taylor Blau, Karthik Nayak,
Elijah Newren, Toon Claes, Christian Couder, Christian Couder
In-Reply-To: <20260519153808.494105-1-christian.couder@gmail.com>
A previous commit introduced the `promisor.acceptFromServerUrl` config
variable along with the machinery to parse and validate the URL glob
patterns and optional remote name prefixes it contains. However, these
URL patterns are not yet tied into the client's acceptance logic.
When a promisor remote is already configured locally, its fields (like
authentication tokens) may occasionally need to be refreshed by the
server. If `promisor.acceptFromServer` is set to the secure default
("None"), these updates are rejected, potentially causing future
fetches to fail.
To enable such targeted updates for trusted URLs, let's use the URL
patterns from `promisor.acceptFromServerUrl` as an additional URL
based allowlist.
Concretely, let's check the advertised URLs against the URL glob
patterns by introducing a new small helper function called
url_matches_accept_list(), which iterates over the glob patterns and
returns the first matching allowed_url entry (or NULL).
The URL matching is done component by component: scheme and port are
compared exactly, the host and path are matched with wildmatch().
Before matching, the advertised URL is passed through url_normalize()
so that case variations in the scheme/host, percent-encoding tricks,
and ".." path segments cannot bypass the allowlist.
The username and password components of the URL are intentionally
ignored during matching to allow servers to rotate them, though using
the 'token' field of the capability is preferred over embedding
credentials in the URL.
Let's then use this helper in should_accept_remote() so that, a known
remote whose URL matches the allowlist is accepted.
To prepare for this new logic, let's also:
- Add an 'accept_urls' parameter to should_accept_remote().
- Replace the BUG() guard in the ACCEPT_KNOWN_URL case with an
explicit 'if (accept == ACCEPT_KNOWN_URL) return' and a new
BUG() guard in the ACCEPT_NONE case.
- Call accept_from_server_url() from filter_promisor_remote()
and relax its early return so that the function is entered when
`accept_urls` has entries even if `accept == ACCEPT_NONE`.
With this, many organizations may only need something like:
git config set --global \
promisor.acceptFromServerUrl "https://my-org.com/*"
to accept only their own remotes. And if they need to accept additional
remotes in some specific repos, they can also set:
git config set promisor.acceptFromServer knownUrl
and configure the additional remote manually only in the repos where
they are needed.
Let's then properly document `promisor.acceptFromServerUrl` in
"promisor.adoc" as an additive security allowlist for known remotes,
including the URL normalization behavior and the component-wise
matching, and let's mention it in "gitprotocol-v2.adoc".
Also let's clarify in the documentation how
`promisor.acceptFromServerUrl` interacts with
`promisor.acceptFromServer`:
- Precedence: when both options are set,
`promisor.acceptFromServerUrl` is consulted first. If a matching
pattern leads to acceptance, the remote is accepted regardless of
`promisor.acceptFromServer`. Otherwise the decision is left to
`promisor.acceptFromServer`.
- URL-mismatch guard: even when the advertised URL matches the
allowlist, an already-existing client-side remote whose configured
URL differs from the advertised one is not accepted through
`promisor.acceptFromServerUrl`. `promisor.acceptFromServer=all` and
`=knownName` keep their pre-existing, looser semantics.
The precedence paragraph is intentionally scoped here to known remotes
only (field updates). A following commit that introduces auto-creation
of unknown remotes will extend it to cover that case as well.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
Documentation/config/promisor.adoc | 74 +++++++++++++++++++
Documentation/gitprotocol-v2.adoc | 9 ++-
promisor-remote.c | 102 +++++++++++++++++++++++---
t/t5710-promisor-remote-capability.sh | 71 ++++++++++++++++++
4 files changed, 242 insertions(+), 14 deletions(-)
diff --git a/Documentation/config/promisor.adoc b/Documentation/config/promisor.adoc
index b0fa43b839..4a1ecb4425 100644
--- a/Documentation/config/promisor.adoc
+++ b/Documentation/config/promisor.adoc
@@ -51,6 +51,80 @@ promisor.acceptFromServer::
to "fetch" and "clone" requests from the client. Name and URL
comparisons are case sensitive. See linkgit:gitprotocol-v2[5].
+promisor.acceptFromServerUrl::
+ A glob pattern to specify which server-advertised URLs a
+ client is allowed to act on. When a URL matches, the client
+ will accept the advertised remote as a promisor remote and may
+ automatically accept field updates (such as authentication
+ tokens) from the server, even if `promisor.acceptFromServer`
+ is set to `none` (the default).
++
+This option can appear multiple times in config files. An advertised
+URL will be accepted if it matches _ANY_ glob pattern specified by
+this option in _ANY_ config file read by Git.
++
+When both `promisor.acceptFromServer` and `promisor.acceptFromServerUrl`
+are set, `promisor.acceptFromServerUrl` is consulted first and takes
+precedence: if a matching pattern leads to acceptance (by accepting
+field updates for a known remote whose URL matches both the local
+configuration and the allowlist), the advertised remote is accepted
+regardless of the `promisor.acceptFromServer` setting. If no pattern
+in `promisor.acceptFromServerUrl` triggers acceptance, the decision
+is left to `promisor.acceptFromServer`.
++
+Note however that, even when an advertised URL matches a pattern in
+`promisor.acceptFromServerUrl`, an already-existing remote on the
+client whose name matches the advertised name but whose configured URL
+differs from the advertised one will _NOT_ be accepted through
+`promisor.acceptFromServerUrl`. This prevents a server from silently
+re-pointing an existing client-side remote at a different URL. (Such a
+remote may still be accepted through `promisor.acceptFromServer=all`
+or `=knownName`, which have their own, looser semantics; see the
+documentation of that option.)
++
+Be _VERY_ careful with these patterns: `*` matches any sequence of
+characters within the 'host' and 'path' parts of a URL (but cannot
+cross part boundaries). An overly broad pattern is a major security
+risk, as a matching URL allows a server to update fields (such as
+authentication tokens) on known remotes without further confirmation.
+To minimize security risks, follow these guidelines:
++
+1. Start with a secure protocol scheme, like `https://` or `ssh://`.
++
+2. Only allow domain names or paths where you control and trust _ALL_
+ the content. Be especially careful with shared hosting platforms
+ like `github.com` or `gitlab.com`. A broad pattern like
+ `https://gitlab.com/*` is dangerous because it trusts every
+ repository on the entire platform. Always restrict such patterns to
+ your specific organization or namespace (e.g.,
+ `https://gitlab.com/your-org/*`).
++
+3. Never use globs at the end of domain names. For example,
+ `https://cdn.your-org.com/*` might be safe, but
+ `https://cdn.your-org.com*/*` is a major security risk because
+ the latter matches `https://cdn.your-org.com.hacker.net/repo`.
++
+4. Be careful using globs at the beginning of domain names. While the
+ code ensures a `*` in the host cannot cross into the path, a
+ pattern like `https://*.example.com/*` will still match any
+ subdomain. This is extremely dangerous on shared hosting platforms
+ (e.g., `https://*.github.io/*` trusts every user's site on the
+ entire platform).
++
+Before matching, both the advertised URL and the pattern are
+normalized: the scheme and host are lowercased, percent-encoded
+characters are decoded where possible, and path segments like `..`
+are resolved. The port must also match exactly (e.g.,
+`https://example.com:8080/*` will not match a URL advertised on
+port 9999). The username and password components of the URL are
+ignored during matching. Note that embedding credentials in URLs is
+discouraged. Passing authentication tokens via the `token` field of
+the `promisor-remote` capability is strongly preferred.
++
+For the security implications of accepting a promisor remote, see the
+documentation of `promisor.acceptFromServer`. For details on the
+protocol, see linkgit:gitprotocol-v2[5].
+
promisor.checkFields::
A comma or space separated list of additional remote related
field names. A client checks if the values of these fields
diff --git a/Documentation/gitprotocol-v2.adoc b/Documentation/gitprotocol-v2.adoc
index befa697d21..2beb70595f 100644
--- a/Documentation/gitprotocol-v2.adoc
+++ b/Documentation/gitprotocol-v2.adoc
@@ -866,10 +866,11 @@ the server advertised, the client shouldn't advertise the
On the server side, the "promisor.advertise" and "promisor.sendFields"
configuration options can be used to control what it advertises. On
-the client side, the "promisor.acceptFromServer" configuration option
-can be used to control what it accepts, and the "promisor.storeFields"
-option, to control what it stores. See the documentation of these
-configuration options in linkgit:git-config[1] for more information.
+the client side, the "promisor.acceptFromServer" and
+"promisor.acceptFromServerUrl" configuration options can be used to
+control what it accepts, and the "promisor.storeFields" option, to
+control what it stores. See the documentation of these configuration
+options in linkgit:git-config[1] for more information.
Note that in the future it would be nice if the "promisor-remote"
protocol capability could be used by the server, when responding to
diff --git a/promisor-remote.c b/promisor-remote.c
index 3f3924f587..ac4f54c590 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -14,6 +14,7 @@
#include "url.h"
#include "urlmatch.h"
#include "version.h"
+#include "wildmatch.h"
struct promisor_remote_config {
struct promisor_remote *promisors;
@@ -742,8 +743,79 @@ static void load_accept_from_server_url(struct repository *repo,
}
}
+static bool match_pattern_url(const char *pat, size_t pat_len,
+ const char *url, size_t url_len)
+{
+ char *p_str = xstrndup(pat, pat_len);
+ char *u_str = xstrndup(url, url_len);
+ bool res = !wildmatch(p_str, u_str, 0);
+
+ free(p_str);
+ free(u_str);
+
+ return res;
+}
+
+static bool match_one_url(const struct url_info *pi, const struct url_info *ui)
+{
+ const char *pat = pi->url;
+ const char *url = ui->url;
+
+ /*
+ * Schemes must match exactly. They are case-folded by
+ * url_normalize(), so strncmp() suffices.
+ */
+ if (pi->scheme_len != ui->scheme_len || strncmp(pat, url, pi->scheme_len))
+ return false;
+
+ /*
+ * Ports must match exactly. url_normalize() strips default
+ * ports (like 443 for https), so length and content
+ * comparisons are sufficient.
+ */
+ if (pi->port_len != ui->port_len ||
+ strncmp(pat + pi->port_off, url + ui->port_off, pi->port_len))
+ return false;
+
+ /*
+ * Match host and path separately to prevent a '*' in the host
+ * portion of the pattern from matching across the '/'
+ * boundary into the path.
+ */
+
+ return match_pattern_url(pat + pi->host_off, pi->host_len,
+ url + ui->host_off, ui->host_len) &&
+ match_pattern_url(pat + pi->path_off, pi->path_len,
+ url + ui->path_off, ui->path_len);
+}
+
+static struct allowed_url *url_matches_accept_list(
+ struct string_list *accept_urls, const char *url)
+{
+ struct string_list_item *item;
+ struct url_info url_info;
+
+ url_info.url = url_normalize(url, &url_info);
+
+ if (!url_info.url)
+ return NULL;
+
+ for_each_string_list_item(item, accept_urls) {
+ struct allowed_url *allowed = item->util;
+
+ if (match_one_url(&allowed->pattern_info, &url_info)) {
+ free(url_info.url);
+ return allowed;
+ }
+ }
+
+ free(url_info.url);
+ return NULL;
+}
+
static int should_accept_remote(enum accept_promisor accept,
struct promisor_info *advertised,
+ struct string_list *accept_urls,
struct string_list *config_info)
{
struct promisor_info *p;
@@ -756,23 +828,27 @@ static int should_accept_remote(enum accept_promisor accept,
"this remote should have been rejected earlier",
remote_name);
- if (accept == ACCEPT_ALL)
- return all_fields_match(advertised, config_info, NULL);
-
/* Get config info for that promisor remote */
item = string_list_lookup(config_info, remote_name);
- if (!item)
+ if (!item) {
/* We don't know about that remote */
+ if (accept == ACCEPT_ALL)
+ return all_fields_match(advertised, config_info, NULL);
return 0;
+ }
p = item->util;
- if (accept == ACCEPT_KNOWN_NAME)
+ /* Known remote in the allowlist? */
+ if (!strcmp(p->url, remote_url) && url_matches_accept_list(accept_urls, remote_url))
return all_fields_match(advertised, config_info, p);
- if (accept != ACCEPT_KNOWN_URL)
- BUG("Unhandled 'enum accept_promisor' value '%d'", accept);
+ if (accept == ACCEPT_ALL)
+ return all_fields_match(advertised, config_info, NULL);
+
+ if (accept == ACCEPT_KNOWN_NAME)
+ return all_fields_match(advertised, config_info, p);
if (strcmp(p->url, remote_url)) {
warning(_("known remote named '%s' but with URL '%s' instead of '%s', "
@@ -781,7 +857,13 @@ static int should_accept_remote(enum accept_promisor accept,
return 0;
}
- return all_fields_match(advertised, config_info, p);
+ if (accept == ACCEPT_KNOWN_URL)
+ return all_fields_match(advertised, config_info, p);
+
+ if (accept != ACCEPT_NONE)
+ BUG("Unhandled 'enum accept_promisor' value '%d'", accept);
+
+ return 0;
}
static int skip_field_name_prefix(const char *elem, const char *field_name, const char **value)
@@ -991,7 +1073,7 @@ static void filter_promisor_remote(struct repository *repo,
/* Load and validate the acceptFromServerUrl config */
load_accept_from_server_url(repo, &accept_urls);
- if (accept == ACCEPT_NONE)
+ if (accept == ACCEPT_NONE && !accept_urls.nr)
return;
/* Parse remote info received */
@@ -1011,7 +1093,7 @@ static void filter_promisor_remote(struct repository *repo,
string_list_sort(&config_info);
}
- if (should_accept_remote(accept, advertised, &config_info)) {
+ if (should_accept_remote(accept, advertised, &accept_urls, &config_info)) {
if (!store_info)
store_info = store_info_new(repo);
if (promisor_store_advertised_fields(advertised, store_info))
diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh
index 3b39505380..0659b2ac15 100755
--- a/t/t5710-promisor-remote-capability.sh
+++ b/t/t5710-promisor-remote-capability.sh
@@ -387,6 +387,77 @@ test_expect_success "clone with 'KnownUrl' and empty url, so not advertised" '
check_missing_objects server 1 "$oid"
'
+test_expect_success "clone with 'None' but URL allowlisted" '
+ git -C server config promisor.advertise true &&
+ test_when_finished "rm -rf client" &&
+
+ GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+ -c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
+ -c remote.lop.url="$TRASH_DIRECTORY_URL/lop" \
+ -c promisor.acceptfromserver=None \
+ -c promisor.acceptFromServerUrl="$ENCODED_TRASH_DIRECTORY_URL/*" \
+ --no-local --filter="blob:limit=5k" server client &&
+
+ # Check that the largest object is still missing on the server
+ check_missing_objects server 1 "$oid"
+'
+
+test_expect_success "clone with 'None' but URL not in allowlist" '
+ git -C server config promisor.advertise true &&
+ test_when_finished "rm -rf client" &&
+
+ GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+ -c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
+ -c remote.lop.url="$TRASH_DIRECTORY_URL/lop" \
+ -c promisor.acceptfromserver=None \
+ -c promisor.acceptFromServerUrl="https://example.com/*" \
+ --no-local --filter="blob:limit=5k" server client &&
+
+ # Check that the largest object is not missing on the server
+ check_missing_objects server 0 "" &&
+
+ # Reinitialize server so that the largest object is missing again
+ initialize_server 1 "$oid"
+'
+
+test_expect_success "clone with 'None' but URL allowlisted in one pattern out of two" '
+ git -C server config promisor.advertise true &&
+ test_when_finished "rm -rf client" &&
+
+ GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+ -c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
+ -c remote.lop.url="$TRASH_DIRECTORY_URL/lop" \
+ -c promisor.acceptfromserver=None \
+ -c promisor.acceptFromServerUrl="https://example.com/*" \
+ -c promisor.acceptFromServerUrl="$ENCODED_TRASH_DIRECTORY_URL/*" \
+ --no-local --filter="blob:limit=5k" server client &&
+
+ # Check that the largest object is still missing on the server
+ check_missing_objects server 1 "$oid"
+'
+
+test_expect_success "clone with 'None', URL allowlisted, but client has different URL" '
+ git -C server config promisor.advertise true &&
+ test_when_finished "rm -rf client" &&
+
+ # The client configures "lop" with a different URL (serverTwo) than
+ # what the server advertises (lop). Even though the advertised URL
+ # matches the allowlist, the remote is rejected because the
+ # configured URL does not match the advertised one.
+ GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+ -c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
+ -c remote.lop.url="$TRASH_DIRECTORY_URL/serverTwo" \
+ -c promisor.acceptfromserver=None \
+ -c promisor.acceptFromServerUrl="$ENCODED_TRASH_DIRECTORY_URL/*" \
+ --no-local --filter="blob:limit=5k" server client &&
+
+ # Check that the largest object is not missing on the server
+ check_missing_objects server 0 "" &&
+
+ # Reinitialize server so that the largest object is missing again
+ initialize_server 1 "$oid"
+'
+
test_expect_success "clone with invalid promisor.acceptFromServerUrl" '
git -C server config promisor.advertise true &&
test_when_finished "rm -rf client" &&
--
2.54.0.134.gbbe8e27878.dirty
^ permalink raw reply related
* [PATCH v3 5/8] promisor-remote: introduce promisor.acceptFromServerUrl
From: Christian Couder @ 2026-05-19 15:38 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Patrick Steinhardt, Taylor Blau, Karthik Nayak,
Elijah Newren, Toon Claes, Christian Couder, Christian Couder
In-Reply-To: <20260519153808.494105-1-christian.couder@gmail.com>
The "promisor-remote" protocol capability allows servers to advertise
promisor remotes, but doesn't allow these remotes to be automatically
configured on the client.
Let's introduce a new `promisor.acceptFromServerUrl` config variable
which contains a glob pattern, so that advertised remotes with a URL
matching that pattern will be automatically configured.
The glob pattern can optionally be prefixed with a remote name which
will be used as the name of the new local remote.
For now though, let's only introduce the functions to read and validate
the glob patterns and the optional prefixes.
Checking if the URLs of the advertised remotes match the glob patterns
and taking the appropriate action is left for a following commit.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
promisor-remote.c | 90 +++++++++++++++++++++++++++
t/t5710-promisor-remote-capability.sh | 21 +++++++
2 files changed, 111 insertions(+)
diff --git a/promisor-remote.c b/promisor-remote.c
index 7699e259eb..3f3924f587 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -12,6 +12,7 @@
#include "packfile.h"
#include "environment.h"
#include "url.h"
+#include "urlmatch.h"
#include "version.h"
struct promisor_remote_config {
@@ -657,6 +658,90 @@ static bool has_control_char(const char *s)
return false;
}
+struct allowed_url {
+ char *remote_name;
+ char *url_pattern;
+ struct url_info pattern_info;
+};
+
+static void allowed_url_free(void *util, const char *str UNUSED)
+{
+ struct allowed_url *allowed = util;
+
+ if (!allowed)
+ return;
+
+ /* Depending on prefix, free either remote_name or url_pattern */
+ free(allowed->remote_name ? allowed->remote_name : allowed->url_pattern);
+ free(allowed->pattern_info.url);
+ free(allowed);
+}
+
+static struct allowed_url *valid_accept_url(const char *url)
+{
+ char *dup, *p;
+ struct allowed_url *allowed;
+
+ if (!url)
+ return NULL;
+
+ dup = xstrdup(url);
+ p = strchr(dup, '=');
+ if (p) {
+ *p = '\0';
+ if (!valid_remote_name(dup)) {
+ warning(_("invalid remote name '%s' before '=' sign "
+ "in '%s' from promisor.acceptFromServerUrl config"),
+ dup, url);
+ free(dup);
+ return NULL;
+ }
+ p++;
+ } else {
+ p = dup;
+ }
+
+ if (has_control_char(p)) {
+ warning(_("invalid url pattern '%s' "
+ "in '%s' from promisor.acceptFromServerUrl config"), p, url);
+ free(dup);
+ return NULL;
+ }
+
+ allowed = xmalloc(sizeof(*allowed));
+ allowed->remote_name = (p == dup) ? NULL : dup;
+ allowed->url_pattern = p;
+ allowed->pattern_info.url = url_normalize_pattern(p, &allowed->pattern_info);
+ if (!allowed->pattern_info.url) {
+ warning(_("invalid url pattern '%s' "
+ "in '%s' from promisor.acceptFromServerUrl config"), p, url);
+ free(dup);
+ free(allowed);
+ return NULL;
+ }
+
+ return allowed;
+}
+
+static void load_accept_from_server_url(struct repository *repo,
+ struct string_list *accept_urls)
+{
+ const struct string_list *config_urls;
+
+ if (!repo_config_get_string_multi(repo, "promisor.acceptfromserverurl", &config_urls)) {
+ struct string_list_item *item;
+
+ for_each_string_list_item(item, config_urls) {
+ struct allowed_url *allowed = valid_accept_url(item->string);
+ if (allowed) {
+ struct string_list_item *new;
+ new = string_list_append(accept_urls, item->string);
+ new->util = allowed;
+ }
+ }
+ }
+}
+
static int should_accept_remote(enum accept_promisor accept,
struct promisor_info *advertised,
struct string_list *config_info)
@@ -901,6 +986,10 @@ static void filter_promisor_remote(struct repository *repo,
struct string_list_item *item;
bool reload_config = false;
enum accept_promisor accept = accept_from_server(repo);
+ struct string_list accept_urls = STRING_LIST_INIT_DUP;
+
+ /* Load and validate the acceptFromServerUrl config */
+ load_accept_from_server_url(repo, &accept_urls);
if (accept == ACCEPT_NONE)
return;
@@ -934,6 +1023,7 @@ static void filter_promisor_remote(struct repository *repo,
}
}
+ string_list_clear_func(&accept_urls, allowed_url_free);
promisor_info_list_clear(&config_info);
string_list_clear(&remote_info, 0);
store_info_free(store_info);
diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh
index bf1cc54605..3b39505380 100755
--- a/t/t5710-promisor-remote-capability.sh
+++ b/t/t5710-promisor-remote-capability.sh
@@ -387,6 +387,27 @@ test_expect_success "clone with 'KnownUrl' and empty url, so not advertised" '
check_missing_objects server 1 "$oid"
'
+test_expect_success "clone with invalid promisor.acceptFromServerUrl" '
+ git -C server config promisor.advertise true &&
+ test_when_finished "rm -rf client" &&
+
+ # As "bad name" contains a space, which is not a valid remote name,
+ # the pattern should be rejected with a warning and no remote created.
+ GIT_NO_LAZY_FETCH=0 git clone \
+ -c promisor.acceptfromserver=None \
+ -c "promisor.acceptFromServerUrl=bad name=https://example.com/*" \
+ --no-local --filter="blob:limit=5k" server client 2>err &&
+
+ # Check that a warning was emitted
+ test_grep "invalid remote name '\''bad name'\''" err &&
+
+ # Check that the largest object is not missing on the server
+ check_missing_objects server 0 "" &&
+
+ # Reinitialize server so that the largest object is missing again
+ initialize_server 1 "$oid"
+'
+
test_expect_success "clone with promisor.sendFields" '
git -C server config promisor.advertise true &&
test_when_finished "rm -rf client" &&
--
2.54.0.134.gbbe8e27878.dirty
^ permalink raw reply related
* [PATCH v3 4/8] promisor-remote: add 'local_name' to 'struct promisor_info'
From: Christian Couder @ 2026-05-19 15:38 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Patrick Steinhardt, Taylor Blau, Karthik Nayak,
Elijah Newren, Toon Claes, Christian Couder, Christian Couder
In-Reply-To: <20260519153808.494105-1-christian.couder@gmail.com>
In a following commit, we will store promisor remote information under
a remote name different than the one the server advertised.
To prepare for this change, let's add a new 'char *local_name' member
to 'struct promisor_info', and let's update the related functions.
While at it, let's also add a small promisor_info_internal_name()
helper that returns `local_name` when set, `name` otherwise, and let's
use this small helper in promisor_store_advertised_fields() and in the
post-loop of filter_promisor_remote() so that lookups against the local
repo configuration use the right name.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
promisor-remote.c | 22 +++++++++++++++-------
1 file changed, 15 insertions(+), 7 deletions(-)
diff --git a/promisor-remote.c b/promisor-remote.c
index 38fa050542..7699e259eb 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -434,13 +434,14 @@ static struct string_list *fields_stored(void)
* Struct for promisor remotes involved in the "promisor-remote"
* protocol capability.
*
- * Except for "name", each <member> in this struct and its <value>
- * should correspond (either on the client side or on the server side)
- * to a "remote.<name>.<member>" config variable set to <value> where
- * "<name>" is a promisor remote name.
+ * Except for "name" and "local_name", each <member> in this struct
+ * and its <value> should correspond (either on the client side or on
+ * the server side) to a "remote.<name>.<member>" config variable set
+ * to <value> where "<name>" is a promisor remote name.
*/
struct promisor_info {
- const char *name;
+ const char *name; /* name the server advertised */
+ const char *local_name; /* name used locally (may be auto-generated) */
const char *url;
const char *filter;
const char *token;
@@ -449,6 +450,7 @@ struct promisor_info {
static void promisor_info_free(struct promisor_info *p)
{
free((char *)p->name);
+ free((char *)p->local_name);
free((char *)p->url);
free((char *)p->filter);
free((char *)p->token);
@@ -462,6 +464,11 @@ static void promisor_info_list_clear(struct string_list *list)
string_list_clear(list, 0);
}
+static const char *promisor_info_internal_name(struct promisor_info *p)
+{
+ return p->local_name ? p->local_name : p->name;
+}
+
static void set_one_field(struct promisor_info *p,
const char *field, const char *value)
{
@@ -829,7 +836,7 @@ static bool promisor_store_advertised_fields(struct promisor_info *advertised,
{
struct promisor_info *p;
struct string_list_item *item;
- const char *remote_name = advertised->name;
+ const char *remote_name = promisor_info_internal_name(advertised);
bool reload_config = false;
if (!(store_info->store_filter || store_info->store_token))
@@ -937,7 +944,8 @@ static void filter_promisor_remote(struct repository *repo,
/* Apply accepted remotes to the stable repo state */
for_each_string_list_item(item, accepted_remotes) {
struct promisor_info *info = item->util;
- struct promisor_remote *r = repo_promisor_remote_find(repo, info->name);
+ const char *local = promisor_info_internal_name(info);
+ struct promisor_remote *r = repo_promisor_remote_find(repo, local);
if (r) {
r->accepted = 1;
--
2.54.0.134.gbbe8e27878.dirty
^ permalink raw reply related
* [PATCH v3 3/8] urlmatch: add url_normalize_pattern() helper
From: Christian Couder @ 2026-05-19 15:38 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Patrick Steinhardt, Taylor Blau, Karthik Nayak,
Elijah Newren, Toon Claes, Christian Couder, Christian Couder
In-Reply-To: <20260519153808.494105-1-christian.couder@gmail.com>
In a following commit, we will need to normalize a URL glob pattern
(which may contain '*' in the host portion) and extract its component
offsets (host, path, etc.) for separate matching. Let's export a
dedicated helper function url_normalize_pattern() for that purpose.
It works like url_normalize(), but passes allow_globs=true to the
internal url_normalize_1(), so that '*' characters in the host are
accepted rather than rejected.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
urlmatch.c | 5 +++++
urlmatch.h | 12 ++++++++++++
2 files changed, 17 insertions(+)
diff --git a/urlmatch.c b/urlmatch.c
index 989bc7eb8b..7e734e2660 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -440,6 +440,11 @@ char *url_normalize(const char *url, struct url_info *out_info)
return url_normalize_1(url, out_info, false);
}
+char *url_normalize_pattern(const char *url, struct url_info *out_info)
+{
+ return url_normalize_1(url, out_info, true);
+}
+
static size_t url_match_prefix(const char *url,
const char *url_prefix,
size_t url_prefix_len)
diff --git a/urlmatch.h b/urlmatch.h
index 5ba85cea13..32c5067f9b 100644
--- a/urlmatch.h
+++ b/urlmatch.h
@@ -36,6 +36,18 @@ struct url_info {
char *url_normalize(const char *, struct url_info *);
+/*
+ * Like url_normalize(), but also allows '*' glob characters in the host
+ * portion. Use this when normalizing URL patterns from user configuration.
+ *
+ * Note that '*' is a valid path character per RFC 3986 (as a sub-delim),
+ * so glob patterns using '*' in the path are also accepted.
+ *
+ * Returns a newly allocated normalized string and fills out_info if
+ * non-NULL, or NULL if the pattern is invalid.
+ */
+char *url_normalize_pattern(const char *url, struct url_info *out_info);
+
struct urlmatch_item {
size_t hostmatch_len;
size_t pathmatch_len;
--
2.54.0.134.gbbe8e27878.dirty
^ permalink raw reply related
* [PATCH v3 2/8] urlmatch: change 'allow_globs' arg to bool
From: Christian Couder @ 2026-05-19 15:38 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Patrick Steinhardt, Taylor Blau, Karthik Nayak,
Elijah Newren, Toon Claes, Christian Couder, Christian Couder
In-Reply-To: <20260519153808.494105-1-christian.couder@gmail.com>
The last argument of url_normalize_1() is `char allow_globs` but it is
used as a boolean, not as a char.
Let's convert it to a `bool`, and while at it convert the two calls to
url_normalize_1() so they pass 'true' or 'false' instead of '1' or '0'.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
urlmatch.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/urlmatch.c b/urlmatch.c
index eea8300489..989bc7eb8b 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -111,7 +111,7 @@ static int match_host(const struct url_info *url_info,
return (!url_len && !pat_len);
}
-static char *url_normalize_1(const char *url, struct url_info *out_info, char allow_globs)
+static char *url_normalize_1(const char *url, struct url_info *out_info, bool allow_globs)
{
/*
* Normalize NUL-terminated url using the following rules:
@@ -437,7 +437,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al
char *url_normalize(const char *url, struct url_info *out_info)
{
- return url_normalize_1(url, out_info, 0);
+ return url_normalize_1(url, out_info, false);
}
static size_t url_match_prefix(const char *url,
@@ -577,7 +577,7 @@ int urlmatch_config_entry(const char *var, const char *value,
struct url_info norm_info;
config_url = xmemdupz(key, dot - key);
- norm_url = url_normalize_1(config_url, &norm_info, 1);
+ norm_url = url_normalize_1(config_url, &norm_info, true);
if (norm_url)
retval = match_urls(url, &norm_info, &matched);
else if (collect->fallback_match_fn)
--
2.54.0.134.gbbe8e27878.dirty
^ permalink raw reply related
* [PATCH v3 1/8] t5710: simplify 'mkdir X' followed by 'git -C X init'
From: Christian Couder @ 2026-05-19 15:38 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Patrick Steinhardt, Taylor Blau, Karthik Nayak,
Elijah Newren, Toon Claes, Christian Couder, Christian Couder
In-Reply-To: <20260519153808.494105-1-christian.couder@gmail.com>
It's simpler and more efficient to just use `git init client` instead
of `mkdir client && git -C client init`.
So let's replace the latter with the former.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
t/t5710-promisor-remote-capability.sh | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh
index b404ad9f0a..bf1cc54605 100755
--- a/t/t5710-promisor-remote-capability.sh
+++ b/t/t5710-promisor-remote-capability.sh
@@ -177,8 +177,7 @@ test_expect_success "init + fetch with promisor.advertise set to 'true'" '
git -C server config promisor.advertise true &&
test_when_finished "rm -rf client" &&
- mkdir client &&
- git -C client init &&
+ git init client &&
git -C client config remote.lop.promisor true &&
git -C client config remote.lop.fetch "+refs/heads/*:refs/remotes/lop/*" &&
git -C client config remote.lop.url "$TRASH_DIRECTORY_URL/lop" &&
@@ -231,8 +230,7 @@ test_expect_success "init + fetch two promisors but only one advertised" '
# Create a promisor that will be configured but not be used
git init --bare unused_lop &&
- mkdir client &&
- git -C client init &&
+ git init client &&
git -C client config remote.unused_lop.promisor true &&
git -C client config remote.unused_lop.fetch "+refs/heads/*:refs/remotes/unused_lop/*" &&
git -C client config remote.unused_lop.url "$TRASH_DIRECTORY_URL/unused_lop" &&
--
2.54.0.134.gbbe8e27878.dirty
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox