git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] builtin/maintenance: introduce "geometric" strategy
@ 2025-10-16  7:26 Patrick Steinhardt
  2025-10-16  7:26 ` [PATCH 1/8] builtin/gc: remove global `repack` variable Patrick Steinhardt
                   ` (10 more replies)
  0 siblings, 11 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-16  7:26 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau

Hi,

by default, git-maintenance(1) uses git-gc(1) to perform repository
housekeeping. This tool has a couple of shortcomings, most importantly
that it regularly does all-into-one repacks. This doesn't really work
all that well in the context of monorepos, where you really want to
avoid repacking all objects regularly.

An alternative maintenance strategy is the "incremental" strategy, but
this strategy has two downsides:

  - Strategies in general only apply to scheduled maintenance. So if you
    run git-maintenance(1), you still end up with git-gc(1).

  - The strategy is designed to not ever delete any data, but a full
    replacment for git-gc(1) needs to also prune reflogs, rereree caches
    and vanished worktrees.

This patch series aims to fix both of these issues.

First, the series introduces a new "geometric" maintenance task, which
makes use of geometric repacking as exposed by git-repack(1) in the
general case. In the case where a geometric repack ends up merging all
packfiles into one we instead do an all-into-one repack with cruft packs
so that we can still phase out objects over time.

Second, the series extends maintenance strategies to also cover normal
maintenance. If the user has configured the "geometric" strategy, we'll
thus use it for both manual and scheduled maintenance. For backwards
compatibility, the "incremental" strategy is changed so that it uses
git-gc(1) for manual maintenance and the other tasks for scheduled
maintenance.

The series is built on top of b660e2dcb9 (Sync with 'maint', 2025-10-14)
with tb/incremental-midx-part-3.1 at c886af90f8 (SQUASH??? play well
with other topics by preemptively including "repository.h", 2025-09-29)
merged into it.

Thanks!

Patrick

---
Patrick Steinhardt (8):
      builtin/gc: remove global `repack` variable
      builtin/gc: make `too_many_loose_objects()` reusable without GC config
      builtin/maintenance: introduce "geometric-repack" task
      builtin/maintenance: don't silently ignore invalid strategy
      builtin/maintenance: run maintenance tasks depending on type
      builtin/maintenance: extend "maintenance.strategy" to manual maintenance
      builtin/maintenance: make "gc" strategy accessible
      builtin/maintenance: introduce "geometric" strategy

 Documentation/config/maintenance.adoc |  44 +++++-
 builtin/gc.c                          | 271 +++++++++++++++++++++++++++-------
 t/t7900-maintenance.sh                | 212 ++++++++++++++++++++++++++
 3 files changed, 469 insertions(+), 58 deletions(-)


---
base-commit: 0bb2c786c2349dd6700727153c13d81cbfb41710
change-id: 20251015-pks-maintenance-geometric-strategy-580c58581b01


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 1/8] builtin/gc: remove global `repack` variable
  2025-10-16  7:26 [PATCH 0/8] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
@ 2025-10-16  7:26 ` Patrick Steinhardt
  2025-10-16 20:07   ` Justin Tobler
  2025-10-17 20:58   ` Taylor Blau
  2025-10-16  7:26 ` [PATCH 2/8] builtin/gc: make `too_many_loose_objects()` reusable without GC config Patrick Steinhardt
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-16  7:26 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau

The global `repack` variable is used to store all command line arguments
that we eventually want to pass to git-repack(1). It is being appended
to from multiple different functions, which makes it hard to follow the
logic. Besides being hard to follow, it also makes it unnecessarily hard
to reuse this infrastructure in new code.

Refactor the code so that we store this variable on the stack and pass
a pointer to it around as needed. This is done so that we can reuse
`add_repack_all_options()` in a subsequent commit.

The refactoring itself is straight-forward. One function that deserves
attention though is `need_to_gc()`: this function determines whether or
not we need to execute garbage collection for `git gc --auto`, but also
for `git maintenance run --auto`. But besides figuring out whether we
have to perform GC, the function also sets up the `repack` arguments.

For `git gc --auto` it's trivial to adapt, as we already have the
on-stack variable at our fingertips. But for the maintenance condition
it's less obvious what to do.

As it turns out, we can just use another temporary variable there that
we then immediately discard. If we need to perform GC we execute a child
git-gc(1) process to repack objects for us, and that process will have
to recompute the arguments anyway.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/gc.c | 74 ++++++++++++++++++++++++++++++++++++------------------------
 1 file changed, 45 insertions(+), 29 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index e19e13d9788..e9772eb3a30 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -55,7 +55,6 @@ static const char * const builtin_gc_usage[] = {
 };
 
 static timestamp_t gc_log_expire_time;
-static struct strvec repack = STRVEC_INIT;
 static struct tempfile *pidfile;
 static struct lock_file log_lock;
 static struct string_list pack_garbage = STRING_LIST_INIT_DUP;
@@ -618,48 +617,50 @@ static uint64_t estimate_repack_memory(struct gc_config *cfg,
 	return os_cache + heap;
 }
 
-static int keep_one_pack(struct string_list_item *item, void *data UNUSED)
+static int keep_one_pack(struct string_list_item *item, void *data)
 {
-	strvec_pushf(&repack, "--keep-pack=%s", basename(item->string));
+	struct strvec *args = data;
+	strvec_pushf(args, "--keep-pack=%s", basename(item->string));
 	return 0;
 }
 
 static void add_repack_all_option(struct gc_config *cfg,
-				  struct string_list *keep_pack)
+				  struct string_list *keep_pack,
+				  struct strvec *args)
 {
 	if (cfg->prune_expire && !strcmp(cfg->prune_expire, "now")
 		&& !(cfg->cruft_packs && cfg->repack_expire_to))
-		strvec_push(&repack, "-a");
+		strvec_push(args, "-a");
 	else if (cfg->cruft_packs) {
-		strvec_push(&repack, "--cruft");
+		strvec_push(args, "--cruft");
 		if (cfg->prune_expire)
-			strvec_pushf(&repack, "--cruft-expiration=%s", cfg->prune_expire);
+			strvec_pushf(args, "--cruft-expiration=%s", cfg->prune_expire);
 		if (cfg->max_cruft_size)
-			strvec_pushf(&repack, "--max-cruft-size=%lu",
+			strvec_pushf(args, "--max-cruft-size=%lu",
 				     cfg->max_cruft_size);
 		if (cfg->repack_expire_to)
-			strvec_pushf(&repack, "--expire-to=%s", cfg->repack_expire_to);
+			strvec_pushf(args, "--expire-to=%s", cfg->repack_expire_to);
 	} else {
-		strvec_push(&repack, "-A");
+		strvec_push(args, "-A");
 		if (cfg->prune_expire)
-			strvec_pushf(&repack, "--unpack-unreachable=%s", cfg->prune_expire);
+			strvec_pushf(args, "--unpack-unreachable=%s", cfg->prune_expire);
 	}
 
 	if (keep_pack)
-		for_each_string_list(keep_pack, keep_one_pack, NULL);
+		for_each_string_list(keep_pack, keep_one_pack, args);
 
 	if (cfg->repack_filter && *cfg->repack_filter)
-		strvec_pushf(&repack, "--filter=%s", cfg->repack_filter);
+		strvec_pushf(args, "--filter=%s", cfg->repack_filter);
 	if (cfg->repack_filter_to && *cfg->repack_filter_to)
-		strvec_pushf(&repack, "--filter-to=%s", cfg->repack_filter_to);
+		strvec_pushf(args, "--filter-to=%s", cfg->repack_filter_to);
 }
 
-static void add_repack_incremental_option(void)
+static void add_repack_incremental_option(struct strvec *args)
 {
-	strvec_push(&repack, "--no-write-bitmap-index");
+	strvec_push(args, "--no-write-bitmap-index");
 }
 
-static int need_to_gc(struct gc_config *cfg)
+static int need_to_gc(struct gc_config *cfg, struct strvec *repack_args)
 {
 	/*
 	 * Setting gc.auto to 0 or negative can disable the
@@ -700,10 +701,10 @@ static int need_to_gc(struct gc_config *cfg)
 				string_list_clear(&keep_pack, 0);
 		}
 
-		add_repack_all_option(cfg, &keep_pack);
+		add_repack_all_option(cfg, &keep_pack, repack_args);
 		string_list_clear(&keep_pack, 0);
 	} else if (too_many_loose_objects(cfg))
-		add_repack_incremental_option();
+		add_repack_incremental_option(repack_args);
 	else
 		return 0;
 
@@ -852,6 +853,7 @@ int cmd_gc(int argc,
 	int keep_largest_pack = -1;
 	int skip_foreground_tasks = 0;
 	timestamp_t dummy;
+	struct strvec repack_args = STRVEC_INIT;
 	struct maintenance_run_opts opts = MAINTENANCE_RUN_OPTS_INIT;
 	struct gc_config cfg = GC_CONFIG_INIT;
 	const char *prune_expire_sentinel = "sentinel";
@@ -891,7 +893,7 @@ int cmd_gc(int argc,
 	show_usage_with_options_if_asked(argc, argv,
 					 builtin_gc_usage, builtin_gc_options);
 
-	strvec_pushl(&repack, "repack", "-d", "-l", NULL);
+	strvec_pushl(&repack_args, "repack", "-d", "-l", NULL);
 
 	gc_config(&cfg);
 
@@ -914,14 +916,14 @@ int cmd_gc(int argc,
 		die(_("failed to parse prune expiry value %s"), cfg.prune_expire);
 
 	if (aggressive) {
-		strvec_push(&repack, "-f");
+		strvec_push(&repack_args, "-f");
 		if (cfg.aggressive_depth > 0)
-			strvec_pushf(&repack, "--depth=%d", cfg.aggressive_depth);
+			strvec_pushf(&repack_args, "--depth=%d", cfg.aggressive_depth);
 		if (cfg.aggressive_window > 0)
-			strvec_pushf(&repack, "--window=%d", cfg.aggressive_window);
+			strvec_pushf(&repack_args, "--window=%d", cfg.aggressive_window);
 	}
 	if (opts.quiet)
-		strvec_push(&repack, "-q");
+		strvec_push(&repack_args, "-q");
 
 	if (opts.auto_flag) {
 		if (cfg.detach_auto && opts.detach < 0)
@@ -930,7 +932,7 @@ int cmd_gc(int argc,
 		/*
 		 * Auto-gc should be least intrusive as possible.
 		 */
-		if (!need_to_gc(&cfg)) {
+		if (!need_to_gc(&cfg, &repack_args)) {
 			ret = 0;
 			goto out;
 		}
@@ -952,7 +954,7 @@ int cmd_gc(int argc,
 			find_base_packs(&keep_pack, cfg.big_pack_threshold);
 		}
 
-		add_repack_all_option(&cfg, &keep_pack);
+		add_repack_all_option(&cfg, &keep_pack, &repack_args);
 		string_list_clear(&keep_pack, 0);
 	}
 
@@ -1014,9 +1016,9 @@ int cmd_gc(int argc,
 
 		repack_cmd.git_cmd = 1;
 		repack_cmd.close_object_store = 1;
-		strvec_pushv(&repack_cmd.args, repack.v);
+		strvec_pushv(&repack_cmd.args, repack_args.v);
 		if (run_command(&repack_cmd))
-			die(FAILED_RUN, repack.v[0]);
+			die(FAILED_RUN, repack_args.v[0]);
 
 		if (cfg.prune_expire) {
 			struct child_process prune_cmd = CHILD_PROCESS_INIT;
@@ -1067,6 +1069,7 @@ int cmd_gc(int argc,
 
 out:
 	maintenance_run_opts_release(&opts);
+	strvec_clear(&repack_args);
 	gc_config_release(&cfg);
 	return 0;
 }
@@ -1269,6 +1272,19 @@ static int maintenance_task_gc_background(struct maintenance_run_opts *opts,
 	return run_command(&child);
 }
 
+static int gc_condition(struct gc_config *cfg)
+{
+	/*
+	 * Note that it's fine to drop the repack arguments here, as we execute
+	 * git-gc(1) as a separate child process anyway. So it knows to compute
+	 * these arguments again.
+	 */
+	struct strvec repack_args = STRVEC_INIT;
+	int ret = need_to_gc(cfg, &repack_args);
+	strvec_clear(&repack_args);
+	return ret;
+}
+
 static int prune_packed(struct maintenance_run_opts *opts)
 {
 	struct child_process child = CHILD_PROCESS_INIT;
@@ -1596,7 +1612,7 @@ static const struct maintenance_task tasks[] = {
 		.name = "gc",
 		.foreground = maintenance_task_gc_foreground,
 		.background = maintenance_task_gc_background,
-		.auto_condition = need_to_gc,
+		.auto_condition = gc_condition,
 	},
 	[TASK_COMMIT_GRAPH] = {
 		.name = "commit-graph",

-- 
2.51.0.869.ge66316f041.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 2/8] builtin/gc: make `too_many_loose_objects()` reusable without GC config
  2025-10-16  7:26 [PATCH 0/8] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
  2025-10-16  7:26 ` [PATCH 1/8] builtin/gc: remove global `repack` variable Patrick Steinhardt
@ 2025-10-16  7:26 ` Patrick Steinhardt
  2025-10-16 20:59   ` Junio C Hamano
  2025-10-16  7:26 ` [PATCH 3/8] builtin/maintenance: introduce "geometric-repack" task Patrick Steinhardt
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-16  7:26 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau

To decide whether or not a repository needs to be repacked we estimate
the number of loose objects. If the number exceeds a certain threshold
we perform the repack, otherwise we don't.

This is done via `too_many_loose_objects()`, which takes as parameter
the `struct gc_config`. This configuration is only used to determine the
threshold. In a subsequent commit we'll add another caller of this
function that wants to pass a different limit than the one stored in
that structure.

Refactor the function accordingly so that we only take the limit as
parameter instead of the whole structure.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/gc.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index e9772eb3a30..026d3a1d714 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -447,7 +447,7 @@ static int rerere_gc_condition(struct gc_config *cfg UNUSED)
 	return should_gc;
 }
 
-static int too_many_loose_objects(struct gc_config *cfg)
+static int too_many_loose_objects(int limit)
 {
 	/*
 	 * Quickly check if a "gc" is needed, by estimating how
@@ -469,7 +469,7 @@ static int too_many_loose_objects(struct gc_config *cfg)
 	if (!dir)
 		return 0;
 
-	auto_threshold = DIV_ROUND_UP(cfg->gc_auto_threshold, 256);
+	auto_threshold = DIV_ROUND_UP(limit, 256);
 	while ((ent = readdir(dir)) != NULL) {
 		if (strspn(ent->d_name, "0123456789abcdef") != hexsz_loose ||
 		    ent->d_name[hexsz_loose] != '\0')
@@ -703,7 +703,7 @@ static int need_to_gc(struct gc_config *cfg, struct strvec *repack_args)
 
 		add_repack_all_option(cfg, &keep_pack, repack_args);
 		string_list_clear(&keep_pack, 0);
-	} else if (too_many_loose_objects(cfg))
+	} else if (too_many_loose_objects(cfg->gc_auto_threshold))
 		add_repack_incremental_option(repack_args);
 	else
 		return 0;
@@ -1057,7 +1057,7 @@ int cmd_gc(int argc,
 					     !opts.quiet && !daemonized ? COMMIT_GRAPH_WRITE_PROGRESS : 0,
 					     NULL);
 
-	if (opts.auto_flag && too_many_loose_objects(&cfg))
+	if (opts.auto_flag && too_many_loose_objects(cfg.gc_auto_threshold))
 		warning(_("There are too many unreachable loose objects; "
 			"run 'git prune' to remove them."));
 

-- 
2.51.0.869.ge66316f041.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 3/8] builtin/maintenance: introduce "geometric-repack" task
  2025-10-16  7:26 [PATCH 0/8] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
  2025-10-16  7:26 ` [PATCH 1/8] builtin/gc: remove global `repack` variable Patrick Steinhardt
  2025-10-16  7:26 ` [PATCH 2/8] builtin/gc: make `too_many_loose_objects()` reusable without GC config Patrick Steinhardt
@ 2025-10-16  7:26 ` Patrick Steinhardt
  2025-10-16 20:51   ` Justin Tobler
  2025-10-17 22:28   ` Taylor Blau
  2025-10-16  7:26 ` [PATCH 4/8] builtin/maintenance: don't silently ignore invalid strategy Patrick Steinhardt
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-16  7:26 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau

Introduce a new "geometric-repack" task. This task uses our geometric
repack infrastructure as provided by git-repack(1) itself, which is a
strategy that especially hosting providers tend to use to amortize the
costs of repacking objects.

There is one issue though with geometric repacks, namely that they
unconditionally pack all loose objects, regardless of whether or not
they are reachable. This is done because it means that we can completely
skip the reachability step, which significantly speeds up the operation.
But it has the big downside that we are unable to expire objects over
time.

To address this issue we thus use a split strategy in this new task:
whenever a geometric repack would merge together all packs, we instead
do an all-into-one repack. By default, these all-into-one repacks have
cruft packs enabled, so unreachable objects would now be written into
their own pack. Consequently, they won't be soaked up during geometric
repacking anymore and can be expired with the next full repack, assuming
that their expiry date has surpassed.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/maintenance.adoc |  11 +++
 builtin/gc.c                          | 102 +++++++++++++++++++++++++
 t/t7900-maintenance.sh                | 137 ++++++++++++++++++++++++++++++++++
 3 files changed, 250 insertions(+)

diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc
index 2f719342183..26dc5de423f 100644
--- a/Documentation/config/maintenance.adoc
+++ b/Documentation/config/maintenance.adoc
@@ -75,6 +75,17 @@ maintenance.incremental-repack.auto::
 	number of pack-files not in the multi-pack-index is at least the value
 	of `maintenance.incremental-repack.auto`. The default value is 10.
 
+maintenance.geometric-repack.auto::
+	This integer config option controls how often the `geometric-repack`
+	task should be run as part of `git maintenance run --auto`. If zero,
+	then the `geometric-repack` task will not run with the `--auto`
+	option. A negative value will force the task to run every time.
+	Otherwise, a positive value implies the command should run either when
+	there are packfiles that need to be merged together to retain the
+	geometric progression, or when there are at least this many loose
+	objects that would be written into a new packfile. The default value is
+	100.
+
 maintenance.reflog-expire.auto::
 	This integer config option controls how often the `reflog-expire` task
 	should be run as part of `git maintenance run --auto`. If zero, then
diff --git a/builtin/gc.c b/builtin/gc.c
index 026d3a1d714..2c9ecd464d2 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -34,6 +34,7 @@
 #include "pack-objects.h"
 #include "path.h"
 #include "reflog.h"
+#include "repack.h"
 #include "rerere.h"
 #include "blob.h"
 #include "tree.h"
@@ -254,6 +255,7 @@ enum maintenance_task_label {
 	TASK_PREFETCH,
 	TASK_LOOSE_OBJECTS,
 	TASK_INCREMENTAL_REPACK,
+	TASK_GEOMETRIC_REPACK,
 	TASK_GC,
 	TASK_COMMIT_GRAPH,
 	TASK_PACK_REFS,
@@ -1566,6 +1568,101 @@ static int maintenance_task_incremental_repack(struct maintenance_run_opts *opts
 	return 0;
 }
 
+static int maintenance_task_geometric_repack(struct maintenance_run_opts *opts,
+					     struct gc_config *cfg)
+{
+	struct pack_geometry geometry = {
+		.split_factor = 2,
+	};
+	struct pack_objects_args po_args = {
+		.local = 1,
+	};
+	struct existing_packs existing_packs = EXISTING_PACKS_INIT;
+	struct string_list kept_packs = STRING_LIST_INIT_DUP;
+	struct child_process child = CHILD_PROCESS_INIT;
+	int ret;
+
+	existing_packs.repo = the_repository;
+	existing_packs_collect(&existing_packs, &kept_packs);
+	pack_geometry_init(&geometry, &existing_packs, &po_args);
+	pack_geometry_split(&geometry);
+
+	child.git_cmd = 1;
+
+	strvec_pushl(&child.args, "repack", "-d", "-l", NULL);
+	if (geometry.split < geometry.pack_nr)
+		strvec_push(&child.args, "--geometric=2");
+	else
+		add_repack_all_option(cfg, NULL, &child.args);
+	if (opts->quiet)
+		strvec_push(&child.args, "--quiet");
+	if (the_repository->settings.core_multi_pack_index)
+		strvec_push(&child.args, "--write-midx");
+
+	if (run_command(&child)) {
+		ret = error(_("failed to perform geometric repack"));
+		goto out;
+	}
+
+	ret = 0;
+
+out:
+	existing_packs_release(&existing_packs);
+	pack_geometry_release(&geometry);
+	return ret;
+}
+
+static int geometric_repack_auto_condition(struct gc_config *cfg UNUSED)
+{
+	struct pack_geometry geometry = {
+		.split_factor = 2,
+	};
+	struct pack_objects_args po_args = {
+		.local = 1,
+	};
+	struct existing_packs existing_packs = EXISTING_PACKS_INIT;
+	struct string_list kept_packs = STRING_LIST_INIT_DUP;
+	int auto_value = 100;
+	int ret;
+
+	repo_config_get_int(the_repository, "maintenance.geometric-repack.auto",
+			    &auto_value);
+	if (!auto_value)
+		return 0;
+	if (auto_value < 0)
+		return 1;
+
+	existing_packs.repo = the_repository;
+	existing_packs_collect(&existing_packs, &kept_packs);
+	pack_geometry_init(&geometry, &existing_packs, &po_args);
+	pack_geometry_split(&geometry);
+
+	/*
+	 * When we'd merge at least two packs with one another we always
+	 * perform the repack.
+	 */
+	if (geometry.split) {
+		ret = 1;
+		goto out;
+	}
+
+	/*
+	 * Otherwise, we estimate the number of loose objects to determine
+	 * whether we want to create a new packfile or not.
+	 */
+	if (too_many_loose_objects(auto_value)) {
+		ret = 1;
+		goto out;
+	}
+
+	ret = 0;
+
+out:
+	existing_packs_release(&existing_packs);
+	pack_geometry_release(&geometry);
+	return ret;
+}
+
 typedef int (*maintenance_task_fn)(struct maintenance_run_opts *opts,
 				   struct gc_config *cfg);
 typedef int (*maintenance_auto_fn)(struct gc_config *cfg);
@@ -1608,6 +1705,11 @@ static const struct maintenance_task tasks[] = {
 		.background = maintenance_task_incremental_repack,
 		.auto_condition = incremental_repack_auto_condition,
 	},
+	[TASK_GEOMETRIC_REPACK] = {
+		.name = "geometric-repack",
+		.background = maintenance_task_geometric_repack,
+		.auto_condition = geometric_repack_auto_condition,
+	},
 	[TASK_GC] = {
 		.name = "gc",
 		.foreground = maintenance_task_gc_foreground,
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index ddd273d8dc2..83a373fe94b 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -465,6 +465,143 @@ test_expect_success 'maintenance.incremental-repack.auto (when config is unset)'
 	)
 '
 
+run_and_verify_geometric_pack () {
+	EXPECTED_PACKS="$1" &&
+
+	# Verify that we perform a geometric repack.
+	rm -f "trace2.txt" &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+		git maintenance run --task=geometric-repack 2>/dev/null &&
+	test_subcommand git repack -d -l --geometric=2 --quiet --write-midx <trace2.txt &&
+
+	# Verify that the number of packfiles matches our expectation.
+	ls -l .git/objects/pack/*.pack >packfiles &&
+	test_line_count = "$EXPECTED_PACKS" packfiles &&
+
+	# And verify that there are no loose objects anymore.
+	cat >expect <<-\EOF &&
+	info
+	pack
+	EOF
+	ls .git/objects >actual &&
+	test_cmp expect actual
+}
+
+test_expect_success 'geometric repacking task' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		git config set maintenance.auto false &&
+		test_commit initial &&
+
+		# The initial repack causes an all-into-one repack.
+		GIT_TRACE2_EVENT="$(pwd)/initial-repack.txt" \
+			git maintenance run --task=geometric-repack 2>/dev/null &&
+		test_subcommand git repack -d -l --cruft --cruft-expiration=2.weeks.ago --quiet --write-midx <initial-repack.txt &&
+
+		# Repacking should now cause a no-op geometric repack because
+		# no packfiles need to be combined.
+		ls -l .git/objects/pack >before &&
+		run_and_verify_geometric_pack 1 &&
+		ls -l .git/objects/pack >after &&
+		test_cmp before after &&
+
+		# This incremental change creates a new packfile that only
+		# soaks up loose objects. The packfiles are not getting merged
+		# at this point.
+		test_commit loose &&
+		run_and_verify_geometric_pack 2 &&
+
+		# Both packfiles have 3 objects, so the next run would cause us
+		# to merge both packfiles together. This should be turned into
+		# an all-into-one-repack.
+		GIT_TRACE2_EVENT="$(pwd)/all-into-one-repack.txt" \
+			git maintenance run --task=geometric-repack 2>/dev/null &&
+		test_subcommand git repack -d -l --cruft --cruft-expiration=2.weeks.ago --quiet --write-midx <all-into-one-repack.txt &&
+
+		# The geometric repack soaks up unreachable objects.
+		echo blob-1 | git hash-object -w --stdin -t blob &&
+		run_and_verify_geometric_pack 2 &&
+
+		# A second unreachable object should be written into another packfile.
+		echo blob-2 | git hash-object -w --stdin -t blob &&
+		run_and_verify_geometric_pack 3 &&
+
+		# And these two small packs should now be merged via the
+		# geometric repack. The large packfile should remain intact.
+		run_and_verify_geometric_pack 2 &&
+
+		# If we now add two more objects and repack twice we should
+		# then see another all-into-one repack. This time around
+		# though, as we have unreachable objects, we should also see a
+		# cruft pack.
+		echo blob-3 | git hash-object -w --stdin -t blob &&
+		echo blob-4 | git hash-object -w --stdin -t blob &&
+		run_and_verify_geometric_pack 3 &&
+		GIT_TRACE2_EVENT="$(pwd)/cruft-repack.txt" \
+			git maintenance run --task=geometric-repack 2>/dev/null &&
+		test_subcommand git repack -d -l --cruft --cruft-expiration=2.weeks.ago --quiet --write-midx <cruft-repack.txt &&
+		ls .git/objects/pack/*.pack >packs &&
+		test_line_count = 2 packs &&
+		ls .git/objects/pack/*.mtimes >cruft &&
+		test_line_count = 1 cruft
+	)
+'
+
+test_geometric_repack_needed () {
+	NEEDED="$1"
+	AUTO_LIMIT="$2" &&
+	rm -f trace2.txt &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+		git ${AUTO_LIMIT:+-c maintenance.geometric-repack.auto=$AUTO_LIMIT} maintenance run --auto --task=geometric-repack &&
+	case "$NEEDED" in
+	true)
+		test_grep "\[\"git\",\"repack\"," trace2.txt;;
+	false)
+		! test_grep "\[\"git\",\"repack\"," trace2.txt;;
+	*)
+		BUG "invalid parameter: $NEEDED";;
+	esac
+}
+
+test_expect_success 'geometric repacking with --auto' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		# An empty repository does not need repacking, except when
+		# explicitly told to do it.
+		test_geometric_repack_needed false &&
+		test_geometric_repack_needed false 0 &&
+		test_geometric_repack_needed false 1 &&
+		test_geometric_repack_needed true -1 &&
+
+		test_oid_init &&
+
+		# Loose objects cause a repack when crossing the limit. Note
+		# that the number of objects gets extrapolated by having a look
+		# at the "objects/17/" shard.
+		test_commit "$(test_oid blob17_1)" &&
+		test_geometric_repack_needed false &&
+		test_commit "$(test_oid blob17_2)" &&
+		test_geometric_repack_needed false 257 &&
+		test_geometric_repack_needed true 256 &&
+
+		# Force another repack.
+		test_commit first &&
+		test_commit second &&
+		test_geometric_repack_needed true -1 &&
+
+		# We now have two packfiles that would be merged together. As
+		# such, the repack should always happen unless the user has
+		# disabled the auto task.
+		test_geometric_repack_needed false 0 &&
+		test_geometric_repack_needed true 9000
+	)
+'
+
 test_expect_success 'pack-refs task' '
 	for n in $(test_seq 1 5)
 	do

-- 
2.51.0.869.ge66316f041.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 4/8] builtin/maintenance: don't silently ignore invalid strategy
  2025-10-16  7:26 [PATCH 0/8] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
                   ` (2 preceding siblings ...)
  2025-10-16  7:26 ` [PATCH 3/8] builtin/maintenance: introduce "geometric-repack" task Patrick Steinhardt
@ 2025-10-16  7:26 ` Patrick Steinhardt
  2025-10-16  7:26 ` [PATCH 5/8] builtin/maintenance: run maintenance tasks depending on type Patrick Steinhardt
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-16  7:26 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau

When parsing maintenance strategies we completely ignore the
user-configured value in case it is unknown to us. This makes it
basically undiscoverable to the user that scheduled maintenance is
devolving into a no-op.

Change this to instead die when seeing an unknown maintenance strategy.
While at it, pull out the parsing logic into a separate function so that
we can reuse it in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/gc.c           | 17 +++++++++++------
 t/t7900-maintenance.sh |  5 +++++
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index 2c9ecd464d2..e358e8d13b4 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1848,6 +1848,13 @@ static const struct maintenance_strategy incremental_strategy = {
 	},
 };
 
+static struct maintenance_strategy parse_maintenance_strategy(const char *name)
+{
+	if (!strcasecmp(name, "incremental"))
+		return incremental_strategy;
+	die(_("unknown maintenance strategy: '%s'"), name);
+}
+
 static void initialize_task_config(struct maintenance_run_opts *opts,
 				   const struct string_list *selected_tasks)
 {
@@ -1883,12 +1890,10 @@ static void initialize_task_config(struct maintenance_run_opts *opts,
 	 * override specific aspects of our strategy.
 	 */
 	if (opts->schedule) {
-		strategy = none_strategy;
-
-		if (!repo_config_get_string_tmp(the_repository, "maintenance.strategy", &config_str)) {
-			if (!strcasecmp(config_str, "incremental"))
-				strategy = incremental_strategy;
-		}
+		if (!repo_config_get_string_tmp(the_repository, "maintenance.strategy", &config_str))
+			strategy = parse_maintenance_strategy(config_str);
+		else
+			strategy = none_strategy;
 	} else {
 		strategy = default_strategy;
 	}
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 83a373fe94b..45334f7ad3a 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -1230,6 +1230,11 @@ test_expect_success 'fails when running outside of a repository' '
 	nongit test_must_fail git maintenance unregister
 '
 
+test_expect_success 'fails when configured to use an invalid strategy' '
+	test_must_fail git -c maintenance.strategy=invalid maintenance run --schedule=hourly 2>err &&
+	test_grep "unknown maintenance strategy: .invalid." err
+'
+
 test_expect_success 'register and unregister bare repo' '
 	test_when_finished "git config --global --unset-all maintenance.repo || :" &&
 	test_might_fail git config --global --unset-all maintenance.repo &&

-- 
2.51.0.869.ge66316f041.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 5/8] builtin/maintenance: run maintenance tasks depending on type
  2025-10-16  7:26 [PATCH 0/8] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
                   ` (3 preceding siblings ...)
  2025-10-16  7:26 ` [PATCH 4/8] builtin/maintenance: don't silently ignore invalid strategy Patrick Steinhardt
@ 2025-10-16  7:26 ` Patrick Steinhardt
  2025-10-16  7:26 ` [PATCH 6/8] builtin/maintenance: extend "maintenance.strategy" to manual maintenance Patrick Steinhardt
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-16  7:26 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau

We basically have three different ways to execute repository
maintenance:

  1. Manual maintenance via `git maintenance run`.

  2. Automatic maintenance via `git maintenance run --auto`.

  3. Scheduled maintenance via `git maintenance run --schedule=`.

At the moment, maintenance strategies only have an effect for the last
type of maintenance. This is about to change in subsequent commits, but
to do so we need to be able to skip some tasks depending on how exactly
maintenance was invoked.

Introduce a new maintenance type that discern between manual (1 & 2) and
scheduled (3) maintenance. Convert the `enabled` field into a bitset so
that it becomes possible to specifiy which tasks exactly should run in a
specific context.

The types picked for existing strategies match the status quo:

  - The default strategy is only ever executed as part of a manual
    maintenance run. It is not possible to use it for scheduled
    maintenance.

  - The incremental strategy is only ever executed as part of a
    scheduled maintenance run. It is not possible to use it for manual
    maintenance.

The strategies will be tweaked in subsequent commits to make use of this
new infrastructure.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/gc.c | 30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index e358e8d13b4..4f70650e7ac 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1820,30 +1820,39 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts,
 	return result;
 }
 
+enum maintenance_type {
+	/* As invoked via `git maintenance run --schedule=`. */
+	MAINTENANCE_TYPE_SCHEDULED = (1 << 0),
+	/* As invoked via `git maintenance run` and with `--auto`. */
+	MAINTENANCE_TYPE_MANUAL    = (1 << 1),
+};
+
 struct maintenance_strategy {
 	struct {
-		int enabled;
+		unsigned type;
 		enum schedule_priority schedule;
 	} tasks[TASK__COUNT];
 };
 
 static const struct maintenance_strategy none_strategy = { 0 };
+
 static const struct maintenance_strategy default_strategy = {
 	.tasks = {
-		[TASK_GC].enabled = 1,
+		[TASK_GC].type = MAINTENANCE_TYPE_MANUAL,
 	},
 };
+
 static const struct maintenance_strategy incremental_strategy = {
 	.tasks = {
-		[TASK_COMMIT_GRAPH].enabled = 1,
+		[TASK_COMMIT_GRAPH].type = MAINTENANCE_TYPE_SCHEDULED,
 		[TASK_COMMIT_GRAPH].schedule = SCHEDULE_HOURLY,
-		[TASK_PREFETCH].enabled = 1,
+		[TASK_PREFETCH].type = MAINTENANCE_TYPE_SCHEDULED,
 		[TASK_PREFETCH].schedule = SCHEDULE_HOURLY,
-		[TASK_INCREMENTAL_REPACK].enabled = 1,
+		[TASK_INCREMENTAL_REPACK].type = MAINTENANCE_TYPE_SCHEDULED,
 		[TASK_INCREMENTAL_REPACK].schedule = SCHEDULE_DAILY,
-		[TASK_LOOSE_OBJECTS].enabled = 1,
+		[TASK_LOOSE_OBJECTS].type = MAINTENANCE_TYPE_SCHEDULED,
 		[TASK_LOOSE_OBJECTS].schedule = SCHEDULE_DAILY,
-		[TASK_PACK_REFS].enabled = 1,
+		[TASK_PACK_REFS].type = MAINTENANCE_TYPE_SCHEDULED,
 		[TASK_PACK_REFS].schedule = SCHEDULE_WEEKLY,
 	},
 };
@@ -1860,6 +1869,7 @@ static void initialize_task_config(struct maintenance_run_opts *opts,
 {
 	struct strbuf config_name = STRBUF_INIT;
 	struct maintenance_strategy strategy;
+	enum maintenance_type type;
 	const char *config_str;
 
 	/*
@@ -1894,8 +1904,10 @@ static void initialize_task_config(struct maintenance_run_opts *opts,
 			strategy = parse_maintenance_strategy(config_str);
 		else
 			strategy = none_strategy;
+		type = MAINTENANCE_TYPE_SCHEDULED;
 	} else {
 		strategy = default_strategy;
+		type = MAINTENANCE_TYPE_MANUAL;
 	}
 
 	for (size_t i = 0; i < TASK__COUNT; i++) {
@@ -1905,8 +1917,8 @@ static void initialize_task_config(struct maintenance_run_opts *opts,
 		strbuf_addf(&config_name, "maintenance.%s.enabled",
 			    tasks[i].name);
 		if (!repo_config_get_bool(the_repository, config_name.buf, &config_value))
-			strategy.tasks[i].enabled = config_value;
-		if (!strategy.tasks[i].enabled)
+			strategy.tasks[i].type = config_value ? type : 0;
+		if (!(strategy.tasks[i].type & type))
 			continue;
 
 		if (opts->schedule) {

-- 
2.51.0.869.ge66316f041.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 6/8] builtin/maintenance: extend "maintenance.strategy" to manual maintenance
  2025-10-16  7:26 [PATCH 0/8] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
                   ` (4 preceding siblings ...)
  2025-10-16  7:26 ` [PATCH 5/8] builtin/maintenance: run maintenance tasks depending on type Patrick Steinhardt
@ 2025-10-16  7:26 ` Patrick Steinhardt
  2025-10-16  7:26 ` [PATCH 7/8] builtin/maintenance: make "gc" strategy accessible Patrick Steinhardt
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-16  7:26 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau

The "maintenance.strategy" configuration allows users to configure how
Git is supposed to perform repository maintenance. The idea is that we
provide a set of high-level strategies that may be useful in different
contexts, like for example when handling a large monorepo. Furthermore,
the strategy can be tweaked by the user by overriding specific tasks.

In its current form though, the strategy only applies to scheduled
maintenance. This creates something of a gap, as scheduled and manual
maintenance will now use _different_ strategies as the latter would
continue to use git-gc(1) by default. This makes the strategies way less
useful than they could be on the one hand. But even more importantly,
the two different strategies might clash with one another, where one of
the strategies performs maintenance in such a way that it discards
benefits from the other strategy.

So ideally, it should be possible to pick one strategy that then applies
globally to all the different ways that we perform maintenance. This
doesn't necessarily mean that the strategy always does the _same_ thing
for every maintenance type. But it means that the strategy can configure
the different types to work in tandem with each other.

Change the meaning of "maintenance.strategy" accordingly so that the
strategy is applied to both types, manual and scheduled. As preceding
commits have introduced logic to run maintenance tasks depending on this
type we can tweak strategies so that they perform those tasks depending
on the context.

Note that this raises the question of backwards compatibility: when the
user has configured the "incremental" strategy we would have ignored
that strategy beforehand. Instead, repository maintenance would have
continued to use git-gc(1) by default.

But luckily, we can match that behaviour by:

  - Keeping all current tasks of the incremental strategy as
    `MAINTENANCE_TYPE_SCHEDULED`. This ensures that those tasks will not
    run during manual maintenance.

  - Configuring the "gc" task so that it is invoked during manual
    maintenance.

Like this, the user shouldn't observe any difference in behaviour.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/maintenance.adoc | 22 ++++++++++++-------
 builtin/gc.c                          | 24 ++++++++++++++++-----
 t/t7900-maintenance.sh                | 40 +++++++++++++++++++++++++++++++++++
 3 files changed, 73 insertions(+), 13 deletions(-)

diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc
index 26dc5de423f..dc6fd9b7fda 100644
--- a/Documentation/config/maintenance.adoc
+++ b/Documentation/config/maintenance.adoc
@@ -16,19 +16,25 @@ detach.
 
 maintenance.strategy::
 	This string config option provides a way to specify one of a few
-	recommended schedules for background maintenance. This only affects
-	which tasks are run during `git maintenance run --schedule=X`
-	commands, provided no `--task=<task>` arguments are provided.
-	Further, if a `maintenance.<task>.schedule` config value is set,
-	then that value is used instead of the one provided by
-	`maintenance.strategy`. The possible strategy strings are:
+	recommended strategies for repository maintenance. This affects
+	which tasks are run during `git maintenance run`, provided no
+	`--task=<task>` arguments are provided. This setting impacts manual
+	maintenance, auto-maintenance as well as scheduled maintenance. The
+	tasks that run may be different depending on the maintenance type.
 +
-* `none`: This default setting implies no tasks are run at any schedule.
+The maintenance strategy can be further tweaked by setting
+`maintenance.<task>.enabled` and `maintenance.<task>.schedule`. If set, these
+values are used instead of the defaults provided by `maintenance.strategy`.
++
+The possible strategies are:
++
+* `none`: This strategy implies no tasks are run at all. This is the default
+  strategy for scheduled maintenance.
 * `incremental`: This setting optimizes for performing small maintenance
   activities that do not delete any data. This does not schedule the `gc`
   task, but runs the `prefetch` and `commit-graph` tasks hourly, the
   `loose-objects` and `incremental-repack` tasks daily, and the `pack-refs`
-  task weekly.
+  task weekly. Manual repository maintenance uses the `gc` task.
 
 maintenance.<task>.enabled::
 	This boolean config option controls whether the maintenance task
diff --git a/builtin/gc.c b/builtin/gc.c
index 4f70650e7ac..971d557d370 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1854,6 +1854,19 @@ static const struct maintenance_strategy incremental_strategy = {
 		[TASK_LOOSE_OBJECTS].schedule = SCHEDULE_DAILY,
 		[TASK_PACK_REFS].type = MAINTENANCE_TYPE_SCHEDULED,
 		[TASK_PACK_REFS].schedule = SCHEDULE_WEEKLY,
+
+		/*
+		 * Historically, the "incremental" strategy was only available
+		 * in the context of scheduled maintenance when set up via
+		 * "maintenance.strategy". We have later expanded that config
+		 * to also cover manual maintenance.
+		 *
+		 * To retain backwards compatibility with the previous status
+		 * quo we thus run git-gc(1) in case manual maintenance was
+		 * requested. This is the same as the default strategy, which
+		 * would have been in use beforehand.
+		 */
+		[TASK_GC].type = MAINTENANCE_TYPE_MANUAL,
 	},
 };
 
@@ -1897,19 +1910,20 @@ static void initialize_task_config(struct maintenance_run_opts *opts,
 	 *   - Unscheduled maintenance uses our default strategy.
 	 *
 	 * Both of these are affected by the gitconfig though, which may
-	 * override specific aspects of our strategy.
+	 * override specific aspects of our strategy. Furthermore, both
+	 * strategies can be overridden by setting "maintenance.strategy".
 	 */
 	if (opts->schedule) {
-		if (!repo_config_get_string_tmp(the_repository, "maintenance.strategy", &config_str))
-			strategy = parse_maintenance_strategy(config_str);
-		else
-			strategy = none_strategy;
+		strategy = none_strategy;
 		type = MAINTENANCE_TYPE_SCHEDULED;
 	} else {
 		strategy = default_strategy;
 		type = MAINTENANCE_TYPE_MANUAL;
 	}
 
+	if (!repo_config_get_string_tmp(the_repository, "maintenance.strategy", &config_str))
+		strategy = parse_maintenance_strategy(config_str);
+
 	for (size_t i = 0; i < TASK__COUNT; i++) {
 		int config_value;
 
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 45334f7ad3a..439f1bfba0c 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -853,6 +853,46 @@ test_expect_success 'maintenance.strategy inheritance' '
 		<modified-daily.txt
 '
 
+test_strategy () {
+	STRATEGY="$1"
+	shift
+
+	cat >expect &&
+	rm -f trace2.txt &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+		git -c maintenance.strategy=$STRATEGY maintenance run --quiet "$@" &&
+	sed -n 's/{"event":"child_start","sid":"[^/"]*",.*,"argv":\["\(.*\)\"]}/\1/p' <trace2.txt |
+		sed 's/","/ /g'  >actual
+	test_cmp expect actual
+}
+
+test_expect_success 'maintenance.strategy is respected' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit initial &&
+
+		test_must_fail git -c maintenance.strategy=unknown maintenance run 2>err &&
+		test_grep "unknown maintenance strategy: .unknown." err &&
+
+		test_strategy incremental <<-\EOF &&
+		git pack-refs --all --prune
+		git reflog expire --all
+		git gc --quiet --no-detach --skip-foreground-tasks
+		EOF
+
+		test_strategy incremental --schedule=weekly <<-\EOF
+		git pack-refs --all --prune
+		git prune-packed --quiet
+		git multi-pack-index write --no-progress
+		git multi-pack-index expire --no-progress
+		git multi-pack-index repack --no-progress --batch-size=1
+		git commit-graph write --split --reachable --no-progress
+		EOF
+	)
+'
+
 test_expect_success 'register and unregister' '
 	test_when_finished git config --global --unset-all maintenance.repo &&
 

-- 
2.51.0.869.ge66316f041.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 7/8] builtin/maintenance: make "gc" strategy accessible
  2025-10-16  7:26 [PATCH 0/8] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
                   ` (5 preceding siblings ...)
  2025-10-16  7:26 ` [PATCH 6/8] builtin/maintenance: extend "maintenance.strategy" to manual maintenance Patrick Steinhardt
@ 2025-10-16  7:26 ` Patrick Steinhardt
  2025-10-16  7:26 ` [PATCH 8/8] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-16  7:26 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau

While the user can pick the "incremental" maintenance strategy, it is
not possible to explicitly use the "gc" strategy. This has two
downsides:

  - It is impossible to use the default "gc" strategy for a specific
    repository when the strategy was globally set to a different strategy.

  - It is not possible to use git-gc(1) for scheduled maintenance.

Address these issues by making making the "gc" strategy configurable.
Furthermore, extend the strategy so that git-gc(1) runs for both manual
and scheduled maintenance.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/maintenance.adoc |  2 ++
 builtin/gc.c                          |  9 ++++++---
 t/t7900-maintenance.sh                | 14 +++++++++++++-
 3 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc
index dc6fd9b7fda..648b6db47c6 100644
--- a/Documentation/config/maintenance.adoc
+++ b/Documentation/config/maintenance.adoc
@@ -30,6 +30,8 @@ The possible strategies are:
 +
 * `none`: This strategy implies no tasks are run at all. This is the default
   strategy for scheduled maintenance.
+* `gc`: This strategy runs the `gc` task. This is the default strategy for
+  manual maintenance.
 * `incremental`: This setting optimizes for performing small maintenance
   activities that do not delete any data. This does not schedule the `gc`
   task, but runs the `prefetch` and `commit-graph` tasks hourly, the
diff --git a/builtin/gc.c b/builtin/gc.c
index 971d557d370..3673f3db630 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1836,9 +1836,10 @@ struct maintenance_strategy {
 
 static const struct maintenance_strategy none_strategy = { 0 };
 
-static const struct maintenance_strategy default_strategy = {
+static const struct maintenance_strategy gc_strategy = {
 	.tasks = {
-		[TASK_GC].type = MAINTENANCE_TYPE_MANUAL,
+		[TASK_GC].type = MAINTENANCE_TYPE_MANUAL | MAINTENANCE_TYPE_SCHEDULED,
+		[TASK_GC].schedule = SCHEDULE_DAILY,
 	},
 };
 
@@ -1874,6 +1875,8 @@ static struct maintenance_strategy parse_maintenance_strategy(const char *name)
 {
 	if (!strcasecmp(name, "incremental"))
 		return incremental_strategy;
+	if (!strcasecmp(name, "gc"))
+		return gc_strategy;
 	die(_("unknown maintenance strategy: '%s'"), name);
 }
 
@@ -1917,7 +1920,7 @@ static void initialize_task_config(struct maintenance_run_opts *opts,
 		strategy = none_strategy;
 		type = MAINTENANCE_TYPE_SCHEDULED;
 	} else {
-		strategy = default_strategy;
+		strategy = gc_strategy;
 		type = MAINTENANCE_TYPE_MANUAL;
 	}
 
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 439f1bfba0c..1acd701830e 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -882,7 +882,7 @@ test_expect_success 'maintenance.strategy is respected' '
 		git gc --quiet --no-detach --skip-foreground-tasks
 		EOF
 
-		test_strategy incremental --schedule=weekly <<-\EOF
+		test_strategy incremental --schedule=weekly <<-\EOF &&
 		git pack-refs --all --prune
 		git prune-packed --quiet
 		git multi-pack-index write --no-progress
@@ -890,6 +890,18 @@ test_expect_success 'maintenance.strategy is respected' '
 		git multi-pack-index repack --no-progress --batch-size=1
 		git commit-graph write --split --reachable --no-progress
 		EOF
+
+		test_strategy gc <<-\EOF &&
+		git pack-refs --all --prune
+		git reflog expire --all
+		git gc --quiet --no-detach --skip-foreground-tasks
+		EOF
+
+		test_strategy gc --schedule=weekly <<-\EOF
+		git pack-refs --all --prune
+		git reflog expire --all
+		git gc --quiet --no-detach --skip-foreground-tasks
+		EOF
 	)
 '
 

-- 
2.51.0.869.ge66316f041.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 8/8] builtin/maintenance: introduce "geometric" strategy
  2025-10-16  7:26 [PATCH 0/8] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
                   ` (6 preceding siblings ...)
  2025-10-16  7:26 ` [PATCH 7/8] builtin/maintenance: make "gc" strategy accessible Patrick Steinhardt
@ 2025-10-16  7:26 ` Patrick Steinhardt
  2025-10-21 14:13 ` [PATCH v2 0/9] " Patrick Steinhardt
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-16  7:26 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau

We have two different repacking strategies in Git:

  - The "gc" strategy uses git-gc(1).

  - The "incremental" strategy uses multi-pack indices and `git
    multi-pack-index repack` to merge together smaller packfiles as
    determined by a specific batch size.

The former strategy is our old and trusted default, whereas the latter
has historically been used for our scheduled maintenance. But both
strategies have their shortcomings:

  - The "gc" strategy performs regular all-into-one repacks. Furthermore
    it is rather inflexible, as it is not easily possible for a user to
    enable or disable specific subtasks.

  - The "incremental" strategy is not a full replacement for the "gc"
    strategy as it doesn't know to prune stale data.

So today, we don't have a strategy that is well-suited for large repos
while being a full replacement for the "gc" strategy.

Introduce a new "geometric" strategy that aims to fill this gap. This
strategy invokes all the usual cleanup tasks that git-gc(1) does like
pruning reflogs and rerere caches as well as stale worktrees. But where
it differs from both the "gc" and "incremental" strategy is that it uses
our geometric repacking infrastructure exposed by git-repack(1) to
repack packfiles. The advantage of geometric repacking is that we only
need to perform an all-into-one repack when the object count in a repo
has grown significantly.

One downside of this strategy is that pruning of unreferenced objects is
not going to happen regularly anymore. Every geometric repack knows to
soak up all loose objects regardless of their reachability, and merging
two or more packs doesn't consider reachability, either. Consequently,
the number of unreachable objects will grow over time.

This is remedied by doing an all-into-one repack instead of a geometric
repack whenever we determine that the geometric repack would end up
merging all packfiles anyway. This all-into-one repack then performs our
usual reachability checks and writes unreachable objects into a cruft
pack. As cruft packs won't ever be merged during geometric repacks we
can thus phase out these objects over time.

Of course, this still means that we retain unreachable objects for far
longer than with the "gc" strategy. But the maintenance strategy is
intended especially for large repositories, where the basic assumption
is that the set of unreachable objects will be significantly dwarfed by
the number of reachable objects.

If this assumption is ever proven to be too disadvantageous we could for
example introduce a time-based strategy: if the largest packfile has not
been touched for longer than $T, we perform an all-into-one repack. But
for now, such a mechanism is deferred into the future as it is not clear
yet whether it is needed in the first place.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/maintenance.adoc |  9 +++++++++
 builtin/gc.c                          | 19 +++++++++++++++++++
 t/t7900-maintenance.sh                | 20 +++++++++++++++++++-
 3 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc
index 648b6db47c6..5ab88c2b328 100644
--- a/Documentation/config/maintenance.adoc
+++ b/Documentation/config/maintenance.adoc
@@ -32,6 +32,15 @@ The possible strategies are:
   strategy for scheduled maintenance.
 * `gc`: This strategy runs the `gc` task. This is the default strategy for
   manual maintenance.
+* `geometric`: This strategy performs geometric repacking of packfiles and
+  keeps auxiliary data structures up-to-date. The strategy expires data in the
+  reflog and removes worktrees that cannot be located anymore. When the
+  geometric repacking strategy would decide to do an all-into-one repack, then
+  the strategy generates a cruft pack for all unreachable objects. Objects that
+  are already part of a cruft pack will be expired.
++
+This repacking strategy is a full replacement for the `gc` strategy and is
+recommended for large repositories.
 * `incremental`: This setting optimizes for performing small maintenance
   activities that do not delete any data. This does not schedule the `gc`
   task, but runs the `prefetch` and `commit-graph` tasks hourly, the
diff --git a/builtin/gc.c b/builtin/gc.c
index 3673f3db630..bf603de8a2f 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1871,12 +1871,31 @@ static const struct maintenance_strategy incremental_strategy = {
 	},
 };
 
+static const struct maintenance_strategy geometric_strategy = {
+	.tasks = {
+		[TASK_COMMIT_GRAPH].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+		[TASK_COMMIT_GRAPH].schedule = SCHEDULE_HOURLY,
+		[TASK_GEOMETRIC_REPACK].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+		[TASK_GEOMETRIC_REPACK].schedule = SCHEDULE_DAILY,
+		[TASK_PACK_REFS].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+		[TASK_PACK_REFS].schedule = SCHEDULE_DAILY,
+		[TASK_RERERE_GC].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+		[TASK_RERERE_GC].schedule = SCHEDULE_WEEKLY,
+		[TASK_REFLOG_EXPIRE].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+		[TASK_REFLOG_EXPIRE].schedule = SCHEDULE_WEEKLY,
+		[TASK_WORKTREE_PRUNE].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+		[TASK_WORKTREE_PRUNE].schedule = SCHEDULE_WEEKLY,
+	},
+};
+
 static struct maintenance_strategy parse_maintenance_strategy(const char *name)
 {
 	if (!strcasecmp(name, "incremental"))
 		return incremental_strategy;
 	if (!strcasecmp(name, "gc"))
 		return gc_strategy;
+	if (!strcasecmp(name, "geometric"))
+		return geometric_strategy;
 	die(_("unknown maintenance strategy: '%s'"), name);
 }
 
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 1acd701830e..bb61b4d7f44 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -897,11 +897,29 @@ test_expect_success 'maintenance.strategy is respected' '
 		git gc --quiet --no-detach --skip-foreground-tasks
 		EOF
 
-		test_strategy gc --schedule=weekly <<-\EOF
+		test_strategy gc --schedule=weekly <<-\EOF &&
 		git pack-refs --all --prune
 		git reflog expire --all
 		git gc --quiet --no-detach --skip-foreground-tasks
 		EOF
+
+		test_strategy geometric <<-\EOF &&
+		git pack-refs --all --prune
+		git reflog expire --all
+		git repack -d -l --geometric=2 --quiet --write-midx
+		git commit-graph write --split --reachable --no-progress
+		git worktree prune --expire 3.months.ago
+		git rerere gc
+		EOF
+
+		test_strategy geometric --schedule=weekly <<-\EOF
+		git pack-refs --all --prune
+		git reflog expire --all
+		git repack -d -l --geometric=2 --quiet --write-midx
+		git commit-graph write --split --reachable --no-progress
+		git worktree prune --expire 3.months.ago
+		git rerere gc
+		EOF
 	)
 '
 

-- 
2.51.0.869.ge66316f041.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/8] builtin/gc: remove global `repack` variable
  2025-10-16  7:26 ` [PATCH 1/8] builtin/gc: remove global `repack` variable Patrick Steinhardt
@ 2025-10-16 20:07   ` Justin Tobler
  2025-10-17 20:58   ` Taylor Blau
  1 sibling, 0 replies; 69+ messages in thread
From: Justin Tobler @ 2025-10-16 20:07 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Derrick Stolee, Taylor Blau

On 25/10/16 09:26AM, Patrick Steinhardt wrote:
> The global `repack` variable is used to store all command line arguments
> that we eventually want to pass to git-repack(1). It is being appended
> to from multiple different functions, which makes it hard to follow the
> logic. Besides being hard to follow, it also makes it unnecessarily hard
> to reuse this infrastructure in new code.
> 
> Refactor the code so that we store this variable on the stack and pass
> a pointer to it around as needed. This is done so that we can reuse
> `add_repack_all_options()` in a subsequent commit.
> 
> The refactoring itself is straight-forward. One function that deserves
> attention though is `need_to_gc()`: this function determines whether or
> not we need to execute garbage collection for `git gc --auto`, but also
> for `git maintenance run --auto`. But besides figuring out whether we
> have to perform GC, the function also sets up the `repack` arguments.
> 
> For `git gc --auto` it's trivial to adapt, as we already have the
> on-stack variable at our fingertips. But for the maintenance condition
> it's less obvious what to do.
> 
> As it turns out, we can just use another temporary variable there that
> we then immediately discard. If we need to perform GC we execute a child
> git-gc(1) process to repack objects for us, and that process will have
> to recompute the arguments anyway.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
[snip]
> @@ -1269,6 +1272,19 @@ static int maintenance_task_gc_background(struct maintenance_run_opts *opts,
>  	return run_command(&child);
>  }
>  
> +static int gc_condition(struct gc_config *cfg)
> +{
> +	/*
> +	 * Note that it's fine to drop the repack arguments here, as we execute
> +	 * git-gc(1) as a separate child process anyway. So it knows to compute
> +	 * these arguments again.
> +	 */

Also we don't expect any arguments to be configured ahead of time so we
are good.

> +	struct strvec repack_args = STRVEC_INIT;
> +	int ret = need_to_gc(cfg, &repack_args);
> +	strvec_clear(&repack_args);
> +	return ret;
> +}
> +
>  static int prune_packed(struct maintenance_run_opts *opts)
>  {
>  	struct child_process child = CHILD_PROCESS_INIT;
> @@ -1596,7 +1612,7 @@ static const struct maintenance_task tasks[] = {
>  		.name = "gc",
>  		.foreground = maintenance_task_gc_foreground,
>  		.background = maintenance_task_gc_background,
> -		.auto_condition = need_to_gc,
> +		.auto_condition = gc_condition,

Now that the `need_to_gc()` function signature has changed, we use a
wrapper function that provides the repack args. In this case, only the
args that get set during `need_to_gc()` are required, and the args are
not needed afterwards, so it is safe to discard.

This patch looks good.

-Justin

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 3/8] builtin/maintenance: introduce "geometric-repack" task
  2025-10-16  7:26 ` [PATCH 3/8] builtin/maintenance: introduce "geometric-repack" task Patrick Steinhardt
@ 2025-10-16 20:51   ` Justin Tobler
  2025-10-17  6:13     ` Patrick Steinhardt
  2025-10-17 22:28   ` Taylor Blau
  1 sibling, 1 reply; 69+ messages in thread
From: Justin Tobler @ 2025-10-16 20:51 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Derrick Stolee, Taylor Blau

On 25/10/16 09:26AM, Patrick Steinhardt wrote:
> Introduce a new "geometric-repack" task. This task uses our geometric
> repack infrastructure as provided by git-repack(1) itself, which is a
> strategy that especially hosting providers tend to use to amortize the
> costs of repacking objects.
> 
> There is one issue though with geometric repacks, namely that they
> unconditionally pack all loose objects, regardless of whether or not
> they are reachable. This is done because it means that we can completely
> skip the reachability step, which significantly speeds up the operation.
> But it has the big downside that we are unable to expire objects over
> time.
> 
> To address this issue we thus use a split strategy in this new task:
> whenever a geometric repack would merge together all packs, we instead
> do an all-into-one repack. By default, these all-into-one repacks have
> cruft packs enabled, so unreachable objects would now be written into
> their own pack. Consequently, they won't be soaked up during geometric
> repacking anymore and can be expired with the next full repack, assuming
> that their expiry date has surpassed.

So normal geometric repacks don't ever check for unreachable objects,
even if all the packs are being merged together. With this new strategy
though, when a geometric repack would normally merge together all packs,
we instead to an all-into-one repack which does check for unreachable
objects.

Does checking for unreachable objects in this case slow down the repack
significantly?

> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
[snip]
> @@ -1566,6 +1568,101 @@ static int maintenance_task_incremental_repack(struct maintenance_run_opts *opts
>  	return 0;
>  }
>  
> +static int maintenance_task_geometric_repack(struct maintenance_run_opts *opts,
> +					     struct gc_config *cfg)
> +{
> +	struct pack_geometry geometry = {
> +		.split_factor = 2,
> +	};
> +	struct pack_objects_args po_args = {
> +		.local = 1,
> +	};
> +	struct existing_packs existing_packs = EXISTING_PACKS_INIT;
> +	struct string_list kept_packs = STRING_LIST_INIT_DUP;
> +	struct child_process child = CHILD_PROCESS_INIT;
> +	int ret;
> +
> +	existing_packs.repo = the_repository;
> +	existing_packs_collect(&existing_packs, &kept_packs);
> +	pack_geometry_init(&geometry, &existing_packs, &po_args);
> +	pack_geometry_split(&geometry);
> +
> +	child.git_cmd = 1;
> +
> +	strvec_pushl(&child.args, "repack", "-d", "-l", NULL);
> +	if (geometry.split < geometry.pack_nr)
> +		strvec_push(&child.args, "--geometric=2");
> +	else
> +		add_repack_all_option(cfg, NULL, &child.args);

Here we do the full repack when the all packs are to be merged. Makes
sense.

[snip]
> @@ -1608,6 +1705,11 @@ static const struct maintenance_task tasks[] = {
>  		.background = maintenance_task_incremental_repack,
>  		.auto_condition = incremental_repack_auto_condition,
>  	},
> +	[TASK_GEOMETRIC_REPACK] = {
> +		.name = "geometric-repack",
> +		.background = maintenance_task_geometric_repack,
> +		.auto_condition = geometric_repack_auto_condition,
> +	},

Here we configure the new maintenance task. Nice :)

The rest of this patch looks good.

-Justin

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 2/8] builtin/gc: make `too_many_loose_objects()` reusable without GC config
  2025-10-16  7:26 ` [PATCH 2/8] builtin/gc: make `too_many_loose_objects()` reusable without GC config Patrick Steinhardt
@ 2025-10-16 20:59   ` Junio C Hamano
  0 siblings, 0 replies; 69+ messages in thread
From: Junio C Hamano @ 2025-10-16 20:59 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Derrick Stolee, Taylor Blau

Patrick Steinhardt <ps@pks.im> writes:

> To decide whether or not a repository needs to be repacked we estimate
> the number of loose objects. If the number exceeds a certain threshold
> we perform the repack, otherwise we don't.
>
> This is done via `too_many_loose_objects()`, which takes as parameter
> the `struct gc_config`. This configuration is only used to determine the
> threshold. In a subsequent commit we'll add another caller of this
> function that wants to pass a different limit than the one stored in
> that structure.
>
> Refactor the function accordingly so that we only take the limit as
> parameter instead of the whole structure.

Trivially correct and makes perfect sense.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 3/8] builtin/maintenance: introduce "geometric-repack" task
  2025-10-16 20:51   ` Justin Tobler
@ 2025-10-17  6:13     ` Patrick Steinhardt
  0 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-17  6:13 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, Derrick Stolee, Taylor Blau

On Thu, Oct 16, 2025 at 03:51:17PM -0500, Justin Tobler wrote:
> On 25/10/16 09:26AM, Patrick Steinhardt wrote:
> > Introduce a new "geometric-repack" task. This task uses our geometric
> > repack infrastructure as provided by git-repack(1) itself, which is a
> > strategy that especially hosting providers tend to use to amortize the
> > costs of repacking objects.
> > 
> > There is one issue though with geometric repacks, namely that they
> > unconditionally pack all loose objects, regardless of whether or not
> > they are reachable. This is done because it means that we can completely
> > skip the reachability step, which significantly speeds up the operation.
> > But it has the big downside that we are unable to expire objects over
> > time.
> > 
> > To address this issue we thus use a split strategy in this new task:
> > whenever a geometric repack would merge together all packs, we instead
> > do an all-into-one repack. By default, these all-into-one repacks have
> > cruft packs enabled, so unreachable objects would now be written into
> > their own pack. Consequently, they won't be soaked up during geometric
> > repacking anymore and can be expired with the next full repack, assuming
> > that their expiry date has surpassed.
> 
> So normal geometric repacks don't ever check for unreachable objects,
> even if all the packs are being merged together. With this new strategy
> though, when a geometric repack would normally merge together all packs,
> we instead to an all-into-one repack which does check for unreachable
> objects.
> 
> Does checking for unreachable objects in this case slow down the repack
> significantly?

It'll certainly add some overhead, but I didn't quantify it. My gut
feeling is that the all-into-one repack is going to be slow by nature
anyway, as we have to rewrite all objects. Doing the reachability check
on top is of course going to slow it down even further, but the relative
impact is going to be smaller.

In any case, we have to perform a reachability check at one point in
time, otherwise we won't ever be able to prune unreachable objects. I
guess doing this at the point where we merge all packs into one is a
reasonable tradeoff.

I think the more interesting question is whether we should maybe do this
all-into-one repack more often, so that we can prune more regularly.
With the proposed strategy you'd need to add a significant portion of
new objects before we'd ever prune them, because otherwise we won't do
the all-into-one repack.

I think for an initial version this is going to be fine, but we might
want to iterarate on this eventually and add a time-based component to
the heuristics.

Patrick

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/8] builtin/gc: remove global `repack` variable
  2025-10-16  7:26 ` [PATCH 1/8] builtin/gc: remove global `repack` variable Patrick Steinhardt
  2025-10-16 20:07   ` Justin Tobler
@ 2025-10-17 20:58   ` Taylor Blau
  1 sibling, 0 replies; 69+ messages in thread
From: Taylor Blau @ 2025-10-17 20:58 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Derrick Stolee

On Thu, Oct 16, 2025 at 09:26:32AM +0200, Patrick Steinhardt wrote:
> @@ -1269,6 +1272,19 @@ static int maintenance_task_gc_background(struct maintenance_run_opts *opts,
>  	return run_command(&child);
>  }
>
> +static int gc_condition(struct gc_config *cfg)
> +{
> +	/*
> +	 * Note that it's fine to drop the repack arguments here, as we execute
> +	 * git-gc(1) as a separate child process anyway. So it knows to compute
> +	 * these arguments again.
> +	 */
> +	struct strvec repack_args = STRVEC_INIT;
> +	int ret = need_to_gc(cfg, &repack_args);
> +	strvec_clear(&repack_args);
> +	return ret;
> +}
> +

Thanks for calling this one out in the patch message. I think had I not
read that I would have been confused why we were putting contents into
the strvec here just to throw it away, but the explanation you wrote
above makes it clear :-).

The rest of the patch looks great to me.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 3/8] builtin/maintenance: introduce "geometric-repack" task
  2025-10-16  7:26 ` [PATCH 3/8] builtin/maintenance: introduce "geometric-repack" task Patrick Steinhardt
  2025-10-16 20:51   ` Justin Tobler
@ 2025-10-17 22:28   ` Taylor Blau
  2025-10-21 13:00     ` Patrick Steinhardt
  1 sibling, 1 reply; 69+ messages in thread
From: Taylor Blau @ 2025-10-17 22:28 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Derrick Stolee

On Thu, Oct 16, 2025 at 09:26:34AM +0200, Patrick Steinhardt wrote:
> Introduce a new "geometric-repack" task. This task uses our geometric
> repack infrastructure as provided by git-repack(1) itself, which is ab
> strategy that especially hosting providers tend to use to amortize the
> costs of repacking objects.
>
> There is one issue though with geometric repacks, namely that they
> unconditionally pack all loose objects, regardless of whether or not
> they are reachable. This is done because it means that we can completely
> skip the reachability step, which significantly speeds up the operation.
> But it has the big downside that we are unable to expire objects over
> time.
>
> To address this issue we thus use a split strategy in this new task:
> whenever a geometric repack would merge together all packs, we instead
> do an all-into-one repack. By default, these all-into-one repacks have
> cruft packs enabled, so unreachable objects would now be written into
> their own pack. Consequently, they won't be soaked up during geometric
> repacking anymore and can be expired with the next full repack, assuming
> that their expiry date has surpassed.

Well put. Geometric repacking today is really only what objects appear
in the packfiles, not whether those objects are reachable or not. That's
partially by design: geometric repack operations are meant to be as
lightweight and quick as possible, so performing a potentially expensive
reachability traversal defeats the purpose.

This mirrors what GitHub does today with their own repository
maintenance implementation. There is some number of geometric repack
operations interspersed between full repacks which collapse the
geometric progression and move unreachable objects out into cruft packs.

So I think that what you did here makes a ton of sense to me. Ultimately
I think there is a middle ground for geometric repacking (well outside
of the scope for this series ;-), don't worry) that could make it do a
little bit of both.

If 'git pack-objects --stdin-packs' (what ultimately implements the
portion of geometric repacking that combines packs together) knew the
current state of a repository's references, it could mark the objects in
the packs to be combined as either reachable or unreachable. Then in a
specialized mode, you could exclude any objects which are unreachable
from the resulting pack, and take a separate pass to write out a cruft
pack containing those objects before ultimately deleting the combined
packs.

I think that is all possible to do, and I think there is a way we could
do it relatively quickly without harming the performance of geometric
repacking. When traversing and marking objects, we can stop as soon as
we see an object that is not contained in the packs that that we're
combining.

So I don't think we have to do a whole-repository walk, which would
indeed defeat the purpose of geometric repacking. The above procedure
would cause us to write out small cruft packs, but we could use the
--combine-cruft-below-size option of 'git repack' to prevent too many
small cruft packs from accumulating together.

Anyway, nothing of that has anything to do with what you wrote here ;-).
It was mostly an excuse for me to write down some of these thoughts that
I've had in my head and alluded to briefly a couple of weeks ago at Git
Merge. Expect some actual patches in this direction from me in the not
too distant future :-).

> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  Documentation/config/maintenance.adoc |  11 +++
>  builtin/gc.c                          | 102 +++++++++++++++++++++++++
>  t/t7900-maintenance.sh                | 137 ++++++++++++++++++++++++++++++++++
>  3 files changed, 250 insertions(+)
>
> diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc
> index 2f719342183..26dc5de423f 100644
> --- a/Documentation/config/maintenance.adoc
> +++ b/Documentation/config/maintenance.adoc
> @@ -75,6 +75,17 @@ maintenance.incremental-repack.auto::
>  	number of pack-files not in the multi-pack-index is at least the value
>  	of `maintenance.incremental-repack.auto`. The default value is 10.
>
> +maintenance.geometric-repack.auto::
> +	This integer config option controls how often the `geometric-repack`
> +	task should be run as part of `git maintenance run --auto`. If zero,
> +	then the `geometric-repack` task will not run with the `--auto`
> +	option. A negative value will force the task to run every time.
> +	Otherwise, a positive value implies the command should run either when
> +	there are packfiles that need to be merged together to retain the
> +	geometric progression, or when there are at least this many loose
> +	objects that would be written into a new packfile. The default value is
> +	100.
> +

OK. To make sure I understand: this limit is the minimum number of loose
objects would cause the geometric-repack task to run, unless there are
pack(s) which would be combined as a result of running a geometric
repack, in which case we run it regardless.

Is that right?

>  maintenance.reflog-expire.auto::
>  	This integer config option controls how often the `reflog-expire` task
>  	should be run as part of `git maintenance run --auto`. If zero, then
> diff --git a/builtin/gc.c b/builtin/gc.c
> index 026d3a1d714..2c9ecd464d2 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -34,6 +34,7 @@
>  #include "pack-objects.h"
>  #include "path.h"
>  #include "reflog.h"
> +#include "repack.h"

Hey, neat ;-).

> @@ -1566,6 +1568,101 @@ static int maintenance_task_incremental_repack(struct maintenance_run_opts *opts
>  	return 0;
>  }
>
> +static int maintenance_task_geometric_repack(struct maintenance_run_opts *opts,
> +					     struct gc_config *cfg)
> +{
> +	struct pack_geometry geometry = {
> +		.split_factor = 2,

I wonder if this should be configurable somewhere. It might not be a bad
idea to introduce a 'repack.geometricSplitFactor' configuration
variable, defaulting to two, and use that here. It would also be nice to
be able to run 'git repack --geometric -d' and have it fallback to that
split factor, since using "2" is so common that it's frustrating when I
forget to type it out explicitly ;-).

> +	};
> +	struct pack_objects_args po_args = {
> +		.local = 1,
> +	};
> +	struct existing_packs existing_packs = EXISTING_PACKS_INIT;
> +	struct string_list kept_packs = STRING_LIST_INIT_DUP;
> +	struct child_process child = CHILD_PROCESS_INIT;
> +	int ret;
> +
> +	existing_packs.repo = the_repository;
> +	existing_packs_collect(&existing_packs, &kept_packs);
> +	pack_geometry_init(&geometry, &existing_packs, &po_args);
> +	pack_geometry_split(&geometry);
> +
> +	child.git_cmd = 1;
> +
> +	strvec_pushl(&child.args, "repack", "-d", "-l", NULL);
> +	if (geometry.split < geometry.pack_nr)
> +		strvec_push(&child.args, "--geometric=2");
> +	else
> +		add_repack_all_option(cfg, NULL, &child.args);

Makes sense; if we're not merging any packs, we do an all-into-one
repack, otherwise we do a geometric one. Looks like the function
geometric_repack_auto_condition() below controls whether or not we even
take this path, which makes sense relative to the documentation you
wrote above.

> +static int geometric_repack_auto_condition(struct gc_config *cfg UNUSED)
> +{
> +	struct pack_geometry geometry = {
> +		.split_factor = 2,
> +	};
> +	struct pack_objects_args po_args = {
> +		.local = 1,
> +	};
> +	struct existing_packs existing_packs = EXISTING_PACKS_INIT;
> +	struct string_list kept_packs = STRING_LIST_INIT_DUP;
> +	int auto_value = 100;
> +	int ret;
> +
> +	repo_config_get_int(the_repository, "maintenance.geometric-repack.auto",
> +			    &auto_value);
> +	if (!auto_value)
> +		return 0;
> +	if (auto_value < 0)
> +		return 1;
> +
> +	existing_packs.repo = the_repository;
> +	existing_packs_collect(&existing_packs, &kept_packs);
> +	pack_geometry_init(&geometry, &existing_packs, &po_args);
> +	pack_geometry_split(&geometry);
> +
> +	/*
> +	 * When we'd merge at least two packs with one another we always
> +	 * perform the repack.
> +	 */
> +	if (geometry.split) {
> +		ret = 1;
> +		goto out;
> +	}

Hmm. I wish that we could somehow pass this information to the function
above so that we don't have to re-discover the fact that there are packs
to combine. I'm not familiar enough with the maintenance code to know
how difficult that would be to do, but it looks like at least the
gc_config pointer is shared between the auto condition and the task
itself.

That's kind of gross to tack on there, but I could see a compelling
argument for passing around an extra void pointer between the two that
would allow us to propagate this kind of data between the auto condition
and the task itself. It's not super expensive to do so I don't think not
doing it is a show-stopper at least from a performance perspective, but
it does seem like a good opportunity to DRY things up a bit.

> diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
> index ddd273d8dc2..83a373fe94b 100755
> --- a/t/t7900-maintenance.sh
> +++ b/t/t7900-maintenance.sh
> @@ -465,6 +465,143 @@ test_expect_success 'maintenance.incremental-repack.auto (when config is unset)'
>  	)
>  '
>
> +run_and_verify_geometric_pack () {
> +	EXPECTED_PACKS="$1" &&
> +
> +	# Verify that we perform a geometric repack.
> +	rm -f "trace2.txt" &&
> +	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
> +		git maintenance run --task=geometric-repack 2>/dev/null &&
> +	test_subcommand git repack -d -l --geometric=2 --quiet --write-midx <trace2.txt &&

Makes sense. I do think the test_subcommand thing is a little fragile
here, but verifying that the resulting pack structure forms a geometric
progression feels like overkill for this test, so I think what you wrote
here makes sense.

As an aside, would you mind wrapping these lines instead of putting the
command-line invocation all together on a single line?

> +
> +	# Verify that the number of packfiles matches our expectation.
> +	ls -l .git/objects/pack/*.pack >packfiles &&
> +	test_line_count = "$EXPECTED_PACKS" packfiles &&
> +
> +	# And verify that there are no loose objects anymore.
> +	cat >expect <<-\EOF &&
> +	info
> +	pack
> +	EOF
> +	ls .git/objects >actual &&

I wonder if there is an easier way to check for loose objects here that
doesn't require you to know that the "info" and "pack" directories
exist. Perhaps something like:

test_stdout_line_count = 0 find .git/objects/?? -type f

, or even

    find .git/objects/?? -type f >loose.objs &&
    test_must_be_empty loose.objs

> +test_expect_success 'geometric repacking task' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		git config set maintenance.auto false &&
> +		test_commit initial &&
> +
> +		# The initial repack causes an all-into-one repack.
> +		GIT_TRACE2_EVENT="$(pwd)/initial-repack.txt" \
> +			git maintenance run --task=geometric-repack 2>/dev/null &&
> +		test_subcommand git repack -d -l --cruft --cruft-expiration=2.weeks.ago --quiet --write-midx <initial-repack.txt &&
> +
> +		# Repacking should now cause a no-op geometric repack because
> +		# no packfiles need to be combined.
> +		ls -l .git/objects/pack >before &&
> +		run_and_verify_geometric_pack 1 &&
> +		ls -l .git/objects/pack >after &&
> +		test_cmp before after &&
> +
> +		# This incremental change creates a new packfile that only
> +		# soaks up loose objects. The packfiles are not getting merged
> +		# at this point.
> +		test_commit loose &&
> +		run_and_verify_geometric_pack 2 &&

I wonder if you want to harden this test a little bit to ensure that the
there is only one new pack being created here, and we're not seeing
e.g., the removal of the existing pack and creation of two new packs.

I dunno, that may be overkill for this test, and I certainly don't feel
strongly about it.

> +
> +		# Both packfiles have 3 objects, so the next run would cause us
> +		# to merge both packfiles together. This should be turned into

Perhaps s/both/all/ ? What you wrote is not wrong, of course, but I
think "all" more clearly communicates that we are only doing an
all-into-one because the geometric repack would have combined everything
together anyway.

The rest of the changes look good to me.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 3/8] builtin/maintenance: introduce "geometric-repack" task
  2025-10-17 22:28   ` Taylor Blau
@ 2025-10-21 13:00     ` Patrick Steinhardt
  2025-10-23 19:19       ` Taylor Blau
  0 siblings, 1 reply; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-21 13:00 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Derrick Stolee

On Fri, Oct 17, 2025 at 06:28:04PM -0400, Taylor Blau wrote:
> On Thu, Oct 16, 2025 at 09:26:34AM +0200, Patrick Steinhardt wrote:
> > Introduce a new "geometric-repack" task. This task uses our geometric
> > repack infrastructure as provided by git-repack(1) itself, which is ab
> > strategy that especially hosting providers tend to use to amortize the
> > costs of repacking objects.
> >
> > There is one issue though with geometric repacks, namely that they
> > unconditionally pack all loose objects, regardless of whether or not
> > they are reachable. This is done because it means that we can completely
> > skip the reachability step, which significantly speeds up the operation.
> > But it has the big downside that we are unable to expire objects over
> > time.
> >
> > To address this issue we thus use a split strategy in this new task:
> > whenever a geometric repack would merge together all packs, we instead
> > do an all-into-one repack. By default, these all-into-one repacks have
> > cruft packs enabled, so unreachable objects would now be written into
> > their own pack. Consequently, they won't be soaked up during geometric
> > repacking anymore and can be expired with the next full repack, assuming
> > that their expiry date has surpassed.
> 
> Well put. Geometric repacking today is really only what objects appear
> in the packfiles, not whether those objects are reachable or not. That's
> partially by design: geometric repack operations are meant to be as
> lightweight and quick as possible, so performing a potentially expensive
> reachability traversal defeats the purpose.
> 
> This mirrors what GitHub does today with their own repository
> maintenance implementation. There is some number of geometric repack
> operations interspersed between full repacks which collapse the
> geometric progression and move unreachable objects out into cruft packs.

Yeah, we also do the same at GitLab.

> So I think that what you did here makes a ton of sense to me. Ultimately
> I think there is a middle ground for geometric repacking (well outside
> of the scope for this series ;-), don't worry) that could make it do a
> little bit of both.
> 
> If 'git pack-objects --stdin-packs' (what ultimately implements the
> portion of geometric repacking that combines packs together) knew the
> current state of a repository's references, it could mark the objects in
> the packs to be combined as either reachable or unreachable. Then in a
> specialized mode, you could exclude any objects which are unreachable
> from the resulting pack, and take a separate pass to write out a cruft
> pack containing those objects before ultimately deleting the combined
> packs.
> 
> I think that is all possible to do, and I think there is a way we could
> do it relatively quickly without harming the performance of geometric
> repacking. When traversing and marking objects, we can stop as soon as
> we see an object that is not contained in the packs that that we're
> combining.
> 
> So I don't think we have to do a whole-repository walk, which would
> indeed defeat the purpose of geometric repacking. The above procedure
> would cause us to write out small cruft packs, but we could use the
> --combine-cruft-below-size option of 'git repack' to prevent too many
> small cruft packs from accumulating together.
> 
> Anyway, nothing of that has anything to do with what you wrote here ;-).
> It was mostly an excuse for me to write down some of these thoughts that
> I've had in my head and alluded to briefly a couple of weeks ago at Git
> Merge. Expect some actual patches in this direction from me in the not
> too distant future :-).

Looking forward to them :)

> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > ---
> >  Documentation/config/maintenance.adoc |  11 +++
> >  builtin/gc.c                          | 102 +++++++++++++++++++++++++
> >  t/t7900-maintenance.sh                | 137 ++++++++++++++++++++++++++++++++++
> >  3 files changed, 250 insertions(+)
> >
> > diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc
> > index 2f719342183..26dc5de423f 100644
> > --- a/Documentation/config/maintenance.adoc
> > +++ b/Documentation/config/maintenance.adoc
> > @@ -75,6 +75,17 @@ maintenance.incremental-repack.auto::
> >  	number of pack-files not in the multi-pack-index is at least the value
> >  	of `maintenance.incremental-repack.auto`. The default value is 10.
> >
> > +maintenance.geometric-repack.auto::
> > +	This integer config option controls how often the `geometric-repack`
> > +	task should be run as part of `git maintenance run --auto`. If zero,
> > +	then the `geometric-repack` task will not run with the `--auto`
> > +	option. A negative value will force the task to run every time.
> > +	Otherwise, a positive value implies the command should run either when
> > +	there are packfiles that need to be merged together to retain the
> > +	geometric progression, or when there are at least this many loose
> > +	objects that would be written into a new packfile. The default value is
> > +	100.
> > +
> 
> OK. To make sure I understand: this limit is the minimum number of loose
> objects would cause the geometric-repack task to run, unless there are
> pack(s) which would be combined as a result of running a geometric
> repack, in which case we run it regardless.
> 
> Is that right?

Yeah, exactly. I was initially thinking to only frame this in the
context of "how many packs would be merged"? But the problem with that
is that we wouldn't ever generate _new_ packs if the repository only
ever grows loose objects, and consequently we also wouldn't ever merge
any of them.

> >  maintenance.reflog-expire.auto::
> >  	This integer config option controls how often the `reflog-expire` task
> >  	should be run as part of `git maintenance run --auto`. If zero, then
> > diff --git a/builtin/gc.c b/builtin/gc.c
> > index 026d3a1d714..2c9ecd464d2 100644
> > --- a/builtin/gc.c
> > +++ b/builtin/gc.c
> > @@ -34,6 +34,7 @@
> >  #include "pack-objects.h"
> >  #include "path.h"
> >  #include "reflog.h"
> > +#include "repack.h"
> 
> Hey, neat ;-).

Yup, your refactorings helped a bunch :)

> > @@ -1566,6 +1568,101 @@ static int maintenance_task_incremental_repack(struct maintenance_run_opts *opts
> >  	return 0;
> >  }
> >
> > +static int maintenance_task_geometric_repack(struct maintenance_run_opts *opts,
> > +					     struct gc_config *cfg)
> > +{
> > +	struct pack_geometry geometry = {
> > +		.split_factor = 2,
> 
> I wonder if this should be configurable somewhere. It might not be a bad
> idea to introduce a 'repack.geometricSplitFactor' configuration
> variable, defaulting to two, and use that here. It would also be nice to
> be able to run 'git repack --geometric -d' and have it fallback to that
> split factor, since using "2" is so common that it's frustrating when I
> forget to type it out explicitly ;-).

I was also pondering over this. I think the way to do so would be to
introduce "maintenance.geometric-repack.splitFactor", as that follows
all the other maintenance configuration we have there, as well.

I decided to not do it yet as I wanted to keep the scope of this patch
series contained. But honestly, it's an easy-enough change to make, so
let me introduce another patch to do this.

> > +	};
> > +	struct pack_objects_args po_args = {
> > +		.local = 1,
> > +	};
> > +	struct existing_packs existing_packs = EXISTING_PACKS_INIT;
> > +	struct string_list kept_packs = STRING_LIST_INIT_DUP;
> > +	struct child_process child = CHILD_PROCESS_INIT;
> > +	int ret;
> > +
> > +	existing_packs.repo = the_repository;
> > +	existing_packs_collect(&existing_packs, &kept_packs);
> > +	pack_geometry_init(&geometry, &existing_packs, &po_args);
> > +	pack_geometry_split(&geometry);
> > +
> > +	child.git_cmd = 1;
> > +
> > +	strvec_pushl(&child.args, "repack", "-d", "-l", NULL);
> > +	if (geometry.split < geometry.pack_nr)
> > +		strvec_push(&child.args, "--geometric=2");
> > +	else
> > +		add_repack_all_option(cfg, NULL, &child.args);
> 
> Makes sense; if we're not merging any packs, we do an all-into-one
> repack, otherwise we do a geometric one. Looks like the function
> geometric_repack_auto_condition() below controls whether or not we even
> take this path, which makes sense relative to the documentation you
> wrote above.

It does, but only in case we do `git maintenance run --auto`. If you run
`git maintenance run` without the flag we unconditionally execute this
code here. But that's fine: if the repository is already well-optimized
we don't end up doing anything.

> > +static int geometric_repack_auto_condition(struct gc_config *cfg UNUSED)
> > +{
> > +	struct pack_geometry geometry = {
> > +		.split_factor = 2,
> > +	};
> > +	struct pack_objects_args po_args = {
> > +		.local = 1,
> > +	};
> > +	struct existing_packs existing_packs = EXISTING_PACKS_INIT;
> > +	struct string_list kept_packs = STRING_LIST_INIT_DUP;
> > +	int auto_value = 100;
> > +	int ret;
> > +
> > +	repo_config_get_int(the_repository, "maintenance.geometric-repack.auto",
> > +			    &auto_value);
> > +	if (!auto_value)
> > +		return 0;
> > +	if (auto_value < 0)
> > +		return 1;
> > +
> > +	existing_packs.repo = the_repository;
> > +	existing_packs_collect(&existing_packs, &kept_packs);
> > +	pack_geometry_init(&geometry, &existing_packs, &po_args);
> > +	pack_geometry_split(&geometry);
> > +
> > +	/*
> > +	 * When we'd merge at least two packs with one another we always
> > +	 * perform the repack.
> > +	 */
> > +	if (geometry.split) {
> > +		ret = 1;
> > +		goto out;
> > +	}
> 
> Hmm. I wish that we could somehow pass this information to the function
> above so that we don't have to re-discover the fact that there are packs
> to combine. I'm not familiar enough with the maintenance code to know
> how difficult that would be to do, but it looks like at least the
> gc_config pointer is shared between the auto condition and the task
> itself.
> 
> That's kind of gross to tack on there, but I could see a compelling
> argument for passing around an extra void pointer between the two that
> would allow us to propagate this kind of data between the auto condition
> and the task itself. It's not super expensive to do so I don't think not
> doing it is a show-stopper at least from a performance perspective, but
> it does seem like a good opportunity to DRY things up a bit.

The problem is that the auto-condition is not evaluated when running
without the "--auto" flag. We of course can conditionally compute the
split in case we figure that the auto-condition didn't run, but it does
get somewhat dirty.

So I'd propose to defer such a change into the future in case we notice
that this indeed is a problem. Is that fine with you?

> > diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
> > index ddd273d8dc2..83a373fe94b 100755
> > --- a/t/t7900-maintenance.sh
> > +++ b/t/t7900-maintenance.sh
> > @@ -465,6 +465,143 @@ test_expect_success 'maintenance.incremental-repack.auto (when config is unset)'
> >  	)
> >  '
> >
> > +run_and_verify_geometric_pack () {
> > +	EXPECTED_PACKS="$1" &&
> > +
> > +	# Verify that we perform a geometric repack.
> > +	rm -f "trace2.txt" &&
> > +	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
> > +		git maintenance run --task=geometric-repack 2>/dev/null &&
> > +	test_subcommand git repack -d -l --geometric=2 --quiet --write-midx <trace2.txt &&
> 
> Makes sense. I do think the test_subcommand thing is a little fragile
> here, but verifying that the resulting pack structure forms a geometric
> progression feels like overkill for this test, so I think what you wrote
> here makes sense.

Oh, yeah, it's fragile and somewhat gross indeed. Couldn't really find a
nicer way to do it though :/

> As an aside, would you mind wrapping these lines instead of putting the
> command-line invocation all together on a single line?

Sure, can do.

> > +
> > +	# Verify that the number of packfiles matches our expectation.
> > +	ls -l .git/objects/pack/*.pack >packfiles &&
> > +	test_line_count = "$EXPECTED_PACKS" packfiles &&
> > +
> > +	# And verify that there are no loose objects anymore.
> > +	cat >expect <<-\EOF &&
> > +	info
> > +	pack
> > +	EOF
> > +	ls .git/objects >actual &&
> 
> I wonder if there is an easier way to check for loose objects here that
> doesn't require you to know that the "info" and "pack" directories
> exist. Perhaps something like:
> 
> test_stdout_line_count = 0 find .git/objects/?? -type f
> 
> , or even
> 
>     find .git/objects/?? -type f >loose.objs &&
>     test_must_be_empty loose.objs

This doesn't work though in case there is not even a single sharding
directory:

    find: '.git/objects/??': No such file or directory

I didn't really have any other idea for now to do this.

> > +test_expect_success 'geometric repacking task' '
> > +	test_when_finished "rm -rf repo" &&
> > +	git init repo &&
> > +	(
> > +		cd repo &&
> > +		git config set maintenance.auto false &&
> > +		test_commit initial &&
> > +
> > +		# The initial repack causes an all-into-one repack.
> > +		GIT_TRACE2_EVENT="$(pwd)/initial-repack.txt" \
> > +			git maintenance run --task=geometric-repack 2>/dev/null &&
> > +		test_subcommand git repack -d -l --cruft --cruft-expiration=2.weeks.ago --quiet --write-midx <initial-repack.txt &&
> > +
> > +		# Repacking should now cause a no-op geometric repack because
> > +		# no packfiles need to be combined.
> > +		ls -l .git/objects/pack >before &&
> > +		run_and_verify_geometric_pack 1 &&
> > +		ls -l .git/objects/pack >after &&
> > +		test_cmp before after &&
> > +
> > +		# This incremental change creates a new packfile that only
> > +		# soaks up loose objects. The packfiles are not getting merged
> > +		# at this point.
> > +		test_commit loose &&
> > +		run_and_verify_geometric_pack 2 &&
> 
> I wonder if you want to harden this test a little bit to ensure that the
> there is only one new pack being created here, and we're not seeing
> e.g., the removal of the existing pack and creation of two new packs.
> 
> I dunno, that may be overkill for this test, and I certainly don't feel
> strongly about it.

Yeah, I had the same thought when writing this, but it quickly got ugly.
I then decided to not do this and instead only verify that the structure
loosely looks like we expect, and that the expected command actually
ran.

In the end we can rely on t7703 to verify the internals of how exactly
the geometric repack works, whereas here we only verify that the
strategy works as expected.

> > +
> > +		# Both packfiles have 3 objects, so the next run would cause us
> > +		# to merge both packfiles together. This should be turned into
> 
> Perhaps s/both/all/ ? What you wrote is not wrong, of course, but I
> think "all" more clearly communicates that we are only doing an
> all-into-one because the geometric repack would have combined everything
> together anyway.

Sure, happy to reword.

Thanks for your review!

Patrick

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v2 0/9] builtin/maintenance: introduce "geometric" strategy
  2025-10-16  7:26 [PATCH 0/8] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
                   ` (7 preceding siblings ...)
  2025-10-16  7:26 ` [PATCH 8/8] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
@ 2025-10-21 14:13 ` Patrick Steinhardt
  2025-10-21 14:13   ` [PATCH v2 1/9] builtin/gc: remove global `repack` variable Patrick Steinhardt
                     ` (9 more replies)
  2025-10-24  6:57 ` [PATCH v3 00/10] " Patrick Steinhardt
  2025-10-27  8:30 ` [PATCH v4 " Patrick Steinhardt
  10 siblings, 10 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-21 14:13 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau

Hi,

by default, git-maintenance(1) uses git-gc(1) to perform repository
housekeeping. This tool has a couple of shortcomings, most importantly
that it regularly does all-into-one repacks. This doesn't really work
all that well in the context of monorepos, where you really want to
avoid repacking all objects regularly.

An alternative maintenance strategy is the "incremental" strategy, but
this strategy has two downsides:

  - Strategies in general only apply to scheduled maintenance. So if you
    run git-maintenance(1), you still end up with git-gc(1).

  - The strategy is designed to not ever delete any data, but a full
    replacment for git-gc(1) needs to also prune reflogs, rereree caches
    and vanished worktrees.

This patch series aims to fix both of these issues.

First, the series introduces a new "geometric" maintenance task, which
makes use of geometric repacking as exposed by git-repack(1) in the
general case. In the case where a geometric repack ends up merging all
packfiles into one we instead do an all-into-one repack with cruft packs
so that we can still phase out objects over time.

Second, the series extends maintenance strategies to also cover normal
maintenance. If the user has configured the "geometric" strategy, we'll
thus use it for both manual and scheduled maintenance. For backwards
compatibility, the "incremental" strategy is changed so that it uses
git-gc(1) for manual maintenance and the other tasks for scheduled
maintenance.

The series is built on top of b660e2dcb9 (Sync with 'maint', 2025-10-14)
with tb/incremental-midx-part-3.1 at c886af90f8 (SQUASH??? play well
with other topics by preemptively including "repository.h", 2025-09-29)
merged into it.

Changes in v2:
  - Make the geometric factor configurable via
    "maintenance.geometric-repack.splitFactor".
  - Wrap some overly long lines in our tests.
  - Link to v1: https://lore.kernel.org/r/20251016-pks-maintenance-geometric-strategy-v1-0-18943d474203@pks.im

Thanks!

Patrick

---
Patrick Steinhardt (9):
      builtin/gc: remove global `repack` variable
      builtin/gc: make `too_many_loose_objects()` reusable without GC config
      builtin/maintenance: introduce "geometric-repack" task
      builtin/maintenance: make the geometric factor configurable
      builtin/maintenance: don't silently ignore invalid strategy
      builtin/maintenance: run maintenance tasks depending on type
      builtin/maintenance: extend "maintenance.strategy" to manual maintenance
      builtin/maintenance: make "gc" strategy accessible
      builtin/maintenance: introduce "geometric" strategy

 Documentation/config/maintenance.adoc |  49 +++++-
 builtin/gc.c                          | 278 ++++++++++++++++++++++++++++------
 t/t7900-maintenance.sh                | 246 ++++++++++++++++++++++++++++++
 3 files changed, 515 insertions(+), 58 deletions(-)

Range-diff versus v1:

 1:  d16d9ac4f01 =  1:  f14cf90529d builtin/gc: remove global `repack` variable
 2:  2beb7edfdc1 =  2:  64fde2d3fb0 builtin/gc: make `too_many_loose_objects()` reusable without GC config
 3:  e4bcc347e76 !  3:  9ba24540238 builtin/maintenance: introduce "geometric-repack" task
    @@ t/t7900-maintenance.sh: test_expect_success 'maintenance.incremental-repack.auto
     +	rm -f "trace2.txt" &&
     +	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
     +		git maintenance run --task=geometric-repack 2>/dev/null &&
    -+	test_subcommand git repack -d -l --geometric=2 --quiet --write-midx <trace2.txt &&
    ++	test_subcommand git repack -d -l --geometric=2 \
    ++		--quiet --write-midx <trace2.txt &&
     +
     +	# Verify that the number of packfiles matches our expectation.
     +	ls -l .git/objects/pack/*.pack >packfiles &&
    @@ t/t7900-maintenance.sh: test_expect_success 'maintenance.incremental-repack.auto
     +		run_and_verify_geometric_pack 2 &&
     +
     +		# Both packfiles have 3 objects, so the next run would cause us
    -+		# to merge both packfiles together. This should be turned into
    ++		# to merge all packfiles together. This should be turned into
     +		# an all-into-one-repack.
     +		GIT_TRACE2_EVENT="$(pwd)/all-into-one-repack.txt" \
     +			git maintenance run --task=geometric-repack 2>/dev/null &&
    @@ t/t7900-maintenance.sh: test_expect_success 'maintenance.incremental-repack.auto
     +
     +test_geometric_repack_needed () {
     +	NEEDED="$1"
    -+	AUTO_LIMIT="$2" &&
    ++	GEOMETRIC_CONFIG="$2" &&
     +	rm -f trace2.txt &&
     +	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
    -+		git ${AUTO_LIMIT:+-c maintenance.geometric-repack.auto=$AUTO_LIMIT} maintenance run --auto --task=geometric-repack &&
    ++		git ${GEOMETRIC_CONFIG:+-c maintenance.geometric-repack.$GEOMETRIC_CONFIG} \
    ++		maintenance run --auto --task=geometric-repack 2>/dev/null &&
     +	case "$NEEDED" in
     +	true)
     +		test_grep "\[\"git\",\"repack\"," trace2.txt;;
    @@ t/t7900-maintenance.sh: test_expect_success 'maintenance.incremental-repack.auto
     +		# An empty repository does not need repacking, except when
     +		# explicitly told to do it.
     +		test_geometric_repack_needed false &&
    -+		test_geometric_repack_needed false 0 &&
    -+		test_geometric_repack_needed false 1 &&
    -+		test_geometric_repack_needed true -1 &&
    ++		test_geometric_repack_needed false auto=0 &&
    ++		test_geometric_repack_needed false auto=1 &&
    ++		test_geometric_repack_needed true auto=-1 &&
     +
     +		test_oid_init &&
     +
    @@ t/t7900-maintenance.sh: test_expect_success 'maintenance.incremental-repack.auto
     +		test_commit "$(test_oid blob17_1)" &&
     +		test_geometric_repack_needed false &&
     +		test_commit "$(test_oid blob17_2)" &&
    -+		test_geometric_repack_needed false 257 &&
    -+		test_geometric_repack_needed true 256 &&
    ++		test_geometric_repack_needed false auto=257 &&
    ++		test_geometric_repack_needed true auto=256 &&
     +
     +		# Force another repack.
     +		test_commit first &&
     +		test_commit second &&
    -+		test_geometric_repack_needed true -1 &&
    ++		test_geometric_repack_needed true auto=-1 &&
     +
     +		# We now have two packfiles that would be merged together. As
     +		# such, the repack should always happen unless the user has
     +		# disabled the auto task.
    -+		test_geometric_repack_needed false 0 &&
    -+		test_geometric_repack_needed true 9000
    ++		test_geometric_repack_needed false auto=0 &&
    ++		test_geometric_repack_needed true auto=9000
     +	)
     +'
     +
 -:  ----------- >  4:  d1b805004b9 builtin/maintenance: make the geometric factor configurable
 4:  c9a6e576299 =  5:  d6fa70640c2 builtin/maintenance: don't silently ignore invalid strategy
 5:  3c82a91f152 =  6:  37f7793dab9 builtin/maintenance: run maintenance tasks depending on type
 6:  78502ad6868 =  7:  4b15eac845c builtin/maintenance: extend "maintenance.strategy" to manual maintenance
 7:  59a5450c44f =  8:  eb75881b6ae builtin/maintenance: make "gc" strategy accessible
 8:  51065b109fa =  9:  5c011e7a7e2 builtin/maintenance: introduce "geometric" strategy

---
base-commit: 0bb2c786c2349dd6700727153c13d81cbfb41710
change-id: 20251015-pks-maintenance-geometric-strategy-580c58581b01


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v2 1/9] builtin/gc: remove global `repack` variable
  2025-10-21 14:13 ` [PATCH v2 0/9] " Patrick Steinhardt
@ 2025-10-21 14:13   ` Patrick Steinhardt
  2025-10-21 14:13   ` [PATCH v2 2/9] builtin/gc: make `too_many_loose_objects()` reusable without GC config Patrick Steinhardt
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-21 14:13 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau

The global `repack` variable is used to store all command line arguments
that we eventually want to pass to git-repack(1). It is being appended
to from multiple different functions, which makes it hard to follow the
logic. Besides being hard to follow, it also makes it unnecessarily hard
to reuse this infrastructure in new code.

Refactor the code so that we store this variable on the stack and pass
a pointer to it around as needed. This is done so that we can reuse
`add_repack_all_options()` in a subsequent commit.

The refactoring itself is straight-forward. One function that deserves
attention though is `need_to_gc()`: this function determines whether or
not we need to execute garbage collection for `git gc --auto`, but also
for `git maintenance run --auto`. But besides figuring out whether we
have to perform GC, the function also sets up the `repack` arguments.

For `git gc --auto` it's trivial to adapt, as we already have the
on-stack variable at our fingertips. But for the maintenance condition
it's less obvious what to do.

As it turns out, we can just use another temporary variable there that
we then immediately discard. If we need to perform GC we execute a child
git-gc(1) process to repack objects for us, and that process will have
to recompute the arguments anyway.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/gc.c | 74 ++++++++++++++++++++++++++++++++++++------------------------
 1 file changed, 45 insertions(+), 29 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index e19e13d9788..e9772eb3a30 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -55,7 +55,6 @@ static const char * const builtin_gc_usage[] = {
 };
 
 static timestamp_t gc_log_expire_time;
-static struct strvec repack = STRVEC_INIT;
 static struct tempfile *pidfile;
 static struct lock_file log_lock;
 static struct string_list pack_garbage = STRING_LIST_INIT_DUP;
@@ -618,48 +617,50 @@ static uint64_t estimate_repack_memory(struct gc_config *cfg,
 	return os_cache + heap;
 }
 
-static int keep_one_pack(struct string_list_item *item, void *data UNUSED)
+static int keep_one_pack(struct string_list_item *item, void *data)
 {
-	strvec_pushf(&repack, "--keep-pack=%s", basename(item->string));
+	struct strvec *args = data;
+	strvec_pushf(args, "--keep-pack=%s", basename(item->string));
 	return 0;
 }
 
 static void add_repack_all_option(struct gc_config *cfg,
-				  struct string_list *keep_pack)
+				  struct string_list *keep_pack,
+				  struct strvec *args)
 {
 	if (cfg->prune_expire && !strcmp(cfg->prune_expire, "now")
 		&& !(cfg->cruft_packs && cfg->repack_expire_to))
-		strvec_push(&repack, "-a");
+		strvec_push(args, "-a");
 	else if (cfg->cruft_packs) {
-		strvec_push(&repack, "--cruft");
+		strvec_push(args, "--cruft");
 		if (cfg->prune_expire)
-			strvec_pushf(&repack, "--cruft-expiration=%s", cfg->prune_expire);
+			strvec_pushf(args, "--cruft-expiration=%s", cfg->prune_expire);
 		if (cfg->max_cruft_size)
-			strvec_pushf(&repack, "--max-cruft-size=%lu",
+			strvec_pushf(args, "--max-cruft-size=%lu",
 				     cfg->max_cruft_size);
 		if (cfg->repack_expire_to)
-			strvec_pushf(&repack, "--expire-to=%s", cfg->repack_expire_to);
+			strvec_pushf(args, "--expire-to=%s", cfg->repack_expire_to);
 	} else {
-		strvec_push(&repack, "-A");
+		strvec_push(args, "-A");
 		if (cfg->prune_expire)
-			strvec_pushf(&repack, "--unpack-unreachable=%s", cfg->prune_expire);
+			strvec_pushf(args, "--unpack-unreachable=%s", cfg->prune_expire);
 	}
 
 	if (keep_pack)
-		for_each_string_list(keep_pack, keep_one_pack, NULL);
+		for_each_string_list(keep_pack, keep_one_pack, args);
 
 	if (cfg->repack_filter && *cfg->repack_filter)
-		strvec_pushf(&repack, "--filter=%s", cfg->repack_filter);
+		strvec_pushf(args, "--filter=%s", cfg->repack_filter);
 	if (cfg->repack_filter_to && *cfg->repack_filter_to)
-		strvec_pushf(&repack, "--filter-to=%s", cfg->repack_filter_to);
+		strvec_pushf(args, "--filter-to=%s", cfg->repack_filter_to);
 }
 
-static void add_repack_incremental_option(void)
+static void add_repack_incremental_option(struct strvec *args)
 {
-	strvec_push(&repack, "--no-write-bitmap-index");
+	strvec_push(args, "--no-write-bitmap-index");
 }
 
-static int need_to_gc(struct gc_config *cfg)
+static int need_to_gc(struct gc_config *cfg, struct strvec *repack_args)
 {
 	/*
 	 * Setting gc.auto to 0 or negative can disable the
@@ -700,10 +701,10 @@ static int need_to_gc(struct gc_config *cfg)
 				string_list_clear(&keep_pack, 0);
 		}
 
-		add_repack_all_option(cfg, &keep_pack);
+		add_repack_all_option(cfg, &keep_pack, repack_args);
 		string_list_clear(&keep_pack, 0);
 	} else if (too_many_loose_objects(cfg))
-		add_repack_incremental_option();
+		add_repack_incremental_option(repack_args);
 	else
 		return 0;
 
@@ -852,6 +853,7 @@ int cmd_gc(int argc,
 	int keep_largest_pack = -1;
 	int skip_foreground_tasks = 0;
 	timestamp_t dummy;
+	struct strvec repack_args = STRVEC_INIT;
 	struct maintenance_run_opts opts = MAINTENANCE_RUN_OPTS_INIT;
 	struct gc_config cfg = GC_CONFIG_INIT;
 	const char *prune_expire_sentinel = "sentinel";
@@ -891,7 +893,7 @@ int cmd_gc(int argc,
 	show_usage_with_options_if_asked(argc, argv,
 					 builtin_gc_usage, builtin_gc_options);
 
-	strvec_pushl(&repack, "repack", "-d", "-l", NULL);
+	strvec_pushl(&repack_args, "repack", "-d", "-l", NULL);
 
 	gc_config(&cfg);
 
@@ -914,14 +916,14 @@ int cmd_gc(int argc,
 		die(_("failed to parse prune expiry value %s"), cfg.prune_expire);
 
 	if (aggressive) {
-		strvec_push(&repack, "-f");
+		strvec_push(&repack_args, "-f");
 		if (cfg.aggressive_depth > 0)
-			strvec_pushf(&repack, "--depth=%d", cfg.aggressive_depth);
+			strvec_pushf(&repack_args, "--depth=%d", cfg.aggressive_depth);
 		if (cfg.aggressive_window > 0)
-			strvec_pushf(&repack, "--window=%d", cfg.aggressive_window);
+			strvec_pushf(&repack_args, "--window=%d", cfg.aggressive_window);
 	}
 	if (opts.quiet)
-		strvec_push(&repack, "-q");
+		strvec_push(&repack_args, "-q");
 
 	if (opts.auto_flag) {
 		if (cfg.detach_auto && opts.detach < 0)
@@ -930,7 +932,7 @@ int cmd_gc(int argc,
 		/*
 		 * Auto-gc should be least intrusive as possible.
 		 */
-		if (!need_to_gc(&cfg)) {
+		if (!need_to_gc(&cfg, &repack_args)) {
 			ret = 0;
 			goto out;
 		}
@@ -952,7 +954,7 @@ int cmd_gc(int argc,
 			find_base_packs(&keep_pack, cfg.big_pack_threshold);
 		}
 
-		add_repack_all_option(&cfg, &keep_pack);
+		add_repack_all_option(&cfg, &keep_pack, &repack_args);
 		string_list_clear(&keep_pack, 0);
 	}
 
@@ -1014,9 +1016,9 @@ int cmd_gc(int argc,
 
 		repack_cmd.git_cmd = 1;
 		repack_cmd.close_object_store = 1;
-		strvec_pushv(&repack_cmd.args, repack.v);
+		strvec_pushv(&repack_cmd.args, repack_args.v);
 		if (run_command(&repack_cmd))
-			die(FAILED_RUN, repack.v[0]);
+			die(FAILED_RUN, repack_args.v[0]);
 
 		if (cfg.prune_expire) {
 			struct child_process prune_cmd = CHILD_PROCESS_INIT;
@@ -1067,6 +1069,7 @@ int cmd_gc(int argc,
 
 out:
 	maintenance_run_opts_release(&opts);
+	strvec_clear(&repack_args);
 	gc_config_release(&cfg);
 	return 0;
 }
@@ -1269,6 +1272,19 @@ static int maintenance_task_gc_background(struct maintenance_run_opts *opts,
 	return run_command(&child);
 }
 
+static int gc_condition(struct gc_config *cfg)
+{
+	/*
+	 * Note that it's fine to drop the repack arguments here, as we execute
+	 * git-gc(1) as a separate child process anyway. So it knows to compute
+	 * these arguments again.
+	 */
+	struct strvec repack_args = STRVEC_INIT;
+	int ret = need_to_gc(cfg, &repack_args);
+	strvec_clear(&repack_args);
+	return ret;
+}
+
 static int prune_packed(struct maintenance_run_opts *opts)
 {
 	struct child_process child = CHILD_PROCESS_INIT;
@@ -1596,7 +1612,7 @@ static const struct maintenance_task tasks[] = {
 		.name = "gc",
 		.foreground = maintenance_task_gc_foreground,
 		.background = maintenance_task_gc_background,
-		.auto_condition = need_to_gc,
+		.auto_condition = gc_condition,
 	},
 	[TASK_COMMIT_GRAPH] = {
 		.name = "commit-graph",

-- 
2.51.1.851.g4ebd6896fd.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 2/9] builtin/gc: make `too_many_loose_objects()` reusable without GC config
  2025-10-21 14:13 ` [PATCH v2 0/9] " Patrick Steinhardt
  2025-10-21 14:13   ` [PATCH v2 1/9] builtin/gc: remove global `repack` variable Patrick Steinhardt
@ 2025-10-21 14:13   ` Patrick Steinhardt
  2025-10-21 14:13   ` [PATCH v2 3/9] builtin/maintenance: introduce "geometric-repack" task Patrick Steinhardt
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-21 14:13 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau

To decide whether or not a repository needs to be repacked we estimate
the number of loose objects. If the number exceeds a certain threshold
we perform the repack, otherwise we don't.

This is done via `too_many_loose_objects()`, which takes as parameter
the `struct gc_config`. This configuration is only used to determine the
threshold. In a subsequent commit we'll add another caller of this
function that wants to pass a different limit than the one stored in
that structure.

Refactor the function accordingly so that we only take the limit as
parameter instead of the whole structure.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/gc.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index e9772eb3a30..026d3a1d714 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -447,7 +447,7 @@ static int rerere_gc_condition(struct gc_config *cfg UNUSED)
 	return should_gc;
 }
 
-static int too_many_loose_objects(struct gc_config *cfg)
+static int too_many_loose_objects(int limit)
 {
 	/*
 	 * Quickly check if a "gc" is needed, by estimating how
@@ -469,7 +469,7 @@ static int too_many_loose_objects(struct gc_config *cfg)
 	if (!dir)
 		return 0;
 
-	auto_threshold = DIV_ROUND_UP(cfg->gc_auto_threshold, 256);
+	auto_threshold = DIV_ROUND_UP(limit, 256);
 	while ((ent = readdir(dir)) != NULL) {
 		if (strspn(ent->d_name, "0123456789abcdef") != hexsz_loose ||
 		    ent->d_name[hexsz_loose] != '\0')
@@ -703,7 +703,7 @@ static int need_to_gc(struct gc_config *cfg, struct strvec *repack_args)
 
 		add_repack_all_option(cfg, &keep_pack, repack_args);
 		string_list_clear(&keep_pack, 0);
-	} else if (too_many_loose_objects(cfg))
+	} else if (too_many_loose_objects(cfg->gc_auto_threshold))
 		add_repack_incremental_option(repack_args);
 	else
 		return 0;
@@ -1057,7 +1057,7 @@ int cmd_gc(int argc,
 					     !opts.quiet && !daemonized ? COMMIT_GRAPH_WRITE_PROGRESS : 0,
 					     NULL);
 
-	if (opts.auto_flag && too_many_loose_objects(&cfg))
+	if (opts.auto_flag && too_many_loose_objects(cfg.gc_auto_threshold))
 		warning(_("There are too many unreachable loose objects; "
 			"run 'git prune' to remove them."));
 

-- 
2.51.1.851.g4ebd6896fd.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 3/9] builtin/maintenance: introduce "geometric-repack" task
  2025-10-21 14:13 ` [PATCH v2 0/9] " Patrick Steinhardt
  2025-10-21 14:13   ` [PATCH v2 1/9] builtin/gc: remove global `repack` variable Patrick Steinhardt
  2025-10-21 14:13   ` [PATCH v2 2/9] builtin/gc: make `too_many_loose_objects()` reusable without GC config Patrick Steinhardt
@ 2025-10-21 14:13   ` Patrick Steinhardt
  2025-10-23 19:29     ` Taylor Blau
  2025-10-21 14:13   ` [PATCH v2 4/9] builtin/maintenance: make the geometric factor configurable Patrick Steinhardt
                     ` (6 subsequent siblings)
  9 siblings, 1 reply; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-21 14:13 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau

Introduce a new "geometric-repack" task. This task uses our geometric
repack infrastructure as provided by git-repack(1) itself, which is a
strategy that especially hosting providers tend to use to amortize the
costs of repacking objects.

There is one issue though with geometric repacks, namely that they
unconditionally pack all loose objects, regardless of whether or not
they are reachable. This is done because it means that we can completely
skip the reachability step, which significantly speeds up the operation.
But it has the big downside that we are unable to expire objects over
time.

To address this issue we thus use a split strategy in this new task:
whenever a geometric repack would merge together all packs, we instead
do an all-into-one repack. By default, these all-into-one repacks have
cruft packs enabled, so unreachable objects would now be written into
their own pack. Consequently, they won't be soaked up during geometric
repacking anymore and can be expired with the next full repack, assuming
that their expiry date has surpassed.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/maintenance.adoc |  11 +++
 builtin/gc.c                          | 102 +++++++++++++++++++++++++
 t/t7900-maintenance.sh                | 139 ++++++++++++++++++++++++++++++++++
 3 files changed, 252 insertions(+)

diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc
index 2f719342183..26dc5de423f 100644
--- a/Documentation/config/maintenance.adoc
+++ b/Documentation/config/maintenance.adoc
@@ -75,6 +75,17 @@ maintenance.incremental-repack.auto::
 	number of pack-files not in the multi-pack-index is at least the value
 	of `maintenance.incremental-repack.auto`. The default value is 10.
 
+maintenance.geometric-repack.auto::
+	This integer config option controls how often the `geometric-repack`
+	task should be run as part of `git maintenance run --auto`. If zero,
+	then the `geometric-repack` task will not run with the `--auto`
+	option. A negative value will force the task to run every time.
+	Otherwise, a positive value implies the command should run either when
+	there are packfiles that need to be merged together to retain the
+	geometric progression, or when there are at least this many loose
+	objects that would be written into a new packfile. The default value is
+	100.
+
 maintenance.reflog-expire.auto::
 	This integer config option controls how often the `reflog-expire` task
 	should be run as part of `git maintenance run --auto`. If zero, then
diff --git a/builtin/gc.c b/builtin/gc.c
index 026d3a1d714..2c9ecd464d2 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -34,6 +34,7 @@
 #include "pack-objects.h"
 #include "path.h"
 #include "reflog.h"
+#include "repack.h"
 #include "rerere.h"
 #include "blob.h"
 #include "tree.h"
@@ -254,6 +255,7 @@ enum maintenance_task_label {
 	TASK_PREFETCH,
 	TASK_LOOSE_OBJECTS,
 	TASK_INCREMENTAL_REPACK,
+	TASK_GEOMETRIC_REPACK,
 	TASK_GC,
 	TASK_COMMIT_GRAPH,
 	TASK_PACK_REFS,
@@ -1566,6 +1568,101 @@ static int maintenance_task_incremental_repack(struct maintenance_run_opts *opts
 	return 0;
 }
 
+static int maintenance_task_geometric_repack(struct maintenance_run_opts *opts,
+					     struct gc_config *cfg)
+{
+	struct pack_geometry geometry = {
+		.split_factor = 2,
+	};
+	struct pack_objects_args po_args = {
+		.local = 1,
+	};
+	struct existing_packs existing_packs = EXISTING_PACKS_INIT;
+	struct string_list kept_packs = STRING_LIST_INIT_DUP;
+	struct child_process child = CHILD_PROCESS_INIT;
+	int ret;
+
+	existing_packs.repo = the_repository;
+	existing_packs_collect(&existing_packs, &kept_packs);
+	pack_geometry_init(&geometry, &existing_packs, &po_args);
+	pack_geometry_split(&geometry);
+
+	child.git_cmd = 1;
+
+	strvec_pushl(&child.args, "repack", "-d", "-l", NULL);
+	if (geometry.split < geometry.pack_nr)
+		strvec_push(&child.args, "--geometric=2");
+	else
+		add_repack_all_option(cfg, NULL, &child.args);
+	if (opts->quiet)
+		strvec_push(&child.args, "--quiet");
+	if (the_repository->settings.core_multi_pack_index)
+		strvec_push(&child.args, "--write-midx");
+
+	if (run_command(&child)) {
+		ret = error(_("failed to perform geometric repack"));
+		goto out;
+	}
+
+	ret = 0;
+
+out:
+	existing_packs_release(&existing_packs);
+	pack_geometry_release(&geometry);
+	return ret;
+}
+
+static int geometric_repack_auto_condition(struct gc_config *cfg UNUSED)
+{
+	struct pack_geometry geometry = {
+		.split_factor = 2,
+	};
+	struct pack_objects_args po_args = {
+		.local = 1,
+	};
+	struct existing_packs existing_packs = EXISTING_PACKS_INIT;
+	struct string_list kept_packs = STRING_LIST_INIT_DUP;
+	int auto_value = 100;
+	int ret;
+
+	repo_config_get_int(the_repository, "maintenance.geometric-repack.auto",
+			    &auto_value);
+	if (!auto_value)
+		return 0;
+	if (auto_value < 0)
+		return 1;
+
+	existing_packs.repo = the_repository;
+	existing_packs_collect(&existing_packs, &kept_packs);
+	pack_geometry_init(&geometry, &existing_packs, &po_args);
+	pack_geometry_split(&geometry);
+
+	/*
+	 * When we'd merge at least two packs with one another we always
+	 * perform the repack.
+	 */
+	if (geometry.split) {
+		ret = 1;
+		goto out;
+	}
+
+	/*
+	 * Otherwise, we estimate the number of loose objects to determine
+	 * whether we want to create a new packfile or not.
+	 */
+	if (too_many_loose_objects(auto_value)) {
+		ret = 1;
+		goto out;
+	}
+
+	ret = 0;
+
+out:
+	existing_packs_release(&existing_packs);
+	pack_geometry_release(&geometry);
+	return ret;
+}
+
 typedef int (*maintenance_task_fn)(struct maintenance_run_opts *opts,
 				   struct gc_config *cfg);
 typedef int (*maintenance_auto_fn)(struct gc_config *cfg);
@@ -1608,6 +1705,11 @@ static const struct maintenance_task tasks[] = {
 		.background = maintenance_task_incremental_repack,
 		.auto_condition = incremental_repack_auto_condition,
 	},
+	[TASK_GEOMETRIC_REPACK] = {
+		.name = "geometric-repack",
+		.background = maintenance_task_geometric_repack,
+		.auto_condition = geometric_repack_auto_condition,
+	},
 	[TASK_GC] = {
 		.name = "gc",
 		.foreground = maintenance_task_gc_foreground,
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index ddd273d8dc2..60029a65a35 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -465,6 +465,145 @@ test_expect_success 'maintenance.incremental-repack.auto (when config is unset)'
 	)
 '
 
+run_and_verify_geometric_pack () {
+	EXPECTED_PACKS="$1" &&
+
+	# Verify that we perform a geometric repack.
+	rm -f "trace2.txt" &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+		git maintenance run --task=geometric-repack 2>/dev/null &&
+	test_subcommand git repack -d -l --geometric=2 \
+		--quiet --write-midx <trace2.txt &&
+
+	# Verify that the number of packfiles matches our expectation.
+	ls -l .git/objects/pack/*.pack >packfiles &&
+	test_line_count = "$EXPECTED_PACKS" packfiles &&
+
+	# And verify that there are no loose objects anymore.
+	cat >expect <<-\EOF &&
+	info
+	pack
+	EOF
+	ls .git/objects >actual &&
+	test_cmp expect actual
+}
+
+test_expect_success 'geometric repacking task' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		git config set maintenance.auto false &&
+		test_commit initial &&
+
+		# The initial repack causes an all-into-one repack.
+		GIT_TRACE2_EVENT="$(pwd)/initial-repack.txt" \
+			git maintenance run --task=geometric-repack 2>/dev/null &&
+		test_subcommand git repack -d -l --cruft --cruft-expiration=2.weeks.ago --quiet --write-midx <initial-repack.txt &&
+
+		# Repacking should now cause a no-op geometric repack because
+		# no packfiles need to be combined.
+		ls -l .git/objects/pack >before &&
+		run_and_verify_geometric_pack 1 &&
+		ls -l .git/objects/pack >after &&
+		test_cmp before after &&
+
+		# This incremental change creates a new packfile that only
+		# soaks up loose objects. The packfiles are not getting merged
+		# at this point.
+		test_commit loose &&
+		run_and_verify_geometric_pack 2 &&
+
+		# Both packfiles have 3 objects, so the next run would cause us
+		# to merge all packfiles together. This should be turned into
+		# an all-into-one-repack.
+		GIT_TRACE2_EVENT="$(pwd)/all-into-one-repack.txt" \
+			git maintenance run --task=geometric-repack 2>/dev/null &&
+		test_subcommand git repack -d -l --cruft --cruft-expiration=2.weeks.ago --quiet --write-midx <all-into-one-repack.txt &&
+
+		# The geometric repack soaks up unreachable objects.
+		echo blob-1 | git hash-object -w --stdin -t blob &&
+		run_and_verify_geometric_pack 2 &&
+
+		# A second unreachable object should be written into another packfile.
+		echo blob-2 | git hash-object -w --stdin -t blob &&
+		run_and_verify_geometric_pack 3 &&
+
+		# And these two small packs should now be merged via the
+		# geometric repack. The large packfile should remain intact.
+		run_and_verify_geometric_pack 2 &&
+
+		# If we now add two more objects and repack twice we should
+		# then see another all-into-one repack. This time around
+		# though, as we have unreachable objects, we should also see a
+		# cruft pack.
+		echo blob-3 | git hash-object -w --stdin -t blob &&
+		echo blob-4 | git hash-object -w --stdin -t blob &&
+		run_and_verify_geometric_pack 3 &&
+		GIT_TRACE2_EVENT="$(pwd)/cruft-repack.txt" \
+			git maintenance run --task=geometric-repack 2>/dev/null &&
+		test_subcommand git repack -d -l --cruft --cruft-expiration=2.weeks.ago --quiet --write-midx <cruft-repack.txt &&
+		ls .git/objects/pack/*.pack >packs &&
+		test_line_count = 2 packs &&
+		ls .git/objects/pack/*.mtimes >cruft &&
+		test_line_count = 1 cruft
+	)
+'
+
+test_geometric_repack_needed () {
+	NEEDED="$1"
+	GEOMETRIC_CONFIG="$2" &&
+	rm -f trace2.txt &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+		git ${GEOMETRIC_CONFIG:+-c maintenance.geometric-repack.$GEOMETRIC_CONFIG} \
+		maintenance run --auto --task=geometric-repack 2>/dev/null &&
+	case "$NEEDED" in
+	true)
+		test_grep "\[\"git\",\"repack\"," trace2.txt;;
+	false)
+		! test_grep "\[\"git\",\"repack\"," trace2.txt;;
+	*)
+		BUG "invalid parameter: $NEEDED";;
+	esac
+}
+
+test_expect_success 'geometric repacking with --auto' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		# An empty repository does not need repacking, except when
+		# explicitly told to do it.
+		test_geometric_repack_needed false &&
+		test_geometric_repack_needed false auto=0 &&
+		test_geometric_repack_needed false auto=1 &&
+		test_geometric_repack_needed true auto=-1 &&
+
+		test_oid_init &&
+
+		# Loose objects cause a repack when crossing the limit. Note
+		# that the number of objects gets extrapolated by having a look
+		# at the "objects/17/" shard.
+		test_commit "$(test_oid blob17_1)" &&
+		test_geometric_repack_needed false &&
+		test_commit "$(test_oid blob17_2)" &&
+		test_geometric_repack_needed false auto=257 &&
+		test_geometric_repack_needed true auto=256 &&
+
+		# Force another repack.
+		test_commit first &&
+		test_commit second &&
+		test_geometric_repack_needed true auto=-1 &&
+
+		# We now have two packfiles that would be merged together. As
+		# such, the repack should always happen unless the user has
+		# disabled the auto task.
+		test_geometric_repack_needed false auto=0 &&
+		test_geometric_repack_needed true auto=9000
+	)
+'
+
 test_expect_success 'pack-refs task' '
 	for n in $(test_seq 1 5)
 	do

-- 
2.51.1.851.g4ebd6896fd.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 4/9] builtin/maintenance: make the geometric factor configurable
  2025-10-21 14:13 ` [PATCH v2 0/9] " Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2025-10-21 14:13   ` [PATCH v2 3/9] builtin/maintenance: introduce "geometric-repack" task Patrick Steinhardt
@ 2025-10-21 14:13   ` Patrick Steinhardt
  2025-10-23 19:33     ` Taylor Blau
  2025-10-21 14:13   ` [PATCH v2 5/9] builtin/maintenance: don't silently ignore invalid strategy Patrick Steinhardt
                     ` (5 subsequent siblings)
  9 siblings, 1 reply; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-21 14:13 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau

The geometric repacking task uses a factor of two for its geometric
sequence, meaning that each next pack must contain at least twice as
many objects as the next-smaller one. In some cases it may be helpful to
configure this factor though to reduce the number of packfile merges
even further, e.g. in very big repositories. But while git-repack(1)
itself supports doing this, the maintenance task does not give us a way
to tune it.

Introduce a new "maintenance.geometric-repack.splitFactor" configuration
to plug this gap.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/maintenance.adoc |  5 +++++
 builtin/gc.c                          |  9 ++++++++-
 t/t7900-maintenance.sh                | 32 ++++++++++++++++++++++++++++++++
 3 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc
index 26dc5de423f..45fdafc2c63 100644
--- a/Documentation/config/maintenance.adoc
+++ b/Documentation/config/maintenance.adoc
@@ -86,6 +86,11 @@ maintenance.geometric-repack.auto::
 	objects that would be written into a new packfile. The default value is
 	100.
 
+maintenance.geometric-repack.splitFactor::
+	This integer config option controls the factor used for the geometric
+	sequence. See the `--geometric=` option in linkgit:git-repack[1] for
+	more details. Defaults to `2`.
+
 maintenance.reflog-expire.auto::
 	This integer config option controls how often the `reflog-expire` task
 	should be run as part of `git maintenance run --auto`. If zero, then
diff --git a/builtin/gc.c b/builtin/gc.c
index 2c9ecd464d2..fb1a82e0304 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1582,6 +1582,9 @@ static int maintenance_task_geometric_repack(struct maintenance_run_opts *opts,
 	struct child_process child = CHILD_PROCESS_INIT;
 	int ret;
 
+	repo_config_get_int(the_repository, "maintenance.geometric-repack.splitFactor",
+			    &geometry.split_factor);
+
 	existing_packs.repo = the_repository;
 	existing_packs_collect(&existing_packs, &kept_packs);
 	pack_geometry_init(&geometry, &existing_packs, &po_args);
@@ -1591,7 +1594,8 @@ static int maintenance_task_geometric_repack(struct maintenance_run_opts *opts,
 
 	strvec_pushl(&child.args, "repack", "-d", "-l", NULL);
 	if (geometry.split < geometry.pack_nr)
-		strvec_push(&child.args, "--geometric=2");
+		strvec_pushf(&child.args, "--geometric=%d",
+			     geometry.split_factor);
 	else
 		add_repack_all_option(cfg, NULL, &child.args);
 	if (opts->quiet)
@@ -1632,6 +1636,9 @@ static int geometric_repack_auto_condition(struct gc_config *cfg UNUSED)
 	if (auto_value < 0)
 		return 1;
 
+	repo_config_get_int(the_repository, "maintenance.geometric-repack.splitFactor",
+			    &geometry.split_factor);
+
 	existing_packs.repo = the_repository;
 	existing_packs_collect(&existing_packs, &kept_packs);
 	pack_geometry_init(&geometry, &existing_packs, &po_args);
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 60029a65a35..8f332e7fbbe 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -604,6 +604,38 @@ test_expect_success 'geometric repacking with --auto' '
 	)
 '
 
+test_expect_success 'geometric repacking honors configured split factor' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		git config set maintenance.auto false &&
+
+		# Create three different packs with 9, 2 and 1 object, respectively.
+		# This is done so that only a subset of packs would be merged
+		# together so that we can verify that `git repack` receives the
+		# correct geometric factor.
+		for i in $(test_seq 9)
+		do
+			echo first-$i | git hash-object -w --stdin -t blob || return 1
+		done &&
+		git repack --geometric=2 -d &&
+
+		for i in $(test_seq 2)
+		do
+			echo second-$i | git hash-object -w --stdin -t blob || return 1
+		done &&
+		git repack --geometric=2 -d &&
+
+		echo third | git hash-object -w --stdin -t blob &&
+		git repack --geometric=2 -d &&
+
+		test_geometric_repack_needed false splitFactor=2 &&
+		test_geometric_repack_needed true splitFactor=3 &&
+		test_subcommand git repack -d -l --geometric=3 --quiet --write-midx <trace2.txt
+	)
+'
+
 test_expect_success 'pack-refs task' '
 	for n in $(test_seq 1 5)
 	do

-- 
2.51.1.851.g4ebd6896fd.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 5/9] builtin/maintenance: don't silently ignore invalid strategy
  2025-10-21 14:13 ` [PATCH v2 0/9] " Patrick Steinhardt
                     ` (3 preceding siblings ...)
  2025-10-21 14:13   ` [PATCH v2 4/9] builtin/maintenance: make the geometric factor configurable Patrick Steinhardt
@ 2025-10-21 14:13   ` Patrick Steinhardt
  2025-10-23 21:31     ` Taylor Blau
  2025-10-21 14:13   ` [PATCH v2 6/9] builtin/maintenance: run maintenance tasks depending on type Patrick Steinhardt
                     ` (4 subsequent siblings)
  9 siblings, 1 reply; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-21 14:13 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau

When parsing maintenance strategies we completely ignore the
user-configured value in case it is unknown to us. This makes it
basically undiscoverable to the user that scheduled maintenance is
devolving into a no-op.

Change this to instead die when seeing an unknown maintenance strategy.
While at it, pull out the parsing logic into a separate function so that
we can reuse it in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/gc.c           | 17 +++++++++++------
 t/t7900-maintenance.sh |  5 +++++
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index fb1a82e030..726d944d3b 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1855,6 +1855,13 @@ static const struct maintenance_strategy incremental_strategy = {
 	},
 };
 
+static struct maintenance_strategy parse_maintenance_strategy(const char *name)
+{
+	if (!strcasecmp(name, "incremental"))
+		return incremental_strategy;
+	die(_("unknown maintenance strategy: '%s'"), name);
+}
+
 static void initialize_task_config(struct maintenance_run_opts *opts,
 				   const struct string_list *selected_tasks)
 {
@@ -1890,12 +1897,10 @@ static void initialize_task_config(struct maintenance_run_opts *opts,
 	 * override specific aspects of our strategy.
 	 */
 	if (opts->schedule) {
-		strategy = none_strategy;
-
-		if (!repo_config_get_string_tmp(the_repository, "maintenance.strategy", &config_str)) {
-			if (!strcasecmp(config_str, "incremental"))
-				strategy = incremental_strategy;
-		}
+		if (!repo_config_get_string_tmp(the_repository, "maintenance.strategy", &config_str))
+			strategy = parse_maintenance_strategy(config_str);
+		else
+			strategy = none_strategy;
 	} else {
 		strategy = default_strategy;
 	}
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 8f332e7fbb..69fb6e9ee2 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -1264,6 +1264,11 @@ test_expect_success 'fails when running outside of a repository' '
 	nongit test_must_fail git maintenance unregister
 '
 
+test_expect_success 'fails when configured to use an invalid strategy' '
+	test_must_fail git -c maintenance.strategy=invalid maintenance run --schedule=hourly 2>err &&
+	test_grep "unknown maintenance strategy: .invalid." err
+'
+
 test_expect_success 'register and unregister bare repo' '
 	test_when_finished "git config --global --unset-all maintenance.repo || :" &&
 	test_might_fail git config --global --unset-all maintenance.repo &&

-- 
2.51.1.851.g4ebd6896fd.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 6/9] builtin/maintenance: run maintenance tasks depending on type
  2025-10-21 14:13 ` [PATCH v2 0/9] " Patrick Steinhardt
                     ` (4 preceding siblings ...)
  2025-10-21 14:13   ` [PATCH v2 5/9] builtin/maintenance: don't silently ignore invalid strategy Patrick Steinhardt
@ 2025-10-21 14:13   ` Patrick Steinhardt
  2025-10-23 21:34     ` Taylor Blau
  2025-10-21 14:13   ` [PATCH v2 7/9] builtin/maintenance: extend "maintenance.strategy" to manual maintenance Patrick Steinhardt
                     ` (3 subsequent siblings)
  9 siblings, 1 reply; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-21 14:13 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau

We basically have three different ways to execute repository
maintenance:

  1. Manual maintenance via `git maintenance run`.

  2. Automatic maintenance via `git maintenance run --auto`.

  3. Scheduled maintenance via `git maintenance run --schedule=`.

At the moment, maintenance strategies only have an effect for the last
type of maintenance. This is about to change in subsequent commits, but
to do so we need to be able to skip some tasks depending on how exactly
maintenance was invoked.

Introduce a new maintenance type that discern between manual (1 & 2) and
scheduled (3) maintenance. Convert the `enabled` field into a bitset so
that it becomes possible to specifiy which tasks exactly should run in a
specific context.

The types picked for existing strategies match the status quo:

  - The default strategy is only ever executed as part of a manual
    maintenance run. It is not possible to use it for scheduled
    maintenance.

  - The incremental strategy is only ever executed as part of a
    scheduled maintenance run. It is not possible to use it for manual
    maintenance.

The strategies will be tweaked in subsequent commits to make use of this
new infrastructure.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/gc.c | 30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index 726d944d3b..eff4e4886a 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1827,30 +1827,39 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts,
 	return result;
 }
 
+enum maintenance_type {
+	/* As invoked via `git maintenance run --schedule=`. */
+	MAINTENANCE_TYPE_SCHEDULED = (1 << 0),
+	/* As invoked via `git maintenance run` and with `--auto`. */
+	MAINTENANCE_TYPE_MANUAL    = (1 << 1),
+};
+
 struct maintenance_strategy {
 	struct {
-		int enabled;
+		unsigned type;
 		enum schedule_priority schedule;
 	} tasks[TASK__COUNT];
 };
 
 static const struct maintenance_strategy none_strategy = { 0 };
+
 static const struct maintenance_strategy default_strategy = {
 	.tasks = {
-		[TASK_GC].enabled = 1,
+		[TASK_GC].type = MAINTENANCE_TYPE_MANUAL,
 	},
 };
+
 static const struct maintenance_strategy incremental_strategy = {
 	.tasks = {
-		[TASK_COMMIT_GRAPH].enabled = 1,
+		[TASK_COMMIT_GRAPH].type = MAINTENANCE_TYPE_SCHEDULED,
 		[TASK_COMMIT_GRAPH].schedule = SCHEDULE_HOURLY,
-		[TASK_PREFETCH].enabled = 1,
+		[TASK_PREFETCH].type = MAINTENANCE_TYPE_SCHEDULED,
 		[TASK_PREFETCH].schedule = SCHEDULE_HOURLY,
-		[TASK_INCREMENTAL_REPACK].enabled = 1,
+		[TASK_INCREMENTAL_REPACK].type = MAINTENANCE_TYPE_SCHEDULED,
 		[TASK_INCREMENTAL_REPACK].schedule = SCHEDULE_DAILY,
-		[TASK_LOOSE_OBJECTS].enabled = 1,
+		[TASK_LOOSE_OBJECTS].type = MAINTENANCE_TYPE_SCHEDULED,
 		[TASK_LOOSE_OBJECTS].schedule = SCHEDULE_DAILY,
-		[TASK_PACK_REFS].enabled = 1,
+		[TASK_PACK_REFS].type = MAINTENANCE_TYPE_SCHEDULED,
 		[TASK_PACK_REFS].schedule = SCHEDULE_WEEKLY,
 	},
 };
@@ -1867,6 +1876,7 @@ static void initialize_task_config(struct maintenance_run_opts *opts,
 {
 	struct strbuf config_name = STRBUF_INIT;
 	struct maintenance_strategy strategy;
+	enum maintenance_type type;
 	const char *config_str;
 
 	/*
@@ -1901,8 +1911,10 @@ static void initialize_task_config(struct maintenance_run_opts *opts,
 			strategy = parse_maintenance_strategy(config_str);
 		else
 			strategy = none_strategy;
+		type = MAINTENANCE_TYPE_SCHEDULED;
 	} else {
 		strategy = default_strategy;
+		type = MAINTENANCE_TYPE_MANUAL;
 	}
 
 	for (size_t i = 0; i < TASK__COUNT; i++) {
@@ -1912,8 +1924,8 @@ static void initialize_task_config(struct maintenance_run_opts *opts,
 		strbuf_addf(&config_name, "maintenance.%s.enabled",
 			    tasks[i].name);
 		if (!repo_config_get_bool(the_repository, config_name.buf, &config_value))
-			strategy.tasks[i].enabled = config_value;
-		if (!strategy.tasks[i].enabled)
+			strategy.tasks[i].type = config_value ? type : 0;
+		if (!(strategy.tasks[i].type & type))
 			continue;
 
 		if (opts->schedule) {

-- 
2.51.1.851.g4ebd6896fd.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 7/9] builtin/maintenance: extend "maintenance.strategy" to manual maintenance
  2025-10-21 14:13 ` [PATCH v2 0/9] " Patrick Steinhardt
                     ` (5 preceding siblings ...)
  2025-10-21 14:13   ` [PATCH v2 6/9] builtin/maintenance: run maintenance tasks depending on type Patrick Steinhardt
@ 2025-10-21 14:13   ` Patrick Steinhardt
  2025-10-21 14:13   ` [PATCH v2 8/9] builtin/maintenance: make "gc" strategy accessible Patrick Steinhardt
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-21 14:13 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau

The "maintenance.strategy" configuration allows users to configure how
Git is supposed to perform repository maintenance. The idea is that we
provide a set of high-level strategies that may be useful in different
contexts, like for example when handling a large monorepo. Furthermore,
the strategy can be tweaked by the user by overriding specific tasks.

In its current form though, the strategy only applies to scheduled
maintenance. This creates something of a gap, as scheduled and manual
maintenance will now use _different_ strategies as the latter would
continue to use git-gc(1) by default. This makes the strategies way less
useful than they could be on the one hand. But even more importantly,
the two different strategies might clash with one another, where one of
the strategies performs maintenance in such a way that it discards
benefits from the other strategy.

So ideally, it should be possible to pick one strategy that then applies
globally to all the different ways that we perform maintenance. This
doesn't necessarily mean that the strategy always does the _same_ thing
for every maintenance type. But it means that the strategy can configure
the different types to work in tandem with each other.

Change the meaning of "maintenance.strategy" accordingly so that the
strategy is applied to both types, manual and scheduled. As preceding
commits have introduced logic to run maintenance tasks depending on this
type we can tweak strategies so that they perform those tasks depending
on the context.

Note that this raises the question of backwards compatibility: when the
user has configured the "incremental" strategy we would have ignored
that strategy beforehand. Instead, repository maintenance would have
continued to use git-gc(1) by default.

But luckily, we can match that behaviour by:

  - Keeping all current tasks of the incremental strategy as
    `MAINTENANCE_TYPE_SCHEDULED`. This ensures that those tasks will not
    run during manual maintenance.

  - Configuring the "gc" task so that it is invoked during manual
    maintenance.

Like this, the user shouldn't observe any difference in behaviour.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/maintenance.adoc | 22 ++++++++++++-------
 builtin/gc.c                          | 24 ++++++++++++++++-----
 t/t7900-maintenance.sh                | 40 +++++++++++++++++++++++++++++++++++
 3 files changed, 73 insertions(+), 13 deletions(-)

diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc
index 45fdafc2c6..b7e90a71a3 100644
--- a/Documentation/config/maintenance.adoc
+++ b/Documentation/config/maintenance.adoc
@@ -16,19 +16,25 @@ detach.
 
 maintenance.strategy::
 	This string config option provides a way to specify one of a few
-	recommended schedules for background maintenance. This only affects
-	which tasks are run during `git maintenance run --schedule=X`
-	commands, provided no `--task=<task>` arguments are provided.
-	Further, if a `maintenance.<task>.schedule` config value is set,
-	then that value is used instead of the one provided by
-	`maintenance.strategy`. The possible strategy strings are:
+	recommended strategies for repository maintenance. This affects
+	which tasks are run during `git maintenance run`, provided no
+	`--task=<task>` arguments are provided. This setting impacts manual
+	maintenance, auto-maintenance as well as scheduled maintenance. The
+	tasks that run may be different depending on the maintenance type.
 +
-* `none`: This default setting implies no tasks are run at any schedule.
+The maintenance strategy can be further tweaked by setting
+`maintenance.<task>.enabled` and `maintenance.<task>.schedule`. If set, these
+values are used instead of the defaults provided by `maintenance.strategy`.
++
+The possible strategies are:
++
+* `none`: This strategy implies no tasks are run at all. This is the default
+  strategy for scheduled maintenance.
 * `incremental`: This setting optimizes for performing small maintenance
   activities that do not delete any data. This does not schedule the `gc`
   task, but runs the `prefetch` and `commit-graph` tasks hourly, the
   `loose-objects` and `incremental-repack` tasks daily, and the `pack-refs`
-  task weekly.
+  task weekly. Manual repository maintenance uses the `gc` task.
 
 maintenance.<task>.enabled::
 	This boolean config option controls whether the maintenance task
diff --git a/builtin/gc.c b/builtin/gc.c
index eff4e4886a..9c05905b9a 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1861,6 +1861,19 @@ static const struct maintenance_strategy incremental_strategy = {
 		[TASK_LOOSE_OBJECTS].schedule = SCHEDULE_DAILY,
 		[TASK_PACK_REFS].type = MAINTENANCE_TYPE_SCHEDULED,
 		[TASK_PACK_REFS].schedule = SCHEDULE_WEEKLY,
+
+		/*
+		 * Historically, the "incremental" strategy was only available
+		 * in the context of scheduled maintenance when set up via
+		 * "maintenance.strategy". We have later expanded that config
+		 * to also cover manual maintenance.
+		 *
+		 * To retain backwards compatibility with the previous status
+		 * quo we thus run git-gc(1) in case manual maintenance was
+		 * requested. This is the same as the default strategy, which
+		 * would have been in use beforehand.
+		 */
+		[TASK_GC].type = MAINTENANCE_TYPE_MANUAL,
 	},
 };
 
@@ -1904,19 +1917,20 @@ static void initialize_task_config(struct maintenance_run_opts *opts,
 	 *   - Unscheduled maintenance uses our default strategy.
 	 *
 	 * Both of these are affected by the gitconfig though, which may
-	 * override specific aspects of our strategy.
+	 * override specific aspects of our strategy. Furthermore, both
+	 * strategies can be overridden by setting "maintenance.strategy".
 	 */
 	if (opts->schedule) {
-		if (!repo_config_get_string_tmp(the_repository, "maintenance.strategy", &config_str))
-			strategy = parse_maintenance_strategy(config_str);
-		else
-			strategy = none_strategy;
+		strategy = none_strategy;
 		type = MAINTENANCE_TYPE_SCHEDULED;
 	} else {
 		strategy = default_strategy;
 		type = MAINTENANCE_TYPE_MANUAL;
 	}
 
+	if (!repo_config_get_string_tmp(the_repository, "maintenance.strategy", &config_str))
+		strategy = parse_maintenance_strategy(config_str);
+
 	for (size_t i = 0; i < TASK__COUNT; i++) {
 		int config_value;
 
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 69fb6e9ee2..3530895bfb 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -887,6 +887,46 @@ test_expect_success 'maintenance.strategy inheritance' '
 		<modified-daily.txt
 '
 
+test_strategy () {
+	STRATEGY="$1"
+	shift
+
+	cat >expect &&
+	rm -f trace2.txt &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+		git -c maintenance.strategy=$STRATEGY maintenance run --quiet "$@" &&
+	sed -n 's/{"event":"child_start","sid":"[^/"]*",.*,"argv":\["\(.*\)\"]}/\1/p' <trace2.txt |
+		sed 's/","/ /g'  >actual
+	test_cmp expect actual
+}
+
+test_expect_success 'maintenance.strategy is respected' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit initial &&
+
+		test_must_fail git -c maintenance.strategy=unknown maintenance run 2>err &&
+		test_grep "unknown maintenance strategy: .unknown." err &&
+
+		test_strategy incremental <<-\EOF &&
+		git pack-refs --all --prune
+		git reflog expire --all
+		git gc --quiet --no-detach --skip-foreground-tasks
+		EOF
+
+		test_strategy incremental --schedule=weekly <<-\EOF
+		git pack-refs --all --prune
+		git prune-packed --quiet
+		git multi-pack-index write --no-progress
+		git multi-pack-index expire --no-progress
+		git multi-pack-index repack --no-progress --batch-size=1
+		git commit-graph write --split --reachable --no-progress
+		EOF
+	)
+'
+
 test_expect_success 'register and unregister' '
 	test_when_finished git config --global --unset-all maintenance.repo &&
 

-- 
2.51.1.851.g4ebd6896fd.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 8/9] builtin/maintenance: make "gc" strategy accessible
  2025-10-21 14:13 ` [PATCH v2 0/9] " Patrick Steinhardt
                     ` (6 preceding siblings ...)
  2025-10-21 14:13   ` [PATCH v2 7/9] builtin/maintenance: extend "maintenance.strategy" to manual maintenance Patrick Steinhardt
@ 2025-10-21 14:13   ` Patrick Steinhardt
  2025-10-21 14:13   ` [PATCH v2 9/9] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
  2025-10-23 16:48   ` [PATCH v2 0/9] " Junio C Hamano
  9 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-21 14:13 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau

While the user can pick the "incremental" maintenance strategy, it is
not possible to explicitly use the "gc" strategy. This has two
downsides:

  - It is impossible to use the default "gc" strategy for a specific
    repository when the strategy was globally set to a different strategy.

  - It is not possible to use git-gc(1) for scheduled maintenance.

Address these issues by making making the "gc" strategy configurable.
Furthermore, extend the strategy so that git-gc(1) runs for both manual
and scheduled maintenance.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/maintenance.adoc |  2 ++
 builtin/gc.c                          |  9 ++++++---
 t/t7900-maintenance.sh                | 14 +++++++++++++-
 3 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc
index b7e90a71a3..b2bacdc822 100644
--- a/Documentation/config/maintenance.adoc
+++ b/Documentation/config/maintenance.adoc
@@ -30,6 +30,8 @@ The possible strategies are:
 +
 * `none`: This strategy implies no tasks are run at all. This is the default
   strategy for scheduled maintenance.
+* `gc`: This strategy runs the `gc` task. This is the default strategy for
+  manual maintenance.
 * `incremental`: This setting optimizes for performing small maintenance
   activities that do not delete any data. This does not schedule the `gc`
   task, but runs the `prefetch` and `commit-graph` tasks hourly, the
diff --git a/builtin/gc.c b/builtin/gc.c
index 9c05905b9a..aaff0bae15 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1843,9 +1843,10 @@ struct maintenance_strategy {
 
 static const struct maintenance_strategy none_strategy = { 0 };
 
-static const struct maintenance_strategy default_strategy = {
+static const struct maintenance_strategy gc_strategy = {
 	.tasks = {
-		[TASK_GC].type = MAINTENANCE_TYPE_MANUAL,
+		[TASK_GC].type = MAINTENANCE_TYPE_MANUAL | MAINTENANCE_TYPE_SCHEDULED,
+		[TASK_GC].schedule = SCHEDULE_DAILY,
 	},
 };
 
@@ -1881,6 +1882,8 @@ static struct maintenance_strategy parse_maintenance_strategy(const char *name)
 {
 	if (!strcasecmp(name, "incremental"))
 		return incremental_strategy;
+	if (!strcasecmp(name, "gc"))
+		return gc_strategy;
 	die(_("unknown maintenance strategy: '%s'"), name);
 }
 
@@ -1924,7 +1927,7 @@ static void initialize_task_config(struct maintenance_run_opts *opts,
 		strategy = none_strategy;
 		type = MAINTENANCE_TYPE_SCHEDULED;
 	} else {
-		strategy = default_strategy;
+		strategy = gc_strategy;
 		type = MAINTENANCE_TYPE_MANUAL;
 	}
 
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 3530895bfb..2770148fd1 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -916,7 +916,7 @@ test_expect_success 'maintenance.strategy is respected' '
 		git gc --quiet --no-detach --skip-foreground-tasks
 		EOF
 
-		test_strategy incremental --schedule=weekly <<-\EOF
+		test_strategy incremental --schedule=weekly <<-\EOF &&
 		git pack-refs --all --prune
 		git prune-packed --quiet
 		git multi-pack-index write --no-progress
@@ -924,6 +924,18 @@ test_expect_success 'maintenance.strategy is respected' '
 		git multi-pack-index repack --no-progress --batch-size=1
 		git commit-graph write --split --reachable --no-progress
 		EOF
+
+		test_strategy gc <<-\EOF &&
+		git pack-refs --all --prune
+		git reflog expire --all
+		git gc --quiet --no-detach --skip-foreground-tasks
+		EOF
+
+		test_strategy gc --schedule=weekly <<-\EOF
+		git pack-refs --all --prune
+		git reflog expire --all
+		git gc --quiet --no-detach --skip-foreground-tasks
+		EOF
 	)
 '
 

-- 
2.51.1.851.g4ebd6896fd.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 9/9] builtin/maintenance: introduce "geometric" strategy
  2025-10-21 14:13 ` [PATCH v2 0/9] " Patrick Steinhardt
                     ` (7 preceding siblings ...)
  2025-10-21 14:13   ` [PATCH v2 8/9] builtin/maintenance: make "gc" strategy accessible Patrick Steinhardt
@ 2025-10-21 14:13   ` Patrick Steinhardt
  2025-10-23 21:49     ` Taylor Blau
  2025-10-23 16:48   ` [PATCH v2 0/9] " Junio C Hamano
  9 siblings, 1 reply; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-21 14:13 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau

We have two different repacking strategies in Git:

  - The "gc" strategy uses git-gc(1).

  - The "incremental" strategy uses multi-pack indices and `git
    multi-pack-index repack` to merge together smaller packfiles as
    determined by a specific batch size.

The former strategy is our old and trusted default, whereas the latter
has historically been used for our scheduled maintenance. But both
strategies have their shortcomings:

  - The "gc" strategy performs regular all-into-one repacks. Furthermore
    it is rather inflexible, as it is not easily possible for a user to
    enable or disable specific subtasks.

  - The "incremental" strategy is not a full replacement for the "gc"
    strategy as it doesn't know to prune stale data.

So today, we don't have a strategy that is well-suited for large repos
while being a full replacement for the "gc" strategy.

Introduce a new "geometric" strategy that aims to fill this gap. This
strategy invokes all the usual cleanup tasks that git-gc(1) does like
pruning reflogs and rerere caches as well as stale worktrees. But where
it differs from both the "gc" and "incremental" strategy is that it uses
our geometric repacking infrastructure exposed by git-repack(1) to
repack packfiles. The advantage of geometric repacking is that we only
need to perform an all-into-one repack when the object count in a repo
has grown significantly.

One downside of this strategy is that pruning of unreferenced objects is
not going to happen regularly anymore. Every geometric repack knows to
soak up all loose objects regardless of their reachability, and merging
two or more packs doesn't consider reachability, either. Consequently,
the number of unreachable objects will grow over time.

This is remedied by doing an all-into-one repack instead of a geometric
repack whenever we determine that the geometric repack would end up
merging all packfiles anyway. This all-into-one repack then performs our
usual reachability checks and writes unreachable objects into a cruft
pack. As cruft packs won't ever be merged during geometric repacks we
can thus phase out these objects over time.

Of course, this still means that we retain unreachable objects for far
longer than with the "gc" strategy. But the maintenance strategy is
intended especially for large repositories, where the basic assumption
is that the set of unreachable objects will be significantly dwarfed by
the number of reachable objects.

If this assumption is ever proven to be too disadvantageous we could for
example introduce a time-based strategy: if the largest packfile has not
been touched for longer than $T, we perform an all-into-one repack. But
for now, such a mechanism is deferred into the future as it is not clear
yet whether it is needed in the first place.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/maintenance.adoc |  9 +++++++++
 builtin/gc.c                          | 19 +++++++++++++++++++
 t/t7900-maintenance.sh                | 20 +++++++++++++++++++-
 3 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc
index b2bacdc822..d0c38f03fa 100644
--- a/Documentation/config/maintenance.adoc
+++ b/Documentation/config/maintenance.adoc
@@ -32,6 +32,15 @@ The possible strategies are:
   strategy for scheduled maintenance.
 * `gc`: This strategy runs the `gc` task. This is the default strategy for
   manual maintenance.
+* `geometric`: This strategy performs geometric repacking of packfiles and
+  keeps auxiliary data structures up-to-date. The strategy expires data in the
+  reflog and removes worktrees that cannot be located anymore. When the
+  geometric repacking strategy would decide to do an all-into-one repack, then
+  the strategy generates a cruft pack for all unreachable objects. Objects that
+  are already part of a cruft pack will be expired.
++
+This repacking strategy is a full replacement for the `gc` strategy and is
+recommended for large repositories.
 * `incremental`: This setting optimizes for performing small maintenance
   activities that do not delete any data. This does not schedule the `gc`
   task, but runs the `prefetch` and `commit-graph` tasks hourly, the
diff --git a/builtin/gc.c b/builtin/gc.c
index aaff0bae15..9739bb0ea2 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1878,12 +1878,31 @@ static const struct maintenance_strategy incremental_strategy = {
 	},
 };
 
+static const struct maintenance_strategy geometric_strategy = {
+	.tasks = {
+		[TASK_COMMIT_GRAPH].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+		[TASK_COMMIT_GRAPH].schedule = SCHEDULE_HOURLY,
+		[TASK_GEOMETRIC_REPACK].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+		[TASK_GEOMETRIC_REPACK].schedule = SCHEDULE_DAILY,
+		[TASK_PACK_REFS].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+		[TASK_PACK_REFS].schedule = SCHEDULE_DAILY,
+		[TASK_RERERE_GC].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+		[TASK_RERERE_GC].schedule = SCHEDULE_WEEKLY,
+		[TASK_REFLOG_EXPIRE].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+		[TASK_REFLOG_EXPIRE].schedule = SCHEDULE_WEEKLY,
+		[TASK_WORKTREE_PRUNE].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+		[TASK_WORKTREE_PRUNE].schedule = SCHEDULE_WEEKLY,
+	},
+};
+
 static struct maintenance_strategy parse_maintenance_strategy(const char *name)
 {
 	if (!strcasecmp(name, "incremental"))
 		return incremental_strategy;
 	if (!strcasecmp(name, "gc"))
 		return gc_strategy;
+	if (!strcasecmp(name, "geometric"))
+		return geometric_strategy;
 	die(_("unknown maintenance strategy: '%s'"), name);
 }
 
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 2770148fd1..aedb9e7e8e 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -931,11 +931,29 @@ test_expect_success 'maintenance.strategy is respected' '
 		git gc --quiet --no-detach --skip-foreground-tasks
 		EOF
 
-		test_strategy gc --schedule=weekly <<-\EOF
+		test_strategy gc --schedule=weekly <<-\EOF &&
 		git pack-refs --all --prune
 		git reflog expire --all
 		git gc --quiet --no-detach --skip-foreground-tasks
 		EOF
+
+		test_strategy geometric <<-\EOF &&
+		git pack-refs --all --prune
+		git reflog expire --all
+		git repack -d -l --geometric=2 --quiet --write-midx
+		git commit-graph write --split --reachable --no-progress
+		git worktree prune --expire 3.months.ago
+		git rerere gc
+		EOF
+
+		test_strategy geometric --schedule=weekly <<-\EOF
+		git pack-refs --all --prune
+		git reflog expire --all
+		git repack -d -l --geometric=2 --quiet --write-midx
+		git commit-graph write --split --reachable --no-progress
+		git worktree prune --expire 3.months.ago
+		git rerere gc
+		EOF
 	)
 '
 

-- 
2.51.1.851.g4ebd6896fd.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 0/9] builtin/maintenance: introduce "geometric" strategy
  2025-10-21 14:13 ` [PATCH v2 0/9] " Patrick Steinhardt
                     ` (8 preceding siblings ...)
  2025-10-21 14:13   ` [PATCH v2 9/9] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
@ 2025-10-23 16:48   ` Junio C Hamano
  2025-10-23 21:50     ` Taylor Blau
  9 siblings, 1 reply; 69+ messages in thread
From: Junio C Hamano @ 2025-10-23 16:48 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Derrick Stolee, Taylor Blau

Patrick Steinhardt <ps@pks.im> writes:

> The series is built on top of b660e2dcb9 (Sync with 'maint', 2025-10-14)
> with tb/incremental-midx-part-3.1 at c886af90f8 (SQUASH??? play well
> with other topics by preemptively including "repository.h", 2025-09-29)
> merged into it.
>
> Changes in v2:
>   - Make the geometric factor configurable via
>     "maintenance.geometric-repack.splitFactor".
>   - Wrap some overly long lines in our tests.
>   - Link to v1: https://lore.kernel.org/r/20251016-pks-maintenance-geometric-strategy-v1-0-18943d474203@pks.im
>
> Thanks!

This round looks good to me (I wasn't very careful picking typos and
minor mistakes, but the resulting code overall looked sound).

Thanks.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 3/8] builtin/maintenance: introduce "geometric-repack" task
  2025-10-21 13:00     ` Patrick Steinhardt
@ 2025-10-23 19:19       ` Taylor Blau
  2025-10-24  5:44         ` Patrick Steinhardt
  0 siblings, 1 reply; 69+ messages in thread
From: Taylor Blau @ 2025-10-23 19:19 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Derrick Stolee

On Tue, Oct 21, 2025 at 03:00:31PM +0200, Patrick Steinhardt wrote:
> > OK. To make sure I understand: this limit is the minimum number of loose
> > objects would cause the geometric-repack task to run, unless there are
> > pack(s) which would be combined as a result of running a geometric
> > repack, in which case we run it regardless.
> >
> > Is that right?
>
> Yeah, exactly. I was initially thinking to only frame this in the
> context of "how many packs would be merged"? But the problem with that
> is that we wouldn't ever generate _new_ packs if the repository only
> ever grows loose objects, and consequently we also wouldn't ever merge
> any of them.

Makes total sense, I like the direction that you picked here.

> > > @@ -1566,6 +1568,101 @@ static int maintenance_task_incremental_repack(struct maintenance_run_opts *opts
> > >  	return 0;
> > >  }
> > >
> > > +static int maintenance_task_geometric_repack(struct maintenance_run_opts *opts,
> > > +					     struct gc_config *cfg)
> > > +{
> > > +	struct pack_geometry geometry = {
> > > +		.split_factor = 2,
> >
> > I wonder if this should be configurable somewhere. It might not be a bad
> > idea to introduce a 'repack.geometricSplitFactor' configuration
> > variable, defaulting to two, and use that here. It would also be nice to
> > be able to run 'git repack --geometric -d' and have it fallback to that
> > split factor, since using "2" is so common that it's frustrating when I
> > forget to type it out explicitly ;-).
>
> I was also pondering over this. I think the way to do so would be to
> introduce "maintenance.geometric-repack.splitFactor", as that follows
> all the other maintenance configuration we have there, as well.
>
> I decided to not do it yet as I wanted to keep the scope of this patch
> series contained. But honestly, it's an easy-enough change to make, so
> let me introduce another patch to do this.

I'm glad to have ~~nerd-sniped~~ convinced you that this was an
interesting addition ;-). Thanks for working on it.

> > > +static int geometric_repack_auto_condition(struct gc_config *cfg UNUSED)
> > > +{
> > > +	struct pack_geometry geometry = {
> > > +		.split_factor = 2,
> > > +	};
> > > +	struct pack_objects_args po_args = {
> > > +		.local = 1,
> > > +	};
> > > +	struct existing_packs existing_packs = EXISTING_PACKS_INIT;
> > > +	struct string_list kept_packs = STRING_LIST_INIT_DUP;
> > > +	int auto_value = 100;
> > > +	int ret;
> > > +
> > > +	repo_config_get_int(the_repository, "maintenance.geometric-repack.auto",
> > > +			    &auto_value);
> > > +	if (!auto_value)
> > > +		return 0;
> > > +	if (auto_value < 0)
> > > +		return 1;
> > > +
> > > +	existing_packs.repo = the_repository;
> > > +	existing_packs_collect(&existing_packs, &kept_packs);
> > > +	pack_geometry_init(&geometry, &existing_packs, &po_args);
> > > +	pack_geometry_split(&geometry);
> > > +
> > > +	/*
> > > +	 * When we'd merge at least two packs with one another we always
> > > +	 * perform the repack.
> > > +	 */
> > > +	if (geometry.split) {
> > > +		ret = 1;
> > > +		goto out;
> > > +	}
> >
> > Hmm. I wish that we could somehow pass this information to the function
> > above so that we don't have to re-discover the fact that there are packs
> > to combine. I'm not familiar enough with the maintenance code to know
> > how difficult that would be to do, but it looks like at least the
> > gc_config pointer is shared between the auto condition and the task
> > itself.
> >
> > That's kind of gross to tack on there, but I could see a compelling
> > argument for passing around an extra void pointer between the two that
> > would allow us to propagate this kind of data between the auto condition
> > and the task itself. It's not super expensive to do so I don't think not
> > doing it is a show-stopper at least from a performance perspective, but
> > it does seem like a good opportunity to DRY things up a bit.
>
> The problem is that the auto-condition is not evaluated when running
> without the "--auto" flag. We of course can conditionally compute the
> split in case we figure that the auto-condition didn't run, but it does
> get somewhat dirty.
>
> So I'd propose to defer such a change into the future in case we notice
> that this indeed is a problem. Is that fine with you?

Ah, makes sense, I definitely felt like I was missing something here.
I agree that punting on this feels like the right thing to do.

> > > diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
> > > index ddd273d8dc2..83a373fe94b 100755
> > > --- a/t/t7900-maintenance.sh
> > > +++ b/t/t7900-maintenance.sh
> > > @@ -465,6 +465,143 @@ test_expect_success 'maintenance.incremental-repack.auto (when config is unset)'
> > >  	)
> > >  '
> > >
> > > +run_and_verify_geometric_pack () {
> > > +	EXPECTED_PACKS="$1" &&
> > > +
> > > +	# Verify that we perform a geometric repack.
> > > +	rm -f "trace2.txt" &&
> > > +	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
> > > +		git maintenance run --task=geometric-repack 2>/dev/null &&
> > > +	test_subcommand git repack -d -l --geometric=2 --quiet --write-midx <trace2.txt &&
> >
> > Makes sense. I do think the test_subcommand thing is a little fragile
> > here, but verifying that the resulting pack structure forms a geometric
> > progression feels like overkill for this test, so I think what you wrote
> > here makes sense.
>
> Oh, yeah, it's fragile and somewhat gross indeed. Couldn't really find a
> nicer way to do it though :/

Yeah, I don't think that there's a much better alternative. At some
level, we really just care that we ran a geometric repack, and that's
exactly what test_command is verifying. I think the fragility comes from
cases like:

 - You started (or stopped) passing an ancillary argument to the command
   in such a way that caused test_subcommand to no longer match, but the
   change in how we call the command internally is not meaningful to the
   test.

 - You stopped passing an argument (e.g., "--geometric=2") in a way that
   *looks* like it would have altered the behavior of the command, but
   actually doesn't, e.g., because you made the geometric factor of "2"
   the default, or otherwise specified via the config.

I think this is all just idle musing about test_subcommand in general,
and not useful to this immediate patch series.

> > > +
> > > +	# Verify that the number of packfiles matches our expectation.
> > > +	ls -l .git/objects/pack/*.pack >packfiles &&
> > > +	test_line_count = "$EXPECTED_PACKS" packfiles &&
> > > +
> > > +	# And verify that there are no loose objects anymore.
> > > +	cat >expect <<-\EOF &&
> > > +	info
> > > +	pack
> > > +	EOF
> > > +	ls .git/objects >actual &&
> >
> > I wonder if there is an easier way to check for loose objects here that
> > doesn't require you to know that the "info" and "pack" directories
> > exist. Perhaps something like:
> >
> > test_stdout_line_count = 0 find .git/objects/?? -type f
> >
> > , or even
> >
> >     find .git/objects/?? -type f >loose.objs &&
> >     test_must_be_empty loose.objs
>
> This doesn't work though in case there is not even a single sharding
> directory:
>
>     find: '.git/objects/??': No such file or directory
>
> I didn't really have any other idea for now to do this.

Mmm, good point. What about using 'git count-objects -v' directly?

    test_loose_object_nr() {
      local nr="$1" &&
      git count-objects -v >count &&
      grep '^count $nr$" count
    }

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 3/9] builtin/maintenance: introduce "geometric-repack" task
  2025-10-21 14:13   ` [PATCH v2 3/9] builtin/maintenance: introduce "geometric-repack" task Patrick Steinhardt
@ 2025-10-23 19:29     ` Taylor Blau
  2025-10-24  5:45       ` Patrick Steinhardt
  0 siblings, 1 reply; 69+ messages in thread
From: Taylor Blau @ 2025-10-23 19:29 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Derrick Stolee

On Tue, Oct 21, 2025 at 04:13:25PM +0200, Patrick Steinhardt wrote:
> +test_expect_success 'geometric repacking task' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		git config set maintenance.auto false &&
> +		test_commit initial &&
> +
> +		# The initial repack causes an all-into-one repack.
> +		GIT_TRACE2_EVENT="$(pwd)/initial-repack.txt" \
> +			git maintenance run --task=geometric-repack 2>/dev/null &&
> +		test_subcommand git repack -d -l --cruft --cruft-expiration=2.weeks.ago --quiet --write-midx <initial-repack.txt &&

Not a show-stopper of course, but I thought from the cover letter that
these lines would have gotten wrapped. Whether or not we have lines
longer than 80 characters is not a hill that I'd like to die on, of
course ;-). But I brought it up because I am wondering if there were
some changes that you meant to include as a part of this round that got
dropped in the shuffle.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 4/9] builtin/maintenance: make the geometric factor configurable
  2025-10-21 14:13   ` [PATCH v2 4/9] builtin/maintenance: make the geometric factor configurable Patrick Steinhardt
@ 2025-10-23 19:33     ` Taylor Blau
  2025-10-24  5:45       ` Patrick Steinhardt
  0 siblings, 1 reply; 69+ messages in thread
From: Taylor Blau @ 2025-10-23 19:33 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Derrick Stolee

On Tue, Oct 21, 2025 at 04:13:26PM +0200, Patrick Steinhardt wrote:
> The geometric repacking task uses a factor of two for its geometric
> sequence, meaning that each next pack must contain at least twice as
> many objects as the next-smaller one. In some cases it may be helpful to
> configure this factor though to reduce the number of packfile merges
> even further, e.g. in very big repositories. But while git-repack(1)
> itself supports doing this, the maintenance task does not give us a way
> to tune it.
>
> Introduce a new "maintenance.geometric-repack.splitFactor" configuration
> to plug this gap.

Interesting, this wasn't exactly what I had in my mind when reading the
last round, but I think this is worth doing on its own. My apologies for
being ambiguous in my earlier message :-s.

I was suggesting that we have a repack.geometricFactor configuration
variable that defaulted to two, could be overridden by --geometric=<n>,
such that we could start doing "git repack --geometric" without having
to write "=2" every time.

I think that that is probably still a useful thing to do in and of
itself, but this change doesn't preclude our ability to do that, since
it just overwrites what we pass in to 'git repack' when calling it from
within the maintenance context.

> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  Documentation/config/maintenance.adoc |  5 +++++
>  builtin/gc.c                          |  9 ++++++++-
>  t/t7900-maintenance.sh                | 32 ++++++++++++++++++++++++++++++++
>  3 files changed, 45 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc
> index 26dc5de423f..45fdafc2c63 100644
> --- a/Documentation/config/maintenance.adoc
> +++ b/Documentation/config/maintenance.adoc
> @@ -86,6 +86,11 @@ maintenance.geometric-repack.auto::
>  	objects that would be written into a new packfile. The default value is
>  	100.
>
> +maintenance.geometric-repack.splitFactor::
> +	This integer config option controls the factor used for the geometric
> +	sequence. See the `--geometric=` option in linkgit:git-repack[1] for
> +	more details. Defaults to `2`.
> +

Looks good.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 5/9] builtin/maintenance: don't silently ignore invalid strategy
  2025-10-21 14:13   ` [PATCH v2 5/9] builtin/maintenance: don't silently ignore invalid strategy Patrick Steinhardt
@ 2025-10-23 21:31     ` Taylor Blau
  0 siblings, 0 replies; 69+ messages in thread
From: Taylor Blau @ 2025-10-23 21:31 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Derrick Stolee

On Tue, Oct 21, 2025 at 04:13:27PM +0200, Patrick Steinhardt wrote:
> ---
>  builtin/gc.c           | 17 +++++++++++------
>  t/t7900-maintenance.sh |  5 +++++
>  2 files changed, 16 insertions(+), 6 deletions(-)

All looks good here.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 6/9] builtin/maintenance: run maintenance tasks depending on type
  2025-10-21 14:13   ` [PATCH v2 6/9] builtin/maintenance: run maintenance tasks depending on type Patrick Steinhardt
@ 2025-10-23 21:34     ` Taylor Blau
  0 siblings, 0 replies; 69+ messages in thread
From: Taylor Blau @ 2025-10-23 21:34 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Derrick Stolee

On Tue, Oct 21, 2025 at 04:13:28PM +0200, Patrick Steinhardt wrote:
> We basically have three different ways to execute repository
> maintenance:
>
>   1. Manual maintenance via `git maintenance run`.
>
>   2. Automatic maintenance via `git maintenance run --auto`.
>
>   3. Scheduled maintenance via `git maintenance run --schedule=`.
>
> At the moment, maintenance strategies only have an effect for the last
> type of maintenance. This is about to change in subsequent commits, but
> to do so we need to be able to skip some tasks depending on how exactly
> maintenance was invoked.

Thanks for writing this down; my initial thought when reading this patch
was that we could distinguish between scheduled tasks and manual ones
based on their "schedule" field. But this makes sense: some of the
scheduled tasks might (or might not) be appropriate for manual runs, so
distinguishing as you do in this patch makes a ton of sense to me.

The rest makes sense and looks good.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 9/9] builtin/maintenance: introduce "geometric" strategy
  2025-10-21 14:13   ` [PATCH v2 9/9] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
@ 2025-10-23 21:49     ` Taylor Blau
  2025-10-24  5:45       ` Patrick Steinhardt
  0 siblings, 1 reply; 69+ messages in thread
From: Taylor Blau @ 2025-10-23 21:49 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Derrick Stolee

On Tue, Oct 21, 2025 at 04:13:31PM +0200, Patrick Steinhardt wrote:
> We have two different repacking strategies in Git:
>
>   - The "gc" strategy uses git-gc(1).
>
>   - The "incremental" strategy uses multi-pack indices and `git
>     multi-pack-index repack` to merge together smaller packfiles as
>     determined by a specific batch size.
>
> The former strategy is our old and trusted default, whereas the latter
> has historically been used for our scheduled maintenance. But both
> strategies have their shortcomings:
>
>   - The "gc" strategy performs regular all-into-one repacks. Furthermore
>     it is rather inflexible, as it is not easily possible for a user to
>     enable or disable specific subtasks.
>
>   - The "incremental" strategy is not a full replacement for the "gc"
>     strategy as it doesn't know to prune stale data.
>
> So today, we don't have a strategy that is well-suited for large repos
> while being a full replacement for the "gc" strategy.

Well put.

> Introduce a new "geometric" strategy that aims to fill this gap. This
> strategy invokes all the usual cleanup tasks that git-gc(1) does like
> pruning reflogs and rerere caches as well as stale worktrees. But where
> it differs from both the "gc" and "incremental" strategy is that it uses
> our geometric repacking infrastructure exposed by git-repack(1) to
> repack packfiles. The advantage of geometric repacking is that we only
> need to perform an all-into-one repack when the object count in a repo
> has grown significantly.
>
> One downside of this strategy is that pruning of unreferenced objects is
> not going to happen regularly anymore. Every geometric repack knows to
> soak up all loose objects regardless of their reachability, and merging
> two or more packs doesn't consider reachability, either. Consequently,
> the number of unreachable objects will grow over time.
>
> This is remedied by doing an all-into-one repack instead of a geometric
> repack whenever we determine that the geometric repack would end up
> merging all packfiles anyway. This all-into-one repack then performs our
> usual reachability checks and writes unreachable objects into a cruft
> pack. As cruft packs won't ever be merged during geometric repacks we
> can thus phase out these objects over time.
>
> Of course, this still means that we retain unreachable objects for far
> longer than with the "gc" strategy. But the maintenance strategy is
> intended especially for large repositories, where the basic assumption
> is that the set of unreachable objects will be significantly dwarfed by
> the number of reachable objects.
>
> If this assumption is ever proven to be too disadvantageous we could for
> example introduce a time-based strategy: if the largest packfile has not
> been touched for longer than $T, we perform an all-into-one repack. But
> for now, such a mechanism is deferred into the future as it is not clear
> yet whether it is needed in the first place.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  Documentation/config/maintenance.adoc |  9 +++++++++
>  builtin/gc.c                          | 19 +++++++++++++++++++
>  t/t7900-maintenance.sh                | 20 +++++++++++++++++++-
>  3 files changed, 47 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc
> index b2bacdc822..d0c38f03fa 100644
> --- a/Documentation/config/maintenance.adoc
> +++ b/Documentation/config/maintenance.adoc
> @@ -32,6 +32,15 @@ The possible strategies are:
>    strategy for scheduled maintenance.
>  * `gc`: This strategy runs the `gc` task. This is the default strategy for
>    manual maintenance.
> +* `geometric`: This strategy performs geometric repacking of packfiles and
> +  keeps auxiliary data structures up-to-date. The strategy expires data in the
> +  reflog and removes worktrees that cannot be located anymore. When the
> +  geometric repacking strategy would decide to do an all-into-one repack, then
> +  the strategy generates a cruft pack for all unreachable objects. Objects that
> +  are already part of a cruft pack will be expired.
> ++
> +This repacking strategy is a full replacement for the `gc` strategy and is
> +recommended for large repositories.

Nice. I always feel like it's tricky for changes like these to know
where the right place is to draw the line between "this belongs in the
commit message, because it will be useful to reviewers in understanding
how I came to this patch" versus "this belongs in the documentation for
my new feature, because it will be useful to users trying to figure out
what option to use".

I like the spot where you drew that line here and I think that the
patch message has details that are useful to reviewers (like the
historical differences between all-into-one strategies and geometric
repacking), but that the documentation has the right user-facing details
and explanations.


> diff --git a/builtin/gc.c b/builtin/gc.c
> index aaff0bae15..9739bb0ea2 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -1878,12 +1878,31 @@ static const struct maintenance_strategy incremental_strategy = {
>  	},
>  };
>
> +static const struct maintenance_strategy geometric_strategy = {
> +	.tasks = {
> +		[TASK_COMMIT_GRAPH].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
> +		[TASK_COMMIT_GRAPH].schedule = SCHEDULE_HOURLY,
> +		[TASK_GEOMETRIC_REPACK].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
> +		[TASK_GEOMETRIC_REPACK].schedule = SCHEDULE_DAILY,
> +		[TASK_PACK_REFS].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
> +		[TASK_PACK_REFS].schedule = SCHEDULE_DAILY,
> +		[TASK_RERERE_GC].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
> +		[TASK_RERERE_GC].schedule = SCHEDULE_WEEKLY,
> +		[TASK_REFLOG_EXPIRE].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
> +		[TASK_REFLOG_EXPIRE].schedule = SCHEDULE_WEEKLY,
> +		[TASK_WORKTREE_PRUNE].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
> +		[TASK_WORKTREE_PRUNE].schedule = SCHEDULE_WEEKLY,
> +	},
> +};
> +

What you wrote here all makes sense to me, so I don't have any comments
on the technical content of 'geometric_strategy'.

As an aside, I wonder if we should use a nested designated initializer
here? It seems a little cleaner to me than doing:

    .tasks = {
        [TASK_FOO].type = ...,
        [TASK_FOO].schedule = ...,
    }

It's inconsistent with the style of the rest of this file, so if you did
make this change I'd suggest adding a prerequisite change that modifies
existing strategies to match the new style. But you could imagine
something like the following on top:

--- 8< ---
diff --git a/builtin/gc.c b/builtin/gc.c
index 9739bb0ea2..881ef6ad88 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1880,18 +1880,30 @@ static const struct maintenance_strategy incremental_strategy = {

 static const struct maintenance_strategy geometric_strategy = {
 	.tasks = {
-		[TASK_COMMIT_GRAPH].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
-		[TASK_COMMIT_GRAPH].schedule = SCHEDULE_HOURLY,
-		[TASK_GEOMETRIC_REPACK].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
-		[TASK_GEOMETRIC_REPACK].schedule = SCHEDULE_DAILY,
-		[TASK_PACK_REFS].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
-		[TASK_PACK_REFS].schedule = SCHEDULE_DAILY,
-		[TASK_RERERE_GC].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
-		[TASK_RERERE_GC].schedule = SCHEDULE_WEEKLY,
-		[TASK_REFLOG_EXPIRE].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
-		[TASK_REFLOG_EXPIRE].schedule = SCHEDULE_WEEKLY,
-		[TASK_WORKTREE_PRUNE].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
-		[TASK_WORKTREE_PRUNE].schedule = SCHEDULE_WEEKLY,
+		[TASK_COMMIT_GRAPH] = {
+			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+			.schedule = SCHEDULE_HOURLY,
+		},
+		[TASK_GEOMETRIC_REPACK] = {
+			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+			.schedule = SCHEDULE_DAILY,
+		},
+		[TASK_PACK_REFS] = {
+			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+			.schedule = SCHEDULE_DAILY,
+		},
+		[TASK_RERERE_GC] = {
+			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+			.schedule = SCHEDULE_WEEKLY,
+		},
+		[TASK_REFLOG_EXPIRE] = {
+			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+			.schedule = SCHEDULE_WEEKLY,
+		},
+		[TASK_WORKTREE_PRUNE] = {
+			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+			.schedule = SCHEDULE_WEEKLY,
+		},
 	},
 };

--- >8 ---

The rest all looks good to me.

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 0/9] builtin/maintenance: introduce "geometric" strategy
  2025-10-23 16:48   ` [PATCH v2 0/9] " Junio C Hamano
@ 2025-10-23 21:50     ` Taylor Blau
  0 siblings, 0 replies; 69+ messages in thread
From: Taylor Blau @ 2025-10-23 21:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Patrick Steinhardt, git, Derrick Stolee

On Thu, Oct 23, 2025 at 09:48:18AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
>
> > The series is built on top of b660e2dcb9 (Sync with 'maint', 2025-10-14)
> > with tb/incremental-midx-part-3.1 at c886af90f8 (SQUASH??? play well
> > with other topics by preemptively including "repository.h", 2025-09-29)
> > merged into it.
> >
> > Changes in v2:
> >   - Make the geometric factor configurable via
> >     "maintenance.geometric-repack.splitFactor".
> >   - Wrap some overly long lines in our tests.
> >   - Link to v1: https://lore.kernel.org/r/20251016-pks-maintenance-geometric-strategy-v1-0-18943d474203@pks.im
> >
> > Thanks!
>
> This round looks good to me (I wasn't very careful picking typos and
> minor mistakes, but the resulting code overall looked sound).

Yeah, I am happy with this round as well. I left a few thoughts
throughout, but none of them are blockers from my perspective.

Thanks for working on this, Patrick!

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 3/8] builtin/maintenance: introduce "geometric-repack" task
  2025-10-23 19:19       ` Taylor Blau
@ 2025-10-24  5:44         ` Patrick Steinhardt
  0 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-24  5:44 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Derrick Stolee

On Thu, Oct 23, 2025 at 03:19:47PM -0400, Taylor Blau wrote:
> On Tue, Oct 21, 2025 at 03:00:31PM +0200, Patrick Steinhardt wrote:
> > > > +	# Verify that the number of packfiles matches our expectation.
> > > > +	ls -l .git/objects/pack/*.pack >packfiles &&
> > > > +	test_line_count = "$EXPECTED_PACKS" packfiles &&
> > > > +
> > > > +	# And verify that there are no loose objects anymore.
> > > > +	cat >expect <<-\EOF &&
> > > > +	info
> > > > +	pack
> > > > +	EOF
> > > > +	ls .git/objects >actual &&
> > >
> > > I wonder if there is an easier way to check for loose objects here that
> > > doesn't require you to know that the "info" and "pack" directories
> > > exist. Perhaps something like:
> > >
> > > test_stdout_line_count = 0 find .git/objects/?? -type f
> > >
> > > , or even
> > >
> > >     find .git/objects/?? -type f >loose.objs &&
> > >     test_must_be_empty loose.objs
> >
> > This doesn't work though in case there is not even a single sharding
> > directory:
> >
> >     find: '.git/objects/??': No such file or directory
> >
> > I didn't really have any other idea for now to do this.
> 
> Mmm, good point. What about using 'git count-objects -v' directly?
> 
>     test_loose_object_nr() {
>       local nr="$1" &&
>       git count-objects -v >count &&
>       grep '^count $nr$" count
>     }

I guess that works. We can even simplify this case as we really only
want to check that there are no loose objects at all:

    git count-objects -v >count &&
    test_grep '^count: 0$' count

Thanks!

Patrick

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 3/9] builtin/maintenance: introduce "geometric-repack" task
  2025-10-23 19:29     ` Taylor Blau
@ 2025-10-24  5:45       ` Patrick Steinhardt
  0 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-24  5:45 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Derrick Stolee

On Thu, Oct 23, 2025 at 03:29:08PM -0400, Taylor Blau wrote:
> On Tue, Oct 21, 2025 at 04:13:25PM +0200, Patrick Steinhardt wrote:
> > +test_expect_success 'geometric repacking task' '
> > +	test_when_finished "rm -rf repo" &&
> > +	git init repo &&
> > +	(
> > +		cd repo &&
> > +		git config set maintenance.auto false &&
> > +		test_commit initial &&
> > +
> > +		# The initial repack causes an all-into-one repack.
> > +		GIT_TRACE2_EVENT="$(pwd)/initial-repack.txt" \
> > +			git maintenance run --task=geometric-repack 2>/dev/null &&
> > +		test_subcommand git repack -d -l --cruft --cruft-expiration=2.weeks.ago --quiet --write-midx <initial-repack.txt &&
> 
> Not a show-stopper of course, but I thought from the cover letter that
> these lines would have gotten wrapped. Whether or not we have lines
> longer than 80 characters is not a hill that I'd like to die on, of
> course ;-). But I brought it up because I am wondering if there were
> some changes that you meant to include as a part of this round that got
> dropped in the shuffle.

Nah, I guess I merely didn't do my due diligence to also wrap other
overly long lines. Let me fix those.

Patrick

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 4/9] builtin/maintenance: make the geometric factor configurable
  2025-10-23 19:33     ` Taylor Blau
@ 2025-10-24  5:45       ` Patrick Steinhardt
  2025-10-24 19:02         ` Taylor Blau
  0 siblings, 1 reply; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-24  5:45 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Derrick Stolee

On Thu, Oct 23, 2025 at 03:33:47PM -0400, Taylor Blau wrote:
> On Tue, Oct 21, 2025 at 04:13:26PM +0200, Patrick Steinhardt wrote:
> > The geometric repacking task uses a factor of two for its geometric
> > sequence, meaning that each next pack must contain at least twice as
> > many objects as the next-smaller one. In some cases it may be helpful to
> > configure this factor though to reduce the number of packfile merges
> > even further, e.g. in very big repositories. But while git-repack(1)
> > itself supports doing this, the maintenance task does not give us a way
> > to tune it.
> >
> > Introduce a new "maintenance.geometric-repack.splitFactor" configuration
> > to plug this gap.
> 
> Interesting, this wasn't exactly what I had in my mind when reading the
> last round, but I think this is worth doing on its own. My apologies for
> being ambiguous in my earlier message :-s.
> 
> I was suggesting that we have a repack.geometricFactor configuration
> variable that defaulted to two, could be overridden by --geometric=<n>,
> such that we could start doing "git repack --geometric" without having
> to write "=2" every time.
> 
> I think that that is probably still a useful thing to do in and of
> itself, but this change doesn't preclude our ability to do that, since
> it just overwrites what we pass in to 'git repack' when calling it from
> within the maintenance context.

Yeah, I understood that suggestion, but I still think that in the
context of this series here it makes more sense to piggy back onto
git-maintenance(1) itself so that we're in line with the other tasks
that we have. All of them are configurable via "maintenance.*.foobar"
knobs, so I wanted to have the same architecture for the geometric task,
as well.

But as you say, this doesn't mean that we cannot introduce a config for
git-repack(1) at a later point in time, and I also believe that this may
be a useful addition indeed. I guess the order of precedence would be
that "repack.geometricFactor" is overridden by
"maintenance.geometric-repack.splitFactor", as the latter is more
specific.

Thanks!

Patrick

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 9/9] builtin/maintenance: introduce "geometric" strategy
  2025-10-23 21:49     ` Taylor Blau
@ 2025-10-24  5:45       ` Patrick Steinhardt
  0 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-24  5:45 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Derrick Stolee

On Thu, Oct 23, 2025 at 05:49:35PM -0400, Taylor Blau wrote:
> On Tue, Oct 21, 2025 at 04:13:31PM +0200, Patrick Steinhardt wrote:
> > diff --git a/builtin/gc.c b/builtin/gc.c
> > index aaff0bae15..9739bb0ea2 100644
> > --- a/builtin/gc.c
> > +++ b/builtin/gc.c
> > @@ -1878,12 +1878,31 @@ static const struct maintenance_strategy incremental_strategy = {
> >  	},
> >  };
> >
> > +static const struct maintenance_strategy geometric_strategy = {
> > +	.tasks = {
> > +		[TASK_COMMIT_GRAPH].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
> > +		[TASK_COMMIT_GRAPH].schedule = SCHEDULE_HOURLY,
> > +		[TASK_GEOMETRIC_REPACK].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
> > +		[TASK_GEOMETRIC_REPACK].schedule = SCHEDULE_DAILY,
> > +		[TASK_PACK_REFS].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
> > +		[TASK_PACK_REFS].schedule = SCHEDULE_DAILY,
> > +		[TASK_RERERE_GC].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
> > +		[TASK_RERERE_GC].schedule = SCHEDULE_WEEKLY,
> > +		[TASK_REFLOG_EXPIRE].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
> > +		[TASK_REFLOG_EXPIRE].schedule = SCHEDULE_WEEKLY,
> > +		[TASK_WORKTREE_PRUNE].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
> > +		[TASK_WORKTREE_PRUNE].schedule = SCHEDULE_WEEKLY,
> > +	},
> > +};
> > +
> 
> What you wrote here all makes sense to me, so I don't have any comments
> on the technical content of 'geometric_strategy'.
> 
> As an aside, I wonder if we should use a nested designated initializer
> here? It seems a little cleaner to me than doing:
> 
>     .tasks = {
>         [TASK_FOO].type = ...,
>         [TASK_FOO].schedule = ...,
>     }
> 
> It's inconsistent with the style of the rest of this file, so if you did
> make this change I'd suggest adding a prerequisite change that modifies
> existing strategies to match the new style. But you could imagine
> something like the following on top:

It's more verbose, but it indeed reads a lot nicer. I'll take your
suggestion, thanks!

Patrick

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v3 00/10] builtin/maintenance: introduce "geometric" strategy
  2025-10-16  7:26 [PATCH 0/8] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
                   ` (8 preceding siblings ...)
  2025-10-21 14:13 ` [PATCH v2 0/9] " Patrick Steinhardt
@ 2025-10-24  6:57 ` Patrick Steinhardt
  2025-10-24  6:57   ` [PATCH v3 01/10] builtin/gc: remove global `repack` variable Patrick Steinhardt
                     ` (10 more replies)
  2025-10-27  8:30 ` [PATCH v4 " Patrick Steinhardt
  10 siblings, 11 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-24  6:57 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau, Justin Tobler, Junio C Hamano

Hi,

by default, git-maintenance(1) uses git-gc(1) to perform repository
housekeeping. This tool has a couple of shortcomings, most importantly
that it regularly does all-into-one repacks. This doesn't really work
all that well in the context of monorepos, where you really want to
avoid repacking all objects regularly.

An alternative maintenance strategy is the "incremental" strategy, but
this strategy has two downsides:

  - Strategies in general only apply to scheduled maintenance. So if you
    run git-maintenance(1), you still end up with git-gc(1).

  - The strategy is designed to not ever delete any data, but a full
    replacment for git-gc(1) needs to also prune reflogs, rereree caches
    and vanished worktrees.

This patch series aims to fix both of these issues.

First, the series introduces a new "geometric" maintenance task, which
makes use of geometric repacking as exposed by git-repack(1) in the
general case. In the case where a geometric repack ends up merging all
packfiles into one we instead do an all-into-one repack with cruft packs
so that we can still phase out objects over time.

Second, the series extends maintenance strategies to also cover normal
maintenance. If the user has configured the "geometric" strategy, we'll
thus use it for both manual and scheduled maintenance. For backwards
compatibility, the "incremental" strategy is changed so that it uses
git-gc(1) for manual maintenance and the other tasks for scheduled
maintenance.

The series is built on top of b660e2dcb9 (Sync with 'maint', 2025-10-14)
with tb/incremental-midx-part-3.1 at c886af90f8 (SQUASH??? play well
with other topics by preemptively including "repository.h", 2025-09-29)
merged into it.

Changes in v3:
  - More line wrapping.
  - Improve readability of maintenance strategies by using nested
    designated initializers.
  - Use git-count-object(1) to count loose objects.
  - Link to v2: https://lore.kernel.org/r/20251021-pks-maintenance-geometric-strategy-v2-0-f0d727832b80@pks.im

Changes in v2:
  - Make the geometric factor configurable via
    "maintenance.geometric-repack.splitFactor".
  - Wrap some overly long lines in our tests.
  - Link to v1: https://lore.kernel.org/r/20251016-pks-maintenance-geometric-strategy-v1-0-18943d474203@pks.im

Thanks!

Patrick

---
Patrick Steinhardt (10):
      builtin/gc: remove global `repack` variable
      builtin/gc: make `too_many_loose_objects()` reusable without GC config
      builtin/maintenance: introduce "geometric-repack" task
      builtin/maintenance: make the geometric factor configurable
      builtin/maintenance: don't silently ignore invalid strategy
      builtin/maintenance: improve readability of strategies
      builtin/maintenance: run maintenance tasks depending on type
      builtin/maintenance: extend "maintenance.strategy" to manual maintenance
      builtin/maintenance: make "gc" strategy accessible
      builtin/maintenance: introduce "geometric" strategy

 Documentation/config/maintenance.adoc |  49 +++++-
 builtin/gc.c                          | 313 ++++++++++++++++++++++++++++------
 t/t7900-maintenance.sh                | 245 ++++++++++++++++++++++++++
 3 files changed, 544 insertions(+), 63 deletions(-)

Range-diff versus v2:

 1:  b853ba54dca =  1:  c35408a33d0 builtin/gc: remove global `repack` variable
 2:  9bbdfe1b9e5 =  2:  be572fe1542 builtin/gc: make `too_many_loose_objects()` reusable without GC config
 3:  bcd82ad038e !  3:  5290f6d3e0f builtin/maintenance: introduce "geometric-repack" task
    @@ t/t7900-maintenance.sh: test_expect_success 'maintenance.incremental-repack.auto
     +	test_line_count = "$EXPECTED_PACKS" packfiles &&
     +
     +	# And verify that there are no loose objects anymore.
    -+	cat >expect <<-\EOF &&
    -+	info
    -+	pack
    -+	EOF
    -+	ls .git/objects >actual &&
    -+	test_cmp expect actual
    ++	git count-objects -v >count &&
    ++	test_grep '^count: 0$' count
     +}
     +
     +test_expect_success 'geometric repacking task' '
    @@ t/t7900-maintenance.sh: test_expect_success 'maintenance.incremental-repack.auto
     +		# The initial repack causes an all-into-one repack.
     +		GIT_TRACE2_EVENT="$(pwd)/initial-repack.txt" \
     +			git maintenance run --task=geometric-repack 2>/dev/null &&
    -+		test_subcommand git repack -d -l --cruft --cruft-expiration=2.weeks.ago --quiet --write-midx <initial-repack.txt &&
    ++		test_subcommand git repack -d -l --cruft --cruft-expiration=2.weeks.ago \
    ++			--quiet --write-midx <initial-repack.txt &&
     +
     +		# Repacking should now cause a no-op geometric repack because
     +		# no packfiles need to be combined.
    @@ t/t7900-maintenance.sh: test_expect_success 'maintenance.incremental-repack.auto
     +		# an all-into-one-repack.
     +		GIT_TRACE2_EVENT="$(pwd)/all-into-one-repack.txt" \
     +			git maintenance run --task=geometric-repack 2>/dev/null &&
    -+		test_subcommand git repack -d -l --cruft --cruft-expiration=2.weeks.ago --quiet --write-midx <all-into-one-repack.txt &&
    ++		test_subcommand git repack -d -l --cruft --cruft-expiration=2.weeks.ago \
    ++			--quiet --write-midx <all-into-one-repack.txt &&
     +
     +		# The geometric repack soaks up unreachable objects.
     +		echo blob-1 | git hash-object -w --stdin -t blob &&
    @@ t/t7900-maintenance.sh: test_expect_success 'maintenance.incremental-repack.auto
     +		run_and_verify_geometric_pack 3 &&
     +		GIT_TRACE2_EVENT="$(pwd)/cruft-repack.txt" \
     +			git maintenance run --task=geometric-repack 2>/dev/null &&
    -+		test_subcommand git repack -d -l --cruft --cruft-expiration=2.weeks.ago --quiet --write-midx <cruft-repack.txt &&
    ++		test_subcommand git repack -d -l --cruft --cruft-expiration=2.weeks.ago \
    ++			--quiet --write-midx <cruft-repack.txt &&
     +		ls .git/objects/pack/*.pack >packs &&
     +		test_line_count = 2 packs &&
     +		ls .git/objects/pack/*.mtimes >cruft &&
 4:  cb10031cc7c =  4:  7f2067fa4ec builtin/maintenance: make the geometric factor configurable
 5:  7e8f83d4753 =  5:  7a76003215e builtin/maintenance: don't silently ignore invalid strategy
 -:  ----------- >  6:  a6383d121b2 builtin/maintenance: improve readability of strategies
 6:  4217c37c0bf !  7:  e25c878a3ff builtin/maintenance: run maintenance tasks depending on type
    @@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts
      		enum schedule_priority schedule;
      	} tasks[TASK__COUNT];
      };
    - 
    - static const struct maintenance_strategy none_strategy = { 0 };
    -+
    +@@ builtin/gc.c: static const struct maintenance_strategy none_strategy = { 0 };
      static const struct maintenance_strategy default_strategy = {
      	.tasks = {
    --		[TASK_GC].enabled = 1,
    -+		[TASK_GC].type = MAINTENANCE_TYPE_MANUAL,
    + 		[TASK_GC] = {
    +-			.enabled = 1,
    ++			.type = MAINTENANCE_TYPE_MANUAL,
    + 		},
      	},
      };
    -+
    +@@ builtin/gc.c: static const struct maintenance_strategy default_strategy = {
      static const struct maintenance_strategy incremental_strategy = {
      	.tasks = {
    --		[TASK_COMMIT_GRAPH].enabled = 1,
    -+		[TASK_COMMIT_GRAPH].type = MAINTENANCE_TYPE_SCHEDULED,
    - 		[TASK_COMMIT_GRAPH].schedule = SCHEDULE_HOURLY,
    --		[TASK_PREFETCH].enabled = 1,
    -+		[TASK_PREFETCH].type = MAINTENANCE_TYPE_SCHEDULED,
    - 		[TASK_PREFETCH].schedule = SCHEDULE_HOURLY,
    --		[TASK_INCREMENTAL_REPACK].enabled = 1,
    -+		[TASK_INCREMENTAL_REPACK].type = MAINTENANCE_TYPE_SCHEDULED,
    - 		[TASK_INCREMENTAL_REPACK].schedule = SCHEDULE_DAILY,
    --		[TASK_LOOSE_OBJECTS].enabled = 1,
    -+		[TASK_LOOSE_OBJECTS].type = MAINTENANCE_TYPE_SCHEDULED,
    - 		[TASK_LOOSE_OBJECTS].schedule = SCHEDULE_DAILY,
    --		[TASK_PACK_REFS].enabled = 1,
    -+		[TASK_PACK_REFS].type = MAINTENANCE_TYPE_SCHEDULED,
    - 		[TASK_PACK_REFS].schedule = SCHEDULE_WEEKLY,
    + 		[TASK_COMMIT_GRAPH] = {
    +-			.enabled = 1,
    ++			.type = MAINTENANCE_TYPE_SCHEDULED,
    + 			.schedule = SCHEDULE_HOURLY,
    + 		},
    + 		[TASK_PREFETCH] = {
    +-			.enabled = 1,
    ++			.type = MAINTENANCE_TYPE_SCHEDULED,
    + 			.schedule = SCHEDULE_HOURLY,
    + 		},
    + 		[TASK_INCREMENTAL_REPACK] = {
    +-			.enabled = 1,
    ++			.type = MAINTENANCE_TYPE_SCHEDULED,
    + 			.schedule = SCHEDULE_DAILY,
    + 		},
    + 		[TASK_LOOSE_OBJECTS] = {
    +-			.enabled = 1,
    ++			.type = MAINTENANCE_TYPE_SCHEDULED,
    + 			.schedule = SCHEDULE_DAILY,
    + 		},
    + 		[TASK_PACK_REFS] = {
    +-			.enabled = 1,
    ++			.type = MAINTENANCE_TYPE_SCHEDULED,
    + 			.schedule = SCHEDULE_WEEKLY,
    + 		},
      	},
    - };
     @@ builtin/gc.c: static void initialize_task_config(struct maintenance_run_opts *opts,
      {
      	struct strbuf config_name = STRBUF_INIT;
 7:  422b16a62a2 !  8:  ba147c3bf33 builtin/maintenance: extend "maintenance.strategy" to manual maintenance
    @@ Documentation/config/maintenance.adoc: detach.
     
      ## builtin/gc.c ##
     @@ builtin/gc.c: static const struct maintenance_strategy incremental_strategy = {
    - 		[TASK_LOOSE_OBJECTS].schedule = SCHEDULE_DAILY,
    - 		[TASK_PACK_REFS].type = MAINTENANCE_TYPE_SCHEDULED,
    - 		[TASK_PACK_REFS].schedule = SCHEDULE_WEEKLY,
    -+
    + 			.type = MAINTENANCE_TYPE_SCHEDULED,
    + 			.schedule = SCHEDULE_WEEKLY,
    + 		},
     +		/*
     +		 * Historically, the "incremental" strategy was only available
     +		 * in the context of scheduled maintenance when set up via
    @@ builtin/gc.c: static const struct maintenance_strategy incremental_strategy = {
     +		 * requested. This is the same as the default strategy, which
     +		 * would have been in use beforehand.
     +		 */
    -+		[TASK_GC].type = MAINTENANCE_TYPE_MANUAL,
    ++		[TASK_GC] = {
    ++			.type = MAINTENANCE_TYPE_MANUAL,
    ++		},
      	},
      };
      
 8:  07f5b32a22e !  9:  eebfab4acda builtin/maintenance: make "gc" strategy accessible
    @@ builtin/gc.c: struct maintenance_strategy {
     -static const struct maintenance_strategy default_strategy = {
     +static const struct maintenance_strategy gc_strategy = {
      	.tasks = {
    --		[TASK_GC].type = MAINTENANCE_TYPE_MANUAL,
    -+		[TASK_GC].type = MAINTENANCE_TYPE_MANUAL | MAINTENANCE_TYPE_SCHEDULED,
    -+		[TASK_GC].schedule = SCHEDULE_DAILY,
    + 		[TASK_GC] = {
    +-			.type = MAINTENANCE_TYPE_MANUAL,
    ++			.type = MAINTENANCE_TYPE_MANUAL | MAINTENANCE_TYPE_SCHEDULED,
    ++			.schedule = SCHEDULE_DAILY,
    + 		},
      	},
      };
    - 
     @@ builtin/gc.c: static struct maintenance_strategy parse_maintenance_strategy(const char *name)
      {
      	if (!strcasecmp(name, "incremental"))
 9:  c597ae7f94a ! 10:  936358736f3 builtin/maintenance: introduce "geometric" strategy
    @@ builtin/gc.c: static const struct maintenance_strategy incremental_strategy = {
      
     +static const struct maintenance_strategy geometric_strategy = {
     +	.tasks = {
    -+		[TASK_COMMIT_GRAPH].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
    -+		[TASK_COMMIT_GRAPH].schedule = SCHEDULE_HOURLY,
    -+		[TASK_GEOMETRIC_REPACK].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
    -+		[TASK_GEOMETRIC_REPACK].schedule = SCHEDULE_DAILY,
    -+		[TASK_PACK_REFS].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
    -+		[TASK_PACK_REFS].schedule = SCHEDULE_DAILY,
    -+		[TASK_RERERE_GC].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
    -+		[TASK_RERERE_GC].schedule = SCHEDULE_WEEKLY,
    -+		[TASK_REFLOG_EXPIRE].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
    -+		[TASK_REFLOG_EXPIRE].schedule = SCHEDULE_WEEKLY,
    -+		[TASK_WORKTREE_PRUNE].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
    -+		[TASK_WORKTREE_PRUNE].schedule = SCHEDULE_WEEKLY,
    ++		[TASK_COMMIT_GRAPH] = {
    ++			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
    ++			.schedule = SCHEDULE_HOURLY,
    ++		},
    ++		[TASK_GEOMETRIC_REPACK] = {
    ++			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
    ++			.schedule = SCHEDULE_DAILY,
    ++		},
    ++		[TASK_PACK_REFS] = {
    ++			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
    ++			.schedule = SCHEDULE_DAILY,
    ++		},
    ++		[TASK_RERERE_GC] = {
    ++			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
    ++			.schedule = SCHEDULE_WEEKLY,
    ++		},
    ++		[TASK_REFLOG_EXPIRE] = {
    ++			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
    ++			.schedule = SCHEDULE_WEEKLY,
    ++		},
    ++		[TASK_WORKTREE_PRUNE] = {
    ++			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
    ++			.schedule = SCHEDULE_WEEKLY,
    ++		},
     +	},
     +};
     +

---
base-commit: 0bb2c786c2349dd6700727153c13d81cbfb41710
change-id: 20251015-pks-maintenance-geometric-strategy-580c58581b01


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v3 01/10] builtin/gc: remove global `repack` variable
  2025-10-24  6:57 ` [PATCH v3 00/10] " Patrick Steinhardt
@ 2025-10-24  6:57   ` Patrick Steinhardt
  2025-10-24  6:57   ` [PATCH v3 02/10] builtin/gc: make `too_many_loose_objects()` reusable without GC config Patrick Steinhardt
                     ` (9 subsequent siblings)
  10 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-24  6:57 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau, Justin Tobler, Junio C Hamano

The global `repack` variable is used to store all command line arguments
that we eventually want to pass to git-repack(1). It is being appended
to from multiple different functions, which makes it hard to follow the
logic. Besides being hard to follow, it also makes it unnecessarily hard
to reuse this infrastructure in new code.

Refactor the code so that we store this variable on the stack and pass
a pointer to it around as needed. This is done so that we can reuse
`add_repack_all_options()` in a subsequent commit.

The refactoring itself is straight-forward. One function that deserves
attention though is `need_to_gc()`: this function determines whether or
not we need to execute garbage collection for `git gc --auto`, but also
for `git maintenance run --auto`. But besides figuring out whether we
have to perform GC, the function also sets up the `repack` arguments.

For `git gc --auto` it's trivial to adapt, as we already have the
on-stack variable at our fingertips. But for the maintenance condition
it's less obvious what to do.

As it turns out, we can just use another temporary variable there that
we then immediately discard. If we need to perform GC we execute a child
git-gc(1) process to repack objects for us, and that process will have
to recompute the arguments anyway.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/gc.c | 74 ++++++++++++++++++++++++++++++++++++------------------------
 1 file changed, 45 insertions(+), 29 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index e19e13d9788..e9772eb3a30 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -55,7 +55,6 @@ static const char * const builtin_gc_usage[] = {
 };
 
 static timestamp_t gc_log_expire_time;
-static struct strvec repack = STRVEC_INIT;
 static struct tempfile *pidfile;
 static struct lock_file log_lock;
 static struct string_list pack_garbage = STRING_LIST_INIT_DUP;
@@ -618,48 +617,50 @@ static uint64_t estimate_repack_memory(struct gc_config *cfg,
 	return os_cache + heap;
 }
 
-static int keep_one_pack(struct string_list_item *item, void *data UNUSED)
+static int keep_one_pack(struct string_list_item *item, void *data)
 {
-	strvec_pushf(&repack, "--keep-pack=%s", basename(item->string));
+	struct strvec *args = data;
+	strvec_pushf(args, "--keep-pack=%s", basename(item->string));
 	return 0;
 }
 
 static void add_repack_all_option(struct gc_config *cfg,
-				  struct string_list *keep_pack)
+				  struct string_list *keep_pack,
+				  struct strvec *args)
 {
 	if (cfg->prune_expire && !strcmp(cfg->prune_expire, "now")
 		&& !(cfg->cruft_packs && cfg->repack_expire_to))
-		strvec_push(&repack, "-a");
+		strvec_push(args, "-a");
 	else if (cfg->cruft_packs) {
-		strvec_push(&repack, "--cruft");
+		strvec_push(args, "--cruft");
 		if (cfg->prune_expire)
-			strvec_pushf(&repack, "--cruft-expiration=%s", cfg->prune_expire);
+			strvec_pushf(args, "--cruft-expiration=%s", cfg->prune_expire);
 		if (cfg->max_cruft_size)
-			strvec_pushf(&repack, "--max-cruft-size=%lu",
+			strvec_pushf(args, "--max-cruft-size=%lu",
 				     cfg->max_cruft_size);
 		if (cfg->repack_expire_to)
-			strvec_pushf(&repack, "--expire-to=%s", cfg->repack_expire_to);
+			strvec_pushf(args, "--expire-to=%s", cfg->repack_expire_to);
 	} else {
-		strvec_push(&repack, "-A");
+		strvec_push(args, "-A");
 		if (cfg->prune_expire)
-			strvec_pushf(&repack, "--unpack-unreachable=%s", cfg->prune_expire);
+			strvec_pushf(args, "--unpack-unreachable=%s", cfg->prune_expire);
 	}
 
 	if (keep_pack)
-		for_each_string_list(keep_pack, keep_one_pack, NULL);
+		for_each_string_list(keep_pack, keep_one_pack, args);
 
 	if (cfg->repack_filter && *cfg->repack_filter)
-		strvec_pushf(&repack, "--filter=%s", cfg->repack_filter);
+		strvec_pushf(args, "--filter=%s", cfg->repack_filter);
 	if (cfg->repack_filter_to && *cfg->repack_filter_to)
-		strvec_pushf(&repack, "--filter-to=%s", cfg->repack_filter_to);
+		strvec_pushf(args, "--filter-to=%s", cfg->repack_filter_to);
 }
 
-static void add_repack_incremental_option(void)
+static void add_repack_incremental_option(struct strvec *args)
 {
-	strvec_push(&repack, "--no-write-bitmap-index");
+	strvec_push(args, "--no-write-bitmap-index");
 }
 
-static int need_to_gc(struct gc_config *cfg)
+static int need_to_gc(struct gc_config *cfg, struct strvec *repack_args)
 {
 	/*
 	 * Setting gc.auto to 0 or negative can disable the
@@ -700,10 +701,10 @@ static int need_to_gc(struct gc_config *cfg)
 				string_list_clear(&keep_pack, 0);
 		}
 
-		add_repack_all_option(cfg, &keep_pack);
+		add_repack_all_option(cfg, &keep_pack, repack_args);
 		string_list_clear(&keep_pack, 0);
 	} else if (too_many_loose_objects(cfg))
-		add_repack_incremental_option();
+		add_repack_incremental_option(repack_args);
 	else
 		return 0;
 
@@ -852,6 +853,7 @@ int cmd_gc(int argc,
 	int keep_largest_pack = -1;
 	int skip_foreground_tasks = 0;
 	timestamp_t dummy;
+	struct strvec repack_args = STRVEC_INIT;
 	struct maintenance_run_opts opts = MAINTENANCE_RUN_OPTS_INIT;
 	struct gc_config cfg = GC_CONFIG_INIT;
 	const char *prune_expire_sentinel = "sentinel";
@@ -891,7 +893,7 @@ int cmd_gc(int argc,
 	show_usage_with_options_if_asked(argc, argv,
 					 builtin_gc_usage, builtin_gc_options);
 
-	strvec_pushl(&repack, "repack", "-d", "-l", NULL);
+	strvec_pushl(&repack_args, "repack", "-d", "-l", NULL);
 
 	gc_config(&cfg);
 
@@ -914,14 +916,14 @@ int cmd_gc(int argc,
 		die(_("failed to parse prune expiry value %s"), cfg.prune_expire);
 
 	if (aggressive) {
-		strvec_push(&repack, "-f");
+		strvec_push(&repack_args, "-f");
 		if (cfg.aggressive_depth > 0)
-			strvec_pushf(&repack, "--depth=%d", cfg.aggressive_depth);
+			strvec_pushf(&repack_args, "--depth=%d", cfg.aggressive_depth);
 		if (cfg.aggressive_window > 0)
-			strvec_pushf(&repack, "--window=%d", cfg.aggressive_window);
+			strvec_pushf(&repack_args, "--window=%d", cfg.aggressive_window);
 	}
 	if (opts.quiet)
-		strvec_push(&repack, "-q");
+		strvec_push(&repack_args, "-q");
 
 	if (opts.auto_flag) {
 		if (cfg.detach_auto && opts.detach < 0)
@@ -930,7 +932,7 @@ int cmd_gc(int argc,
 		/*
 		 * Auto-gc should be least intrusive as possible.
 		 */
-		if (!need_to_gc(&cfg)) {
+		if (!need_to_gc(&cfg, &repack_args)) {
 			ret = 0;
 			goto out;
 		}
@@ -952,7 +954,7 @@ int cmd_gc(int argc,
 			find_base_packs(&keep_pack, cfg.big_pack_threshold);
 		}
 
-		add_repack_all_option(&cfg, &keep_pack);
+		add_repack_all_option(&cfg, &keep_pack, &repack_args);
 		string_list_clear(&keep_pack, 0);
 	}
 
@@ -1014,9 +1016,9 @@ int cmd_gc(int argc,
 
 		repack_cmd.git_cmd = 1;
 		repack_cmd.close_object_store = 1;
-		strvec_pushv(&repack_cmd.args, repack.v);
+		strvec_pushv(&repack_cmd.args, repack_args.v);
 		if (run_command(&repack_cmd))
-			die(FAILED_RUN, repack.v[0]);
+			die(FAILED_RUN, repack_args.v[0]);
 
 		if (cfg.prune_expire) {
 			struct child_process prune_cmd = CHILD_PROCESS_INIT;
@@ -1067,6 +1069,7 @@ int cmd_gc(int argc,
 
 out:
 	maintenance_run_opts_release(&opts);
+	strvec_clear(&repack_args);
 	gc_config_release(&cfg);
 	return 0;
 }
@@ -1269,6 +1272,19 @@ static int maintenance_task_gc_background(struct maintenance_run_opts *opts,
 	return run_command(&child);
 }
 
+static int gc_condition(struct gc_config *cfg)
+{
+	/*
+	 * Note that it's fine to drop the repack arguments here, as we execute
+	 * git-gc(1) as a separate child process anyway. So it knows to compute
+	 * these arguments again.
+	 */
+	struct strvec repack_args = STRVEC_INIT;
+	int ret = need_to_gc(cfg, &repack_args);
+	strvec_clear(&repack_args);
+	return ret;
+}
+
 static int prune_packed(struct maintenance_run_opts *opts)
 {
 	struct child_process child = CHILD_PROCESS_INIT;
@@ -1596,7 +1612,7 @@ static const struct maintenance_task tasks[] = {
 		.name = "gc",
 		.foreground = maintenance_task_gc_foreground,
 		.background = maintenance_task_gc_background,
-		.auto_condition = need_to_gc,
+		.auto_condition = gc_condition,
 	},
 	[TASK_COMMIT_GRAPH] = {
 		.name = "commit-graph",

-- 
2.51.1.930.gacf6e81ea2.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 02/10] builtin/gc: make `too_many_loose_objects()` reusable without GC config
  2025-10-24  6:57 ` [PATCH v3 00/10] " Patrick Steinhardt
  2025-10-24  6:57   ` [PATCH v3 01/10] builtin/gc: remove global `repack` variable Patrick Steinhardt
@ 2025-10-24  6:57   ` Patrick Steinhardt
  2025-10-24  6:57   ` [PATCH v3 03/10] builtin/maintenance: introduce "geometric-repack" task Patrick Steinhardt
                     ` (8 subsequent siblings)
  10 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-24  6:57 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau, Justin Tobler, Junio C Hamano

To decide whether or not a repository needs to be repacked we estimate
the number of loose objects. If the number exceeds a certain threshold
we perform the repack, otherwise we don't.

This is done via `too_many_loose_objects()`, which takes as parameter
the `struct gc_config`. This configuration is only used to determine the
threshold. In a subsequent commit we'll add another caller of this
function that wants to pass a different limit than the one stored in
that structure.

Refactor the function accordingly so that we only take the limit as
parameter instead of the whole structure.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/gc.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index e9772eb3a30..026d3a1d714 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -447,7 +447,7 @@ static int rerere_gc_condition(struct gc_config *cfg UNUSED)
 	return should_gc;
 }
 
-static int too_many_loose_objects(struct gc_config *cfg)
+static int too_many_loose_objects(int limit)
 {
 	/*
 	 * Quickly check if a "gc" is needed, by estimating how
@@ -469,7 +469,7 @@ static int too_many_loose_objects(struct gc_config *cfg)
 	if (!dir)
 		return 0;
 
-	auto_threshold = DIV_ROUND_UP(cfg->gc_auto_threshold, 256);
+	auto_threshold = DIV_ROUND_UP(limit, 256);
 	while ((ent = readdir(dir)) != NULL) {
 		if (strspn(ent->d_name, "0123456789abcdef") != hexsz_loose ||
 		    ent->d_name[hexsz_loose] != '\0')
@@ -703,7 +703,7 @@ static int need_to_gc(struct gc_config *cfg, struct strvec *repack_args)
 
 		add_repack_all_option(cfg, &keep_pack, repack_args);
 		string_list_clear(&keep_pack, 0);
-	} else if (too_many_loose_objects(cfg))
+	} else if (too_many_loose_objects(cfg->gc_auto_threshold))
 		add_repack_incremental_option(repack_args);
 	else
 		return 0;
@@ -1057,7 +1057,7 @@ int cmd_gc(int argc,
 					     !opts.quiet && !daemonized ? COMMIT_GRAPH_WRITE_PROGRESS : 0,
 					     NULL);
 
-	if (opts.auto_flag && too_many_loose_objects(&cfg))
+	if (opts.auto_flag && too_many_loose_objects(cfg.gc_auto_threshold))
 		warning(_("There are too many unreachable loose objects; "
 			"run 'git prune' to remove them."));
 

-- 
2.51.1.930.gacf6e81ea2.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 03/10] builtin/maintenance: introduce "geometric-repack" task
  2025-10-24  6:57 ` [PATCH v3 00/10] " Patrick Steinhardt
  2025-10-24  6:57   ` [PATCH v3 01/10] builtin/gc: remove global `repack` variable Patrick Steinhardt
  2025-10-24  6:57   ` [PATCH v3 02/10] builtin/gc: make `too_many_loose_objects()` reusable without GC config Patrick Steinhardt
@ 2025-10-24  6:57   ` Patrick Steinhardt
  2025-10-25 19:15     ` Jeff King
  2025-10-24  6:57   ` [PATCH v3 04/10] builtin/maintenance: make the geometric factor configurable Patrick Steinhardt
                     ` (7 subsequent siblings)
  10 siblings, 1 reply; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-24  6:57 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau, Justin Tobler, Junio C Hamano

Introduce a new "geometric-repack" task. This task uses our geometric
repack infrastructure as provided by git-repack(1) itself, which is a
strategy that especially hosting providers tend to use to amortize the
costs of repacking objects.

There is one issue though with geometric repacks, namely that they
unconditionally pack all loose objects, regardless of whether or not
they are reachable. This is done because it means that we can completely
skip the reachability step, which significantly speeds up the operation.
But it has the big downside that we are unable to expire objects over
time.

To address this issue we thus use a split strategy in this new task:
whenever a geometric repack would merge together all packs, we instead
do an all-into-one repack. By default, these all-into-one repacks have
cruft packs enabled, so unreachable objects would now be written into
their own pack. Consequently, they won't be soaked up during geometric
repacking anymore and can be expired with the next full repack, assuming
that their expiry date has surpassed.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/maintenance.adoc |  11 +++
 builtin/gc.c                          | 102 +++++++++++++++++++++++++
 t/t7900-maintenance.sh                | 138 ++++++++++++++++++++++++++++++++++
 3 files changed, 251 insertions(+)

diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc
index 2f719342183..26dc5de423f 100644
--- a/Documentation/config/maintenance.adoc
+++ b/Documentation/config/maintenance.adoc
@@ -75,6 +75,17 @@ maintenance.incremental-repack.auto::
 	number of pack-files not in the multi-pack-index is at least the value
 	of `maintenance.incremental-repack.auto`. The default value is 10.
 
+maintenance.geometric-repack.auto::
+	This integer config option controls how often the `geometric-repack`
+	task should be run as part of `git maintenance run --auto`. If zero,
+	then the `geometric-repack` task will not run with the `--auto`
+	option. A negative value will force the task to run every time.
+	Otherwise, a positive value implies the command should run either when
+	there are packfiles that need to be merged together to retain the
+	geometric progression, or when there are at least this many loose
+	objects that would be written into a new packfile. The default value is
+	100.
+
 maintenance.reflog-expire.auto::
 	This integer config option controls how often the `reflog-expire` task
 	should be run as part of `git maintenance run --auto`. If zero, then
diff --git a/builtin/gc.c b/builtin/gc.c
index 026d3a1d714..2c9ecd464d2 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -34,6 +34,7 @@
 #include "pack-objects.h"
 #include "path.h"
 #include "reflog.h"
+#include "repack.h"
 #include "rerere.h"
 #include "blob.h"
 #include "tree.h"
@@ -254,6 +255,7 @@ enum maintenance_task_label {
 	TASK_PREFETCH,
 	TASK_LOOSE_OBJECTS,
 	TASK_INCREMENTAL_REPACK,
+	TASK_GEOMETRIC_REPACK,
 	TASK_GC,
 	TASK_COMMIT_GRAPH,
 	TASK_PACK_REFS,
@@ -1566,6 +1568,101 @@ static int maintenance_task_incremental_repack(struct maintenance_run_opts *opts
 	return 0;
 }
 
+static int maintenance_task_geometric_repack(struct maintenance_run_opts *opts,
+					     struct gc_config *cfg)
+{
+	struct pack_geometry geometry = {
+		.split_factor = 2,
+	};
+	struct pack_objects_args po_args = {
+		.local = 1,
+	};
+	struct existing_packs existing_packs = EXISTING_PACKS_INIT;
+	struct string_list kept_packs = STRING_LIST_INIT_DUP;
+	struct child_process child = CHILD_PROCESS_INIT;
+	int ret;
+
+	existing_packs.repo = the_repository;
+	existing_packs_collect(&existing_packs, &kept_packs);
+	pack_geometry_init(&geometry, &existing_packs, &po_args);
+	pack_geometry_split(&geometry);
+
+	child.git_cmd = 1;
+
+	strvec_pushl(&child.args, "repack", "-d", "-l", NULL);
+	if (geometry.split < geometry.pack_nr)
+		strvec_push(&child.args, "--geometric=2");
+	else
+		add_repack_all_option(cfg, NULL, &child.args);
+	if (opts->quiet)
+		strvec_push(&child.args, "--quiet");
+	if (the_repository->settings.core_multi_pack_index)
+		strvec_push(&child.args, "--write-midx");
+
+	if (run_command(&child)) {
+		ret = error(_("failed to perform geometric repack"));
+		goto out;
+	}
+
+	ret = 0;
+
+out:
+	existing_packs_release(&existing_packs);
+	pack_geometry_release(&geometry);
+	return ret;
+}
+
+static int geometric_repack_auto_condition(struct gc_config *cfg UNUSED)
+{
+	struct pack_geometry geometry = {
+		.split_factor = 2,
+	};
+	struct pack_objects_args po_args = {
+		.local = 1,
+	};
+	struct existing_packs existing_packs = EXISTING_PACKS_INIT;
+	struct string_list kept_packs = STRING_LIST_INIT_DUP;
+	int auto_value = 100;
+	int ret;
+
+	repo_config_get_int(the_repository, "maintenance.geometric-repack.auto",
+			    &auto_value);
+	if (!auto_value)
+		return 0;
+	if (auto_value < 0)
+		return 1;
+
+	existing_packs.repo = the_repository;
+	existing_packs_collect(&existing_packs, &kept_packs);
+	pack_geometry_init(&geometry, &existing_packs, &po_args);
+	pack_geometry_split(&geometry);
+
+	/*
+	 * When we'd merge at least two packs with one another we always
+	 * perform the repack.
+	 */
+	if (geometry.split) {
+		ret = 1;
+		goto out;
+	}
+
+	/*
+	 * Otherwise, we estimate the number of loose objects to determine
+	 * whether we want to create a new packfile or not.
+	 */
+	if (too_many_loose_objects(auto_value)) {
+		ret = 1;
+		goto out;
+	}
+
+	ret = 0;
+
+out:
+	existing_packs_release(&existing_packs);
+	pack_geometry_release(&geometry);
+	return ret;
+}
+
 typedef int (*maintenance_task_fn)(struct maintenance_run_opts *opts,
 				   struct gc_config *cfg);
 typedef int (*maintenance_auto_fn)(struct gc_config *cfg);
@@ -1608,6 +1705,11 @@ static const struct maintenance_task tasks[] = {
 		.background = maintenance_task_incremental_repack,
 		.auto_condition = incremental_repack_auto_condition,
 	},
+	[TASK_GEOMETRIC_REPACK] = {
+		.name = "geometric-repack",
+		.background = maintenance_task_geometric_repack,
+		.auto_condition = geometric_repack_auto_condition,
+	},
 	[TASK_GC] = {
 		.name = "gc",
 		.foreground = maintenance_task_gc_foreground,
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index ddd273d8dc2..ace0ba83002 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -465,6 +465,144 @@ test_expect_success 'maintenance.incremental-repack.auto (when config is unset)'
 	)
 '
 
+run_and_verify_geometric_pack () {
+	EXPECTED_PACKS="$1" &&
+
+	# Verify that we perform a geometric repack.
+	rm -f "trace2.txt" &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+		git maintenance run --task=geometric-repack 2>/dev/null &&
+	test_subcommand git repack -d -l --geometric=2 \
+		--quiet --write-midx <trace2.txt &&
+
+	# Verify that the number of packfiles matches our expectation.
+	ls -l .git/objects/pack/*.pack >packfiles &&
+	test_line_count = "$EXPECTED_PACKS" packfiles &&
+
+	# And verify that there are no loose objects anymore.
+	git count-objects -v >count &&
+	test_grep '^count: 0$' count
+}
+
+test_expect_success 'geometric repacking task' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		git config set maintenance.auto false &&
+		test_commit initial &&
+
+		# The initial repack causes an all-into-one repack.
+		GIT_TRACE2_EVENT="$(pwd)/initial-repack.txt" \
+			git maintenance run --task=geometric-repack 2>/dev/null &&
+		test_subcommand git repack -d -l --cruft --cruft-expiration=2.weeks.ago \
+			--quiet --write-midx <initial-repack.txt &&
+
+		# Repacking should now cause a no-op geometric repack because
+		# no packfiles need to be combined.
+		ls -l .git/objects/pack >before &&
+		run_and_verify_geometric_pack 1 &&
+		ls -l .git/objects/pack >after &&
+		test_cmp before after &&
+
+		# This incremental change creates a new packfile that only
+		# soaks up loose objects. The packfiles are not getting merged
+		# at this point.
+		test_commit loose &&
+		run_and_verify_geometric_pack 2 &&
+
+		# Both packfiles have 3 objects, so the next run would cause us
+		# to merge all packfiles together. This should be turned into
+		# an all-into-one-repack.
+		GIT_TRACE2_EVENT="$(pwd)/all-into-one-repack.txt" \
+			git maintenance run --task=geometric-repack 2>/dev/null &&
+		test_subcommand git repack -d -l --cruft --cruft-expiration=2.weeks.ago \
+			--quiet --write-midx <all-into-one-repack.txt &&
+
+		# The geometric repack soaks up unreachable objects.
+		echo blob-1 | git hash-object -w --stdin -t blob &&
+		run_and_verify_geometric_pack 2 &&
+
+		# A second unreachable object should be written into another packfile.
+		echo blob-2 | git hash-object -w --stdin -t blob &&
+		run_and_verify_geometric_pack 3 &&
+
+		# And these two small packs should now be merged via the
+		# geometric repack. The large packfile should remain intact.
+		run_and_verify_geometric_pack 2 &&
+
+		# If we now add two more objects and repack twice we should
+		# then see another all-into-one repack. This time around
+		# though, as we have unreachable objects, we should also see a
+		# cruft pack.
+		echo blob-3 | git hash-object -w --stdin -t blob &&
+		echo blob-4 | git hash-object -w --stdin -t blob &&
+		run_and_verify_geometric_pack 3 &&
+		GIT_TRACE2_EVENT="$(pwd)/cruft-repack.txt" \
+			git maintenance run --task=geometric-repack 2>/dev/null &&
+		test_subcommand git repack -d -l --cruft --cruft-expiration=2.weeks.ago \
+			--quiet --write-midx <cruft-repack.txt &&
+		ls .git/objects/pack/*.pack >packs &&
+		test_line_count = 2 packs &&
+		ls .git/objects/pack/*.mtimes >cruft &&
+		test_line_count = 1 cruft
+	)
+'
+
+test_geometric_repack_needed () {
+	NEEDED="$1"
+	GEOMETRIC_CONFIG="$2" &&
+	rm -f trace2.txt &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+		git ${GEOMETRIC_CONFIG:+-c maintenance.geometric-repack.$GEOMETRIC_CONFIG} \
+		maintenance run --auto --task=geometric-repack 2>/dev/null &&
+	case "$NEEDED" in
+	true)
+		test_grep "\[\"git\",\"repack\"," trace2.txt;;
+	false)
+		! test_grep "\[\"git\",\"repack\"," trace2.txt;;
+	*)
+		BUG "invalid parameter: $NEEDED";;
+	esac
+}
+
+test_expect_success 'geometric repacking with --auto' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		# An empty repository does not need repacking, except when
+		# explicitly told to do it.
+		test_geometric_repack_needed false &&
+		test_geometric_repack_needed false auto=0 &&
+		test_geometric_repack_needed false auto=1 &&
+		test_geometric_repack_needed true auto=-1 &&
+
+		test_oid_init &&
+
+		# Loose objects cause a repack when crossing the limit. Note
+		# that the number of objects gets extrapolated by having a look
+		# at the "objects/17/" shard.
+		test_commit "$(test_oid blob17_1)" &&
+		test_geometric_repack_needed false &&
+		test_commit "$(test_oid blob17_2)" &&
+		test_geometric_repack_needed false auto=257 &&
+		test_geometric_repack_needed true auto=256 &&
+
+		# Force another repack.
+		test_commit first &&
+		test_commit second &&
+		test_geometric_repack_needed true auto=-1 &&
+
+		# We now have two packfiles that would be merged together. As
+		# such, the repack should always happen unless the user has
+		# disabled the auto task.
+		test_geometric_repack_needed false auto=0 &&
+		test_geometric_repack_needed true auto=9000
+	)
+'
+
 test_expect_success 'pack-refs task' '
 	for n in $(test_seq 1 5)
 	do

-- 
2.51.1.930.gacf6e81ea2.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 04/10] builtin/maintenance: make the geometric factor configurable
  2025-10-24  6:57 ` [PATCH v3 00/10] " Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2025-10-24  6:57   ` [PATCH v3 03/10] builtin/maintenance: introduce "geometric-repack" task Patrick Steinhardt
@ 2025-10-24  6:57   ` Patrick Steinhardt
  2025-10-24  6:57   ` [PATCH v3 05/10] builtin/maintenance: don't silently ignore invalid strategy Patrick Steinhardt
                     ` (6 subsequent siblings)
  10 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-24  6:57 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau, Justin Tobler, Junio C Hamano

The geometric repacking task uses a factor of two for its geometric
sequence, meaning that each next pack must contain at least twice as
many objects as the next-smaller one. In some cases it may be helpful to
configure this factor though to reduce the number of packfile merges
even further, e.g. in very big repositories. But while git-repack(1)
itself supports doing this, the maintenance task does not give us a way
to tune it.

Introduce a new "maintenance.geometric-repack.splitFactor" configuration
to plug this gap.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/maintenance.adoc |  5 +++++
 builtin/gc.c                          |  9 ++++++++-
 t/t7900-maintenance.sh                | 32 ++++++++++++++++++++++++++++++++
 3 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc
index 26dc5de423f..45fdafc2c63 100644
--- a/Documentation/config/maintenance.adoc
+++ b/Documentation/config/maintenance.adoc
@@ -86,6 +86,11 @@ maintenance.geometric-repack.auto::
 	objects that would be written into a new packfile. The default value is
 	100.
 
+maintenance.geometric-repack.splitFactor::
+	This integer config option controls the factor used for the geometric
+	sequence. See the `--geometric=` option in linkgit:git-repack[1] for
+	more details. Defaults to `2`.
+
 maintenance.reflog-expire.auto::
 	This integer config option controls how often the `reflog-expire` task
 	should be run as part of `git maintenance run --auto`. If zero, then
diff --git a/builtin/gc.c b/builtin/gc.c
index 2c9ecd464d2..fb1a82e0304 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1582,6 +1582,9 @@ static int maintenance_task_geometric_repack(struct maintenance_run_opts *opts,
 	struct child_process child = CHILD_PROCESS_INIT;
 	int ret;
 
+	repo_config_get_int(the_repository, "maintenance.geometric-repack.splitFactor",
+			    &geometry.split_factor);
+
 	existing_packs.repo = the_repository;
 	existing_packs_collect(&existing_packs, &kept_packs);
 	pack_geometry_init(&geometry, &existing_packs, &po_args);
@@ -1591,7 +1594,8 @@ static int maintenance_task_geometric_repack(struct maintenance_run_opts *opts,
 
 	strvec_pushl(&child.args, "repack", "-d", "-l", NULL);
 	if (geometry.split < geometry.pack_nr)
-		strvec_push(&child.args, "--geometric=2");
+		strvec_pushf(&child.args, "--geometric=%d",
+			     geometry.split_factor);
 	else
 		add_repack_all_option(cfg, NULL, &child.args);
 	if (opts->quiet)
@@ -1632,6 +1636,9 @@ static int geometric_repack_auto_condition(struct gc_config *cfg UNUSED)
 	if (auto_value < 0)
 		return 1;
 
+	repo_config_get_int(the_repository, "maintenance.geometric-repack.splitFactor",
+			    &geometry.split_factor);
+
 	existing_packs.repo = the_repository;
 	existing_packs_collect(&existing_packs, &kept_packs);
 	pack_geometry_init(&geometry, &existing_packs, &po_args);
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index ace0ba83002..e0352fd1965 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -603,6 +603,38 @@ test_expect_success 'geometric repacking with --auto' '
 	)
 '
 
+test_expect_success 'geometric repacking honors configured split factor' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		git config set maintenance.auto false &&
+
+		# Create three different packs with 9, 2 and 1 object, respectively.
+		# This is done so that only a subset of packs would be merged
+		# together so that we can verify that `git repack` receives the
+		# correct geometric factor.
+		for i in $(test_seq 9)
+		do
+			echo first-$i | git hash-object -w --stdin -t blob || return 1
+		done &&
+		git repack --geometric=2 -d &&
+
+		for i in $(test_seq 2)
+		do
+			echo second-$i | git hash-object -w --stdin -t blob || return 1
+		done &&
+		git repack --geometric=2 -d &&
+
+		echo third | git hash-object -w --stdin -t blob &&
+		git repack --geometric=2 -d &&
+
+		test_geometric_repack_needed false splitFactor=2 &&
+		test_geometric_repack_needed true splitFactor=3 &&
+		test_subcommand git repack -d -l --geometric=3 --quiet --write-midx <trace2.txt
+	)
+'
+
 test_expect_success 'pack-refs task' '
 	for n in $(test_seq 1 5)
 	do

-- 
2.51.1.930.gacf6e81ea2.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 05/10] builtin/maintenance: don't silently ignore invalid strategy
  2025-10-24  6:57 ` [PATCH v3 00/10] " Patrick Steinhardt
                     ` (3 preceding siblings ...)
  2025-10-24  6:57   ` [PATCH v3 04/10] builtin/maintenance: make the geometric factor configurable Patrick Steinhardt
@ 2025-10-24  6:57   ` Patrick Steinhardt
  2025-10-24  6:57   ` [PATCH v3 06/10] builtin/maintenance: improve readability of strategies Patrick Steinhardt
                     ` (5 subsequent siblings)
  10 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-24  6:57 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau, Justin Tobler, Junio C Hamano

When parsing maintenance strategies we completely ignore the
user-configured value in case it is unknown to us. This makes it
basically undiscoverable to the user that scheduled maintenance is
devolving into a no-op.

Change this to instead die when seeing an unknown maintenance strategy.
While at it, pull out the parsing logic into a separate function so that
we can reuse it in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/gc.c           | 17 +++++++++++------
 t/t7900-maintenance.sh |  5 +++++
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index fb1a82e0304..726d944d3bd 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1855,6 +1855,13 @@ static const struct maintenance_strategy incremental_strategy = {
 	},
 };
 
+static struct maintenance_strategy parse_maintenance_strategy(const char *name)
+{
+	if (!strcasecmp(name, "incremental"))
+		return incremental_strategy;
+	die(_("unknown maintenance strategy: '%s'"), name);
+}
+
 static void initialize_task_config(struct maintenance_run_opts *opts,
 				   const struct string_list *selected_tasks)
 {
@@ -1890,12 +1897,10 @@ static void initialize_task_config(struct maintenance_run_opts *opts,
 	 * override specific aspects of our strategy.
 	 */
 	if (opts->schedule) {
-		strategy = none_strategy;
-
-		if (!repo_config_get_string_tmp(the_repository, "maintenance.strategy", &config_str)) {
-			if (!strcasecmp(config_str, "incremental"))
-				strategy = incremental_strategy;
-		}
+		if (!repo_config_get_string_tmp(the_repository, "maintenance.strategy", &config_str))
+			strategy = parse_maintenance_strategy(config_str);
+		else
+			strategy = none_strategy;
 	} else {
 		strategy = default_strategy;
 	}
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index e0352fd1965..0fb917dd7b7 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -1263,6 +1263,11 @@ test_expect_success 'fails when running outside of a repository' '
 	nongit test_must_fail git maintenance unregister
 '
 
+test_expect_success 'fails when configured to use an invalid strategy' '
+	test_must_fail git -c maintenance.strategy=invalid maintenance run --schedule=hourly 2>err &&
+	test_grep "unknown maintenance strategy: .invalid." err
+'
+
 test_expect_success 'register and unregister bare repo' '
 	test_when_finished "git config --global --unset-all maintenance.repo || :" &&
 	test_might_fail git config --global --unset-all maintenance.repo &&

-- 
2.51.1.930.gacf6e81ea2.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 06/10] builtin/maintenance: improve readability of strategies
  2025-10-24  6:57 ` [PATCH v3 00/10] " Patrick Steinhardt
                     ` (4 preceding siblings ...)
  2025-10-24  6:57   ` [PATCH v3 05/10] builtin/maintenance: don't silently ignore invalid strategy Patrick Steinhardt
@ 2025-10-24  6:57   ` Patrick Steinhardt
  2025-10-24  6:57   ` [PATCH v3 07/10] builtin/maintenance: run maintenance tasks depending on type Patrick Steinhardt
                     ` (4 subsequent siblings)
  10 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-24  6:57 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau, Justin Tobler, Junio C Hamano

Our maintenance strategies are essentially a large array of structures,
where each of the tasks can be enabled and scheduled individually. With
the current layout though all the configuration sits on the same nesting
layer, which makes it a bit hard to discern which initialized fields
belong to what task.

Improve readability of the individual tasks by using nested designated
initializers instead.

Suggested-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/gc.c | 36 +++++++++++++++++++++++++-----------
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index 726d944d3bd..0ba6e59de14 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1835,23 +1835,37 @@ struct maintenance_strategy {
 };
 
 static const struct maintenance_strategy none_strategy = { 0 };
+
 static const struct maintenance_strategy default_strategy = {
 	.tasks = {
-		[TASK_GC].enabled = 1,
+		[TASK_GC] = {
+			.enabled = 1,
+		},
 	},
 };
+
 static const struct maintenance_strategy incremental_strategy = {
 	.tasks = {
-		[TASK_COMMIT_GRAPH].enabled = 1,
-		[TASK_COMMIT_GRAPH].schedule = SCHEDULE_HOURLY,
-		[TASK_PREFETCH].enabled = 1,
-		[TASK_PREFETCH].schedule = SCHEDULE_HOURLY,
-		[TASK_INCREMENTAL_REPACK].enabled = 1,
-		[TASK_INCREMENTAL_REPACK].schedule = SCHEDULE_DAILY,
-		[TASK_LOOSE_OBJECTS].enabled = 1,
-		[TASK_LOOSE_OBJECTS].schedule = SCHEDULE_DAILY,
-		[TASK_PACK_REFS].enabled = 1,
-		[TASK_PACK_REFS].schedule = SCHEDULE_WEEKLY,
+		[TASK_COMMIT_GRAPH] = {
+			.enabled = 1,
+			.schedule = SCHEDULE_HOURLY,
+		},
+		[TASK_PREFETCH] = {
+			.enabled = 1,
+			.schedule = SCHEDULE_HOURLY,
+		},
+		[TASK_INCREMENTAL_REPACK] = {
+			.enabled = 1,
+			.schedule = SCHEDULE_DAILY,
+		},
+		[TASK_LOOSE_OBJECTS] = {
+			.enabled = 1,
+			.schedule = SCHEDULE_DAILY,
+		},
+		[TASK_PACK_REFS] = {
+			.enabled = 1,
+			.schedule = SCHEDULE_WEEKLY,
+		},
 	},
 };
 

-- 
2.51.1.930.gacf6e81ea2.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 07/10] builtin/maintenance: run maintenance tasks depending on type
  2025-10-24  6:57 ` [PATCH v3 00/10] " Patrick Steinhardt
                     ` (5 preceding siblings ...)
  2025-10-24  6:57   ` [PATCH v3 06/10] builtin/maintenance: improve readability of strategies Patrick Steinhardt
@ 2025-10-24  6:57   ` Patrick Steinhardt
  2025-10-24  6:57   ` [PATCH v3 08/10] builtin/maintenance: extend "maintenance.strategy" to manual maintenance Patrick Steinhardt
                     ` (3 subsequent siblings)
  10 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-24  6:57 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau, Justin Tobler, Junio C Hamano

We basically have three different ways to execute repository
maintenance:

  1. Manual maintenance via `git maintenance run`.

  2. Automatic maintenance via `git maintenance run --auto`.

  3. Scheduled maintenance via `git maintenance run --schedule=`.

At the moment, maintenance strategies only have an effect for the last
type of maintenance. This is about to change in subsequent commits, but
to do so we need to be able to skip some tasks depending on how exactly
maintenance was invoked.

Introduce a new maintenance type that discern between manual (1 & 2) and
scheduled (3) maintenance. Convert the `enabled` field into a bitset so
that it becomes possible to specifiy which tasks exactly should run in a
specific context.

The types picked for existing strategies match the status quo:

  - The default strategy is only ever executed as part of a manual
    maintenance run. It is not possible to use it for scheduled
    maintenance.

  - The incremental strategy is only ever executed as part of a
    scheduled maintenance run. It is not possible to use it for manual
    maintenance.

The strategies will be tweaked in subsequent commits to make use of this
new infrastructure.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/gc.c | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index 0ba6e59de1..6cc4f98c7a 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1827,9 +1827,16 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts,
 	return result;
 }
 
+enum maintenance_type {
+	/* As invoked via `git maintenance run --schedule=`. */
+	MAINTENANCE_TYPE_SCHEDULED = (1 << 0),
+	/* As invoked via `git maintenance run` and with `--auto`. */
+	MAINTENANCE_TYPE_MANUAL    = (1 << 1),
+};
+
 struct maintenance_strategy {
 	struct {
-		int enabled;
+		unsigned type;
 		enum schedule_priority schedule;
 	} tasks[TASK__COUNT];
 };
@@ -1839,7 +1846,7 @@ static const struct maintenance_strategy none_strategy = { 0 };
 static const struct maintenance_strategy default_strategy = {
 	.tasks = {
 		[TASK_GC] = {
-			.enabled = 1,
+			.type = MAINTENANCE_TYPE_MANUAL,
 		},
 	},
 };
@@ -1847,23 +1854,23 @@ static const struct maintenance_strategy default_strategy = {
 static const struct maintenance_strategy incremental_strategy = {
 	.tasks = {
 		[TASK_COMMIT_GRAPH] = {
-			.enabled = 1,
+			.type = MAINTENANCE_TYPE_SCHEDULED,
 			.schedule = SCHEDULE_HOURLY,
 		},
 		[TASK_PREFETCH] = {
-			.enabled = 1,
+			.type = MAINTENANCE_TYPE_SCHEDULED,
 			.schedule = SCHEDULE_HOURLY,
 		},
 		[TASK_INCREMENTAL_REPACK] = {
-			.enabled = 1,
+			.type = MAINTENANCE_TYPE_SCHEDULED,
 			.schedule = SCHEDULE_DAILY,
 		},
 		[TASK_LOOSE_OBJECTS] = {
-			.enabled = 1,
+			.type = MAINTENANCE_TYPE_SCHEDULED,
 			.schedule = SCHEDULE_DAILY,
 		},
 		[TASK_PACK_REFS] = {
-			.enabled = 1,
+			.type = MAINTENANCE_TYPE_SCHEDULED,
 			.schedule = SCHEDULE_WEEKLY,
 		},
 	},
@@ -1881,6 +1888,7 @@ static void initialize_task_config(struct maintenance_run_opts *opts,
 {
 	struct strbuf config_name = STRBUF_INIT;
 	struct maintenance_strategy strategy;
+	enum maintenance_type type;
 	const char *config_str;
 
 	/*
@@ -1915,8 +1923,10 @@ static void initialize_task_config(struct maintenance_run_opts *opts,
 			strategy = parse_maintenance_strategy(config_str);
 		else
 			strategy = none_strategy;
+		type = MAINTENANCE_TYPE_SCHEDULED;
 	} else {
 		strategy = default_strategy;
+		type = MAINTENANCE_TYPE_MANUAL;
 	}
 
 	for (size_t i = 0; i < TASK__COUNT; i++) {
@@ -1926,8 +1936,8 @@ static void initialize_task_config(struct maintenance_run_opts *opts,
 		strbuf_addf(&config_name, "maintenance.%s.enabled",
 			    tasks[i].name);
 		if (!repo_config_get_bool(the_repository, config_name.buf, &config_value))
-			strategy.tasks[i].enabled = config_value;
-		if (!strategy.tasks[i].enabled)
+			strategy.tasks[i].type = config_value ? type : 0;
+		if (!(strategy.tasks[i].type & type))
 			continue;
 
 		if (opts->schedule) {

-- 
2.51.1.930.gacf6e81ea2.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 08/10] builtin/maintenance: extend "maintenance.strategy" to manual maintenance
  2025-10-24  6:57 ` [PATCH v3 00/10] " Patrick Steinhardt
                     ` (6 preceding siblings ...)
  2025-10-24  6:57   ` [PATCH v3 07/10] builtin/maintenance: run maintenance tasks depending on type Patrick Steinhardt
@ 2025-10-24  6:57   ` Patrick Steinhardt
  2025-10-24  6:57   ` [PATCH v3 09/10] builtin/maintenance: make "gc" strategy accessible Patrick Steinhardt
                     ` (2 subsequent siblings)
  10 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-24  6:57 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau, Justin Tobler, Junio C Hamano

The "maintenance.strategy" configuration allows users to configure how
Git is supposed to perform repository maintenance. The idea is that we
provide a set of high-level strategies that may be useful in different
contexts, like for example when handling a large monorepo. Furthermore,
the strategy can be tweaked by the user by overriding specific tasks.

In its current form though, the strategy only applies to scheduled
maintenance. This creates something of a gap, as scheduled and manual
maintenance will now use _different_ strategies as the latter would
continue to use git-gc(1) by default. This makes the strategies way less
useful than they could be on the one hand. But even more importantly,
the two different strategies might clash with one another, where one of
the strategies performs maintenance in such a way that it discards
benefits from the other strategy.

So ideally, it should be possible to pick one strategy that then applies
globally to all the different ways that we perform maintenance. This
doesn't necessarily mean that the strategy always does the _same_ thing
for every maintenance type. But it means that the strategy can configure
the different types to work in tandem with each other.

Change the meaning of "maintenance.strategy" accordingly so that the
strategy is applied to both types, manual and scheduled. As preceding
commits have introduced logic to run maintenance tasks depending on this
type we can tweak strategies so that they perform those tasks depending
on the context.

Note that this raises the question of backwards compatibility: when the
user has configured the "incremental" strategy we would have ignored
that strategy beforehand. Instead, repository maintenance would have
continued to use git-gc(1) by default.

But luckily, we can match that behaviour by:

  - Keeping all current tasks of the incremental strategy as
    `MAINTENANCE_TYPE_SCHEDULED`. This ensures that those tasks will not
    run during manual maintenance.

  - Configuring the "gc" task so that it is invoked during manual
    maintenance.

Like this, the user shouldn't observe any difference in behaviour.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/maintenance.adoc | 22 ++++++++++++-------
 builtin/gc.c                          | 25 +++++++++++++++++-----
 t/t7900-maintenance.sh                | 40 +++++++++++++++++++++++++++++++++++
 3 files changed, 74 insertions(+), 13 deletions(-)

diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc
index 45fdafc2c63..b7e90a71a3d 100644
--- a/Documentation/config/maintenance.adoc
+++ b/Documentation/config/maintenance.adoc
@@ -16,19 +16,25 @@ detach.
 
 maintenance.strategy::
 	This string config option provides a way to specify one of a few
-	recommended schedules for background maintenance. This only affects
-	which tasks are run during `git maintenance run --schedule=X`
-	commands, provided no `--task=<task>` arguments are provided.
-	Further, if a `maintenance.<task>.schedule` config value is set,
-	then that value is used instead of the one provided by
-	`maintenance.strategy`. The possible strategy strings are:
+	recommended strategies for repository maintenance. This affects
+	which tasks are run during `git maintenance run`, provided no
+	`--task=<task>` arguments are provided. This setting impacts manual
+	maintenance, auto-maintenance as well as scheduled maintenance. The
+	tasks that run may be different depending on the maintenance type.
 +
-* `none`: This default setting implies no tasks are run at any schedule.
+The maintenance strategy can be further tweaked by setting
+`maintenance.<task>.enabled` and `maintenance.<task>.schedule`. If set, these
+values are used instead of the defaults provided by `maintenance.strategy`.
++
+The possible strategies are:
++
+* `none`: This strategy implies no tasks are run at all. This is the default
+  strategy for scheduled maintenance.
 * `incremental`: This setting optimizes for performing small maintenance
   activities that do not delete any data. This does not schedule the `gc`
   task, but runs the `prefetch` and `commit-graph` tasks hourly, the
   `loose-objects` and `incremental-repack` tasks daily, and the `pack-refs`
-  task weekly.
+  task weekly. Manual repository maintenance uses the `gc` task.
 
 maintenance.<task>.enabled::
 	This boolean config option controls whether the maintenance task
diff --git a/builtin/gc.c b/builtin/gc.c
index 6cc4f98c7aa..3c0a9a2e5df 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1873,6 +1873,20 @@ static const struct maintenance_strategy incremental_strategy = {
 			.type = MAINTENANCE_TYPE_SCHEDULED,
 			.schedule = SCHEDULE_WEEKLY,
 		},
+		/*
+		 * Historically, the "incremental" strategy was only available
+		 * in the context of scheduled maintenance when set up via
+		 * "maintenance.strategy". We have later expanded that config
+		 * to also cover manual maintenance.
+		 *
+		 * To retain backwards compatibility with the previous status
+		 * quo we thus run git-gc(1) in case manual maintenance was
+		 * requested. This is the same as the default strategy, which
+		 * would have been in use beforehand.
+		 */
+		[TASK_GC] = {
+			.type = MAINTENANCE_TYPE_MANUAL,
+		},
 	},
 };
 
@@ -1916,19 +1930,20 @@ static void initialize_task_config(struct maintenance_run_opts *opts,
 	 *   - Unscheduled maintenance uses our default strategy.
 	 *
 	 * Both of these are affected by the gitconfig though, which may
-	 * override specific aspects of our strategy.
+	 * override specific aspects of our strategy. Furthermore, both
+	 * strategies can be overridden by setting "maintenance.strategy".
 	 */
 	if (opts->schedule) {
-		if (!repo_config_get_string_tmp(the_repository, "maintenance.strategy", &config_str))
-			strategy = parse_maintenance_strategy(config_str);
-		else
-			strategy = none_strategy;
+		strategy = none_strategy;
 		type = MAINTENANCE_TYPE_SCHEDULED;
 	} else {
 		strategy = default_strategy;
 		type = MAINTENANCE_TYPE_MANUAL;
 	}
 
+	if (!repo_config_get_string_tmp(the_repository, "maintenance.strategy", &config_str))
+		strategy = parse_maintenance_strategy(config_str);
+
 	for (size_t i = 0; i < TASK__COUNT; i++) {
 		int config_value;
 
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 0fb917dd7b7..5219bc17a69 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -886,6 +886,46 @@ test_expect_success 'maintenance.strategy inheritance' '
 		<modified-daily.txt
 '
 
+test_strategy () {
+	STRATEGY="$1"
+	shift
+
+	cat >expect &&
+	rm -f trace2.txt &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+		git -c maintenance.strategy=$STRATEGY maintenance run --quiet "$@" &&
+	sed -n 's/{"event":"child_start","sid":"[^/"]*",.*,"argv":\["\(.*\)\"]}/\1/p' <trace2.txt |
+		sed 's/","/ /g'  >actual
+	test_cmp expect actual
+}
+
+test_expect_success 'maintenance.strategy is respected' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit initial &&
+
+		test_must_fail git -c maintenance.strategy=unknown maintenance run 2>err &&
+		test_grep "unknown maintenance strategy: .unknown." err &&
+
+		test_strategy incremental <<-\EOF &&
+		git pack-refs --all --prune
+		git reflog expire --all
+		git gc --quiet --no-detach --skip-foreground-tasks
+		EOF
+
+		test_strategy incremental --schedule=weekly <<-\EOF
+		git pack-refs --all --prune
+		git prune-packed --quiet
+		git multi-pack-index write --no-progress
+		git multi-pack-index expire --no-progress
+		git multi-pack-index repack --no-progress --batch-size=1
+		git commit-graph write --split --reachable --no-progress
+		EOF
+	)
+'
+
 test_expect_success 'register and unregister' '
 	test_when_finished git config --global --unset-all maintenance.repo &&
 

-- 
2.51.1.930.gacf6e81ea2.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 09/10] builtin/maintenance: make "gc" strategy accessible
  2025-10-24  6:57 ` [PATCH v3 00/10] " Patrick Steinhardt
                     ` (7 preceding siblings ...)
  2025-10-24  6:57   ` [PATCH v3 08/10] builtin/maintenance: extend "maintenance.strategy" to manual maintenance Patrick Steinhardt
@ 2025-10-24  6:57   ` Patrick Steinhardt
  2025-10-24  6:57   ` [PATCH v3 10/10] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
  2025-10-24 19:03   ` [PATCH v3 00/10] " Taylor Blau
  10 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-24  6:57 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau, Justin Tobler, Junio C Hamano

While the user can pick the "incremental" maintenance strategy, it is
not possible to explicitly use the "gc" strategy. This has two
downsides:

  - It is impossible to use the default "gc" strategy for a specific
    repository when the strategy was globally set to a different strategy.

  - It is not possible to use git-gc(1) for scheduled maintenance.

Address these issues by making making the "gc" strategy configurable.
Furthermore, extend the strategy so that git-gc(1) runs for both manual
and scheduled maintenance.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/maintenance.adoc |  2 ++
 builtin/gc.c                          |  9 ++++++---
 t/t7900-maintenance.sh                | 14 +++++++++++++-
 3 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc
index b7e90a71a3d..b2bacdc8220 100644
--- a/Documentation/config/maintenance.adoc
+++ b/Documentation/config/maintenance.adoc
@@ -30,6 +30,8 @@ The possible strategies are:
 +
 * `none`: This strategy implies no tasks are run at all. This is the default
   strategy for scheduled maintenance.
+* `gc`: This strategy runs the `gc` task. This is the default strategy for
+  manual maintenance.
 * `incremental`: This setting optimizes for performing small maintenance
   activities that do not delete any data. This does not schedule the `gc`
   task, but runs the `prefetch` and `commit-graph` tasks hourly, the
diff --git a/builtin/gc.c b/builtin/gc.c
index 3c0a9a2e5df..8cab1450095 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1843,10 +1843,11 @@ struct maintenance_strategy {
 
 static const struct maintenance_strategy none_strategy = { 0 };
 
-static const struct maintenance_strategy default_strategy = {
+static const struct maintenance_strategy gc_strategy = {
 	.tasks = {
 		[TASK_GC] = {
-			.type = MAINTENANCE_TYPE_MANUAL,
+			.type = MAINTENANCE_TYPE_MANUAL | MAINTENANCE_TYPE_SCHEDULED,
+			.schedule = SCHEDULE_DAILY,
 		},
 	},
 };
@@ -1894,6 +1895,8 @@ static struct maintenance_strategy parse_maintenance_strategy(const char *name)
 {
 	if (!strcasecmp(name, "incremental"))
 		return incremental_strategy;
+	if (!strcasecmp(name, "gc"))
+		return gc_strategy;
 	die(_("unknown maintenance strategy: '%s'"), name);
 }
 
@@ -1937,7 +1940,7 @@ static void initialize_task_config(struct maintenance_run_opts *opts,
 		strategy = none_strategy;
 		type = MAINTENANCE_TYPE_SCHEDULED;
 	} else {
-		strategy = default_strategy;
+		strategy = gc_strategy;
 		type = MAINTENANCE_TYPE_MANUAL;
 	}
 
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 5219bc17a69..85e0cea4d96 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -915,7 +915,7 @@ test_expect_success 'maintenance.strategy is respected' '
 		git gc --quiet --no-detach --skip-foreground-tasks
 		EOF
 
-		test_strategy incremental --schedule=weekly <<-\EOF
+		test_strategy incremental --schedule=weekly <<-\EOF &&
 		git pack-refs --all --prune
 		git prune-packed --quiet
 		git multi-pack-index write --no-progress
@@ -923,6 +923,18 @@ test_expect_success 'maintenance.strategy is respected' '
 		git multi-pack-index repack --no-progress --batch-size=1
 		git commit-graph write --split --reachable --no-progress
 		EOF
+
+		test_strategy gc <<-\EOF &&
+		git pack-refs --all --prune
+		git reflog expire --all
+		git gc --quiet --no-detach --skip-foreground-tasks
+		EOF
+
+		test_strategy gc --schedule=weekly <<-\EOF
+		git pack-refs --all --prune
+		git reflog expire --all
+		git gc --quiet --no-detach --skip-foreground-tasks
+		EOF
 	)
 '
 

-- 
2.51.1.930.gacf6e81ea2.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v3 10/10] builtin/maintenance: introduce "geometric" strategy
  2025-10-24  6:57 ` [PATCH v3 00/10] " Patrick Steinhardt
                     ` (8 preceding siblings ...)
  2025-10-24  6:57   ` [PATCH v3 09/10] builtin/maintenance: make "gc" strategy accessible Patrick Steinhardt
@ 2025-10-24  6:57   ` Patrick Steinhardt
  2025-10-24 19:03   ` [PATCH v3 00/10] " Taylor Blau
  10 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-24  6:57 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau, Justin Tobler, Junio C Hamano

We have two different repacking strategies in Git:

  - The "gc" strategy uses git-gc(1).

  - The "incremental" strategy uses multi-pack indices and `git
    multi-pack-index repack` to merge together smaller packfiles as
    determined by a specific batch size.

The former strategy is our old and trusted default, whereas the latter
has historically been used for our scheduled maintenance. But both
strategies have their shortcomings:

  - The "gc" strategy performs regular all-into-one repacks. Furthermore
    it is rather inflexible, as it is not easily possible for a user to
    enable or disable specific subtasks.

  - The "incremental" strategy is not a full replacement for the "gc"
    strategy as it doesn't know to prune stale data.

So today, we don't have a strategy that is well-suited for large repos
while being a full replacement for the "gc" strategy.

Introduce a new "geometric" strategy that aims to fill this gap. This
strategy invokes all the usual cleanup tasks that git-gc(1) does like
pruning reflogs and rerere caches as well as stale worktrees. But where
it differs from both the "gc" and "incremental" strategy is that it uses
our geometric repacking infrastructure exposed by git-repack(1) to
repack packfiles. The advantage of geometric repacking is that we only
need to perform an all-into-one repack when the object count in a repo
has grown significantly.

One downside of this strategy is that pruning of unreferenced objects is
not going to happen regularly anymore. Every geometric repack knows to
soak up all loose objects regardless of their reachability, and merging
two or more packs doesn't consider reachability, either. Consequently,
the number of unreachable objects will grow over time.

This is remedied by doing an all-into-one repack instead of a geometric
repack whenever we determine that the geometric repack would end up
merging all packfiles anyway. This all-into-one repack then performs our
usual reachability checks and writes unreachable objects into a cruft
pack. As cruft packs won't ever be merged during geometric repacks we
can thus phase out these objects over time.

Of course, this still means that we retain unreachable objects for far
longer than with the "gc" strategy. But the maintenance strategy is
intended especially for large repositories, where the basic assumption
is that the set of unreachable objects will be significantly dwarfed by
the number of reachable objects.

If this assumption is ever proven to be too disadvantageous we could for
example introduce a time-based strategy: if the largest packfile has not
been touched for longer than $T, we perform an all-into-one repack. But
for now, such a mechanism is deferred into the future as it is not clear
yet whether it is needed in the first place.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/maintenance.adoc |  9 +++++++++
 builtin/gc.c                          | 31 +++++++++++++++++++++++++++++++
 t/t7900-maintenance.sh                | 20 +++++++++++++++++++-
 3 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc
index b2bacdc8220..d0c38f03fab 100644
--- a/Documentation/config/maintenance.adoc
+++ b/Documentation/config/maintenance.adoc
@@ -32,6 +32,15 @@ The possible strategies are:
   strategy for scheduled maintenance.
 * `gc`: This strategy runs the `gc` task. This is the default strategy for
   manual maintenance.
+* `geometric`: This strategy performs geometric repacking of packfiles and
+  keeps auxiliary data structures up-to-date. The strategy expires data in the
+  reflog and removes worktrees that cannot be located anymore. When the
+  geometric repacking strategy would decide to do an all-into-one repack, then
+  the strategy generates a cruft pack for all unreachable objects. Objects that
+  are already part of a cruft pack will be expired.
++
+This repacking strategy is a full replacement for the `gc` strategy and is
+recommended for large repositories.
 * `incremental`: This setting optimizes for performing small maintenance
   activities that do not delete any data. This does not schedule the `gc`
   task, but runs the `prefetch` and `commit-graph` tasks hourly, the
diff --git a/builtin/gc.c b/builtin/gc.c
index 8cab1450095..19be3f87e13 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1891,12 +1891,43 @@ static const struct maintenance_strategy incremental_strategy = {
 	},
 };
 
+static const struct maintenance_strategy geometric_strategy = {
+	.tasks = {
+		[TASK_COMMIT_GRAPH] = {
+			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+			.schedule = SCHEDULE_HOURLY,
+		},
+		[TASK_GEOMETRIC_REPACK] = {
+			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+			.schedule = SCHEDULE_DAILY,
+		},
+		[TASK_PACK_REFS] = {
+			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+			.schedule = SCHEDULE_DAILY,
+		},
+		[TASK_RERERE_GC] = {
+			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+			.schedule = SCHEDULE_WEEKLY,
+		},
+		[TASK_REFLOG_EXPIRE] = {
+			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+			.schedule = SCHEDULE_WEEKLY,
+		},
+		[TASK_WORKTREE_PRUNE] = {
+			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+			.schedule = SCHEDULE_WEEKLY,
+		},
+	},
+};
+
 static struct maintenance_strategy parse_maintenance_strategy(const char *name)
 {
 	if (!strcasecmp(name, "incremental"))
 		return incremental_strategy;
 	if (!strcasecmp(name, "gc"))
 		return gc_strategy;
+	if (!strcasecmp(name, "geometric"))
+		return geometric_strategy;
 	die(_("unknown maintenance strategy: '%s'"), name);
 }
 
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 85e0cea4d96..0d76693feec 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -930,11 +930,29 @@ test_expect_success 'maintenance.strategy is respected' '
 		git gc --quiet --no-detach --skip-foreground-tasks
 		EOF
 
-		test_strategy gc --schedule=weekly <<-\EOF
+		test_strategy gc --schedule=weekly <<-\EOF &&
 		git pack-refs --all --prune
 		git reflog expire --all
 		git gc --quiet --no-detach --skip-foreground-tasks
 		EOF
+
+		test_strategy geometric <<-\EOF &&
+		git pack-refs --all --prune
+		git reflog expire --all
+		git repack -d -l --geometric=2 --quiet --write-midx
+		git commit-graph write --split --reachable --no-progress
+		git worktree prune --expire 3.months.ago
+		git rerere gc
+		EOF
+
+		test_strategy geometric --schedule=weekly <<-\EOF
+		git pack-refs --all --prune
+		git reflog expire --all
+		git repack -d -l --geometric=2 --quiet --write-midx
+		git commit-graph write --split --reachable --no-progress
+		git worktree prune --expire 3.months.ago
+		git rerere gc
+		EOF
 	)
 '
 

-- 
2.51.1.930.gacf6e81ea2.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 4/9] builtin/maintenance: make the geometric factor configurable
  2025-10-24  5:45       ` Patrick Steinhardt
@ 2025-10-24 19:02         ` Taylor Blau
  0 siblings, 0 replies; 69+ messages in thread
From: Taylor Blau @ 2025-10-24 19:02 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Derrick Stolee

On Fri, Oct 24, 2025 at 07:45:18AM +0200, Patrick Steinhardt wrote:
> On Thu, Oct 23, 2025 at 03:33:47PM -0400, Taylor Blau wrote:
> > On Tue, Oct 21, 2025 at 04:13:26PM +0200, Patrick Steinhardt wrote:
> > > The geometric repacking task uses a factor of two for its geometric
> > > sequence, meaning that each next pack must contain at least twice as
> > > many objects as the next-smaller one. In some cases it may be helpful to
> > > configure this factor though to reduce the number of packfile merges
> > > even further, e.g. in very big repositories. But while git-repack(1)
> > > itself supports doing this, the maintenance task does not give us a way
> > > to tune it.
> > >
> > > Introduce a new "maintenance.geometric-repack.splitFactor" configuration
> > > to plug this gap.
> >
> > Interesting, this wasn't exactly what I had in my mind when reading the
> > last round, but I think this is worth doing on its own. My apologies for
> > being ambiguous in my earlier message :-s.
> >
> > I was suggesting that we have a repack.geometricFactor configuration
> > variable that defaulted to two, could be overridden by --geometric=<n>,
> > such that we could start doing "git repack --geometric" without having
> > to write "=2" every time.
> >
> > I think that that is probably still a useful thing to do in and of
> > itself, but this change doesn't preclude our ability to do that, since
> > it just overwrites what we pass in to 'git repack' when calling it from
> > within the maintenance context.
>
> Yeah, I understood that suggestion, but I still think that in the
> context of this series here it makes more sense to piggy back onto
> git-maintenance(1) itself so that we're in line with the other tasks
> that we have. All of them are configurable via "maintenance.*.foobar"
> knobs, so I wanted to have the same architecture for the geometric task,
> as well.
>
> But as you say, this doesn't mean that we cannot introduce a config for
> git-repack(1) at a later point in time, and I also believe that this may
> be a useful addition indeed. I guess the order of precedence would be
> that "repack.geometricFactor" is overridden by
> "maintenance.geometric-repack.splitFactor", as the latter is more
> specific.

Yeah, I agree with all of that. From maintenance's perspective, it reads
the value of maintenance.geometric-repack.splitFactor and uses that as a
command-line argument when invoking repack with '--geometric=<n>'

'repack' should of course be oblivious to all of that, and whatever
value it reads from 'repack.geometricFactor' should be the default when
--geometric is passed without a value.

It is a little too bad that we can't say, "all geometric repacking should
use a factor of 3" by default easily, since that would require
maintenance having to read the value of 'repack.geometricFactor' itself,
but I think that's a trade-off that I can live with.

Sounds like this is all good #leftoverbits.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 00/10] builtin/maintenance: introduce "geometric" strategy
  2025-10-24  6:57 ` [PATCH v3 00/10] " Patrick Steinhardt
                     ` (9 preceding siblings ...)
  2025-10-24  6:57   ` [PATCH v3 10/10] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
@ 2025-10-24 19:03   ` Taylor Blau
  2025-10-24 19:11     ` Junio C Hamano
  10 siblings, 1 reply; 69+ messages in thread
From: Taylor Blau @ 2025-10-24 19:03 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Derrick Stolee, Justin Tobler, Junio C Hamano

On Fri, Oct 24, 2025 at 08:57:13AM +0200, Patrick Steinhardt wrote:
> Range-diff versus v2:

The range-diff all looks good to me, so this has my:

    Acked-by: Taylor Blau <me@ttaylorr.com>

Thanks for working on this! I'm excited to have an easier way for more
uses to interact with geometric repacking without having to juggle in
their head when they should use that versus doing an all-into-one
repack.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 00/10] builtin/maintenance: introduce "geometric" strategy
  2025-10-24 19:03   ` [PATCH v3 00/10] " Taylor Blau
@ 2025-10-24 19:11     ` Junio C Hamano
  0 siblings, 0 replies; 69+ messages in thread
From: Junio C Hamano @ 2025-10-24 19:11 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Patrick Steinhardt, git, Derrick Stolee, Justin Tobler

Taylor Blau <me@ttaylorr.com> writes:

> On Fri, Oct 24, 2025 at 08:57:13AM +0200, Patrick Steinhardt wrote:
>> Range-diff versus v2:
>
> The range-diff all looks good to me, so this has my:
>
>     Acked-by: Taylor Blau <me@ttaylorr.com>
>
> Thanks for working on this! I'm excited to have an easier way for more
> uses to interact with geometric repacking without having to juggle in
> their head when they should use that versus doing an all-into-one
> repack.

Thanks, both.  Queued.

Let's mark the topic for 'next'.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 03/10] builtin/maintenance: introduce "geometric-repack" task
  2025-10-24  6:57   ` [PATCH v3 03/10] builtin/maintenance: introduce "geometric-repack" task Patrick Steinhardt
@ 2025-10-25 19:15     ` Jeff King
  2025-10-27  8:24       ` Patrick Steinhardt
  0 siblings, 1 reply; 69+ messages in thread
From: Jeff King @ 2025-10-25 19:15 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Derrick Stolee, Taylor Blau, Justin Tobler, Junio C Hamano

On Fri, Oct 24, 2025 at 08:57:16AM +0200, Patrick Steinhardt wrote:

> +		# Repacking should now cause a no-op geometric repack because
> +		# no packfiles need to be combined.
> +		ls -l .git/objects/pack >before &&
> +		run_and_verify_geometric_pack 1 &&
> +		ls -l .git/objects/pack >after &&
> +		test_cmp before after &&

I got a CI failure from this test like this:

   + diff -u before after
   --- before 2025-10-25 17:51:59.985025237 +0000
   +++ after  2025-10-25 17:52:00.304026445 +0000
   @@ -1,5 +1,5 @@
    total 16
   --rw-rw-r-- 1 builder builder 1252 Oct 25 17:51 multi-pack-index
   +-rw-rw-r-- 1 builder builder 1252 Oct 25 17:52 multi-pack-index
    -r--r--r-- 1 builder builder 1156 Oct 25 17:51 pack-68c20c4590a622a21395b4480621d55494112a83.idx
    -r--r--r-- 1 builder builder  226 Oct 25 17:51 pack-68c20c4590a622a21395b4480621d55494112a83.pack
    -r--r--r-- 1 builder builder   64 Oct 25 17:51 pack-68c20c4590a622a21395b4480621d55494112a83.rev

I'm not sure if this is a bug or a race condition in the test. If
"no-op" means "do not generate a new pack, but do generate a new midx"
then it's a race condition (the regenerated midx might move across the
minute boundary).  If it means "do not even generate a new midx", then
there is a bug. ;)

You can generate the race at will like this:

diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 0d76693fee..2b5141196f 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -501,6 +501,7 @@ test_expect_success 'geometric repacking task' '
 		# Repacking should now cause a no-op geometric repack because
 		# no packfiles need to be combined.
 		ls -l .git/objects/pack >before &&
+		sleep 60 &&
 		run_and_verify_geometric_pack 1 &&
 		ls -l .git/objects/pack >after &&
 		test_cmp before after &&

though if we are going to be picky about timestamps, it probably makes
sense to use a higher resolution. Sadly I don't think there's a portable
way to do that with "ls", and "stat" is probably likewise something we
can't assume. I'd turn to perl, but I know you've been trying to avoid
depending on it. You can hack around it with:

  test-tool chmtime -v +0 .git/objects/pack/*

for this case, I'd think.

-Peff

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 03/10] builtin/maintenance: introduce "geometric-repack" task
  2025-10-25 19:15     ` Jeff King
@ 2025-10-27  8:24       ` Patrick Steinhardt
  2025-10-27 14:25         ` Jeff King
  0 siblings, 1 reply; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-27  8:24 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Derrick Stolee, Taylor Blau, Justin Tobler, Junio C Hamano

On Sat, Oct 25, 2025 at 03:15:50PM -0400, Jeff King wrote:
> On Fri, Oct 24, 2025 at 08:57:16AM +0200, Patrick Steinhardt wrote:
> 
> > +		# Repacking should now cause a no-op geometric repack because
> > +		# no packfiles need to be combined.
> > +		ls -l .git/objects/pack >before &&
> > +		run_and_verify_geometric_pack 1 &&
> > +		ls -l .git/objects/pack >after &&
> > +		test_cmp before after &&
> 
> I got a CI failure from this test like this:
> 
>    + diff -u before after
>    --- before 2025-10-25 17:51:59.985025237 +0000
>    +++ after  2025-10-25 17:52:00.304026445 +0000
>    @@ -1,5 +1,5 @@
>     total 16
>    --rw-rw-r-- 1 builder builder 1252 Oct 25 17:51 multi-pack-index
>    +-rw-rw-r-- 1 builder builder 1252 Oct 25 17:52 multi-pack-index
>     -r--r--r-- 1 builder builder 1156 Oct 25 17:51 pack-68c20c4590a622a21395b4480621d55494112a83.idx
>     -r--r--r-- 1 builder builder  226 Oct 25 17:51 pack-68c20c4590a622a21395b4480621d55494112a83.pack
>     -r--r--r-- 1 builder builder   64 Oct 25 17:51 pack-68c20c4590a622a21395b4480621d55494112a83.rev
> 
> I'm not sure if this is a bug or a race condition in the test. If
> "no-op" means "do not generate a new pack, but do generate a new midx"
> then it's a race condition (the regenerated midx might move across the
> minute boundary).  If it means "do not even generate a new midx", then
> there is a bug. ;)
> 
> You can generate the race at will like this:
> 
> diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
> index 0d76693fee..2b5141196f 100755
> --- a/t/t7900-maintenance.sh
> +++ b/t/t7900-maintenance.sh
> @@ -501,6 +501,7 @@ test_expect_success 'geometric repacking task' '
>  		# Repacking should now cause a no-op geometric repack because
>  		# no packfiles need to be combined.
>  		ls -l .git/objects/pack >before &&
> +		sleep 60 &&
>  		run_and_verify_geometric_pack 1 &&
>  		ls -l .git/objects/pack >after &&
>  		test_cmp before after &&
> 
> though if we are going to be picky about timestamps, it probably makes
> sense to use a higher resolution. Sadly I don't think there's a portable
> way to do that with "ls", and "stat" is probably likewise something we
> can't assume. I'd turn to perl, but I know you've been trying to avoid
> depending on it. You can hack around it with:
> 
>   test-tool chmtime -v +0 .git/objects/pack/*
> 
> for this case, I'd think.

Interesting! I would say that this is an issue in git-repack(1) itself:
if the geometric repack didn't lead to any new packs, and if all of the
packs are already covered by a MIDX, then we still rather pointlessly
regenerate the MIDX even though it won't cover anything new.

I wonder whether we want a patch like the below one? Problem though is
that we'd also have to check whether any of the other options have
changed, otherwise we for example wouldn't generate bitmaps.

In any case though, I feel like this is a bit out of scope for this
patch series. Other strategies that write a MIDX behave the same, so
this is something we can fix later on.

Patrick

diff --git a/repack-midx.c b/repack-midx.c
index 6f6202c5bcc..efa47bb55b5 100644
--- a/repack-midx.c
+++ b/repack-midx.c
@@ -285,6 +285,35 @@ static void remove_redundant_bitmaps(struct string_list *include,
 	strbuf_release(&path);
 }
 
+static bool midx_needs_repack(const struct repack_write_midx_opts *opts,
+			      const struct string_list *include)
+{
+	struct strset set = STRSET_INIT;
+	struct strbuf buf = STRBUF_INIT;
+	bool needs_repack;
+
+	if (opts->existing->midx_packs.nr != include->nr)
+		return true;
+
+	for (size_t i = 0; i < opts->existing->midx_packs.nr; i++) {
+		const char *item = opts->existing->midx_packs.items[i].string;
+
+		strbuf_reset(&buf);
+		strbuf_addstr(&buf, item);
+		strbuf_strip_suffix(&buf, ".pack");
+		strbuf_addstr(&buf, ".idx");
+
+		strset_add(&set, buf.buf);
+	}
+
+	needs_repack = false;
+	for (size_t i = 0; i < include->nr && !needs_repack; i++)
+		needs_repack = !strset_contains(&set, include->items[i].string);
+
+	strset_clear(&set);
+	return needs_repack;
+}
+
 int write_midx_included_packs(struct repack_write_midx_opts *opts)
 {
 	struct child_process cmd = CHILD_PROCESS_INIT;
@@ -295,7 +324,7 @@ int write_midx_included_packs(struct repack_write_midx_opts *opts)
 	int ret = 0;
 
 	midx_included_packs(&include, opts);
-	if (!include.nr)
+	if (!include.nr || !midx_needs_repack(opts, &include))
 		goto done;
 
 	cmd.in = -1;


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 00/10] builtin/maintenance: introduce "geometric" strategy
  2025-10-16  7:26 [PATCH 0/8] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
                   ` (9 preceding siblings ...)
  2025-10-24  6:57 ` [PATCH v3 00/10] " Patrick Steinhardt
@ 2025-10-27  8:30 ` Patrick Steinhardt
  2025-10-27  8:30   ` [PATCH v4 01/10] builtin/gc: remove global `repack` variable Patrick Steinhardt
                     ` (10 more replies)
  10 siblings, 11 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-27  8:30 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau, Justin Tobler, Junio C Hamano

Hi,

by default, git-maintenance(1) uses git-gc(1) to perform repository
housekeeping. This tool has a couple of shortcomings, most importantly
that it regularly does all-into-one repacks. This doesn't really work
all that well in the context of monorepos, where you really want to
avoid repacking all objects regularly.

An alternative maintenance strategy is the "incremental" strategy, but
this strategy has two downsides:

  - Strategies in general only apply to scheduled maintenance. So if you
    run git-maintenance(1), you still end up with git-gc(1).

  - The strategy is designed to not ever delete any data, but a full
    replacment for git-gc(1) needs to also prune reflogs, rereree caches
    and vanished worktrees.

This patch series aims to fix both of these issues.

First, the series introduces a new "geometric" maintenance task, which
makes use of geometric repacking as exposed by git-repack(1) in the
general case. In the case where a geometric repack ends up merging all
packfiles into one we instead do an all-into-one repack with cruft packs
so that we can still phase out objects over time.

Second, the series extends maintenance strategies to also cover normal
maintenance. If the user has configured the "geometric" strategy, we'll
thus use it for both manual and scheduled maintenance. For backwards
compatibility, the "incremental" strategy is changed so that it uses
git-gc(1) for manual maintenance and the other tasks for scheduled
maintenance.

The series is built on top of b660e2dcb9 (Sync with 'maint', 2025-10-14)
with tb/incremental-midx-part-3.1 at c886af90f8 (SQUASH??? play well
with other topics by preemptively including "repository.h", 2025-09-29)
merged into it.

Changes in v4:
  - Fix a flaky test because git-repack(1) always decides to rewrite the
    MIDX, even though no packs have changed. This isn't a new issue, and
    other maintenance tasks behave the same. So I decided to punt on it
    for now.
  - Link to v3: https://lore.kernel.org/r/20251024-pks-maintenance-geometric-strategy-v3-0-9b5b3bdb4387@pks.im

Changes in v3:
  - More line wrapping.
  - Improve readability of maintenance strategies by using nested
    designated initializers.
  - Use git-count-object(1) to count loose objects.
  - Link to v2: https://lore.kernel.org/r/20251021-pks-maintenance-geometric-strategy-v2-0-f0d727832b80@pks.im

Changes in v2:
  - Make the geometric factor configurable via
    "maintenance.geometric-repack.splitFactor".
  - Wrap some overly long lines in our tests.
  - Link to v1: https://lore.kernel.org/r/20251016-pks-maintenance-geometric-strategy-v1-0-18943d474203@pks.im

Thanks!

Patrick

---
Patrick Steinhardt (10):
      builtin/gc: remove global `repack` variable
      builtin/gc: make `too_many_loose_objects()` reusable without GC config
      builtin/maintenance: introduce "geometric-repack" task
      builtin/maintenance: make the geometric factor configurable
      builtin/maintenance: don't silently ignore invalid strategy
      builtin/maintenance: improve readability of strategies
      builtin/maintenance: run maintenance tasks depending on type
      builtin/maintenance: extend "maintenance.strategy" to manual maintenance
      builtin/maintenance: make "gc" strategy accessible
      builtin/maintenance: introduce "geometric" strategy

 Documentation/config/maintenance.adoc |  49 +++++-
 builtin/gc.c                          | 313 ++++++++++++++++++++++++++++------
 t/t7900-maintenance.sh                | 245 ++++++++++++++++++++++++++
 3 files changed, 544 insertions(+), 63 deletions(-)

Range-diff versus v3:

 1:  e1af6298ba2 =  1:  9893494c0f7 builtin/gc: remove global `repack` variable
 2:  3009eb5fa82 =  2:  d9214040c96 builtin/gc: make `too_many_loose_objects()` reusable without GC config
 3:  888f8576a8f !  3:  0aa6444cef2 builtin/maintenance: introduce "geometric-repack" task
    @@ t/t7900-maintenance.sh: test_expect_success 'maintenance.incremental-repack.auto
     +
     +		# Repacking should now cause a no-op geometric repack because
     +		# no packfiles need to be combined.
    -+		ls -l .git/objects/pack >before &&
    ++		ls -l .git/objects/pack/*.pack >before &&
     +		run_and_verify_geometric_pack 1 &&
    -+		ls -l .git/objects/pack >after &&
    ++		ls -l .git/objects/pack/*.pack >after &&
     +		test_cmp before after &&
     +
     +		# This incremental change creates a new packfile that only
 4:  d14b6d9bfc7 =  4:  ccff4aea2fe builtin/maintenance: make the geometric factor configurable
 5:  0b2e1dc2561 =  5:  594ee7d3765 builtin/maintenance: don't silently ignore invalid strategy
 6:  417554d8c89 =  6:  cc5844dd05e builtin/maintenance: improve readability of strategies
 7:  0c7f246a9f3 =  7:  6ce5c3bf93a builtin/maintenance: run maintenance tasks depending on type
 8:  4e01332152f =  8:  bbbaee0d13e builtin/maintenance: extend "maintenance.strategy" to manual maintenance
 9:  cf8adf2e039 =  9:  f4faf84e06e builtin/maintenance: make "gc" strategy accessible
10:  20bfc9802d7 = 10:  fc65b12ed9e builtin/maintenance: introduce "geometric" strategy

---
base-commit: 0bb2c786c2349dd6700727153c13d81cbfb41710
change-id: 20251015-pks-maintenance-geometric-strategy-580c58581b01


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v4 01/10] builtin/gc: remove global `repack` variable
  2025-10-27  8:30 ` [PATCH v4 " Patrick Steinhardt
@ 2025-10-27  8:30   ` Patrick Steinhardt
  2025-10-27  8:30   ` [PATCH v4 02/10] builtin/gc: make `too_many_loose_objects()` reusable without GC config Patrick Steinhardt
                     ` (9 subsequent siblings)
  10 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-27  8:30 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau, Justin Tobler, Junio C Hamano

The global `repack` variable is used to store all command line arguments
that we eventually want to pass to git-repack(1). It is being appended
to from multiple different functions, which makes it hard to follow the
logic. Besides being hard to follow, it also makes it unnecessarily hard
to reuse this infrastructure in new code.

Refactor the code so that we store this variable on the stack and pass
a pointer to it around as needed. This is done so that we can reuse
`add_repack_all_options()` in a subsequent commit.

The refactoring itself is straight-forward. One function that deserves
attention though is `need_to_gc()`: this function determines whether or
not we need to execute garbage collection for `git gc --auto`, but also
for `git maintenance run --auto`. But besides figuring out whether we
have to perform GC, the function also sets up the `repack` arguments.

For `git gc --auto` it's trivial to adapt, as we already have the
on-stack variable at our fingertips. But for the maintenance condition
it's less obvious what to do.

As it turns out, we can just use another temporary variable there that
we then immediately discard. If we need to perform GC we execute a child
git-gc(1) process to repack objects for us, and that process will have
to recompute the arguments anyway.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/gc.c | 74 ++++++++++++++++++++++++++++++++++++------------------------
 1 file changed, 45 insertions(+), 29 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index e19e13d9788..e9772eb3a30 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -55,7 +55,6 @@ static const char * const builtin_gc_usage[] = {
 };
 
 static timestamp_t gc_log_expire_time;
-static struct strvec repack = STRVEC_INIT;
 static struct tempfile *pidfile;
 static struct lock_file log_lock;
 static struct string_list pack_garbage = STRING_LIST_INIT_DUP;
@@ -618,48 +617,50 @@ static uint64_t estimate_repack_memory(struct gc_config *cfg,
 	return os_cache + heap;
 }
 
-static int keep_one_pack(struct string_list_item *item, void *data UNUSED)
+static int keep_one_pack(struct string_list_item *item, void *data)
 {
-	strvec_pushf(&repack, "--keep-pack=%s", basename(item->string));
+	struct strvec *args = data;
+	strvec_pushf(args, "--keep-pack=%s", basename(item->string));
 	return 0;
 }
 
 static void add_repack_all_option(struct gc_config *cfg,
-				  struct string_list *keep_pack)
+				  struct string_list *keep_pack,
+				  struct strvec *args)
 {
 	if (cfg->prune_expire && !strcmp(cfg->prune_expire, "now")
 		&& !(cfg->cruft_packs && cfg->repack_expire_to))
-		strvec_push(&repack, "-a");
+		strvec_push(args, "-a");
 	else if (cfg->cruft_packs) {
-		strvec_push(&repack, "--cruft");
+		strvec_push(args, "--cruft");
 		if (cfg->prune_expire)
-			strvec_pushf(&repack, "--cruft-expiration=%s", cfg->prune_expire);
+			strvec_pushf(args, "--cruft-expiration=%s", cfg->prune_expire);
 		if (cfg->max_cruft_size)
-			strvec_pushf(&repack, "--max-cruft-size=%lu",
+			strvec_pushf(args, "--max-cruft-size=%lu",
 				     cfg->max_cruft_size);
 		if (cfg->repack_expire_to)
-			strvec_pushf(&repack, "--expire-to=%s", cfg->repack_expire_to);
+			strvec_pushf(args, "--expire-to=%s", cfg->repack_expire_to);
 	} else {
-		strvec_push(&repack, "-A");
+		strvec_push(args, "-A");
 		if (cfg->prune_expire)
-			strvec_pushf(&repack, "--unpack-unreachable=%s", cfg->prune_expire);
+			strvec_pushf(args, "--unpack-unreachable=%s", cfg->prune_expire);
 	}
 
 	if (keep_pack)
-		for_each_string_list(keep_pack, keep_one_pack, NULL);
+		for_each_string_list(keep_pack, keep_one_pack, args);
 
 	if (cfg->repack_filter && *cfg->repack_filter)
-		strvec_pushf(&repack, "--filter=%s", cfg->repack_filter);
+		strvec_pushf(args, "--filter=%s", cfg->repack_filter);
 	if (cfg->repack_filter_to && *cfg->repack_filter_to)
-		strvec_pushf(&repack, "--filter-to=%s", cfg->repack_filter_to);
+		strvec_pushf(args, "--filter-to=%s", cfg->repack_filter_to);
 }
 
-static void add_repack_incremental_option(void)
+static void add_repack_incremental_option(struct strvec *args)
 {
-	strvec_push(&repack, "--no-write-bitmap-index");
+	strvec_push(args, "--no-write-bitmap-index");
 }
 
-static int need_to_gc(struct gc_config *cfg)
+static int need_to_gc(struct gc_config *cfg, struct strvec *repack_args)
 {
 	/*
 	 * Setting gc.auto to 0 or negative can disable the
@@ -700,10 +701,10 @@ static int need_to_gc(struct gc_config *cfg)
 				string_list_clear(&keep_pack, 0);
 		}
 
-		add_repack_all_option(cfg, &keep_pack);
+		add_repack_all_option(cfg, &keep_pack, repack_args);
 		string_list_clear(&keep_pack, 0);
 	} else if (too_many_loose_objects(cfg))
-		add_repack_incremental_option();
+		add_repack_incremental_option(repack_args);
 	else
 		return 0;
 
@@ -852,6 +853,7 @@ int cmd_gc(int argc,
 	int keep_largest_pack = -1;
 	int skip_foreground_tasks = 0;
 	timestamp_t dummy;
+	struct strvec repack_args = STRVEC_INIT;
 	struct maintenance_run_opts opts = MAINTENANCE_RUN_OPTS_INIT;
 	struct gc_config cfg = GC_CONFIG_INIT;
 	const char *prune_expire_sentinel = "sentinel";
@@ -891,7 +893,7 @@ int cmd_gc(int argc,
 	show_usage_with_options_if_asked(argc, argv,
 					 builtin_gc_usage, builtin_gc_options);
 
-	strvec_pushl(&repack, "repack", "-d", "-l", NULL);
+	strvec_pushl(&repack_args, "repack", "-d", "-l", NULL);
 
 	gc_config(&cfg);
 
@@ -914,14 +916,14 @@ int cmd_gc(int argc,
 		die(_("failed to parse prune expiry value %s"), cfg.prune_expire);
 
 	if (aggressive) {
-		strvec_push(&repack, "-f");
+		strvec_push(&repack_args, "-f");
 		if (cfg.aggressive_depth > 0)
-			strvec_pushf(&repack, "--depth=%d", cfg.aggressive_depth);
+			strvec_pushf(&repack_args, "--depth=%d", cfg.aggressive_depth);
 		if (cfg.aggressive_window > 0)
-			strvec_pushf(&repack, "--window=%d", cfg.aggressive_window);
+			strvec_pushf(&repack_args, "--window=%d", cfg.aggressive_window);
 	}
 	if (opts.quiet)
-		strvec_push(&repack, "-q");
+		strvec_push(&repack_args, "-q");
 
 	if (opts.auto_flag) {
 		if (cfg.detach_auto && opts.detach < 0)
@@ -930,7 +932,7 @@ int cmd_gc(int argc,
 		/*
 		 * Auto-gc should be least intrusive as possible.
 		 */
-		if (!need_to_gc(&cfg)) {
+		if (!need_to_gc(&cfg, &repack_args)) {
 			ret = 0;
 			goto out;
 		}
@@ -952,7 +954,7 @@ int cmd_gc(int argc,
 			find_base_packs(&keep_pack, cfg.big_pack_threshold);
 		}
 
-		add_repack_all_option(&cfg, &keep_pack);
+		add_repack_all_option(&cfg, &keep_pack, &repack_args);
 		string_list_clear(&keep_pack, 0);
 	}
 
@@ -1014,9 +1016,9 @@ int cmd_gc(int argc,
 
 		repack_cmd.git_cmd = 1;
 		repack_cmd.close_object_store = 1;
-		strvec_pushv(&repack_cmd.args, repack.v);
+		strvec_pushv(&repack_cmd.args, repack_args.v);
 		if (run_command(&repack_cmd))
-			die(FAILED_RUN, repack.v[0]);
+			die(FAILED_RUN, repack_args.v[0]);
 
 		if (cfg.prune_expire) {
 			struct child_process prune_cmd = CHILD_PROCESS_INIT;
@@ -1067,6 +1069,7 @@ int cmd_gc(int argc,
 
 out:
 	maintenance_run_opts_release(&opts);
+	strvec_clear(&repack_args);
 	gc_config_release(&cfg);
 	return 0;
 }
@@ -1269,6 +1272,19 @@ static int maintenance_task_gc_background(struct maintenance_run_opts *opts,
 	return run_command(&child);
 }
 
+static int gc_condition(struct gc_config *cfg)
+{
+	/*
+	 * Note that it's fine to drop the repack arguments here, as we execute
+	 * git-gc(1) as a separate child process anyway. So it knows to compute
+	 * these arguments again.
+	 */
+	struct strvec repack_args = STRVEC_INIT;
+	int ret = need_to_gc(cfg, &repack_args);
+	strvec_clear(&repack_args);
+	return ret;
+}
+
 static int prune_packed(struct maintenance_run_opts *opts)
 {
 	struct child_process child = CHILD_PROCESS_INIT;
@@ -1596,7 +1612,7 @@ static const struct maintenance_task tasks[] = {
 		.name = "gc",
 		.foreground = maintenance_task_gc_foreground,
 		.background = maintenance_task_gc_background,
-		.auto_condition = need_to_gc,
+		.auto_condition = gc_condition,
 	},
 	[TASK_COMMIT_GRAPH] = {
 		.name = "commit-graph",

-- 
2.51.1.930.gacf6e81ea2.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 02/10] builtin/gc: make `too_many_loose_objects()` reusable without GC config
  2025-10-27  8:30 ` [PATCH v4 " Patrick Steinhardt
  2025-10-27  8:30   ` [PATCH v4 01/10] builtin/gc: remove global `repack` variable Patrick Steinhardt
@ 2025-10-27  8:30   ` Patrick Steinhardt
  2025-10-27  8:30   ` [PATCH v4 03/10] builtin/maintenance: introduce "geometric-repack" task Patrick Steinhardt
                     ` (8 subsequent siblings)
  10 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-27  8:30 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau, Justin Tobler, Junio C Hamano

To decide whether or not a repository needs to be repacked we estimate
the number of loose objects. If the number exceeds a certain threshold
we perform the repack, otherwise we don't.

This is done via `too_many_loose_objects()`, which takes as parameter
the `struct gc_config`. This configuration is only used to determine the
threshold. In a subsequent commit we'll add another caller of this
function that wants to pass a different limit than the one stored in
that structure.

Refactor the function accordingly so that we only take the limit as
parameter instead of the whole structure.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/gc.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index e9772eb3a30..026d3a1d714 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -447,7 +447,7 @@ static int rerere_gc_condition(struct gc_config *cfg UNUSED)
 	return should_gc;
 }
 
-static int too_many_loose_objects(struct gc_config *cfg)
+static int too_many_loose_objects(int limit)
 {
 	/*
 	 * Quickly check if a "gc" is needed, by estimating how
@@ -469,7 +469,7 @@ static int too_many_loose_objects(struct gc_config *cfg)
 	if (!dir)
 		return 0;
 
-	auto_threshold = DIV_ROUND_UP(cfg->gc_auto_threshold, 256);
+	auto_threshold = DIV_ROUND_UP(limit, 256);
 	while ((ent = readdir(dir)) != NULL) {
 		if (strspn(ent->d_name, "0123456789abcdef") != hexsz_loose ||
 		    ent->d_name[hexsz_loose] != '\0')
@@ -703,7 +703,7 @@ static int need_to_gc(struct gc_config *cfg, struct strvec *repack_args)
 
 		add_repack_all_option(cfg, &keep_pack, repack_args);
 		string_list_clear(&keep_pack, 0);
-	} else if (too_many_loose_objects(cfg))
+	} else if (too_many_loose_objects(cfg->gc_auto_threshold))
 		add_repack_incremental_option(repack_args);
 	else
 		return 0;
@@ -1057,7 +1057,7 @@ int cmd_gc(int argc,
 					     !opts.quiet && !daemonized ? COMMIT_GRAPH_WRITE_PROGRESS : 0,
 					     NULL);
 
-	if (opts.auto_flag && too_many_loose_objects(&cfg))
+	if (opts.auto_flag && too_many_loose_objects(cfg.gc_auto_threshold))
 		warning(_("There are too many unreachable loose objects; "
 			"run 'git prune' to remove them."));
 

-- 
2.51.1.930.gacf6e81ea2.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 03/10] builtin/maintenance: introduce "geometric-repack" task
  2025-10-27  8:30 ` [PATCH v4 " Patrick Steinhardt
  2025-10-27  8:30   ` [PATCH v4 01/10] builtin/gc: remove global `repack` variable Patrick Steinhardt
  2025-10-27  8:30   ` [PATCH v4 02/10] builtin/gc: make `too_many_loose_objects()` reusable without GC config Patrick Steinhardt
@ 2025-10-27  8:30   ` Patrick Steinhardt
  2025-10-27  8:30   ` [PATCH v4 04/10] builtin/maintenance: make the geometric factor configurable Patrick Steinhardt
                     ` (7 subsequent siblings)
  10 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-27  8:30 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau, Justin Tobler, Junio C Hamano

Introduce a new "geometric-repack" task. This task uses our geometric
repack infrastructure as provided by git-repack(1) itself, which is a
strategy that especially hosting providers tend to use to amortize the
costs of repacking objects.

There is one issue though with geometric repacks, namely that they
unconditionally pack all loose objects, regardless of whether or not
they are reachable. This is done because it means that we can completely
skip the reachability step, which significantly speeds up the operation.
But it has the big downside that we are unable to expire objects over
time.

To address this issue we thus use a split strategy in this new task:
whenever a geometric repack would merge together all packs, we instead
do an all-into-one repack. By default, these all-into-one repacks have
cruft packs enabled, so unreachable objects would now be written into
their own pack. Consequently, they won't be soaked up during geometric
repacking anymore and can be expired with the next full repack, assuming
that their expiry date has surpassed.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/maintenance.adoc |  11 +++
 builtin/gc.c                          | 102 +++++++++++++++++++++++++
 t/t7900-maintenance.sh                | 138 ++++++++++++++++++++++++++++++++++
 3 files changed, 251 insertions(+)

diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc
index 2f719342183..26dc5de423f 100644
--- a/Documentation/config/maintenance.adoc
+++ b/Documentation/config/maintenance.adoc
@@ -75,6 +75,17 @@ maintenance.incremental-repack.auto::
 	number of pack-files not in the multi-pack-index is at least the value
 	of `maintenance.incremental-repack.auto`. The default value is 10.
 
+maintenance.geometric-repack.auto::
+	This integer config option controls how often the `geometric-repack`
+	task should be run as part of `git maintenance run --auto`. If zero,
+	then the `geometric-repack` task will not run with the `--auto`
+	option. A negative value will force the task to run every time.
+	Otherwise, a positive value implies the command should run either when
+	there are packfiles that need to be merged together to retain the
+	geometric progression, or when there are at least this many loose
+	objects that would be written into a new packfile. The default value is
+	100.
+
 maintenance.reflog-expire.auto::
 	This integer config option controls how often the `reflog-expire` task
 	should be run as part of `git maintenance run --auto`. If zero, then
diff --git a/builtin/gc.c b/builtin/gc.c
index 026d3a1d714..2c9ecd464d2 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -34,6 +34,7 @@
 #include "pack-objects.h"
 #include "path.h"
 #include "reflog.h"
+#include "repack.h"
 #include "rerere.h"
 #include "blob.h"
 #include "tree.h"
@@ -254,6 +255,7 @@ enum maintenance_task_label {
 	TASK_PREFETCH,
 	TASK_LOOSE_OBJECTS,
 	TASK_INCREMENTAL_REPACK,
+	TASK_GEOMETRIC_REPACK,
 	TASK_GC,
 	TASK_COMMIT_GRAPH,
 	TASK_PACK_REFS,
@@ -1566,6 +1568,101 @@ static int maintenance_task_incremental_repack(struct maintenance_run_opts *opts
 	return 0;
 }
 
+static int maintenance_task_geometric_repack(struct maintenance_run_opts *opts,
+					     struct gc_config *cfg)
+{
+	struct pack_geometry geometry = {
+		.split_factor = 2,
+	};
+	struct pack_objects_args po_args = {
+		.local = 1,
+	};
+	struct existing_packs existing_packs = EXISTING_PACKS_INIT;
+	struct string_list kept_packs = STRING_LIST_INIT_DUP;
+	struct child_process child = CHILD_PROCESS_INIT;
+	int ret;
+
+	existing_packs.repo = the_repository;
+	existing_packs_collect(&existing_packs, &kept_packs);
+	pack_geometry_init(&geometry, &existing_packs, &po_args);
+	pack_geometry_split(&geometry);
+
+	child.git_cmd = 1;
+
+	strvec_pushl(&child.args, "repack", "-d", "-l", NULL);
+	if (geometry.split < geometry.pack_nr)
+		strvec_push(&child.args, "--geometric=2");
+	else
+		add_repack_all_option(cfg, NULL, &child.args);
+	if (opts->quiet)
+		strvec_push(&child.args, "--quiet");
+	if (the_repository->settings.core_multi_pack_index)
+		strvec_push(&child.args, "--write-midx");
+
+	if (run_command(&child)) {
+		ret = error(_("failed to perform geometric repack"));
+		goto out;
+	}
+
+	ret = 0;
+
+out:
+	existing_packs_release(&existing_packs);
+	pack_geometry_release(&geometry);
+	return ret;
+}
+
+static int geometric_repack_auto_condition(struct gc_config *cfg UNUSED)
+{
+	struct pack_geometry geometry = {
+		.split_factor = 2,
+	};
+	struct pack_objects_args po_args = {
+		.local = 1,
+	};
+	struct existing_packs existing_packs = EXISTING_PACKS_INIT;
+	struct string_list kept_packs = STRING_LIST_INIT_DUP;
+	int auto_value = 100;
+	int ret;
+
+	repo_config_get_int(the_repository, "maintenance.geometric-repack.auto",
+			    &auto_value);
+	if (!auto_value)
+		return 0;
+	if (auto_value < 0)
+		return 1;
+
+	existing_packs.repo = the_repository;
+	existing_packs_collect(&existing_packs, &kept_packs);
+	pack_geometry_init(&geometry, &existing_packs, &po_args);
+	pack_geometry_split(&geometry);
+
+	/*
+	 * When we'd merge at least two packs with one another we always
+	 * perform the repack.
+	 */
+	if (geometry.split) {
+		ret = 1;
+		goto out;
+	}
+
+	/*
+	 * Otherwise, we estimate the number of loose objects to determine
+	 * whether we want to create a new packfile or not.
+	 */
+	if (too_many_loose_objects(auto_value)) {
+		ret = 1;
+		goto out;
+	}
+
+	ret = 0;
+
+out:
+	existing_packs_release(&existing_packs);
+	pack_geometry_release(&geometry);
+	return ret;
+}
+
 typedef int (*maintenance_task_fn)(struct maintenance_run_opts *opts,
 				   struct gc_config *cfg);
 typedef int (*maintenance_auto_fn)(struct gc_config *cfg);
@@ -1608,6 +1705,11 @@ static const struct maintenance_task tasks[] = {
 		.background = maintenance_task_incremental_repack,
 		.auto_condition = incremental_repack_auto_condition,
 	},
+	[TASK_GEOMETRIC_REPACK] = {
+		.name = "geometric-repack",
+		.background = maintenance_task_geometric_repack,
+		.auto_condition = geometric_repack_auto_condition,
+	},
 	[TASK_GC] = {
 		.name = "gc",
 		.foreground = maintenance_task_gc_foreground,
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index ddd273d8dc2..842829879d8 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -465,6 +465,144 @@ test_expect_success 'maintenance.incremental-repack.auto (when config is unset)'
 	)
 '
 
+run_and_verify_geometric_pack () {
+	EXPECTED_PACKS="$1" &&
+
+	# Verify that we perform a geometric repack.
+	rm -f "trace2.txt" &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+		git maintenance run --task=geometric-repack 2>/dev/null &&
+	test_subcommand git repack -d -l --geometric=2 \
+		--quiet --write-midx <trace2.txt &&
+
+	# Verify that the number of packfiles matches our expectation.
+	ls -l .git/objects/pack/*.pack >packfiles &&
+	test_line_count = "$EXPECTED_PACKS" packfiles &&
+
+	# And verify that there are no loose objects anymore.
+	git count-objects -v >count &&
+	test_grep '^count: 0$' count
+}
+
+test_expect_success 'geometric repacking task' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		git config set maintenance.auto false &&
+		test_commit initial &&
+
+		# The initial repack causes an all-into-one repack.
+		GIT_TRACE2_EVENT="$(pwd)/initial-repack.txt" \
+			git maintenance run --task=geometric-repack 2>/dev/null &&
+		test_subcommand git repack -d -l --cruft --cruft-expiration=2.weeks.ago \
+			--quiet --write-midx <initial-repack.txt &&
+
+		# Repacking should now cause a no-op geometric repack because
+		# no packfiles need to be combined.
+		ls -l .git/objects/pack/*.pack >before &&
+		run_and_verify_geometric_pack 1 &&
+		ls -l .git/objects/pack/*.pack >after &&
+		test_cmp before after &&
+
+		# This incremental change creates a new packfile that only
+		# soaks up loose objects. The packfiles are not getting merged
+		# at this point.
+		test_commit loose &&
+		run_and_verify_geometric_pack 2 &&
+
+		# Both packfiles have 3 objects, so the next run would cause us
+		# to merge all packfiles together. This should be turned into
+		# an all-into-one-repack.
+		GIT_TRACE2_EVENT="$(pwd)/all-into-one-repack.txt" \
+			git maintenance run --task=geometric-repack 2>/dev/null &&
+		test_subcommand git repack -d -l --cruft --cruft-expiration=2.weeks.ago \
+			--quiet --write-midx <all-into-one-repack.txt &&
+
+		# The geometric repack soaks up unreachable objects.
+		echo blob-1 | git hash-object -w --stdin -t blob &&
+		run_and_verify_geometric_pack 2 &&
+
+		# A second unreachable object should be written into another packfile.
+		echo blob-2 | git hash-object -w --stdin -t blob &&
+		run_and_verify_geometric_pack 3 &&
+
+		# And these two small packs should now be merged via the
+		# geometric repack. The large packfile should remain intact.
+		run_and_verify_geometric_pack 2 &&
+
+		# If we now add two more objects and repack twice we should
+		# then see another all-into-one repack. This time around
+		# though, as we have unreachable objects, we should also see a
+		# cruft pack.
+		echo blob-3 | git hash-object -w --stdin -t blob &&
+		echo blob-4 | git hash-object -w --stdin -t blob &&
+		run_and_verify_geometric_pack 3 &&
+		GIT_TRACE2_EVENT="$(pwd)/cruft-repack.txt" \
+			git maintenance run --task=geometric-repack 2>/dev/null &&
+		test_subcommand git repack -d -l --cruft --cruft-expiration=2.weeks.ago \
+			--quiet --write-midx <cruft-repack.txt &&
+		ls .git/objects/pack/*.pack >packs &&
+		test_line_count = 2 packs &&
+		ls .git/objects/pack/*.mtimes >cruft &&
+		test_line_count = 1 cruft
+	)
+'
+
+test_geometric_repack_needed () {
+	NEEDED="$1"
+	GEOMETRIC_CONFIG="$2" &&
+	rm -f trace2.txt &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+		git ${GEOMETRIC_CONFIG:+-c maintenance.geometric-repack.$GEOMETRIC_CONFIG} \
+		maintenance run --auto --task=geometric-repack 2>/dev/null &&
+	case "$NEEDED" in
+	true)
+		test_grep "\[\"git\",\"repack\"," trace2.txt;;
+	false)
+		! test_grep "\[\"git\",\"repack\"," trace2.txt;;
+	*)
+		BUG "invalid parameter: $NEEDED";;
+	esac
+}
+
+test_expect_success 'geometric repacking with --auto' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		# An empty repository does not need repacking, except when
+		# explicitly told to do it.
+		test_geometric_repack_needed false &&
+		test_geometric_repack_needed false auto=0 &&
+		test_geometric_repack_needed false auto=1 &&
+		test_geometric_repack_needed true auto=-1 &&
+
+		test_oid_init &&
+
+		# Loose objects cause a repack when crossing the limit. Note
+		# that the number of objects gets extrapolated by having a look
+		# at the "objects/17/" shard.
+		test_commit "$(test_oid blob17_1)" &&
+		test_geometric_repack_needed false &&
+		test_commit "$(test_oid blob17_2)" &&
+		test_geometric_repack_needed false auto=257 &&
+		test_geometric_repack_needed true auto=256 &&
+
+		# Force another repack.
+		test_commit first &&
+		test_commit second &&
+		test_geometric_repack_needed true auto=-1 &&
+
+		# We now have two packfiles that would be merged together. As
+		# such, the repack should always happen unless the user has
+		# disabled the auto task.
+		test_geometric_repack_needed false auto=0 &&
+		test_geometric_repack_needed true auto=9000
+	)
+'
+
 test_expect_success 'pack-refs task' '
 	for n in $(test_seq 1 5)
 	do

-- 
2.51.1.930.gacf6e81ea2.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 04/10] builtin/maintenance: make the geometric factor configurable
  2025-10-27  8:30 ` [PATCH v4 " Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2025-10-27  8:30   ` [PATCH v4 03/10] builtin/maintenance: introduce "geometric-repack" task Patrick Steinhardt
@ 2025-10-27  8:30   ` Patrick Steinhardt
  2025-10-27  8:30   ` [PATCH v4 05/10] builtin/maintenance: don't silently ignore invalid strategy Patrick Steinhardt
                     ` (6 subsequent siblings)
  10 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-27  8:30 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau, Justin Tobler, Junio C Hamano

The geometric repacking task uses a factor of two for its geometric
sequence, meaning that each next pack must contain at least twice as
many objects as the next-smaller one. In some cases it may be helpful to
configure this factor though to reduce the number of packfile merges
even further, e.g. in very big repositories. But while git-repack(1)
itself supports doing this, the maintenance task does not give us a way
to tune it.

Introduce a new "maintenance.geometric-repack.splitFactor" configuration
to plug this gap.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/maintenance.adoc |  5 +++++
 builtin/gc.c                          |  9 ++++++++-
 t/t7900-maintenance.sh                | 32 ++++++++++++++++++++++++++++++++
 3 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc
index 26dc5de423f..45fdafc2c63 100644
--- a/Documentation/config/maintenance.adoc
+++ b/Documentation/config/maintenance.adoc
@@ -86,6 +86,11 @@ maintenance.geometric-repack.auto::
 	objects that would be written into a new packfile. The default value is
 	100.
 
+maintenance.geometric-repack.splitFactor::
+	This integer config option controls the factor used for the geometric
+	sequence. See the `--geometric=` option in linkgit:git-repack[1] for
+	more details. Defaults to `2`.
+
 maintenance.reflog-expire.auto::
 	This integer config option controls how often the `reflog-expire` task
 	should be run as part of `git maintenance run --auto`. If zero, then
diff --git a/builtin/gc.c b/builtin/gc.c
index 2c9ecd464d2..fb1a82e0304 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1582,6 +1582,9 @@ static int maintenance_task_geometric_repack(struct maintenance_run_opts *opts,
 	struct child_process child = CHILD_PROCESS_INIT;
 	int ret;
 
+	repo_config_get_int(the_repository, "maintenance.geometric-repack.splitFactor",
+			    &geometry.split_factor);
+
 	existing_packs.repo = the_repository;
 	existing_packs_collect(&existing_packs, &kept_packs);
 	pack_geometry_init(&geometry, &existing_packs, &po_args);
@@ -1591,7 +1594,8 @@ static int maintenance_task_geometric_repack(struct maintenance_run_opts *opts,
 
 	strvec_pushl(&child.args, "repack", "-d", "-l", NULL);
 	if (geometry.split < geometry.pack_nr)
-		strvec_push(&child.args, "--geometric=2");
+		strvec_pushf(&child.args, "--geometric=%d",
+			     geometry.split_factor);
 	else
 		add_repack_all_option(cfg, NULL, &child.args);
 	if (opts->quiet)
@@ -1632,6 +1636,9 @@ static int geometric_repack_auto_condition(struct gc_config *cfg UNUSED)
 	if (auto_value < 0)
 		return 1;
 
+	repo_config_get_int(the_repository, "maintenance.geometric-repack.splitFactor",
+			    &geometry.split_factor);
+
 	existing_packs.repo = the_repository;
 	existing_packs_collect(&existing_packs, &kept_packs);
 	pack_geometry_init(&geometry, &existing_packs, &po_args);
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 842829879d8..8fda6b1a6f7 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -603,6 +603,38 @@ test_expect_success 'geometric repacking with --auto' '
 	)
 '
 
+test_expect_success 'geometric repacking honors configured split factor' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		git config set maintenance.auto false &&
+
+		# Create three different packs with 9, 2 and 1 object, respectively.
+		# This is done so that only a subset of packs would be merged
+		# together so that we can verify that `git repack` receives the
+		# correct geometric factor.
+		for i in $(test_seq 9)
+		do
+			echo first-$i | git hash-object -w --stdin -t blob || return 1
+		done &&
+		git repack --geometric=2 -d &&
+
+		for i in $(test_seq 2)
+		do
+			echo second-$i | git hash-object -w --stdin -t blob || return 1
+		done &&
+		git repack --geometric=2 -d &&
+
+		echo third | git hash-object -w --stdin -t blob &&
+		git repack --geometric=2 -d &&
+
+		test_geometric_repack_needed false splitFactor=2 &&
+		test_geometric_repack_needed true splitFactor=3 &&
+		test_subcommand git repack -d -l --geometric=3 --quiet --write-midx <trace2.txt
+	)
+'
+
 test_expect_success 'pack-refs task' '
 	for n in $(test_seq 1 5)
 	do

-- 
2.51.1.930.gacf6e81ea2.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 05/10] builtin/maintenance: don't silently ignore invalid strategy
  2025-10-27  8:30 ` [PATCH v4 " Patrick Steinhardt
                     ` (3 preceding siblings ...)
  2025-10-27  8:30   ` [PATCH v4 04/10] builtin/maintenance: make the geometric factor configurable Patrick Steinhardt
@ 2025-10-27  8:30   ` Patrick Steinhardt
  2025-10-27  8:30   ` [PATCH v4 06/10] builtin/maintenance: improve readability of strategies Patrick Steinhardt
                     ` (5 subsequent siblings)
  10 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-27  8:30 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau, Justin Tobler, Junio C Hamano

When parsing maintenance strategies we completely ignore the
user-configured value in case it is unknown to us. This makes it
basically undiscoverable to the user that scheduled maintenance is
devolving into a no-op.

Change this to instead die when seeing an unknown maintenance strategy.
While at it, pull out the parsing logic into a separate function so that
we can reuse it in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/gc.c           | 17 +++++++++++------
 t/t7900-maintenance.sh |  5 +++++
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index fb1a82e0304..726d944d3bd 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1855,6 +1855,13 @@ static const struct maintenance_strategy incremental_strategy = {
 	},
 };
 
+static struct maintenance_strategy parse_maintenance_strategy(const char *name)
+{
+	if (!strcasecmp(name, "incremental"))
+		return incremental_strategy;
+	die(_("unknown maintenance strategy: '%s'"), name);
+}
+
 static void initialize_task_config(struct maintenance_run_opts *opts,
 				   const struct string_list *selected_tasks)
 {
@@ -1890,12 +1897,10 @@ static void initialize_task_config(struct maintenance_run_opts *opts,
 	 * override specific aspects of our strategy.
 	 */
 	if (opts->schedule) {
-		strategy = none_strategy;
-
-		if (!repo_config_get_string_tmp(the_repository, "maintenance.strategy", &config_str)) {
-			if (!strcasecmp(config_str, "incremental"))
-				strategy = incremental_strategy;
-		}
+		if (!repo_config_get_string_tmp(the_repository, "maintenance.strategy", &config_str))
+			strategy = parse_maintenance_strategy(config_str);
+		else
+			strategy = none_strategy;
 	} else {
 		strategy = default_strategy;
 	}
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 8fda6b1a6f7..211350bf54e 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -1263,6 +1263,11 @@ test_expect_success 'fails when running outside of a repository' '
 	nongit test_must_fail git maintenance unregister
 '
 
+test_expect_success 'fails when configured to use an invalid strategy' '
+	test_must_fail git -c maintenance.strategy=invalid maintenance run --schedule=hourly 2>err &&
+	test_grep "unknown maintenance strategy: .invalid." err
+'
+
 test_expect_success 'register and unregister bare repo' '
 	test_when_finished "git config --global --unset-all maintenance.repo || :" &&
 	test_might_fail git config --global --unset-all maintenance.repo &&

-- 
2.51.1.930.gacf6e81ea2.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 06/10] builtin/maintenance: improve readability of strategies
  2025-10-27  8:30 ` [PATCH v4 " Patrick Steinhardt
                     ` (4 preceding siblings ...)
  2025-10-27  8:30   ` [PATCH v4 05/10] builtin/maintenance: don't silently ignore invalid strategy Patrick Steinhardt
@ 2025-10-27  8:30   ` Patrick Steinhardt
  2025-10-27  8:30   ` [PATCH v4 07/10] builtin/maintenance: run maintenance tasks depending on type Patrick Steinhardt
                     ` (4 subsequent siblings)
  10 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-27  8:30 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau, Justin Tobler, Junio C Hamano

Our maintenance strategies are essentially a large array of structures,
where each of the tasks can be enabled and scheduled individually. With
the current layout though all the configuration sits on the same nesting
layer, which makes it a bit hard to discern which initialized fields
belong to what task.

Improve readability of the individual tasks by using nested designated
initializers instead.

Suggested-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/gc.c | 36 +++++++++++++++++++++++++-----------
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index 726d944d3bd..0ba6e59de14 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1835,23 +1835,37 @@ struct maintenance_strategy {
 };
 
 static const struct maintenance_strategy none_strategy = { 0 };
+
 static const struct maintenance_strategy default_strategy = {
 	.tasks = {
-		[TASK_GC].enabled = 1,
+		[TASK_GC] = {
+			.enabled = 1,
+		},
 	},
 };
+
 static const struct maintenance_strategy incremental_strategy = {
 	.tasks = {
-		[TASK_COMMIT_GRAPH].enabled = 1,
-		[TASK_COMMIT_GRAPH].schedule = SCHEDULE_HOURLY,
-		[TASK_PREFETCH].enabled = 1,
-		[TASK_PREFETCH].schedule = SCHEDULE_HOURLY,
-		[TASK_INCREMENTAL_REPACK].enabled = 1,
-		[TASK_INCREMENTAL_REPACK].schedule = SCHEDULE_DAILY,
-		[TASK_LOOSE_OBJECTS].enabled = 1,
-		[TASK_LOOSE_OBJECTS].schedule = SCHEDULE_DAILY,
-		[TASK_PACK_REFS].enabled = 1,
-		[TASK_PACK_REFS].schedule = SCHEDULE_WEEKLY,
+		[TASK_COMMIT_GRAPH] = {
+			.enabled = 1,
+			.schedule = SCHEDULE_HOURLY,
+		},
+		[TASK_PREFETCH] = {
+			.enabled = 1,
+			.schedule = SCHEDULE_HOURLY,
+		},
+		[TASK_INCREMENTAL_REPACK] = {
+			.enabled = 1,
+			.schedule = SCHEDULE_DAILY,
+		},
+		[TASK_LOOSE_OBJECTS] = {
+			.enabled = 1,
+			.schedule = SCHEDULE_DAILY,
+		},
+		[TASK_PACK_REFS] = {
+			.enabled = 1,
+			.schedule = SCHEDULE_WEEKLY,
+		},
 	},
 };
 

-- 
2.51.1.930.gacf6e81ea2.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 07/10] builtin/maintenance: run maintenance tasks depending on type
  2025-10-27  8:30 ` [PATCH v4 " Patrick Steinhardt
                     ` (5 preceding siblings ...)
  2025-10-27  8:30   ` [PATCH v4 06/10] builtin/maintenance: improve readability of strategies Patrick Steinhardt
@ 2025-10-27  8:30   ` Patrick Steinhardt
  2025-10-27  8:30   ` [PATCH v4 08/10] builtin/maintenance: extend "maintenance.strategy" to manual maintenance Patrick Steinhardt
                     ` (3 subsequent siblings)
  10 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-27  8:30 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau, Justin Tobler, Junio C Hamano

We basically have three different ways to execute repository
maintenance:

  1. Manual maintenance via `git maintenance run`.

  2. Automatic maintenance via `git maintenance run --auto`.

  3. Scheduled maintenance via `git maintenance run --schedule=`.

At the moment, maintenance strategies only have an effect for the last
type of maintenance. This is about to change in subsequent commits, but
to do so we need to be able to skip some tasks depending on how exactly
maintenance was invoked.

Introduce a new maintenance type that discern between manual (1 & 2) and
scheduled (3) maintenance. Convert the `enabled` field into a bitset so
that it becomes possible to specifiy which tasks exactly should run in a
specific context.

The types picked for existing strategies match the status quo:

  - The default strategy is only ever executed as part of a manual
    maintenance run. It is not possible to use it for scheduled
    maintenance.

  - The incremental strategy is only ever executed as part of a
    scheduled maintenance run. It is not possible to use it for manual
    maintenance.

The strategies will be tweaked in subsequent commits to make use of this
new infrastructure.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/gc.c | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index 0ba6e59de1..6cc4f98c7a 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1827,9 +1827,16 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts,
 	return result;
 }
 
+enum maintenance_type {
+	/* As invoked via `git maintenance run --schedule=`. */
+	MAINTENANCE_TYPE_SCHEDULED = (1 << 0),
+	/* As invoked via `git maintenance run` and with `--auto`. */
+	MAINTENANCE_TYPE_MANUAL    = (1 << 1),
+};
+
 struct maintenance_strategy {
 	struct {
-		int enabled;
+		unsigned type;
 		enum schedule_priority schedule;
 	} tasks[TASK__COUNT];
 };
@@ -1839,7 +1846,7 @@ static const struct maintenance_strategy none_strategy = { 0 };
 static const struct maintenance_strategy default_strategy = {
 	.tasks = {
 		[TASK_GC] = {
-			.enabled = 1,
+			.type = MAINTENANCE_TYPE_MANUAL,
 		},
 	},
 };
@@ -1847,23 +1854,23 @@ static const struct maintenance_strategy default_strategy = {
 static const struct maintenance_strategy incremental_strategy = {
 	.tasks = {
 		[TASK_COMMIT_GRAPH] = {
-			.enabled = 1,
+			.type = MAINTENANCE_TYPE_SCHEDULED,
 			.schedule = SCHEDULE_HOURLY,
 		},
 		[TASK_PREFETCH] = {
-			.enabled = 1,
+			.type = MAINTENANCE_TYPE_SCHEDULED,
 			.schedule = SCHEDULE_HOURLY,
 		},
 		[TASK_INCREMENTAL_REPACK] = {
-			.enabled = 1,
+			.type = MAINTENANCE_TYPE_SCHEDULED,
 			.schedule = SCHEDULE_DAILY,
 		},
 		[TASK_LOOSE_OBJECTS] = {
-			.enabled = 1,
+			.type = MAINTENANCE_TYPE_SCHEDULED,
 			.schedule = SCHEDULE_DAILY,
 		},
 		[TASK_PACK_REFS] = {
-			.enabled = 1,
+			.type = MAINTENANCE_TYPE_SCHEDULED,
 			.schedule = SCHEDULE_WEEKLY,
 		},
 	},
@@ -1881,6 +1888,7 @@ static void initialize_task_config(struct maintenance_run_opts *opts,
 {
 	struct strbuf config_name = STRBUF_INIT;
 	struct maintenance_strategy strategy;
+	enum maintenance_type type;
 	const char *config_str;
 
 	/*
@@ -1915,8 +1923,10 @@ static void initialize_task_config(struct maintenance_run_opts *opts,
 			strategy = parse_maintenance_strategy(config_str);
 		else
 			strategy = none_strategy;
+		type = MAINTENANCE_TYPE_SCHEDULED;
 	} else {
 		strategy = default_strategy;
+		type = MAINTENANCE_TYPE_MANUAL;
 	}
 
 	for (size_t i = 0; i < TASK__COUNT; i++) {
@@ -1926,8 +1936,8 @@ static void initialize_task_config(struct maintenance_run_opts *opts,
 		strbuf_addf(&config_name, "maintenance.%s.enabled",
 			    tasks[i].name);
 		if (!repo_config_get_bool(the_repository, config_name.buf, &config_value))
-			strategy.tasks[i].enabled = config_value;
-		if (!strategy.tasks[i].enabled)
+			strategy.tasks[i].type = config_value ? type : 0;
+		if (!(strategy.tasks[i].type & type))
 			continue;
 
 		if (opts->schedule) {

-- 
2.51.1.930.gacf6e81ea2.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 08/10] builtin/maintenance: extend "maintenance.strategy" to manual maintenance
  2025-10-27  8:30 ` [PATCH v4 " Patrick Steinhardt
                     ` (6 preceding siblings ...)
  2025-10-27  8:30   ` [PATCH v4 07/10] builtin/maintenance: run maintenance tasks depending on type Patrick Steinhardt
@ 2025-10-27  8:30   ` Patrick Steinhardt
  2025-10-27  8:30   ` [PATCH v4 09/10] builtin/maintenance: make "gc" strategy accessible Patrick Steinhardt
                     ` (2 subsequent siblings)
  10 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-27  8:30 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau, Justin Tobler, Junio C Hamano

The "maintenance.strategy" configuration allows users to configure how
Git is supposed to perform repository maintenance. The idea is that we
provide a set of high-level strategies that may be useful in different
contexts, like for example when handling a large monorepo. Furthermore,
the strategy can be tweaked by the user by overriding specific tasks.

In its current form though, the strategy only applies to scheduled
maintenance. This creates something of a gap, as scheduled and manual
maintenance will now use _different_ strategies as the latter would
continue to use git-gc(1) by default. This makes the strategies way less
useful than they could be on the one hand. But even more importantly,
the two different strategies might clash with one another, where one of
the strategies performs maintenance in such a way that it discards
benefits from the other strategy.

So ideally, it should be possible to pick one strategy that then applies
globally to all the different ways that we perform maintenance. This
doesn't necessarily mean that the strategy always does the _same_ thing
for every maintenance type. But it means that the strategy can configure
the different types to work in tandem with each other.

Change the meaning of "maintenance.strategy" accordingly so that the
strategy is applied to both types, manual and scheduled. As preceding
commits have introduced logic to run maintenance tasks depending on this
type we can tweak strategies so that they perform those tasks depending
on the context.

Note that this raises the question of backwards compatibility: when the
user has configured the "incremental" strategy we would have ignored
that strategy beforehand. Instead, repository maintenance would have
continued to use git-gc(1) by default.

But luckily, we can match that behaviour by:

  - Keeping all current tasks of the incremental strategy as
    `MAINTENANCE_TYPE_SCHEDULED`. This ensures that those tasks will not
    run during manual maintenance.

  - Configuring the "gc" task so that it is invoked during manual
    maintenance.

Like this, the user shouldn't observe any difference in behaviour.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/maintenance.adoc | 22 ++++++++++++-------
 builtin/gc.c                          | 25 +++++++++++++++++-----
 t/t7900-maintenance.sh                | 40 +++++++++++++++++++++++++++++++++++
 3 files changed, 74 insertions(+), 13 deletions(-)

diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc
index 45fdafc2c63..b7e90a71a3d 100644
--- a/Documentation/config/maintenance.adoc
+++ b/Documentation/config/maintenance.adoc
@@ -16,19 +16,25 @@ detach.
 
 maintenance.strategy::
 	This string config option provides a way to specify one of a few
-	recommended schedules for background maintenance. This only affects
-	which tasks are run during `git maintenance run --schedule=X`
-	commands, provided no `--task=<task>` arguments are provided.
-	Further, if a `maintenance.<task>.schedule` config value is set,
-	then that value is used instead of the one provided by
-	`maintenance.strategy`. The possible strategy strings are:
+	recommended strategies for repository maintenance. This affects
+	which tasks are run during `git maintenance run`, provided no
+	`--task=<task>` arguments are provided. This setting impacts manual
+	maintenance, auto-maintenance as well as scheduled maintenance. The
+	tasks that run may be different depending on the maintenance type.
 +
-* `none`: This default setting implies no tasks are run at any schedule.
+The maintenance strategy can be further tweaked by setting
+`maintenance.<task>.enabled` and `maintenance.<task>.schedule`. If set, these
+values are used instead of the defaults provided by `maintenance.strategy`.
++
+The possible strategies are:
++
+* `none`: This strategy implies no tasks are run at all. This is the default
+  strategy for scheduled maintenance.
 * `incremental`: This setting optimizes for performing small maintenance
   activities that do not delete any data. This does not schedule the `gc`
   task, but runs the `prefetch` and `commit-graph` tasks hourly, the
   `loose-objects` and `incremental-repack` tasks daily, and the `pack-refs`
-  task weekly.
+  task weekly. Manual repository maintenance uses the `gc` task.
 
 maintenance.<task>.enabled::
 	This boolean config option controls whether the maintenance task
diff --git a/builtin/gc.c b/builtin/gc.c
index 6cc4f98c7aa..3c0a9a2e5df 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1873,6 +1873,20 @@ static const struct maintenance_strategy incremental_strategy = {
 			.type = MAINTENANCE_TYPE_SCHEDULED,
 			.schedule = SCHEDULE_WEEKLY,
 		},
+		/*
+		 * Historically, the "incremental" strategy was only available
+		 * in the context of scheduled maintenance when set up via
+		 * "maintenance.strategy". We have later expanded that config
+		 * to also cover manual maintenance.
+		 *
+		 * To retain backwards compatibility with the previous status
+		 * quo we thus run git-gc(1) in case manual maintenance was
+		 * requested. This is the same as the default strategy, which
+		 * would have been in use beforehand.
+		 */
+		[TASK_GC] = {
+			.type = MAINTENANCE_TYPE_MANUAL,
+		},
 	},
 };
 
@@ -1916,19 +1930,20 @@ static void initialize_task_config(struct maintenance_run_opts *opts,
 	 *   - Unscheduled maintenance uses our default strategy.
 	 *
 	 * Both of these are affected by the gitconfig though, which may
-	 * override specific aspects of our strategy.
+	 * override specific aspects of our strategy. Furthermore, both
+	 * strategies can be overridden by setting "maintenance.strategy".
 	 */
 	if (opts->schedule) {
-		if (!repo_config_get_string_tmp(the_repository, "maintenance.strategy", &config_str))
-			strategy = parse_maintenance_strategy(config_str);
-		else
-			strategy = none_strategy;
+		strategy = none_strategy;
 		type = MAINTENANCE_TYPE_SCHEDULED;
 	} else {
 		strategy = default_strategy;
 		type = MAINTENANCE_TYPE_MANUAL;
 	}
 
+	if (!repo_config_get_string_tmp(the_repository, "maintenance.strategy", &config_str))
+		strategy = parse_maintenance_strategy(config_str);
+
 	for (size_t i = 0; i < TASK__COUNT; i++) {
 		int config_value;
 
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 211350bf54e..a791a38916e 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -886,6 +886,46 @@ test_expect_success 'maintenance.strategy inheritance' '
 		<modified-daily.txt
 '
 
+test_strategy () {
+	STRATEGY="$1"
+	shift
+
+	cat >expect &&
+	rm -f trace2.txt &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+		git -c maintenance.strategy=$STRATEGY maintenance run --quiet "$@" &&
+	sed -n 's/{"event":"child_start","sid":"[^/"]*",.*,"argv":\["\(.*\)\"]}/\1/p' <trace2.txt |
+		sed 's/","/ /g'  >actual
+	test_cmp expect actual
+}
+
+test_expect_success 'maintenance.strategy is respected' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit initial &&
+
+		test_must_fail git -c maintenance.strategy=unknown maintenance run 2>err &&
+		test_grep "unknown maintenance strategy: .unknown." err &&
+
+		test_strategy incremental <<-\EOF &&
+		git pack-refs --all --prune
+		git reflog expire --all
+		git gc --quiet --no-detach --skip-foreground-tasks
+		EOF
+
+		test_strategy incremental --schedule=weekly <<-\EOF
+		git pack-refs --all --prune
+		git prune-packed --quiet
+		git multi-pack-index write --no-progress
+		git multi-pack-index expire --no-progress
+		git multi-pack-index repack --no-progress --batch-size=1
+		git commit-graph write --split --reachable --no-progress
+		EOF
+	)
+'
+
 test_expect_success 'register and unregister' '
 	test_when_finished git config --global --unset-all maintenance.repo &&
 

-- 
2.51.1.930.gacf6e81ea2.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 09/10] builtin/maintenance: make "gc" strategy accessible
  2025-10-27  8:30 ` [PATCH v4 " Patrick Steinhardt
                     ` (7 preceding siblings ...)
  2025-10-27  8:30   ` [PATCH v4 08/10] builtin/maintenance: extend "maintenance.strategy" to manual maintenance Patrick Steinhardt
@ 2025-10-27  8:30   ` Patrick Steinhardt
  2025-10-27  8:31   ` [PATCH v4 10/10] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
  2025-10-27 15:53   ` [PATCH v4 00/10] " Junio C Hamano
  10 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-27  8:30 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau, Justin Tobler, Junio C Hamano

While the user can pick the "incremental" maintenance strategy, it is
not possible to explicitly use the "gc" strategy. This has two
downsides:

  - It is impossible to use the default "gc" strategy for a specific
    repository when the strategy was globally set to a different strategy.

  - It is not possible to use git-gc(1) for scheduled maintenance.

Address these issues by making making the "gc" strategy configurable.
Furthermore, extend the strategy so that git-gc(1) runs for both manual
and scheduled maintenance.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/maintenance.adoc |  2 ++
 builtin/gc.c                          |  9 ++++++---
 t/t7900-maintenance.sh                | 14 +++++++++++++-
 3 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc
index b7e90a71a3d..b2bacdc8220 100644
--- a/Documentation/config/maintenance.adoc
+++ b/Documentation/config/maintenance.adoc
@@ -30,6 +30,8 @@ The possible strategies are:
 +
 * `none`: This strategy implies no tasks are run at all. This is the default
   strategy for scheduled maintenance.
+* `gc`: This strategy runs the `gc` task. This is the default strategy for
+  manual maintenance.
 * `incremental`: This setting optimizes for performing small maintenance
   activities that do not delete any data. This does not schedule the `gc`
   task, but runs the `prefetch` and `commit-graph` tasks hourly, the
diff --git a/builtin/gc.c b/builtin/gc.c
index 3c0a9a2e5df..8cab1450095 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1843,10 +1843,11 @@ struct maintenance_strategy {
 
 static const struct maintenance_strategy none_strategy = { 0 };
 
-static const struct maintenance_strategy default_strategy = {
+static const struct maintenance_strategy gc_strategy = {
 	.tasks = {
 		[TASK_GC] = {
-			.type = MAINTENANCE_TYPE_MANUAL,
+			.type = MAINTENANCE_TYPE_MANUAL | MAINTENANCE_TYPE_SCHEDULED,
+			.schedule = SCHEDULE_DAILY,
 		},
 	},
 };
@@ -1894,6 +1895,8 @@ static struct maintenance_strategy parse_maintenance_strategy(const char *name)
 {
 	if (!strcasecmp(name, "incremental"))
 		return incremental_strategy;
+	if (!strcasecmp(name, "gc"))
+		return gc_strategy;
 	die(_("unknown maintenance strategy: '%s'"), name);
 }
 
@@ -1937,7 +1940,7 @@ static void initialize_task_config(struct maintenance_run_opts *opts,
 		strategy = none_strategy;
 		type = MAINTENANCE_TYPE_SCHEDULED;
 	} else {
-		strategy = default_strategy;
+		strategy = gc_strategy;
 		type = MAINTENANCE_TYPE_MANUAL;
 	}
 
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index a791a38916e..65417a1e9c3 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -915,7 +915,7 @@ test_expect_success 'maintenance.strategy is respected' '
 		git gc --quiet --no-detach --skip-foreground-tasks
 		EOF
 
-		test_strategy incremental --schedule=weekly <<-\EOF
+		test_strategy incremental --schedule=weekly <<-\EOF &&
 		git pack-refs --all --prune
 		git prune-packed --quiet
 		git multi-pack-index write --no-progress
@@ -923,6 +923,18 @@ test_expect_success 'maintenance.strategy is respected' '
 		git multi-pack-index repack --no-progress --batch-size=1
 		git commit-graph write --split --reachable --no-progress
 		EOF
+
+		test_strategy gc <<-\EOF &&
+		git pack-refs --all --prune
+		git reflog expire --all
+		git gc --quiet --no-detach --skip-foreground-tasks
+		EOF
+
+		test_strategy gc --schedule=weekly <<-\EOF
+		git pack-refs --all --prune
+		git reflog expire --all
+		git gc --quiet --no-detach --skip-foreground-tasks
+		EOF
 	)
 '
 

-- 
2.51.1.930.gacf6e81ea2.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 10/10] builtin/maintenance: introduce "geometric" strategy
  2025-10-27  8:30 ` [PATCH v4 " Patrick Steinhardt
                     ` (8 preceding siblings ...)
  2025-10-27  8:30   ` [PATCH v4 09/10] builtin/maintenance: make "gc" strategy accessible Patrick Steinhardt
@ 2025-10-27  8:31   ` Patrick Steinhardt
  2025-10-27 15:53   ` [PATCH v4 00/10] " Junio C Hamano
  10 siblings, 0 replies; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-27  8:31 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Taylor Blau, Justin Tobler, Junio C Hamano

We have two different repacking strategies in Git:

  - The "gc" strategy uses git-gc(1).

  - The "incremental" strategy uses multi-pack indices and `git
    multi-pack-index repack` to merge together smaller packfiles as
    determined by a specific batch size.

The former strategy is our old and trusted default, whereas the latter
has historically been used for our scheduled maintenance. But both
strategies have their shortcomings:

  - The "gc" strategy performs regular all-into-one repacks. Furthermore
    it is rather inflexible, as it is not easily possible for a user to
    enable or disable specific subtasks.

  - The "incremental" strategy is not a full replacement for the "gc"
    strategy as it doesn't know to prune stale data.

So today, we don't have a strategy that is well-suited for large repos
while being a full replacement for the "gc" strategy.

Introduce a new "geometric" strategy that aims to fill this gap. This
strategy invokes all the usual cleanup tasks that git-gc(1) does like
pruning reflogs and rerere caches as well as stale worktrees. But where
it differs from both the "gc" and "incremental" strategy is that it uses
our geometric repacking infrastructure exposed by git-repack(1) to
repack packfiles. The advantage of geometric repacking is that we only
need to perform an all-into-one repack when the object count in a repo
has grown significantly.

One downside of this strategy is that pruning of unreferenced objects is
not going to happen regularly anymore. Every geometric repack knows to
soak up all loose objects regardless of their reachability, and merging
two or more packs doesn't consider reachability, either. Consequently,
the number of unreachable objects will grow over time.

This is remedied by doing an all-into-one repack instead of a geometric
repack whenever we determine that the geometric repack would end up
merging all packfiles anyway. This all-into-one repack then performs our
usual reachability checks and writes unreachable objects into a cruft
pack. As cruft packs won't ever be merged during geometric repacks we
can thus phase out these objects over time.

Of course, this still means that we retain unreachable objects for far
longer than with the "gc" strategy. But the maintenance strategy is
intended especially for large repositories, where the basic assumption
is that the set of unreachable objects will be significantly dwarfed by
the number of reachable objects.

If this assumption is ever proven to be too disadvantageous we could for
example introduce a time-based strategy: if the largest packfile has not
been touched for longer than $T, we perform an all-into-one repack. But
for now, such a mechanism is deferred into the future as it is not clear
yet whether it is needed in the first place.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/maintenance.adoc |  9 +++++++++
 builtin/gc.c                          | 31 +++++++++++++++++++++++++++++++
 t/t7900-maintenance.sh                | 20 +++++++++++++++++++-
 3 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc
index b2bacdc8220..d0c38f03fab 100644
--- a/Documentation/config/maintenance.adoc
+++ b/Documentation/config/maintenance.adoc
@@ -32,6 +32,15 @@ The possible strategies are:
   strategy for scheduled maintenance.
 * `gc`: This strategy runs the `gc` task. This is the default strategy for
   manual maintenance.
+* `geometric`: This strategy performs geometric repacking of packfiles and
+  keeps auxiliary data structures up-to-date. The strategy expires data in the
+  reflog and removes worktrees that cannot be located anymore. When the
+  geometric repacking strategy would decide to do an all-into-one repack, then
+  the strategy generates a cruft pack for all unreachable objects. Objects that
+  are already part of a cruft pack will be expired.
++
+This repacking strategy is a full replacement for the `gc` strategy and is
+recommended for large repositories.
 * `incremental`: This setting optimizes for performing small maintenance
   activities that do not delete any data. This does not schedule the `gc`
   task, but runs the `prefetch` and `commit-graph` tasks hourly, the
diff --git a/builtin/gc.c b/builtin/gc.c
index 8cab1450095..19be3f87e13 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1891,12 +1891,43 @@ static const struct maintenance_strategy incremental_strategy = {
 	},
 };
 
+static const struct maintenance_strategy geometric_strategy = {
+	.tasks = {
+		[TASK_COMMIT_GRAPH] = {
+			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+			.schedule = SCHEDULE_HOURLY,
+		},
+		[TASK_GEOMETRIC_REPACK] = {
+			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+			.schedule = SCHEDULE_DAILY,
+		},
+		[TASK_PACK_REFS] = {
+			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+			.schedule = SCHEDULE_DAILY,
+		},
+		[TASK_RERERE_GC] = {
+			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+			.schedule = SCHEDULE_WEEKLY,
+		},
+		[TASK_REFLOG_EXPIRE] = {
+			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+			.schedule = SCHEDULE_WEEKLY,
+		},
+		[TASK_WORKTREE_PRUNE] = {
+			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+			.schedule = SCHEDULE_WEEKLY,
+		},
+	},
+};
+
 static struct maintenance_strategy parse_maintenance_strategy(const char *name)
 {
 	if (!strcasecmp(name, "incremental"))
 		return incremental_strategy;
 	if (!strcasecmp(name, "gc"))
 		return gc_strategy;
+	if (!strcasecmp(name, "geometric"))
+		return geometric_strategy;
 	die(_("unknown maintenance strategy: '%s'"), name);
 }
 
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 65417a1e9c3..614184a0978 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -930,11 +930,29 @@ test_expect_success 'maintenance.strategy is respected' '
 		git gc --quiet --no-detach --skip-foreground-tasks
 		EOF
 
-		test_strategy gc --schedule=weekly <<-\EOF
+		test_strategy gc --schedule=weekly <<-\EOF &&
 		git pack-refs --all --prune
 		git reflog expire --all
 		git gc --quiet --no-detach --skip-foreground-tasks
 		EOF
+
+		test_strategy geometric <<-\EOF &&
+		git pack-refs --all --prune
+		git reflog expire --all
+		git repack -d -l --geometric=2 --quiet --write-midx
+		git commit-graph write --split --reachable --no-progress
+		git worktree prune --expire 3.months.ago
+		git rerere gc
+		EOF
+
+		test_strategy geometric --schedule=weekly <<-\EOF
+		git pack-refs --all --prune
+		git reflog expire --all
+		git repack -d -l --geometric=2 --quiet --write-midx
+		git commit-graph write --split --reachable --no-progress
+		git worktree prune --expire 3.months.ago
+		git rerere gc
+		EOF
 	)
 '
 

-- 
2.51.1.930.gacf6e81ea2.dirty


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 03/10] builtin/maintenance: introduce "geometric-repack" task
  2025-10-27  8:24       ` Patrick Steinhardt
@ 2025-10-27 14:25         ` Jeff King
  0 siblings, 0 replies; 69+ messages in thread
From: Jeff King @ 2025-10-27 14:25 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Derrick Stolee, Taylor Blau, Justin Tobler, Junio C Hamano

On Mon, Oct 27, 2025 at 09:24:11AM +0100, Patrick Steinhardt wrote:

> Interesting! I would say that this is an issue in git-repack(1) itself:
> if the geometric repack didn't lead to any new packs, and if all of the
> packs are already covered by a MIDX, then we still rather pointlessly
> regenerate the MIDX even though it won't cover anything new.
> 
> I wonder whether we want a patch like the below one? Problem though is
> that we'd also have to check whether any of the other options have
> changed, otherwise we for example wouldn't generate bitmaps.
> 
> In any case though, I feel like this is a bit out of scope for this
> patch series. Other strategies that write a MIDX behave the same, so
> this is something we can fix later on.

I agree it's out of scope for the series, but the racy test is new. So
we probably at least need to make it un-racy with a comment to address
the root cause later.

The patch you posted looks plausibly correct, though I agree with you
that some thought needs to be given to changing options.

-Peff

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 00/10] builtin/maintenance: introduce "geometric" strategy
  2025-10-27  8:30 ` [PATCH v4 " Patrick Steinhardt
                     ` (9 preceding siblings ...)
  2025-10-27  8:31   ` [PATCH v4 10/10] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
@ 2025-10-27 15:53   ` Junio C Hamano
  2025-10-27 20:05     ` Patrick Steinhardt
  10 siblings, 1 reply; 69+ messages in thread
From: Junio C Hamano @ 2025-10-27 15:53 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Jeff King, Derrick Stolee, Taylor Blau, Justin Tobler

Patrick Steinhardt <ps@pks.im> writes:

> Changes in v4:
>   - Fix a flaky test because git-repack(1) always decides to rewrite the
>     MIDX, even though no packs have changed. This isn't a new issue, and
>     other maintenance tasks behave the same. So I decided to punt on it
>     for now.

Thanks, but this round raced against 'next', so let me fabricate the
following and queue it instead.

------- >8 -------
From: Patrick Steinhardt <ps@pks.im>
Date: Mon, 27 Oct 2025 09:30:50 +0100
Subject: [PATCH] t7900: fix a flaky test due to git-repack always regenerating .midx

When a supposedly no-op "git repack" runs across a second boundary,
because the command always touches the MIDX file and updates its
timestamp, "ls -l $GIT_DIR/objects/pack/" before and after the
operation can change, which causes such a test to fail.  Only
compare the *.pack files in the directory before and after the
operation to work around this flakyness.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
[jc: taken from diff to v4 from v3 that was already merged to 'next']
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/t7900-maintenance.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 0d76693fee..614184a097 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -500,9 +500,9 @@ test_expect_success 'geometric repacking task' '
 
 		# Repacking should now cause a no-op geometric repack because
 		# no packfiles need to be combined.
-		ls -l .git/objects/pack >before &&
+		ls -l .git/objects/pack/*.pack >before &&
 		run_and_verify_geometric_pack 1 &&
-		ls -l .git/objects/pack >after &&
+		ls -l .git/objects/pack/*.pack >after &&
 		test_cmp before after &&
 
 		# This incremental change creates a new packfile that only
-- 
2.51.2-678-g0cd646409c


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 00/10] builtin/maintenance: introduce "geometric" strategy
  2025-10-27 15:53   ` [PATCH v4 00/10] " Junio C Hamano
@ 2025-10-27 20:05     ` Patrick Steinhardt
  2025-10-27 20:58       ` Junio C Hamano
  0 siblings, 1 reply; 69+ messages in thread
From: Patrick Steinhardt @ 2025-10-27 20:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Derrick Stolee, Taylor Blau, Justin Tobler

On Mon, Oct 27, 2025 at 08:53:22AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Changes in v4:
> >   - Fix a flaky test because git-repack(1) always decides to rewrite the
> >     MIDX, even though no packs have changed. This isn't a new issue, and
> >     other maintenance tasks behave the same. So I decided to punt on it
> >     for now.
> 
> Thanks, but this round raced against 'next', so let me fabricate the
> following and queue it instead.

Thanks for doing this!

> ------- >8 -------
> From: Patrick Steinhardt <ps@pks.im>
> Date: Mon, 27 Oct 2025 09:30:50 +0100
> Subject: [PATCH] t7900: fix a flaky test due to git-repack always regenerating .midx

s/.midx/MIDX, as the MIDX file does not have a dot anywhere.

> 
> When a supposedly no-op "git repack" runs across a second boundary,
> because the command always touches the MIDX file and updates its
> timestamp, "ls -l $GIT_DIR/objects/pack/" before and after the
> operation can change, which causes such a test to fail.  Only
> compare the *.pack files in the directory before and after the
> operation to work around this flakyness.

Maybe add something like the following:

    Arguably, git-repack(1) should learn to not rewrite the MIDX in case
    we know it is already up-to-date. But this is not a new problem
    introduced via the new geometric maintenance task, so for now it
    should be good enough to paper over the issue.

But I think this looks good enough already, so please feel free to
ignore. Happy to have my authorship with either of these versions.

Thanks!

Patrick

> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> [jc: taken from diff to v4 from v3 that was already merged to 'next']
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  t/t7900-maintenance.sh | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
> index 0d76693fee..614184a097 100755
> --- a/t/t7900-maintenance.sh
> +++ b/t/t7900-maintenance.sh
> @@ -500,9 +500,9 @@ test_expect_success 'geometric repacking task' '
>  
>  		# Repacking should now cause a no-op geometric repack because
>  		# no packfiles need to be combined.
> -		ls -l .git/objects/pack >before &&
> +		ls -l .git/objects/pack/*.pack >before &&
>  		run_and_verify_geometric_pack 1 &&
> -		ls -l .git/objects/pack >after &&
> +		ls -l .git/objects/pack/*.pack >after &&
>  		test_cmp before after &&
>  
>  		# This incremental change creates a new packfile that only
> -- 
> 2.51.2-678-g0cd646409c
> 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 00/10] builtin/maintenance: introduce "geometric" strategy
  2025-10-27 20:05     ` Patrick Steinhardt
@ 2025-10-27 20:58       ` Junio C Hamano
  0 siblings, 0 replies; 69+ messages in thread
From: Junio C Hamano @ 2025-10-27 20:58 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Jeff King, Derrick Stolee, Taylor Blau, Justin Tobler

Patrick Steinhardt <ps@pks.im> writes:

> On Mon, Oct 27, 2025 at 08:53:22AM -0700, Junio C Hamano wrote:
>> Patrick Steinhardt <ps@pks.im> writes:
>> 
>> > Changes in v4:
>> >   - Fix a flaky test because git-repack(1) always decides to rewrite the
>> >     MIDX, even though no packs have changed. This isn't a new issue, and
>> >     other maintenance tasks behave the same. So I decided to punt on it
>> >     for now.
>> 
>> Thanks, but this round raced against 'next', so let me fabricate the
>> following and queue it instead.
>
> Thanks for doing this!
>
>> ------- >8 -------
>> From: Patrick Steinhardt <ps@pks.im>
>> Date: Mon, 27 Oct 2025 09:30:50 +0100
>> Subject: [PATCH] t7900: fix a flaky test due to git-repack always regenerating .midx
>
> s/.midx/MIDX, as the MIDX file does not have a dot anywhere.
>
>> 
>> When a supposedly no-op "git repack" runs across a second boundary,
>> because the command always touches the MIDX file and updates its
>> timestamp, "ls -l $GIT_DIR/objects/pack/" before and after the
>> operation can change, which causes such a test to fail.  Only
>> compare the *.pack files in the directory before and after the
>> operation to work around this flakyness.
>
> Maybe add something like the following:
>
>     Arguably, git-repack(1) should learn to not rewrite the MIDX in case
>     we know it is already up-to-date. But this is not a new problem
>     introduced via the new geometric maintenance task, so for now it
>     should be good enough to paper over the issue.

Will add.  Let me mark it for 'next' after squashing it in.

^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, other threads:[~2025-10-27 20:58 UTC | newest]

Thread overview: 69+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-16  7:26 [PATCH 0/8] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
2025-10-16  7:26 ` [PATCH 1/8] builtin/gc: remove global `repack` variable Patrick Steinhardt
2025-10-16 20:07   ` Justin Tobler
2025-10-17 20:58   ` Taylor Blau
2025-10-16  7:26 ` [PATCH 2/8] builtin/gc: make `too_many_loose_objects()` reusable without GC config Patrick Steinhardt
2025-10-16 20:59   ` Junio C Hamano
2025-10-16  7:26 ` [PATCH 3/8] builtin/maintenance: introduce "geometric-repack" task Patrick Steinhardt
2025-10-16 20:51   ` Justin Tobler
2025-10-17  6:13     ` Patrick Steinhardt
2025-10-17 22:28   ` Taylor Blau
2025-10-21 13:00     ` Patrick Steinhardt
2025-10-23 19:19       ` Taylor Blau
2025-10-24  5:44         ` Patrick Steinhardt
2025-10-16  7:26 ` [PATCH 4/8] builtin/maintenance: don't silently ignore invalid strategy Patrick Steinhardt
2025-10-16  7:26 ` [PATCH 5/8] builtin/maintenance: run maintenance tasks depending on type Patrick Steinhardt
2025-10-16  7:26 ` [PATCH 6/8] builtin/maintenance: extend "maintenance.strategy" to manual maintenance Patrick Steinhardt
2025-10-16  7:26 ` [PATCH 7/8] builtin/maintenance: make "gc" strategy accessible Patrick Steinhardt
2025-10-16  7:26 ` [PATCH 8/8] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
2025-10-21 14:13 ` [PATCH v2 0/9] " Patrick Steinhardt
2025-10-21 14:13   ` [PATCH v2 1/9] builtin/gc: remove global `repack` variable Patrick Steinhardt
2025-10-21 14:13   ` [PATCH v2 2/9] builtin/gc: make `too_many_loose_objects()` reusable without GC config Patrick Steinhardt
2025-10-21 14:13   ` [PATCH v2 3/9] builtin/maintenance: introduce "geometric-repack" task Patrick Steinhardt
2025-10-23 19:29     ` Taylor Blau
2025-10-24  5:45       ` Patrick Steinhardt
2025-10-21 14:13   ` [PATCH v2 4/9] builtin/maintenance: make the geometric factor configurable Patrick Steinhardt
2025-10-23 19:33     ` Taylor Blau
2025-10-24  5:45       ` Patrick Steinhardt
2025-10-24 19:02         ` Taylor Blau
2025-10-21 14:13   ` [PATCH v2 5/9] builtin/maintenance: don't silently ignore invalid strategy Patrick Steinhardt
2025-10-23 21:31     ` Taylor Blau
2025-10-21 14:13   ` [PATCH v2 6/9] builtin/maintenance: run maintenance tasks depending on type Patrick Steinhardt
2025-10-23 21:34     ` Taylor Blau
2025-10-21 14:13   ` [PATCH v2 7/9] builtin/maintenance: extend "maintenance.strategy" to manual maintenance Patrick Steinhardt
2025-10-21 14:13   ` [PATCH v2 8/9] builtin/maintenance: make "gc" strategy accessible Patrick Steinhardt
2025-10-21 14:13   ` [PATCH v2 9/9] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
2025-10-23 21:49     ` Taylor Blau
2025-10-24  5:45       ` Patrick Steinhardt
2025-10-23 16:48   ` [PATCH v2 0/9] " Junio C Hamano
2025-10-23 21:50     ` Taylor Blau
2025-10-24  6:57 ` [PATCH v3 00/10] " Patrick Steinhardt
2025-10-24  6:57   ` [PATCH v3 01/10] builtin/gc: remove global `repack` variable Patrick Steinhardt
2025-10-24  6:57   ` [PATCH v3 02/10] builtin/gc: make `too_many_loose_objects()` reusable without GC config Patrick Steinhardt
2025-10-24  6:57   ` [PATCH v3 03/10] builtin/maintenance: introduce "geometric-repack" task Patrick Steinhardt
2025-10-25 19:15     ` Jeff King
2025-10-27  8:24       ` Patrick Steinhardt
2025-10-27 14:25         ` Jeff King
2025-10-24  6:57   ` [PATCH v3 04/10] builtin/maintenance: make the geometric factor configurable Patrick Steinhardt
2025-10-24  6:57   ` [PATCH v3 05/10] builtin/maintenance: don't silently ignore invalid strategy Patrick Steinhardt
2025-10-24  6:57   ` [PATCH v3 06/10] builtin/maintenance: improve readability of strategies Patrick Steinhardt
2025-10-24  6:57   ` [PATCH v3 07/10] builtin/maintenance: run maintenance tasks depending on type Patrick Steinhardt
2025-10-24  6:57   ` [PATCH v3 08/10] builtin/maintenance: extend "maintenance.strategy" to manual maintenance Patrick Steinhardt
2025-10-24  6:57   ` [PATCH v3 09/10] builtin/maintenance: make "gc" strategy accessible Patrick Steinhardt
2025-10-24  6:57   ` [PATCH v3 10/10] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
2025-10-24 19:03   ` [PATCH v3 00/10] " Taylor Blau
2025-10-24 19:11     ` Junio C Hamano
2025-10-27  8:30 ` [PATCH v4 " Patrick Steinhardt
2025-10-27  8:30   ` [PATCH v4 01/10] builtin/gc: remove global `repack` variable Patrick Steinhardt
2025-10-27  8:30   ` [PATCH v4 02/10] builtin/gc: make `too_many_loose_objects()` reusable without GC config Patrick Steinhardt
2025-10-27  8:30   ` [PATCH v4 03/10] builtin/maintenance: introduce "geometric-repack" task Patrick Steinhardt
2025-10-27  8:30   ` [PATCH v4 04/10] builtin/maintenance: make the geometric factor configurable Patrick Steinhardt
2025-10-27  8:30   ` [PATCH v4 05/10] builtin/maintenance: don't silently ignore invalid strategy Patrick Steinhardt
2025-10-27  8:30   ` [PATCH v4 06/10] builtin/maintenance: improve readability of strategies Patrick Steinhardt
2025-10-27  8:30   ` [PATCH v4 07/10] builtin/maintenance: run maintenance tasks depending on type Patrick Steinhardt
2025-10-27  8:30   ` [PATCH v4 08/10] builtin/maintenance: extend "maintenance.strategy" to manual maintenance Patrick Steinhardt
2025-10-27  8:30   ` [PATCH v4 09/10] builtin/maintenance: make "gc" strategy accessible Patrick Steinhardt
2025-10-27  8:31   ` [PATCH v4 10/10] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
2025-10-27 15:53   ` [PATCH v4 00/10] " Junio C Hamano
2025-10-27 20:05     ` Patrick Steinhardt
2025-10-27 20:58       ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).