From: Taylor Blau <me@ttaylorr.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH v2 9/9] builtin/maintenance: introduce "geometric" strategy
Date: Thu, 23 Oct 2025 17:49:35 -0400 [thread overview]
Message-ID: <aPqi71+zRE2KsnCk@nand.local> (raw)
In-Reply-To: <20251021-pks-maintenance-geometric-strategy-v2-9-f0d727832b80@pks.im>
On Tue, Oct 21, 2025 at 04:13:31PM +0200, Patrick Steinhardt wrote:
> We have two different repacking strategies in Git:
>
> - The "gc" strategy uses git-gc(1).
>
> - The "incremental" strategy uses multi-pack indices and `git
> multi-pack-index repack` to merge together smaller packfiles as
> determined by a specific batch size.
>
> The former strategy is our old and trusted default, whereas the latter
> has historically been used for our scheduled maintenance. But both
> strategies have their shortcomings:
>
> - The "gc" strategy performs regular all-into-one repacks. Furthermore
> it is rather inflexible, as it is not easily possible for a user to
> enable or disable specific subtasks.
>
> - The "incremental" strategy is not a full replacement for the "gc"
> strategy as it doesn't know to prune stale data.
>
> So today, we don't have a strategy that is well-suited for large repos
> while being a full replacement for the "gc" strategy.
Well put.
> Introduce a new "geometric" strategy that aims to fill this gap. This
> strategy invokes all the usual cleanup tasks that git-gc(1) does like
> pruning reflogs and rerere caches as well as stale worktrees. But where
> it differs from both the "gc" and "incremental" strategy is that it uses
> our geometric repacking infrastructure exposed by git-repack(1) to
> repack packfiles. The advantage of geometric repacking is that we only
> need to perform an all-into-one repack when the object count in a repo
> has grown significantly.
>
> One downside of this strategy is that pruning of unreferenced objects is
> not going to happen regularly anymore. Every geometric repack knows to
> soak up all loose objects regardless of their reachability, and merging
> two or more packs doesn't consider reachability, either. Consequently,
> the number of unreachable objects will grow over time.
>
> This is remedied by doing an all-into-one repack instead of a geometric
> repack whenever we determine that the geometric repack would end up
> merging all packfiles anyway. This all-into-one repack then performs our
> usual reachability checks and writes unreachable objects into a cruft
> pack. As cruft packs won't ever be merged during geometric repacks we
> can thus phase out these objects over time.
>
> Of course, this still means that we retain unreachable objects for far
> longer than with the "gc" strategy. But the maintenance strategy is
> intended especially for large repositories, where the basic assumption
> is that the set of unreachable objects will be significantly dwarfed by
> the number of reachable objects.
>
> If this assumption is ever proven to be too disadvantageous we could for
> example introduce a time-based strategy: if the largest packfile has not
> been touched for longer than $T, we perform an all-into-one repack. But
> for now, such a mechanism is deferred into the future as it is not clear
> yet whether it is needed in the first place.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> Documentation/config/maintenance.adoc | 9 +++++++++
> builtin/gc.c | 19 +++++++++++++++++++
> t/t7900-maintenance.sh | 20 +++++++++++++++++++-
> 3 files changed, 47 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc
> index b2bacdc822..d0c38f03fa 100644
> --- a/Documentation/config/maintenance.adoc
> +++ b/Documentation/config/maintenance.adoc
> @@ -32,6 +32,15 @@ The possible strategies are:
> strategy for scheduled maintenance.
> * `gc`: This strategy runs the `gc` task. This is the default strategy for
> manual maintenance.
> +* `geometric`: This strategy performs geometric repacking of packfiles and
> + keeps auxiliary data structures up-to-date. The strategy expires data in the
> + reflog and removes worktrees that cannot be located anymore. When the
> + geometric repacking strategy would decide to do an all-into-one repack, then
> + the strategy generates a cruft pack for all unreachable objects. Objects that
> + are already part of a cruft pack will be expired.
> ++
> +This repacking strategy is a full replacement for the `gc` strategy and is
> +recommended for large repositories.
Nice. I always feel like it's tricky for changes like these to know
where the right place is to draw the line between "this belongs in the
commit message, because it will be useful to reviewers in understanding
how I came to this patch" versus "this belongs in the documentation for
my new feature, because it will be useful to users trying to figure out
what option to use".
I like the spot where you drew that line here and I think that the
patch message has details that are useful to reviewers (like the
historical differences between all-into-one strategies and geometric
repacking), but that the documentation has the right user-facing details
and explanations.
> diff --git a/builtin/gc.c b/builtin/gc.c
> index aaff0bae15..9739bb0ea2 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -1878,12 +1878,31 @@ static const struct maintenance_strategy incremental_strategy = {
> },
> };
>
> +static const struct maintenance_strategy geometric_strategy = {
> + .tasks = {
> + [TASK_COMMIT_GRAPH].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
> + [TASK_COMMIT_GRAPH].schedule = SCHEDULE_HOURLY,
> + [TASK_GEOMETRIC_REPACK].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
> + [TASK_GEOMETRIC_REPACK].schedule = SCHEDULE_DAILY,
> + [TASK_PACK_REFS].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
> + [TASK_PACK_REFS].schedule = SCHEDULE_DAILY,
> + [TASK_RERERE_GC].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
> + [TASK_RERERE_GC].schedule = SCHEDULE_WEEKLY,
> + [TASK_REFLOG_EXPIRE].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
> + [TASK_REFLOG_EXPIRE].schedule = SCHEDULE_WEEKLY,
> + [TASK_WORKTREE_PRUNE].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
> + [TASK_WORKTREE_PRUNE].schedule = SCHEDULE_WEEKLY,
> + },
> +};
> +
What you wrote here all makes sense to me, so I don't have any comments
on the technical content of 'geometric_strategy'.
As an aside, I wonder if we should use a nested designated initializer
here? It seems a little cleaner to me than doing:
.tasks = {
[TASK_FOO].type = ...,
[TASK_FOO].schedule = ...,
}
It's inconsistent with the style of the rest of this file, so if you did
make this change I'd suggest adding a prerequisite change that modifies
existing strategies to match the new style. But you could imagine
something like the following on top:
--- 8< ---
diff --git a/builtin/gc.c b/builtin/gc.c
index 9739bb0ea2..881ef6ad88 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1880,18 +1880,30 @@ static const struct maintenance_strategy incremental_strategy = {
static const struct maintenance_strategy geometric_strategy = {
.tasks = {
- [TASK_COMMIT_GRAPH].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
- [TASK_COMMIT_GRAPH].schedule = SCHEDULE_HOURLY,
- [TASK_GEOMETRIC_REPACK].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
- [TASK_GEOMETRIC_REPACK].schedule = SCHEDULE_DAILY,
- [TASK_PACK_REFS].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
- [TASK_PACK_REFS].schedule = SCHEDULE_DAILY,
- [TASK_RERERE_GC].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
- [TASK_RERERE_GC].schedule = SCHEDULE_WEEKLY,
- [TASK_REFLOG_EXPIRE].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
- [TASK_REFLOG_EXPIRE].schedule = SCHEDULE_WEEKLY,
- [TASK_WORKTREE_PRUNE].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
- [TASK_WORKTREE_PRUNE].schedule = SCHEDULE_WEEKLY,
+ [TASK_COMMIT_GRAPH] = {
+ .type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+ .schedule = SCHEDULE_HOURLY,
+ },
+ [TASK_GEOMETRIC_REPACK] = {
+ .type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+ .schedule = SCHEDULE_DAILY,
+ },
+ [TASK_PACK_REFS] = {
+ .type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+ .schedule = SCHEDULE_DAILY,
+ },
+ [TASK_RERERE_GC] = {
+ .type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+ .schedule = SCHEDULE_WEEKLY,
+ },
+ [TASK_REFLOG_EXPIRE] = {
+ .type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+ .schedule = SCHEDULE_WEEKLY,
+ },
+ [TASK_WORKTREE_PRUNE] = {
+ .type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
+ .schedule = SCHEDULE_WEEKLY,
+ },
},
};
--- >8 ---
The rest all looks good to me.
Thanks,
Taylor
next prev parent reply other threads:[~2025-10-23 21:49 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-16 7:26 [PATCH 0/8] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
2025-10-16 7:26 ` [PATCH 1/8] builtin/gc: remove global `repack` variable Patrick Steinhardt
2025-10-16 20:07 ` Justin Tobler
2025-10-17 20:58 ` Taylor Blau
2025-10-16 7:26 ` [PATCH 2/8] builtin/gc: make `too_many_loose_objects()` reusable without GC config Patrick Steinhardt
2025-10-16 20:59 ` Junio C Hamano
2025-10-16 7:26 ` [PATCH 3/8] builtin/maintenance: introduce "geometric-repack" task Patrick Steinhardt
2025-10-16 20:51 ` Justin Tobler
2025-10-17 6:13 ` Patrick Steinhardt
2025-10-17 22:28 ` Taylor Blau
2025-10-21 13:00 ` Patrick Steinhardt
2025-10-23 19:19 ` Taylor Blau
2025-10-24 5:44 ` Patrick Steinhardt
2025-10-16 7:26 ` [PATCH 4/8] builtin/maintenance: don't silently ignore invalid strategy Patrick Steinhardt
2025-10-16 7:26 ` [PATCH 5/8] builtin/maintenance: run maintenance tasks depending on type Patrick Steinhardt
2025-10-16 7:26 ` [PATCH 6/8] builtin/maintenance: extend "maintenance.strategy" to manual maintenance Patrick Steinhardt
2025-10-16 7:26 ` [PATCH 7/8] builtin/maintenance: make "gc" strategy accessible Patrick Steinhardt
2025-10-16 7:26 ` [PATCH 8/8] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
2025-10-21 14:13 ` [PATCH v2 0/9] " Patrick Steinhardt
2025-10-21 14:13 ` [PATCH v2 1/9] builtin/gc: remove global `repack` variable Patrick Steinhardt
2025-10-21 14:13 ` [PATCH v2 2/9] builtin/gc: make `too_many_loose_objects()` reusable without GC config Patrick Steinhardt
2025-10-21 14:13 ` [PATCH v2 3/9] builtin/maintenance: introduce "geometric-repack" task Patrick Steinhardt
2025-10-23 19:29 ` Taylor Blau
2025-10-24 5:45 ` Patrick Steinhardt
2025-10-21 14:13 ` [PATCH v2 4/9] builtin/maintenance: make the geometric factor configurable Patrick Steinhardt
2025-10-23 19:33 ` Taylor Blau
2025-10-24 5:45 ` Patrick Steinhardt
2025-10-24 19:02 ` Taylor Blau
2025-10-21 14:13 ` [PATCH v2 5/9] builtin/maintenance: don't silently ignore invalid strategy Patrick Steinhardt
2025-10-23 21:31 ` Taylor Blau
2025-10-21 14:13 ` [PATCH v2 6/9] builtin/maintenance: run maintenance tasks depending on type Patrick Steinhardt
2025-10-23 21:34 ` Taylor Blau
2025-10-21 14:13 ` [PATCH v2 7/9] builtin/maintenance: extend "maintenance.strategy" to manual maintenance Patrick Steinhardt
2025-10-21 14:13 ` [PATCH v2 8/9] builtin/maintenance: make "gc" strategy accessible Patrick Steinhardt
2025-10-21 14:13 ` [PATCH v2 9/9] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
2025-10-23 21:49 ` Taylor Blau [this message]
2025-10-24 5:45 ` Patrick Steinhardt
2025-10-23 16:48 ` [PATCH v2 0/9] " Junio C Hamano
2025-10-23 21:50 ` Taylor Blau
2025-10-24 6:57 ` [PATCH v3 00/10] " Patrick Steinhardt
2025-10-24 6:57 ` [PATCH v3 01/10] builtin/gc: remove global `repack` variable Patrick Steinhardt
2025-10-24 6:57 ` [PATCH v3 02/10] builtin/gc: make `too_many_loose_objects()` reusable without GC config Patrick Steinhardt
2025-10-24 6:57 ` [PATCH v3 03/10] builtin/maintenance: introduce "geometric-repack" task Patrick Steinhardt
2025-10-25 19:15 ` Jeff King
2025-10-27 8:24 ` Patrick Steinhardt
2025-10-27 14:25 ` Jeff King
2025-10-24 6:57 ` [PATCH v3 04/10] builtin/maintenance: make the geometric factor configurable Patrick Steinhardt
2025-10-24 6:57 ` [PATCH v3 05/10] builtin/maintenance: don't silently ignore invalid strategy Patrick Steinhardt
2025-10-24 6:57 ` [PATCH v3 06/10] builtin/maintenance: improve readability of strategies Patrick Steinhardt
2025-10-24 6:57 ` [PATCH v3 07/10] builtin/maintenance: run maintenance tasks depending on type Patrick Steinhardt
2025-10-24 6:57 ` [PATCH v3 08/10] builtin/maintenance: extend "maintenance.strategy" to manual maintenance Patrick Steinhardt
2025-10-24 6:57 ` [PATCH v3 09/10] builtin/maintenance: make "gc" strategy accessible Patrick Steinhardt
2025-10-24 6:57 ` [PATCH v3 10/10] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
2025-10-24 19:03 ` [PATCH v3 00/10] " Taylor Blau
2025-10-24 19:11 ` Junio C Hamano
2025-10-27 8:30 ` [PATCH v4 " Patrick Steinhardt
2025-10-27 8:30 ` [PATCH v4 01/10] builtin/gc: remove global `repack` variable Patrick Steinhardt
2025-10-27 8:30 ` [PATCH v4 02/10] builtin/gc: make `too_many_loose_objects()` reusable without GC config Patrick Steinhardt
2025-10-27 8:30 ` [PATCH v4 03/10] builtin/maintenance: introduce "geometric-repack" task Patrick Steinhardt
2025-10-27 8:30 ` [PATCH v4 04/10] builtin/maintenance: make the geometric factor configurable Patrick Steinhardt
2025-10-27 8:30 ` [PATCH v4 05/10] builtin/maintenance: don't silently ignore invalid strategy Patrick Steinhardt
2025-10-27 8:30 ` [PATCH v4 06/10] builtin/maintenance: improve readability of strategies Patrick Steinhardt
2025-10-27 8:30 ` [PATCH v4 07/10] builtin/maintenance: run maintenance tasks depending on type Patrick Steinhardt
2025-10-27 8:30 ` [PATCH v4 08/10] builtin/maintenance: extend "maintenance.strategy" to manual maintenance Patrick Steinhardt
2025-10-27 8:30 ` [PATCH v4 09/10] builtin/maintenance: make "gc" strategy accessible Patrick Steinhardt
2025-10-27 8:31 ` [PATCH v4 10/10] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
2025-10-27 15:53 ` [PATCH v4 00/10] " Junio C Hamano
2025-10-27 20:05 ` Patrick Steinhardt
2025-10-27 20:58 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aPqi71+zRE2KsnCk@nand.local \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=ps@pks.im \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).