git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Justin Tobler <jltobler@gmail.com>
Cc: git@vger.kernel.org, Derrick Stolee <stolee@gmail.com>,
	Taylor Blau <me@ttaylorr.com>
Subject: Re: [PATCH 3/8] builtin/maintenance: introduce "geometric-repack" task
Date: Fri, 17 Oct 2025 08:13:18 +0200	[thread overview]
Message-ID: <aPHeflRSMpoRukta@pks.im> (raw)
In-Reply-To: <uos7cczvzlrwjgcyhzyfirck62qjnb4zcoy6ga2o3pbnba7cfj@ag2pnonze5tu>

On Thu, Oct 16, 2025 at 03:51:17PM -0500, Justin Tobler wrote:
> On 25/10/16 09:26AM, Patrick Steinhardt wrote:
> > Introduce a new "geometric-repack" task. This task uses our geometric
> > repack infrastructure as provided by git-repack(1) itself, which is a
> > strategy that especially hosting providers tend to use to amortize the
> > costs of repacking objects.
> > 
> > There is one issue though with geometric repacks, namely that they
> > unconditionally pack all loose objects, regardless of whether or not
> > they are reachable. This is done because it means that we can completely
> > skip the reachability step, which significantly speeds up the operation.
> > But it has the big downside that we are unable to expire objects over
> > time.
> > 
> > To address this issue we thus use a split strategy in this new task:
> > whenever a geometric repack would merge together all packs, we instead
> > do an all-into-one repack. By default, these all-into-one repacks have
> > cruft packs enabled, so unreachable objects would now be written into
> > their own pack. Consequently, they won't be soaked up during geometric
> > repacking anymore and can be expired with the next full repack, assuming
> > that their expiry date has surpassed.
> 
> So normal geometric repacks don't ever check for unreachable objects,
> even if all the packs are being merged together. With this new strategy
> though, when a geometric repack would normally merge together all packs,
> we instead to an all-into-one repack which does check for unreachable
> objects.
> 
> Does checking for unreachable objects in this case slow down the repack
> significantly?

It'll certainly add some overhead, but I didn't quantify it. My gut
feeling is that the all-into-one repack is going to be slow by nature
anyway, as we have to rewrite all objects. Doing the reachability check
on top is of course going to slow it down even further, but the relative
impact is going to be smaller.

In any case, we have to perform a reachability check at one point in
time, otherwise we won't ever be able to prune unreachable objects. I
guess doing this at the point where we merge all packs into one is a
reasonable tradeoff.

I think the more interesting question is whether we should maybe do this
all-into-one repack more often, so that we can prune more regularly.
With the proposed strategy you'd need to add a significant portion of
new objects before we'd ever prune them, because otherwise we won't do
the all-into-one repack.

I think for an initial version this is going to be fine, but we might
want to iterarate on this eventually and add a time-based component to
the heuristics.

Patrick

  reply	other threads:[~2025-10-17  6:13 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-16  7:26 [PATCH 0/8] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
2025-10-16  7:26 ` [PATCH 1/8] builtin/gc: remove global `repack` variable Patrick Steinhardt
2025-10-16 20:07   ` Justin Tobler
2025-10-17 20:58   ` Taylor Blau
2025-10-16  7:26 ` [PATCH 2/8] builtin/gc: make `too_many_loose_objects()` reusable without GC config Patrick Steinhardt
2025-10-16 20:59   ` Junio C Hamano
2025-10-16  7:26 ` [PATCH 3/8] builtin/maintenance: introduce "geometric-repack" task Patrick Steinhardt
2025-10-16 20:51   ` Justin Tobler
2025-10-17  6:13     ` Patrick Steinhardt [this message]
2025-10-17 22:28   ` Taylor Blau
2025-10-21 13:00     ` Patrick Steinhardt
2025-10-23 19:19       ` Taylor Blau
2025-10-24  5:44         ` Patrick Steinhardt
2025-10-16  7:26 ` [PATCH 4/8] builtin/maintenance: don't silently ignore invalid strategy Patrick Steinhardt
2025-10-16  7:26 ` [PATCH 5/8] builtin/maintenance: run maintenance tasks depending on type Patrick Steinhardt
2025-10-16  7:26 ` [PATCH 6/8] builtin/maintenance: extend "maintenance.strategy" to manual maintenance Patrick Steinhardt
2025-10-16  7:26 ` [PATCH 7/8] builtin/maintenance: make "gc" strategy accessible Patrick Steinhardt
2025-10-16  7:26 ` [PATCH 8/8] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
2025-10-21 14:13 ` [PATCH v2 0/9] " Patrick Steinhardt
2025-10-21 14:13   ` [PATCH v2 1/9] builtin/gc: remove global `repack` variable Patrick Steinhardt
2025-10-21 14:13   ` [PATCH v2 2/9] builtin/gc: make `too_many_loose_objects()` reusable without GC config Patrick Steinhardt
2025-10-21 14:13   ` [PATCH v2 3/9] builtin/maintenance: introduce "geometric-repack" task Patrick Steinhardt
2025-10-23 19:29     ` Taylor Blau
2025-10-24  5:45       ` Patrick Steinhardt
2025-10-21 14:13   ` [PATCH v2 4/9] builtin/maintenance: make the geometric factor configurable Patrick Steinhardt
2025-10-23 19:33     ` Taylor Blau
2025-10-24  5:45       ` Patrick Steinhardt
2025-10-24 19:02         ` Taylor Blau
2025-10-21 14:13   ` [PATCH v2 5/9] builtin/maintenance: don't silently ignore invalid strategy Patrick Steinhardt
2025-10-23 21:31     ` Taylor Blau
2025-10-21 14:13   ` [PATCH v2 6/9] builtin/maintenance: run maintenance tasks depending on type Patrick Steinhardt
2025-10-23 21:34     ` Taylor Blau
2025-10-21 14:13   ` [PATCH v2 7/9] builtin/maintenance: extend "maintenance.strategy" to manual maintenance Patrick Steinhardt
2025-10-21 14:13   ` [PATCH v2 8/9] builtin/maintenance: make "gc" strategy accessible Patrick Steinhardt
2025-10-21 14:13   ` [PATCH v2 9/9] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
2025-10-23 21:49     ` Taylor Blau
2025-10-24  5:45       ` Patrick Steinhardt
2025-10-23 16:48   ` [PATCH v2 0/9] " Junio C Hamano
2025-10-23 21:50     ` Taylor Blau
2025-10-24  6:57 ` [PATCH v3 00/10] " Patrick Steinhardt
2025-10-24  6:57   ` [PATCH v3 01/10] builtin/gc: remove global `repack` variable Patrick Steinhardt
2025-10-24  6:57   ` [PATCH v3 02/10] builtin/gc: make `too_many_loose_objects()` reusable without GC config Patrick Steinhardt
2025-10-24  6:57   ` [PATCH v3 03/10] builtin/maintenance: introduce "geometric-repack" task Patrick Steinhardt
2025-10-25 19:15     ` Jeff King
2025-10-27  8:24       ` Patrick Steinhardt
2025-10-27 14:25         ` Jeff King
2025-10-24  6:57   ` [PATCH v3 04/10] builtin/maintenance: make the geometric factor configurable Patrick Steinhardt
2025-10-24  6:57   ` [PATCH v3 05/10] builtin/maintenance: don't silently ignore invalid strategy Patrick Steinhardt
2025-10-24  6:57   ` [PATCH v3 06/10] builtin/maintenance: improve readability of strategies Patrick Steinhardt
2025-10-24  6:57   ` [PATCH v3 07/10] builtin/maintenance: run maintenance tasks depending on type Patrick Steinhardt
2025-10-24  6:57   ` [PATCH v3 08/10] builtin/maintenance: extend "maintenance.strategy" to manual maintenance Patrick Steinhardt
2025-10-24  6:57   ` [PATCH v3 09/10] builtin/maintenance: make "gc" strategy accessible Patrick Steinhardt
2025-10-24  6:57   ` [PATCH v3 10/10] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
2025-10-24 19:03   ` [PATCH v3 00/10] " Taylor Blau
2025-10-24 19:11     ` Junio C Hamano
2025-10-27  8:30 ` [PATCH v4 " Patrick Steinhardt
2025-10-27  8:30   ` [PATCH v4 01/10] builtin/gc: remove global `repack` variable Patrick Steinhardt
2025-10-27  8:30   ` [PATCH v4 02/10] builtin/gc: make `too_many_loose_objects()` reusable without GC config Patrick Steinhardt
2025-10-27  8:30   ` [PATCH v4 03/10] builtin/maintenance: introduce "geometric-repack" task Patrick Steinhardt
2025-10-27  8:30   ` [PATCH v4 04/10] builtin/maintenance: make the geometric factor configurable Patrick Steinhardt
2025-10-27  8:30   ` [PATCH v4 05/10] builtin/maintenance: don't silently ignore invalid strategy Patrick Steinhardt
2025-10-27  8:30   ` [PATCH v4 06/10] builtin/maintenance: improve readability of strategies Patrick Steinhardt
2025-10-27  8:30   ` [PATCH v4 07/10] builtin/maintenance: run maintenance tasks depending on type Patrick Steinhardt
2025-10-27  8:30   ` [PATCH v4 08/10] builtin/maintenance: extend "maintenance.strategy" to manual maintenance Patrick Steinhardt
2025-10-27  8:30   ` [PATCH v4 09/10] builtin/maintenance: make "gc" strategy accessible Patrick Steinhardt
2025-10-27  8:31   ` [PATCH v4 10/10] builtin/maintenance: introduce "geometric" strategy Patrick Steinhardt
2025-10-27 15:53   ` [PATCH v4 00/10] " Junio C Hamano
2025-10-27 20:05     ` Patrick Steinhardt
2025-10-27 20:58       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aPHeflRSMpoRukta@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=jltobler@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).