From: Patrick Steinhardt <ps@pks.im>
To: Junio C Hamano <gitster@pobox.com>
Cc: Markus Gerstel <2025@uxp.de>, git@vger.kernel.org
Subject: Re: 'git gc auto' didn't trigger on large reflog
Date: Wed, 26 Feb 2025 12:39:35 +0100 [thread overview]
Message-ID: <Z779d7SnW5j8XcOb@pks.im> (raw)
In-Reply-To: <xmqqeczn70pg.fsf@gitster.g>
On Mon, Feb 24, 2025 at 08:43:23AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
>
> > It's a bit funny, but whether or not `git gc --auto` does anything
> > solely depends on the state of the object database.
>
> I guess after adding "auto", we haven't been careful enough to
> update the triggering condition as we added new kinds of "garbage"
> to collect? Should we make an exhausitive and authoritative list of
> gc tasks, document them, and make sure "--auto" pays attention?
Maybe. But maybe a better solution would be to build this into
git-maintenance(1) instead, which is a lot more fine-grained. It already
has properly defined subtasks, and each of these subtasks has an
optional callback function that makes it only run as-needed.
So from my perspective we should:
- Expand git-maintenance(1) to gain a new task for expiring reflogs.
- Adapt it to not use git-gc(1) anymore, but instead use the specific
subtasks.
It also allows us to iterate a lot more on the actual tasks run by the
command and make them configurable. It would for example allow us to
eventually enable incremental repacking via multi-pack indices or
geometric repacking.
> Other than objects (packing loose ones, pruning unreferenced loose
> ones or packing them into cruft packs), we seem to check reflog,
> worktree, and rerere database.
>
> I do not think there is a readily usable API to query how much stale
> data is in reflogs that are more than N seconds old, without which
> "gc --auto" cannot make decisions. I am reasonably sure rerere API
> does not give you such data, either. I have no idea about the
> triggering condition of "worktree prune".
No, there isn't, and computing it is also potentially expensive. You
basically have to iterate through each reflog and then also iterate
through all of its reflog entries to figure out whether anything needs
cleaning or not.
But probably we can come up with clever heuristics instead that don't
require us to be this thorough. We could for example just read the
"HEAD" reflog and figure out whether it contains reflog entries that
would be pruned.
Patrick
next prev parent reply other threads:[~2025-02-26 11:39 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-22 22:50 'git gc auto' didn't trigger on large reflog Markus Gerstel
2025-02-24 10:56 ` Patrick Steinhardt
2025-02-24 16:43 ` Junio C Hamano
2025-02-26 11:39 ` Patrick Steinhardt [this message]
2025-02-26 16:10 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z779d7SnW5j8XcOb@pks.im \
--to=ps@pks.im \
--cc=2025@uxp.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).