git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH v2 6/8] rerere: provide function to collect stale entries
Date: Fri, 2 May 2025 10:07:35 +0200	[thread overview]
Message-ID: <aBR9R3PxSCuLON6G@pks.im> (raw)
In-Reply-To: <xmqqy0vh1t96.fsf@gitster.g>

On Wed, Apr 30, 2025 at 09:58:13AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > We're about to add another task for git-maintenance(1) that prunes stale
> > rerere entries via `git rerere gc`. The condition of when to run this
> > subcommand will be configurable so that the subcommand is only executed
> > when a certain number of stale rerere entries exists. This requires us
> > to know about the number of stale rerere entries in the first place,
> > which is non-trivial to figure out.
> >
> > Refactor `rerere_gc()` and `prune_one()` so that garbage collection is
> > split into three phases:
> >
> >   1. We collect any stale rerere entries and directories that are about
> >      to become empty.
> >
> >   2. Prune all stale rerere entries.
> >
> >   3. Remove all directories that should have become empty in (2).
> >
> > By splitting out the collection of stale entries we can trivially expose
> > this function to external callers and thus reuse it in later steps.
> >
> > This refactoring is not expected to result in a user-visible change in
> > behaviour.
> 
> I have no objection against the goal of allowing "git maintenance"
> drive "git rerere gc", and as the primary author of this code path I
> do not see anything wrong, in the "correctness" sense, in the
> updated code.
> 
> I however am not sure if "count what we would prune, and remove only
> when there are too many" would work well for this subsystem, because
> I expect that the cost to enumerate existing rerere entries and
> check each of them for staleness would be the dominant part,
> relative to actual "rm -fr <rerere-id>", of the cost you are paying
> when you run "git rerere gc".
> 
> And if my suspicion is correct, all this change does to the plain
> vanilla user of "git rerere gc" is to have them pay the extra cost
> of allocating and deallocating the list of names of paths in string
> lists.

Yeah, I think this concern makes sense indeed. I was a bit sceptical
myself whether this is going too far. Maybe a simpler solution would be
to just count the number of directories in ".git/rr-cache", without
checking whether those actually are prunable?

We could also adapt this to be closer to the original version, where we
only verified that ".git/rr-cache" exists and contains at least one
subdirectory. This can even be combined with the above approach if we
set "maintenance.rerere-gc.auto=1" by default.

> We need to see some performance measurement to show that the we pay
> for collection and counting is a lot smaller compared to the whole
> pruning operation to justify the "auto" thing.

Hm. I guess ultimately the answer is going to be "it depends". The
performance implication on Windows is going to be quite different
compared to the performance on Linux/macOS.

In any case, let's go with the simpler model for now. We can still
iterate as needed if we eventually see that the heuristic is too dumb to
be useful.

Patrick

  reply	other threads:[~2025-05-02  8:07 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-25  7:29 [PATCH 0/7] builtin/maintenance: implement missing tasks compared to git-gc(1) Patrick Steinhardt
2025-04-25  7:29 ` [PATCH 1/7] builtin/gc: fix indentation of `cmd_gc()` parameters Patrick Steinhardt
2025-04-25  7:29 ` [PATCH 2/7] builtin/gc: remove global variables where it trivial to do Patrick Steinhardt
2025-04-25  7:29 ` [PATCH 3/7] builtin/gc: move pruning of worktrees into a separate function Patrick Steinhardt
2025-04-25  7:29 ` [PATCH 4/7] worktree: expose function to retrieve worktree names Patrick Steinhardt
2025-04-25  7:29 ` [PATCH 5/7] builtin/maintenance: introduce "worktree-prune" task Patrick Steinhardt
2025-04-29 20:02   ` Derrick Stolee
2025-04-30  7:08     ` Patrick Steinhardt
2025-04-25  7:29 ` [PATCH 6/7] builtin/gc: move rerere garbage collection into separate function Patrick Steinhardt
2025-04-25  7:29 ` [PATCH 7/7] builtin/maintenance: introduce "rerere-gc" task Patrick Steinhardt
2025-04-29 20:02 ` [PATCH 0/7] builtin/maintenance: implement missing tasks compared to git-gc(1) Derrick Stolee
2025-04-30  7:08   ` Patrick Steinhardt
2025-04-30 10:25 ` [PATCH v2 0/8] " Patrick Steinhardt
2025-04-30 10:25   ` [PATCH v2 1/8] builtin/gc: fix indentation of `cmd_gc()` parameters Patrick Steinhardt
2025-04-30 10:25   ` [PATCH v2 2/8] builtin/gc: remove global variables where it trivial to do Patrick Steinhardt
2025-04-30 10:25   ` [PATCH v2 3/8] builtin/gc: move pruning of worktrees into a separate function Patrick Steinhardt
2025-04-30 10:25   ` [PATCH v2 4/8] worktree: expose function to retrieve worktree names Patrick Steinhardt
2025-04-30 10:25   ` [PATCH v2 5/8] builtin/maintenance: introduce "worktree-prune" task Patrick Steinhardt
2025-04-30 10:25   ` [PATCH v2 6/8] rerere: provide function to collect stale entries Patrick Steinhardt
2025-04-30 16:58     ` Junio C Hamano
2025-05-02  8:07       ` Patrick Steinhardt [this message]
2025-05-02 16:35         ` Junio C Hamano
2025-05-05  7:22           ` Patrick Steinhardt
2025-04-30 10:25   ` [PATCH v2 7/8] builtin/gc: move rerere garbage collection into separate function Patrick Steinhardt
2025-04-30 10:25   ` [PATCH v2 8/8] builtin/maintenance: introduce "rerere-gc" task Patrick Steinhardt
2025-04-30 10:37   ` [PATCH v2 0/8] builtin/maintenance: implement missing tasks compared to git-gc(1) Derrick Stolee
2025-05-02  8:43 ` [PATCH v3 0/7] " Patrick Steinhardt
2025-05-02  8:43   ` [PATCH v3 1/7] builtin/gc: fix indentation of `cmd_gc()` parameters Patrick Steinhardt
2025-05-02  8:43   ` [PATCH v3 2/7] builtin/gc: remove global variables where it trivial to do Patrick Steinhardt
2025-05-02  8:44   ` [PATCH v3 3/7] builtin/gc: move pruning of worktrees into a separate function Patrick Steinhardt
2025-05-02  8:44   ` [PATCH v3 4/7] worktree: expose function to retrieve worktree names Patrick Steinhardt
2025-05-05  8:42     ` Eric Sunshine
2025-05-07  7:06       ` Patrick Steinhardt
2025-05-02  8:44   ` [PATCH v3 5/7] builtin/maintenance: introduce "worktree-prune" task Patrick Steinhardt
2025-05-05  8:59     ` Eric Sunshine
2025-05-07  7:06       ` Patrick Steinhardt
2025-05-02  8:44   ` [PATCH v3 6/7] builtin/gc: move rerere garbage collection into separate function Patrick Steinhardt
2025-05-02  8:44   ` [PATCH v3 7/7] builtin/maintenance: introduce "rerere-gc" task Patrick Steinhardt
2025-05-02 14:57   ` [PATCH v3 0/7] builtin/maintenance: implement missing tasks compared to git-gc(1) Derrick Stolee
2025-05-02 21:07     ` Junio C Hamano
2025-05-05  7:32       ` Patrick Steinhardt
2025-05-05  8:51 ` [PATCH v4 " Patrick Steinhardt
2025-05-05  8:51   ` [PATCH v4 1/7] builtin/gc: fix indentation of `cmd_gc()` parameters Patrick Steinhardt
2025-05-05  8:51   ` [PATCH v4 2/7] builtin/gc: remove global variables where it trivial to do Patrick Steinhardt
2025-05-06  7:44     ` Christian Couder
2025-05-07  7:06       ` Patrick Steinhardt
2025-05-05  8:51   ` [PATCH v4 3/7] builtin/gc: move pruning of worktrees into a separate function Patrick Steinhardt
2025-05-06  7:50     ` Christian Couder
2025-05-07  7:06       ` Patrick Steinhardt
2025-05-05  8:51   ` [PATCH v4 4/7] worktree: expose function to retrieve worktree names Patrick Steinhardt
2025-05-06  8:20     ` Christian Couder
2025-05-06 16:08       ` Eric Sunshine
2025-05-05  8:51   ` [PATCH v4 5/7] builtin/maintenance: introduce "worktree-prune" task Patrick Steinhardt
2025-05-06  7:40     ` Christian Couder
2025-05-07  7:06       ` Patrick Steinhardt
2025-05-05  8:51   ` [PATCH v4 6/7] builtin/gc: move rerere garbage collection into separate function Patrick Steinhardt
2025-05-06  8:39     ` Christian Couder
2025-05-05  8:51   ` [PATCH v4 7/7] builtin/maintenance: introduce "rerere-gc" task Patrick Steinhardt
2025-05-06  9:05   ` [PATCH v4 0/7] builtin/maintenance: implement missing tasks compared to git-gc(1) Christian Couder
2025-05-07  7:21 ` [PATCH v5 0/6] " Patrick Steinhardt
2025-05-07  7:21   ` [PATCH v5 1/6] builtin/gc: fix indentation of `cmd_gc()` parameters Patrick Steinhardt
2025-05-07  7:21   ` [PATCH v5 2/6] builtin/gc: remove global variables where it is trivial to do Patrick Steinhardt
2025-05-07  7:21   ` [PATCH v5 3/6] builtin/gc: move pruning of worktrees into a separate function Patrick Steinhardt
2025-05-07  7:21   ` [PATCH v5 4/6] builtin/maintenance: introduce "worktree-prune" task Patrick Steinhardt
2025-05-07  7:21   ` [PATCH v5 5/6] builtin/gc: move rerere garbage collection into separate function Patrick Steinhardt
2025-05-07  7:21   ` [PATCH v5 6/6] builtin/maintenance: introduce "rerere-gc" task Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aBR9R3PxSCuLON6G@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).