From: Phil Hord <phil.hord@gmail.com>
To: git@vger.kernel.org
Cc: Phil Hord <phil.hord@gmail.com>
Subject: [RFC PATCH 1/2] fetch-prune: optimize dangling-ref reporting
Date: Wed, 18 Jun 2025 14:08:39 -0700 [thread overview]
Message-ID: <20250618211024.2332525-2-phil.hord@gmail.com> (raw)
In-Reply-To: <20250618211024.2332525-1-phil.hord@gmail.com>
From: Phil Hord <phil.hord@gmail.com>
When pruning during `git fetch` we check each pruned ref against the
ref_store one at a time to decide whether to report it as dangling.
This causes every local ref to be scanned for each ref being pruned.
If there are N refs in the repo and M refs being pruned, this code is
O(M*N). However, `git remote prune` uses a very similar function that
is only O(N*log(M)).
Remove the wasteful ref scanning for each pruned ref and use the faster
version already available in refs_warn_dangling_symrefs.
In a repo with 126,000 refs, where I was pruning 28,000 refs, this
code made about 3.6 billion calls to strcmp and consumed 410 seconds
of CPU. (Invariably in that time, my remote would timeout and the
fetch would fail anyway.)
After this change, the same operation completes in under 4 seconds.
I considered further optimizing this function to be O(N), but this
requires ref_store iterators to be sorted, too. I found some suggestions
that this is always the case, but I'm not certain it is.
The current speedup is enough for our needs at the moment.
This change causes a reordering of the output for any reported dangling
refs. Previously they would be reported inline with the "fetch: prune"
messages. Now they will be reported after all the original prune
messages are complete.
Signed-off-by: Phil Hord <phil.hord@gmail.com>
---
builtin/fetch.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 40a0e8d24434..11ce51da780a 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1383,10 +1383,14 @@ static int prune_refs(struct display_state *display_state,
int result = 0;
struct ref *ref, *stale_refs = get_stale_heads(rs, ref_map);
struct strbuf err = STRBUF_INIT;
+ struct string_list refnames = STRING_LIST_INIT_NODUP;
const char *dangling_msg = dry_run
? _(" (%s will become dangling)")
: _(" (%s has become dangling)");
+ for (ref = stale_refs; ref; ref = ref->next)
+ string_list_append(&refnames, ref->name);
+
if (!dry_run) {
if (transaction) {
for (ref = stale_refs; ref; ref = ref->next) {
@@ -1396,15 +1400,9 @@ static int prune_refs(struct display_state *display_state,
goto cleanup;
}
} else {
- struct string_list refnames = STRING_LIST_INIT_NODUP;
-
- for (ref = stale_refs; ref; ref = ref->next)
- string_list_append(&refnames, ref->name);
-
result = refs_delete_refs(get_main_ref_store(the_repository),
"fetch: prune", &refnames,
0);
- string_list_clear(&refnames, 0);
}
}
@@ -1416,12 +1414,14 @@ static int prune_refs(struct display_state *display_state,
_("(none)"), ref->name,
&ref->new_oid, &ref->old_oid,
summary_width);
- refs_warn_dangling_symref(get_main_ref_store(the_repository),
- stderr, dangling_msg, ref->name);
}
+ string_list_sort(&refnames);
+ refs_warn_dangling_symrefs(get_main_ref_store(the_repository),
+ stderr, dangling_msg, &refnames);
}
cleanup:
+ string_list_clear(&refnames, 0);
strbuf_release(&err);
free_refs(stale_refs);
return result;
--
2.50.0.1.gf2ab606906.dirty
next prev parent reply other threads:[~2025-06-18 21:11 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-18 21:08 [RFC PATCH 0/2] fetch --prune performance problem Phil Hord
2025-06-18 21:08 ` Phil Hord [this message]
2025-06-18 21:50 ` [RFC PATCH 1/2] fetch-prune: optimize dangling-ref reporting Junio C Hamano
2025-06-18 23:18 ` Jacob Keller
2025-06-19 4:00 ` Jeff King
2025-06-19 11:01 ` Lidong Yan
2025-06-19 14:41 ` Lidong Yan
2025-06-18 21:08 ` [RFC PATCH 2/2] refs: remove old refs_warn_dangling_symref Phil Hord
2025-06-18 23:15 ` [RFC PATCH 0/2] fetch --prune performance problem Jacob Keller
2025-06-19 3:37 ` Jeff King
2025-06-19 17:18 ` Junio C Hamano
[not found] ` <CABURp0p4d0JPg=-cW1OZdFQJ+vNT_0PDd9Rv3oz6toFGqGv5=g@mail.gmail.com>
2025-06-23 23:32 ` Jacob Keller
2025-06-23 23:41 ` Junio C Hamano
[not found] ` <CABURp0q-1FGmD+PJeSQ=xvyDN6ZYn1O7Fh8i1OojfD2WQCqgcw@mail.gmail.com>
2025-06-23 23:46 ` Jacob Keller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250618211024.2332525-2-phil.hord@gmail.com \
--to=phil.hord@gmail.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.