* [PATCH v2 0/2] fetch --prune performance problem
@ 2025-06-23 23:43 Phil Hord
2025-06-23 23:43 ` [PATCH v2 1/2] fetch-prune: optimize dangling-ref reporting Phil Hord
2025-06-23 23:43 ` [PATCH v2 2/2] refs: remove old refs_warn_dangling_symref Phil Hord
0 siblings, 2 replies; 4+ messages in thread
From: Phil Hord @ 2025-06-23 23:43 UTC (permalink / raw)
To: gitster; +Cc: peff, git, Jacob Keller, Phil Hord
From: Phil Hord <phil.hord@gmail.com>
`git fetch --prune` runs in O(N^2) time normally. This happens because the code
iterates over each ref to be pruned to display its status. In a repo with
174,000 refs, where I was pruning 15,000 refs, the current code made 2.6 billion
calls to strcmp and consumed 470 seconds of CPU. After this change, the same
operation completes in under 1 second.
The loop looks like this:
for p in prune_refs { for ref in all_refs { if p == ref { ... }}}
That loop runs only to check for and report newly dangling refs. A workaround to
avoid this slowness is to run with `-q` to bypass this check.
There is similar check/report functionality in `git remote prune`, but it uses a
more efficient method to check for dangling refs. prune_refs is first sorted, so
it can be searched in O(logN), so this loop is O(N*logN).
for ref in all_refs { if ref in prune_refs { ... }}
We can use that function instead, with some minor cleanup to the output to deal
with the ordering being changed.
This patch version only adds the deleted branch name to the output of the dangling
sym refs since the ordering has changed. This is only a minor cleanup and was
not actually needed since, for example, `git origin prune` already did not
mind losing track of this information in its output. But now it is improved
to be more explicit.
Phil Hord (2):
fetch-prune: optimize dangling-ref reporting
refs: remove old refs_warn_dangling_symref
builtin/fetch.c | 20 ++++++++++----------
builtin/remote.c | 4 ++--
refs.c | 21 ++++-----------------
3 files changed, 16 insertions(+), 29 deletions(-)
--
2.50.0.84.g5d85fe910b.dirty
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v2 1/2] fetch-prune: optimize dangling-ref reporting
2025-06-23 23:43 [PATCH v2 0/2] fetch --prune performance problem Phil Hord
@ 2025-06-23 23:43 ` Phil Hord
2025-06-24 10:49 ` Jeff King
2025-06-23 23:43 ` [PATCH v2 2/2] refs: remove old refs_warn_dangling_symref Phil Hord
1 sibling, 1 reply; 4+ messages in thread
From: Phil Hord @ 2025-06-23 23:43 UTC (permalink / raw)
To: gitster; +Cc: peff, git, Jacob Keller, Phil Hord
From: Phil Hord <phil.hord@gmail.com>
When pruning during `git fetch` we check each pruned ref against the
ref_store one at a time to decide whether to report it as dangling.
This causes every local ref to be scanned for each ref being pruned.
If there are N refs in the repo and M refs being pruned, this code is
O(M*N). However, `git remote prune` uses a very similar function that
is only O(N*log(M)).
Remove the wasteful ref scanning for each pruned ref and use the faster
version already available in refs_warn_dangling_symrefs.
In a repo with 126,000 refs, where I was pruning 28,000 refs, this
code made about 3.6 billion calls to strcmp and consumed 410 seconds
of CPU. (Invariably in that time, my remote would timeout and the
fetch would fail anyway.)
After this change, the same operation completes in under a second.
Signed-off-by: Phil Hord <phil.hord@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
---
builtin/fetch.c | 20 ++++++++++----------
builtin/remote.c | 4 ++--
refs.c | 4 +++-
3 files changed, 15 insertions(+), 13 deletions(-)
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 40a0e8d24434..65d606c6de08 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1383,9 +1383,13 @@ static int prune_refs(struct display_state *display_state,
int result = 0;
struct ref *ref, *stale_refs = get_stale_heads(rs, ref_map);
struct strbuf err = STRBUF_INIT;
+ struct string_list refnames = STRING_LIST_INIT_NODUP;
const char *dangling_msg = dry_run
- ? _(" (%s will become dangling)")
- : _(" (%s has become dangling)");
+ ? _(" %s will become dangling after %s is deleted")
+ : _(" %s has become dangling after %s was deleted");
+
+ for (ref = stale_refs; ref; ref = ref->next)
+ string_list_append(&refnames, ref->name);
if (!dry_run) {
if (transaction) {
@@ -1396,15 +1400,9 @@ static int prune_refs(struct display_state *display_state,
goto cleanup;
}
} else {
- struct string_list refnames = STRING_LIST_INIT_NODUP;
-
- for (ref = stale_refs; ref; ref = ref->next)
- string_list_append(&refnames, ref->name);
-
result = refs_delete_refs(get_main_ref_store(the_repository),
"fetch: prune", &refnames,
0);
- string_list_clear(&refnames, 0);
}
}
@@ -1416,12 +1414,14 @@ static int prune_refs(struct display_state *display_state,
_("(none)"), ref->name,
&ref->new_oid, &ref->old_oid,
summary_width);
- refs_warn_dangling_symref(get_main_ref_store(the_repository),
- stderr, dangling_msg, ref->name);
}
+ string_list_sort(&refnames);
+ refs_warn_dangling_symrefs(get_main_ref_store(the_repository),
+ stderr, dangling_msg, &refnames);
}
cleanup:
+ string_list_clear(&refnames, 0);
strbuf_release(&err);
free_refs(stale_refs);
return result;
diff --git a/builtin/remote.c b/builtin/remote.c
index 0d6755bcb71e..4de7dd373ae5 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -1522,8 +1522,8 @@ static int prune_remote(const char *remote, int dry_run)
struct string_list refs_to_prune = STRING_LIST_INIT_NODUP;
struct string_list_item *item;
const char *dangling_msg = dry_run
- ? _(" %s will become dangling!")
- : _(" %s has become dangling!");
+ ? _(" %s will become dangling after %s is deleted!")
+ : _(" %s has become dangling after %s was deleted!");
get_remote_ref_states(remote, &states, GET_REF_STATES);
diff --git a/refs.c b/refs.c
index dce5c49ca2ba..e2075a98c844 100644
--- a/refs.c
+++ b/refs.c
@@ -461,7 +461,9 @@ static int warn_if_dangling_symref(const char *refname, const char *referent UNU
return 0;
}
- fprintf(d->fp, d->msg_fmt, refname);
+ skip_prefix(refname, "refs/remotes/", &refname);
+ skip_prefix(resolves_to, "refs/remotes/", &resolves_to);
+ fprintf(d->fp, d->msg_fmt, refname, resolves_to);
fputc('\n', d->fp);
return 0;
}
--
2.50.0.84.g5d85fe910b.dirty
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v2 2/2] refs: remove old refs_warn_dangling_symref
2025-06-23 23:43 [PATCH v2 0/2] fetch --prune performance problem Phil Hord
2025-06-23 23:43 ` [PATCH v2 1/2] fetch-prune: optimize dangling-ref reporting Phil Hord
@ 2025-06-23 23:43 ` Phil Hord
1 sibling, 0 replies; 4+ messages in thread
From: Phil Hord @ 2025-06-23 23:43 UTC (permalink / raw)
To: gitster; +Cc: peff, git, Jacob Keller, Phil Hord
From: Phil Hord <phil.hord@gmail.com>
The dangling warning function that takes a single ref to search for
is no longer used. Remove it.
Signed-off-by: Phil Hord <phil.hord@gmail.com>
---
refs.c | 17 +----------------
1 file changed, 1 insertion(+), 16 deletions(-)
diff --git a/refs.c b/refs.c
index e2075a98c844..a9fbb0c8f23c 100644
--- a/refs.c
+++ b/refs.c
@@ -438,7 +438,6 @@ static int for_each_filter_refs(const char *refname, const char *referent,
struct warn_if_dangling_data {
struct ref_store *refs;
FILE *fp;
- const char *refname;
const struct string_list *refnames;
const char *msg_fmt;
};
@@ -455,9 +454,7 @@ static int warn_if_dangling_symref(const char *refname, const char *referent UNU
resolves_to = refs_resolve_ref_unsafe(d->refs, refname, 0, NULL, NULL);
if (!resolves_to
- || (d->refname
- ? strcmp(resolves_to, d->refname)
- : !string_list_has_string(d->refnames, resolves_to))) {
+ || !string_list_has_string(d->refnames, resolves_to)) {
return 0;
}
@@ -468,18 +465,6 @@ static int warn_if_dangling_symref(const char *refname, const char *referent UNU
return 0;
}
-void refs_warn_dangling_symref(struct ref_store *refs, FILE *fp,
- const char *msg_fmt, const char *refname)
-{
- struct warn_if_dangling_data data = {
- .refs = refs,
- .fp = fp,
- .refname = refname,
- .msg_fmt = msg_fmt,
- };
- refs_for_each_rawref(refs, warn_if_dangling_symref, &data);
-}
-
void refs_warn_dangling_symrefs(struct ref_store *refs, FILE *fp,
const char *msg_fmt, const struct string_list *refnames)
{
--
2.50.0.84.g5d85fe910b.dirty
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v2 1/2] fetch-prune: optimize dangling-ref reporting
2025-06-23 23:43 ` [PATCH v2 1/2] fetch-prune: optimize dangling-ref reporting Phil Hord
@ 2025-06-24 10:49 ` Jeff King
0 siblings, 0 replies; 4+ messages in thread
From: Jeff King @ 2025-06-24 10:49 UTC (permalink / raw)
To: Phil Hord; +Cc: gitster, git, Jacob Keller
On Mon, Jun 23, 2025 at 04:43:26PM -0700, Phil Hord wrote:
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 40a0e8d24434..65d606c6de08 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -1383,9 +1383,13 @@ static int prune_refs(struct display_state *display_state,
> int result = 0;
> struct ref *ref, *stale_refs = get_stale_heads(rs, ref_map);
> struct strbuf err = STRBUF_INIT;
> + struct string_list refnames = STRING_LIST_INIT_NODUP;
> const char *dangling_msg = dry_run
> - ? _(" (%s will become dangling)")
> - : _(" (%s has become dangling)");
> + ? _(" %s will become dangling after %s is deleted")
> + : _(" %s has become dangling after %s was deleted");
This approach seems reasonable. It is a little ugly that
refs_warn_dangling_symrefs() takes a printf-formatted string that must
contain the correct number of "%s" fields (and that we get no compiler
warnings if we get it wrong).
But that is not really new in your series. Given that there are two
callers and they use (almost) the same string, I wonder if we could
refactor the interface. We'd need to pass in the indentation level, and
the dry-run flag.
I guess alternatively, we could have a function which passes back a
strvec or similar of danglers, but then both call sites would have more
printing boilerplate. I dunno. Maybe we should just avert our eyes and
live with it. ;)
> diff --git a/refs.c b/refs.c
> index dce5c49ca2ba..e2075a98c844 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -461,7 +461,9 @@ static int warn_if_dangling_symref(const char *refname, const char *referent UNU
> return 0;
> }
>
> - fprintf(d->fp, d->msg_fmt, refname);
> + skip_prefix(refname, "refs/remotes/", &refname);
> + skip_prefix(resolves_to, "refs/remotes/", &resolves_to);
> + fprintf(d->fp, d->msg_fmt, refname, resolves_to);
> fputc('\n', d->fp);
> return 0;
This prefix handling feels kind of ad-hoc. Should we use something like
refs_shorten_unambiguous_ref() to follow the usual rules?
This is also shortening the symref name, which didn't happen before.
Arguably that should happen in a separate patch, but I can live with it
either way.
-Peff
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-06-24 10:49 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-23 23:43 [PATCH v2 0/2] fetch --prune performance problem Phil Hord
2025-06-23 23:43 ` [PATCH v2 1/2] fetch-prune: optimize dangling-ref reporting Phil Hord
2025-06-24 10:49 ` Jeff King
2025-06-23 23:43 ` [PATCH v2 2/2] refs: remove old refs_warn_dangling_symref Phil Hord
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox