* [PATCH] describe: limit default ref iteration to tags
@ 2026-06-07 20:51 Tamir Duberstein
2026-06-08 6:59 ` Patrick Steinhardt
2026-06-08 12:36 ` Junio C Hamano
0 siblings, 2 replies; 5+ messages in thread
From: Tamir Duberstein @ 2026-06-07 20:51 UTC (permalink / raw)
To: git
Cc: Shawn O. Pearce, Junio C Hamano, Jeff King, Patrick Steinhardt,
Tamir Duberstein
Unless --all is given, get_name() rejects every ref outside refs/tags/.
The rejection happens only after the ref backend has enumerated the ref,
so repositories with many other refs spend most of a simple describe
invocation visiting refs which cannot affect its result.
Commit 8a5a1884e9 (Avoid accessing non-tag refs in git-describe unless
--all is requested, 2008-02-24) moved this rejection before object
lookup, but left iteration unscoped. Pass the existing refs/tags/
restriction to the iterator unless --all is given so the backend can
avoid unrelated refs.
On a checkout with 124,357 refs, of which 330 were tags, I ran the
following command with the parent and patched binaries:
hyperfine --warmup 3 --runs 15 \
'git describe --always --long --abbrev=40 HEAD'
The results were:
parent this commit
elapsed 196.2 ms 63.3 ms
user 69.5 ms 48.0 ms
system 123.0 ms 12.0 ms
The wall-time standard deviations were 13.2 ms and 2.6 ms, respectively,
for a 3.10x speedup.
Both revisions were built with -O3, -mcpu=native, and ThinLTO using
Apple clang 21.0.0 on macOS 26.5. The machine was a MacBook Pro
(Mac16,6) with a 16-core Apple M4 Max (12 performance and four
efficiency cores) and 128 GB RAM.
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
---
builtin/describe.c | 3 +++
t/perf/p6100-describe.sh | 20 ++++++++++++++++++++
2 files changed, 23 insertions(+)
diff --git a/builtin/describe.c b/builtin/describe.c
index 1c47d7c0b7..3532c8ff22 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -740,6 +740,9 @@ int cmd_describe(int argc,
return ret;
}
+ if (!all)
+ for_each_ref_opts.prefix = "refs/tags/";
+
hashmap_init(&names, commit_name_neq, NULL, 0);
refs_for_each_ref_ext(get_main_ref_store(the_repository),
get_name, NULL, &for_each_ref_opts);
diff --git a/t/perf/p6100-describe.sh b/t/perf/p6100-describe.sh
index 069f91ce49..dfcaf59e90 100755
--- a/t/perf/p6100-describe.sh
+++ b/t/perf/p6100-describe.sh
@@ -5,6 +5,12 @@ test_description='performance of git-describe'
test_perf_default_repo
+test_lazy_prereq PERF_REFFILES '
+ test "$(git rev-parse --show-ref-format)" = files
+'
+
+ref_count=10000
+
# clear out old tags and give us a known state
test_expect_success 'set up tags' '
git for-each-ref --format="delete %(refname)" refs/tags >to-delete &&
@@ -27,4 +33,18 @@ test_perf 'describe HEAD with one tag' '
git describe --match=new HEAD
'
+test_expect_success PERF_REFFILES 'set up many unrelated refs' '
+ git tag -m tip tip HEAD &&
+ for i in $(test_seq $ref_count)
+ do
+ printf "create refs/heads/describe-perf/%05d HEAD\n" $i ||
+ return 1
+ done >instructions &&
+ git update-ref --stdin <instructions
+'
+
+test_perf 'describe exact tag with many loose refs' --prereq PERF_REFFILES '
+ git describe --exact-match HEAD
+'
+
test_done
---
base-commit: 9ac3f193c05c2237e2b14ebaa1149e9fc8a1abe0
change-id: 20260607-describe-tag-ref-scope-7d00ae140a58
Best regards,
--
Tamir Duberstein <tamird@gmail.com>
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] describe: limit default ref iteration to tags
2026-06-07 20:51 [PATCH] describe: limit default ref iteration to tags Tamir Duberstein
@ 2026-06-08 6:59 ` Patrick Steinhardt
2026-06-08 15:46 ` Tamir Duberstein
2026-06-08 12:36 ` Junio C Hamano
1 sibling, 1 reply; 5+ messages in thread
From: Patrick Steinhardt @ 2026-06-08 6:59 UTC (permalink / raw)
To: Tamir Duberstein; +Cc: git, Shawn O. Pearce, Junio C Hamano, Jeff King
On Sun, Jun 07, 2026 at 04:51:53PM -0400, Tamir Duberstein wrote:
> Unless --all is given, get_name() rejects every ref outside refs/tags/.
> The rejection happens only after the ref backend has enumerated the ref,
> so repositories with many other refs spend most of a simple describe
> invocation visiting refs which cannot affect its result.
Right. The relevant block is this one:
if (skip_prefix(ref->name, "refs/tags/", &path_to_match)) {
is_tag = 1;
} else if (all) {
if ((exclude_patterns.nr || patterns.nr) &&
!skip_prefix(ref->name, "refs/heads/", &path_to_match) &&
!skip_prefix(ref->name, "refs/remotes/", &path_to_match)) {
/* Only accept reference of known type if there are match/exclude patterns */
return 0;
}
} else {
/* Reject anything outside refs/tags/ unless --all */
return 0;
}
So we really only use tags unless "--all" is given.
> Commit 8a5a1884e9 (Avoid accessing non-tag refs in git-describe unless
> --all is requested, 2008-02-24) moved this rejection before object
> lookup, but left iteration unscoped. Pass the existing refs/tags/
> restriction to the iterator unless --all is given so the backend can
> avoid unrelated refs.
>
> On a checkout with 124,357 refs, of which 330 were tags, I ran the
> following command with the parent and patched binaries:
>
> hyperfine --warmup 3 --runs 15 \
> 'git describe --always --long --abbrev=40 HEAD'
>
> The results were:
>
> parent this commit
> elapsed 196.2 ms 63.3 ms
> user 69.5 ms 48.0 ms
> system 123.0 ms 12.0 ms
It's a bit curious that you don't post the hyperfine(1) results as-is
here.
> The wall-time standard deviations were 13.2 ms and 2.6 ms, respectively,
> for a 3.10x speedup.
Makes sense that this would result in a sizeable speedup, depending of
course on the shape of the existing refs in the repository.
> diff --git a/builtin/describe.c b/builtin/describe.c
> index 1c47d7c0b7..3532c8ff22 100644
> --- a/builtin/describe.c
> +++ b/builtin/describe.c
> @@ -740,6 +740,9 @@ int cmd_describe(int argc,
> return ret;
> }
>
> + if (!all)
> + for_each_ref_opts.prefix = "refs/tags/";
> +
> hashmap_init(&names, commit_name_neq, NULL, 0);
> refs_for_each_ref_ext(get_main_ref_store(the_repository),
> get_name, NULL, &for_each_ref_opts);
Another performance optimization that we could do here is to wire up the
exclude patterns via `for_each_ref_opts.exclude_patterns`. But that's
outside the scope of this patch series, and also much less likely to
help many use cases out there.
> diff --git a/t/perf/p6100-describe.sh b/t/perf/p6100-describe.sh
> index 069f91ce49..dfcaf59e90 100755
> --- a/t/perf/p6100-describe.sh
> +++ b/t/perf/p6100-describe.sh
> @@ -5,6 +5,12 @@ test_description='performance of git-describe'
>
> test_perf_default_repo
>
> +test_lazy_prereq PERF_REFFILES '
> + test "$(git rev-parse --show-ref-format)" = files
> +'
> +
> +ref_count=10000
Let's not declare this variable outside of tests.
> @@ -27,4 +33,18 @@ test_perf 'describe HEAD with one tag' '
> git describe --match=new HEAD
> '
>
> +test_expect_success PERF_REFFILES 'set up many unrelated refs' '
> + git tag -m tip tip HEAD &&
> + for i in $(test_seq $ref_count)
> + do
> + printf "create refs/heads/describe-perf/%05d HEAD\n" $i ||
> + return 1
> + done >instructions &&
> + git update-ref --stdin <instructions
> +'
Why is this limited to the "files" backend, only? The logic should work
for both backends as-is.
Thanks!
Patrick
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] describe: limit default ref iteration to tags
2026-06-07 20:51 [PATCH] describe: limit default ref iteration to tags Tamir Duberstein
2026-06-08 6:59 ` Patrick Steinhardt
@ 2026-06-08 12:36 ` Junio C Hamano
2026-06-08 15:53 ` Tamir Duberstein
1 sibling, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2026-06-08 12:36 UTC (permalink / raw)
To: Tamir Duberstein; +Cc: git, Jeff King, Patrick Steinhardt
Tamir Duberstein <tamird@gmail.com> writes:
[jc: Removing Shawn from CC who passed away quite a while ago, RIP].
> Unless --all is given, get_name() rejects every ref outside refs/tags/.
> The rejection happens only after the ref backend has enumerated the ref,
> so repositories with many other refs spend most of a simple describe
> invocation visiting refs which cannot affect its result.
> ...
> Both revisions were built with -O3, -mcpu=native, and ThinLTO using
> Apple clang 21.0.0 on macOS 26.5. The machine was a MacBook Pro
> (Mac16,6) with a 16-core Apple M4 Max (12 performance and four
> efficiency cores) and 128 GB RAM.
>
> Signed-off-by: Tamir Duberstein <tamird@gmail.com>
> ---
> builtin/describe.c | 3 +++
> t/perf/p6100-describe.sh | 20 ++++++++++++++++++++
> 2 files changed, 23 insertions(+)
Interesting. How would this relate to and work well with
<20260601233727.43558-1-jacob.e.keller@intel.com>?
> +test_lazy_prereq PERF_REFFILES '
> + test "$(git rev-parse --show-ref-format)" = files
> +'
> +
> +ref_count=10000
> +
> # clear out old tags and give us a known state
> test_expect_success 'set up tags' '
> git for-each-ref --format="delete %(refname)" refs/tags >to-delete &&
> @@ -27,4 +33,18 @@ test_perf 'describe HEAD with one tag' '
> git describe --match=new HEAD
> '
>
> +test_expect_success PERF_REFFILES 'set up many unrelated refs' '
> + git tag -m tip tip HEAD &&
> + for i in $(test_seq $ref_count)
> + do
> + printf "create refs/heads/describe-perf/%05d HEAD\n" $i ||
> + return 1
> + done >instructions &&
> + git update-ref --stdin <instructions
> +'
> +
> +test_perf 'describe exact tag with many loose refs' --prereq PERF_REFFILES '
> + git describe --exact-match HEAD
> +'
> +
Is there a strong reason to guard this new test behind
`PERF_REFFILES`?
Even though the penalty of enumerating 10,000 unrelated loose
references may be most pronounced in the `files` backend, skipping
unnecessary reference enumeration is an architectural win for other
backends (like `reftable` or a fully packed repository) as well.
If we drop `PERF_REFFILES` and retitle the test to "describe exact
tag with many unrelated refs", we could run it unconditionally to
benchmark the improvement across all storage formats.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] describe: limit default ref iteration to tags
2026-06-08 6:59 ` Patrick Steinhardt
@ 2026-06-08 15:46 ` Tamir Duberstein
0 siblings, 0 replies; 5+ messages in thread
From: Tamir Duberstein @ 2026-06-08 15:46 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Shawn O. Pearce, Junio C Hamano, Jeff King
On Sun, Jun 7, 2026 at 11:59 PM Patrick Steinhardt <ps@pks.im> wrote:
>
> On Sun, Jun 07, 2026 at 04:51:53PM -0400, Tamir Duberstein wrote:
> > Unless --all is given, get_name() rejects every ref outside refs/tags/.
> > The rejection happens only after the ref backend has enumerated the ref,
> > so repositories with many other refs spend most of a simple describe
> > invocation visiting refs which cannot affect its result.
>
> Right. The relevant block is this one:
>
> if (skip_prefix(ref->name, "refs/tags/", &path_to_match)) {
> is_tag = 1;
> } else if (all) {
> if ((exclude_patterns.nr || patterns.nr) &&
> !skip_prefix(ref->name, "refs/heads/", &path_to_match) &&
> !skip_prefix(ref->name, "refs/remotes/", &path_to_match)) {
> /* Only accept reference of known type if there are match/exclude patterns */
> return 0;
> }
> } else {
> /* Reject anything outside refs/tags/ unless --all */
> return 0;
> }
>
> So we really only use tags unless "--all" is given.
>
> > Commit 8a5a1884e9 (Avoid accessing non-tag refs in git-describe unless
> > --all is requested, 2008-02-24) moved this rejection before object
> > lookup, but left iteration unscoped. Pass the existing refs/tags/
> > restriction to the iterator unless --all is given so the backend can
> > avoid unrelated refs.
> >
> > On a checkout with 124,357 refs, of which 330 were tags, I ran the
> > following command with the parent and patched binaries:
> >
> > hyperfine --warmup 3 --runs 15 \
> > 'git describe --always --long --abbrev=40 HEAD'
> >
> > The results were:
> >
> > parent this commit
> > elapsed 196.2 ms 63.3 ms
> > user 69.5 ms 48.0 ms
> > system 123.0 ms 12.0 ms
>
> It's a bit curious that you don't post the hyperfine(1) results as-is
> here.
Agreed, will include that in v2. For reference:
Benchmark 1: parent
Time (mean ± σ): 171.7 ms ± 18.5 ms [User: 23.9 ms,
System: 133.6 ms]
Range (min … max): 142.3 ms … 198.3 ms 15 runs
Benchmark 2: this commit
Time (mean ± σ): 9.9 ms ± 1.1 ms [User: 3.3 ms,
System: 4.7 ms]
Range (min … max): 8.8 ms … 13.1 ms 15 runs
>
> > The wall-time standard deviations were 13.2 ms and 2.6 ms, respectively,
> > for a 3.10x speedup.
>
> Makes sense that this would result in a sizeable speedup, depending of
> course on the shape of the existing refs in the repository.
>
> > diff --git a/builtin/describe.c b/builtin/describe.c
> > index 1c47d7c0b7..3532c8ff22 100644
> > --- a/builtin/describe.c
> > +++ b/builtin/describe.c
> > @@ -740,6 +740,9 @@ int cmd_describe(int argc,
> > return ret;
> > }
> >
> > + if (!all)
> > + for_each_ref_opts.prefix = "refs/tags/";
> > +
> > hashmap_init(&names, commit_name_neq, NULL, 0);
> > refs_for_each_ref_ext(get_main_ref_store(the_repository),
> > get_name, NULL, &for_each_ref_opts);
>
> Another performance optimization that we could do here is to wire up the
> exclude patterns via `for_each_ref_opts.exclude_patterns`. But that's
> outside the scope of this patch series, and also much less likely to
> help many use cases out there.
I tried this and have a separate patch prepared.
The patterns cannot be passed through verbatim: `git describe
--exclude=foo` excludes the exact name `foo`, while the refs API would
treat `foo` as a directory prefix and also skip `foo/*`. The patch
therefore passes only patterns consisting of a literal prefix followed
by trailing asterisks, adds back the applicable ref namespace, and
retains the existing callback filtering.
With 30,000 packed remote-tracking refs under an excluded prefix, the
perf test invokes `git describe` ten times per run:
```
master patched
describe excluding many refs 0.16(0.07+0.05) 0.12(0.04+0.05)
```
That is a 25% wall-time reduction, with user CPU falling from 0.07 to
0.04 seconds.
I also tested a larger checkout with 62,170 refs under
`refs/remotes/origin/`:
```
git describe --all --exact-match --exclude='origin/*' HEAD
```
This improved from 176.7 ms to 161.3 ms, or about 10%. Startup work
unrelated to ref iteration dominates more of that repository's runtime.
>
> > diff --git a/t/perf/p6100-describe.sh b/t/perf/p6100-describe.sh
> > index 069f91ce49..dfcaf59e90 100755
> > --- a/t/perf/p6100-describe.sh
> > +++ b/t/perf/p6100-describe.sh
> > @@ -5,6 +5,12 @@ test_description='performance of git-describe'
> >
> > test_perf_default_repo
> >
> > +test_lazy_prereq PERF_REFFILES '
> > + test "$(git rev-parse --show-ref-format)" = files
> > +'
> > +
> > +ref_count=10000
>
> Let's not declare this variable outside of tests.
Done in v2.
>
> > @@ -27,4 +33,18 @@ test_perf 'describe HEAD with one tag' '
> > git describe --match=new HEAD
> > '
> >
> > +test_expect_success PERF_REFFILES 'set up many unrelated refs' '
> > + git tag -m tip tip HEAD &&
> > + for i in $(test_seq $ref_count)
> > + do
> > + printf "create refs/heads/describe-perf/%05d HEAD\n" $i ||
> > + return 1
> > + done >instructions &&
> > + git update-ref --stdin <instructions
> > +'
>
> Why is this limited to the "files" backend, only? The logic should work
> for both backends as-is.
You're right, fixed in v2.
>
> Thanks!
Thanks for the quick review!
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] describe: limit default ref iteration to tags
2026-06-08 12:36 ` Junio C Hamano
@ 2026-06-08 15:53 ` Tamir Duberstein
0 siblings, 0 replies; 5+ messages in thread
From: Tamir Duberstein @ 2026-06-08 15:53 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Jeff King, Patrick Steinhardt
On Mon, Jun 8, 2026 at 5:36 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Tamir Duberstein <tamird@gmail.com> writes:
>
> [jc: Removing Shawn from CC who passed away quite a while ago, RIP].
>
> > Unless --all is given, get_name() rejects every ref outside refs/tags/.
> > The rejection happens only after the ref backend has enumerated the ref,
> > so repositories with many other refs spend most of a simple describe
> > invocation visiting refs which cannot affect its result.
> > ...
> > Both revisions were built with -O3, -mcpu=native, and ThinLTO using
> > Apple clang 21.0.0 on macOS 26.5. The machine was a MacBook Pro
> > (Mac16,6) with a 16-core Apple M4 Max (12 performance and four
> > efficiency cores) and 128 GB RAM.
> >
> > Signed-off-by: Tamir Duberstein <tamird@gmail.com>
> > ---
> > builtin/describe.c | 3 +++
> > t/perf/p6100-describe.sh | 20 ++++++++++++++++++++
> > 2 files changed, 23 insertions(+)
>
> Interesting. How would this relate to and work well with
> <20260601233727.43558-1-jacob.e.keller@intel.com>?
They are orthogonal. That patch changes the argument construction
inside the `contains` block, which invokes `cmd_name_rev()` and
returns. This patch changes the ref iterator used after that block, so
it only affects the ordinary, non-`--contains` path.
>
> > +test_lazy_prereq PERF_REFFILES '
> > + test "$(git rev-parse --show-ref-format)" = files
> > +'
> > +
> > +ref_count=10000
> > +
> > # clear out old tags and give us a known state
> > test_expect_success 'set up tags' '
> > git for-each-ref --format="delete %(refname)" refs/tags >to-delete &&
> > @@ -27,4 +33,18 @@ test_perf 'describe HEAD with one tag' '
> > git describe --match=new HEAD
> > '
> >
> > +test_expect_success PERF_REFFILES 'set up many unrelated refs' '
> > + git tag -m tip tip HEAD &&
> > + for i in $(test_seq $ref_count)
> > + do
> > + printf "create refs/heads/describe-perf/%05d HEAD\n" $i ||
> > + return 1
> > + done >instructions &&
> > + git update-ref --stdin <instructions
> > +'
> > +
> > +test_perf 'describe exact tag with many loose refs' --prereq PERF_REFFILES '
> > + git describe --exact-match HEAD
> > +'
> > +
>
> Is there a strong reason to guard this new test behind
> `PERF_REFFILES`?
>
> Even though the penalty of enumerating 10,000 unrelated loose
> references may be most pronounced in the `files` backend, skipping
> unnecessary reference enumeration is an architectural win for other
> backends (like `reftable` or a fully packed repository) as well.
>
> If we drop `PERF_REFFILES` and retitle the test to "describe exact
> tag with many unrelated refs", we could run it unconditionally to
> benchmark the improvement across all storage formats.
Yeah, there's no good reason - and Patrick made the same observation.
In v2 I will remove the prerequisite and rename the case to refer to
unrelated rather than loose refs.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-08 15:54 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-07 20:51 [PATCH] describe: limit default ref iteration to tags Tamir Duberstein
2026-06-08 6:59 ` Patrick Steinhardt
2026-06-08 15:46 ` Tamir Duberstein
2026-06-08 12:36 ` Junio C Hamano
2026-06-08 15:53 ` Tamir Duberstein
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox