* [PATCH] describe: limit default ref iteration to tags
@ 2026-06-07 20:51 Tamir Duberstein
2026-06-08 6:59 ` Patrick Steinhardt
2026-06-08 12:36 ` Junio C Hamano
0 siblings, 2 replies; 5+ messages in thread
From: Tamir Duberstein @ 2026-06-07 20:51 UTC (permalink / raw)
To: git
Cc: Shawn O. Pearce, Junio C Hamano, Jeff King, Patrick Steinhardt,
Tamir Duberstein
Unless --all is given, get_name() rejects every ref outside refs/tags/.
The rejection happens only after the ref backend has enumerated the ref,
so repositories with many other refs spend most of a simple describe
invocation visiting refs which cannot affect its result.
Commit 8a5a1884e9 (Avoid accessing non-tag refs in git-describe unless
--all is requested, 2008-02-24) moved this rejection before object
lookup, but left iteration unscoped. Pass the existing refs/tags/
restriction to the iterator unless --all is given so the backend can
avoid unrelated refs.
On a checkout with 124,357 refs, of which 330 were tags, I ran the
following command with the parent and patched binaries:
hyperfine --warmup 3 --runs 15 \
'git describe --always --long --abbrev=40 HEAD'
The results were:
parent this commit
elapsed 196.2 ms 63.3 ms
user 69.5 ms 48.0 ms
system 123.0 ms 12.0 ms
The wall-time standard deviations were 13.2 ms and 2.6 ms, respectively,
for a 3.10x speedup.
Both revisions were built with -O3, -mcpu=native, and ThinLTO using
Apple clang 21.0.0 on macOS 26.5. The machine was a MacBook Pro
(Mac16,6) with a 16-core Apple M4 Max (12 performance and four
efficiency cores) and 128 GB RAM.
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
---
builtin/describe.c | 3 +++
t/perf/p6100-describe.sh | 20 ++++++++++++++++++++
2 files changed, 23 insertions(+)
diff --git a/builtin/describe.c b/builtin/describe.c
index 1c47d7c0b7..3532c8ff22 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -740,6 +740,9 @@ int cmd_describe(int argc,
return ret;
}
+ if (!all)
+ for_each_ref_opts.prefix = "refs/tags/";
+
hashmap_init(&names, commit_name_neq, NULL, 0);
refs_for_each_ref_ext(get_main_ref_store(the_repository),
get_name, NULL, &for_each_ref_opts);
diff --git a/t/perf/p6100-describe.sh b/t/perf/p6100-describe.sh
index 069f91ce49..dfcaf59e90 100755
--- a/t/perf/p6100-describe.sh
+++ b/t/perf/p6100-describe.sh
@@ -5,6 +5,12 @@ test_description='performance of git-describe'
test_perf_default_repo
+test_lazy_prereq PERF_REFFILES '
+ test "$(git rev-parse --show-ref-format)" = files
+'
+
+ref_count=10000
+
# clear out old tags and give us a known state
test_expect_success 'set up tags' '
git for-each-ref --format="delete %(refname)" refs/tags >to-delete &&
@@ -27,4 +33,18 @@ test_perf 'describe HEAD with one tag' '
git describe --match=new HEAD
'
+test_expect_success PERF_REFFILES 'set up many unrelated refs' '
+ git tag -m tip tip HEAD &&
+ for i in $(test_seq $ref_count)
+ do
+ printf "create refs/heads/describe-perf/%05d HEAD\n" $i ||
+ return 1
+ done >instructions &&
+ git update-ref --stdin <instructions
+'
+
+test_perf 'describe exact tag with many loose refs' --prereq PERF_REFFILES '
+ git describe --exact-match HEAD
+'
+
test_done
---
base-commit: 9ac3f193c05c2237e2b14ebaa1149e9fc8a1abe0
change-id: 20260607-describe-tag-ref-scope-7d00ae140a58
Best regards,
--
Tamir Duberstein <tamird@gmail.com>
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] describe: limit default ref iteration to tags 2026-06-07 20:51 [PATCH] describe: limit default ref iteration to tags Tamir Duberstein @ 2026-06-08 6:59 ` Patrick Steinhardt 2026-06-08 15:46 ` Tamir Duberstein 2026-06-08 12:36 ` Junio C Hamano 1 sibling, 1 reply; 5+ messages in thread From: Patrick Steinhardt @ 2026-06-08 6:59 UTC (permalink / raw) To: Tamir Duberstein; +Cc: git, Shawn O. Pearce, Junio C Hamano, Jeff King On Sun, Jun 07, 2026 at 04:51:53PM -0400, Tamir Duberstein wrote: > Unless --all is given, get_name() rejects every ref outside refs/tags/. > The rejection happens only after the ref backend has enumerated the ref, > so repositories with many other refs spend most of a simple describe > invocation visiting refs which cannot affect its result. Right. The relevant block is this one: if (skip_prefix(ref->name, "refs/tags/", &path_to_match)) { is_tag = 1; } else if (all) { if ((exclude_patterns.nr || patterns.nr) && !skip_prefix(ref->name, "refs/heads/", &path_to_match) && !skip_prefix(ref->name, "refs/remotes/", &path_to_match)) { /* Only accept reference of known type if there are match/exclude patterns */ return 0; } } else { /* Reject anything outside refs/tags/ unless --all */ return 0; } So we really only use tags unless "--all" is given. > Commit 8a5a1884e9 (Avoid accessing non-tag refs in git-describe unless > --all is requested, 2008-02-24) moved this rejection before object > lookup, but left iteration unscoped. Pass the existing refs/tags/ > restriction to the iterator unless --all is given so the backend can > avoid unrelated refs. > > On a checkout with 124,357 refs, of which 330 were tags, I ran the > following command with the parent and patched binaries: > > hyperfine --warmup 3 --runs 15 \ > 'git describe --always --long --abbrev=40 HEAD' > > The results were: > > parent this commit > elapsed 196.2 ms 63.3 ms > user 69.5 ms 48.0 ms > system 123.0 ms 12.0 ms It's a bit curious that you don't post the hyperfine(1) results as-is here. > The wall-time standard deviations were 13.2 ms and 2.6 ms, respectively, > for a 3.10x speedup. Makes sense that this would result in a sizeable speedup, depending of course on the shape of the existing refs in the repository. > diff --git a/builtin/describe.c b/builtin/describe.c > index 1c47d7c0b7..3532c8ff22 100644 > --- a/builtin/describe.c > +++ b/builtin/describe.c > @@ -740,6 +740,9 @@ int cmd_describe(int argc, > return ret; > } > > + if (!all) > + for_each_ref_opts.prefix = "refs/tags/"; > + > hashmap_init(&names, commit_name_neq, NULL, 0); > refs_for_each_ref_ext(get_main_ref_store(the_repository), > get_name, NULL, &for_each_ref_opts); Another performance optimization that we could do here is to wire up the exclude patterns via `for_each_ref_opts.exclude_patterns`. But that's outside the scope of this patch series, and also much less likely to help many use cases out there. > diff --git a/t/perf/p6100-describe.sh b/t/perf/p6100-describe.sh > index 069f91ce49..dfcaf59e90 100755 > --- a/t/perf/p6100-describe.sh > +++ b/t/perf/p6100-describe.sh > @@ -5,6 +5,12 @@ test_description='performance of git-describe' > > test_perf_default_repo > > +test_lazy_prereq PERF_REFFILES ' > + test "$(git rev-parse --show-ref-format)" = files > +' > + > +ref_count=10000 Let's not declare this variable outside of tests. > @@ -27,4 +33,18 @@ test_perf 'describe HEAD with one tag' ' > git describe --match=new HEAD > ' > > +test_expect_success PERF_REFFILES 'set up many unrelated refs' ' > + git tag -m tip tip HEAD && > + for i in $(test_seq $ref_count) > + do > + printf "create refs/heads/describe-perf/%05d HEAD\n" $i || > + return 1 > + done >instructions && > + git update-ref --stdin <instructions > +' Why is this limited to the "files" backend, only? The logic should work for both backends as-is. Thanks! Patrick ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] describe: limit default ref iteration to tags 2026-06-08 6:59 ` Patrick Steinhardt @ 2026-06-08 15:46 ` Tamir Duberstein 0 siblings, 0 replies; 5+ messages in thread From: Tamir Duberstein @ 2026-06-08 15:46 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git, Shawn O. Pearce, Junio C Hamano, Jeff King On Sun, Jun 7, 2026 at 11:59 PM Patrick Steinhardt <ps@pks.im> wrote: > > On Sun, Jun 07, 2026 at 04:51:53PM -0400, Tamir Duberstein wrote: > > Unless --all is given, get_name() rejects every ref outside refs/tags/. > > The rejection happens only after the ref backend has enumerated the ref, > > so repositories with many other refs spend most of a simple describe > > invocation visiting refs which cannot affect its result. > > Right. The relevant block is this one: > > if (skip_prefix(ref->name, "refs/tags/", &path_to_match)) { > is_tag = 1; > } else if (all) { > if ((exclude_patterns.nr || patterns.nr) && > !skip_prefix(ref->name, "refs/heads/", &path_to_match) && > !skip_prefix(ref->name, "refs/remotes/", &path_to_match)) { > /* Only accept reference of known type if there are match/exclude patterns */ > return 0; > } > } else { > /* Reject anything outside refs/tags/ unless --all */ > return 0; > } > > So we really only use tags unless "--all" is given. > > > Commit 8a5a1884e9 (Avoid accessing non-tag refs in git-describe unless > > --all is requested, 2008-02-24) moved this rejection before object > > lookup, but left iteration unscoped. Pass the existing refs/tags/ > > restriction to the iterator unless --all is given so the backend can > > avoid unrelated refs. > > > > On a checkout with 124,357 refs, of which 330 were tags, I ran the > > following command with the parent and patched binaries: > > > > hyperfine --warmup 3 --runs 15 \ > > 'git describe --always --long --abbrev=40 HEAD' > > > > The results were: > > > > parent this commit > > elapsed 196.2 ms 63.3 ms > > user 69.5 ms 48.0 ms > > system 123.0 ms 12.0 ms > > It's a bit curious that you don't post the hyperfine(1) results as-is > here. Agreed, will include that in v2. For reference: Benchmark 1: parent Time (mean ± σ): 171.7 ms ± 18.5 ms [User: 23.9 ms, System: 133.6 ms] Range (min … max): 142.3 ms … 198.3 ms 15 runs Benchmark 2: this commit Time (mean ± σ): 9.9 ms ± 1.1 ms [User: 3.3 ms, System: 4.7 ms] Range (min … max): 8.8 ms … 13.1 ms 15 runs > > > The wall-time standard deviations were 13.2 ms and 2.6 ms, respectively, > > for a 3.10x speedup. > > Makes sense that this would result in a sizeable speedup, depending of > course on the shape of the existing refs in the repository. > > > diff --git a/builtin/describe.c b/builtin/describe.c > > index 1c47d7c0b7..3532c8ff22 100644 > > --- a/builtin/describe.c > > +++ b/builtin/describe.c > > @@ -740,6 +740,9 @@ int cmd_describe(int argc, > > return ret; > > } > > > > + if (!all) > > + for_each_ref_opts.prefix = "refs/tags/"; > > + > > hashmap_init(&names, commit_name_neq, NULL, 0); > > refs_for_each_ref_ext(get_main_ref_store(the_repository), > > get_name, NULL, &for_each_ref_opts); > > Another performance optimization that we could do here is to wire up the > exclude patterns via `for_each_ref_opts.exclude_patterns`. But that's > outside the scope of this patch series, and also much less likely to > help many use cases out there. I tried this and have a separate patch prepared. The patterns cannot be passed through verbatim: `git describe --exclude=foo` excludes the exact name `foo`, while the refs API would treat `foo` as a directory prefix and also skip `foo/*`. The patch therefore passes only patterns consisting of a literal prefix followed by trailing asterisks, adds back the applicable ref namespace, and retains the existing callback filtering. With 30,000 packed remote-tracking refs under an excluded prefix, the perf test invokes `git describe` ten times per run: ``` master patched describe excluding many refs 0.16(0.07+0.05) 0.12(0.04+0.05) ``` That is a 25% wall-time reduction, with user CPU falling from 0.07 to 0.04 seconds. I also tested a larger checkout with 62,170 refs under `refs/remotes/origin/`: ``` git describe --all --exact-match --exclude='origin/*' HEAD ``` This improved from 176.7 ms to 161.3 ms, or about 10%. Startup work unrelated to ref iteration dominates more of that repository's runtime. > > > diff --git a/t/perf/p6100-describe.sh b/t/perf/p6100-describe.sh > > index 069f91ce49..dfcaf59e90 100755 > > --- a/t/perf/p6100-describe.sh > > +++ b/t/perf/p6100-describe.sh > > @@ -5,6 +5,12 @@ test_description='performance of git-describe' > > > > test_perf_default_repo > > > > +test_lazy_prereq PERF_REFFILES ' > > + test "$(git rev-parse --show-ref-format)" = files > > +' > > + > > +ref_count=10000 > > Let's not declare this variable outside of tests. Done in v2. > > > @@ -27,4 +33,18 @@ test_perf 'describe HEAD with one tag' ' > > git describe --match=new HEAD > > ' > > > > +test_expect_success PERF_REFFILES 'set up many unrelated refs' ' > > + git tag -m tip tip HEAD && > > + for i in $(test_seq $ref_count) > > + do > > + printf "create refs/heads/describe-perf/%05d HEAD\n" $i || > > + return 1 > > + done >instructions && > > + git update-ref --stdin <instructions > > +' > > Why is this limited to the "files" backend, only? The logic should work > for both backends as-is. You're right, fixed in v2. > > Thanks! Thanks for the quick review! ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] describe: limit default ref iteration to tags 2026-06-07 20:51 [PATCH] describe: limit default ref iteration to tags Tamir Duberstein 2026-06-08 6:59 ` Patrick Steinhardt @ 2026-06-08 12:36 ` Junio C Hamano 2026-06-08 15:53 ` Tamir Duberstein 1 sibling, 1 reply; 5+ messages in thread From: Junio C Hamano @ 2026-06-08 12:36 UTC (permalink / raw) To: Tamir Duberstein; +Cc: git, Jeff King, Patrick Steinhardt Tamir Duberstein <tamird@gmail.com> writes: [jc: Removing Shawn from CC who passed away quite a while ago, RIP]. > Unless --all is given, get_name() rejects every ref outside refs/tags/. > The rejection happens only after the ref backend has enumerated the ref, > so repositories with many other refs spend most of a simple describe > invocation visiting refs which cannot affect its result. > ... > Both revisions were built with -O3, -mcpu=native, and ThinLTO using > Apple clang 21.0.0 on macOS 26.5. The machine was a MacBook Pro > (Mac16,6) with a 16-core Apple M4 Max (12 performance and four > efficiency cores) and 128 GB RAM. > > Signed-off-by: Tamir Duberstein <tamird@gmail.com> > --- > builtin/describe.c | 3 +++ > t/perf/p6100-describe.sh | 20 ++++++++++++++++++++ > 2 files changed, 23 insertions(+) Interesting. How would this relate to and work well with <20260601233727.43558-1-jacob.e.keller@intel.com>? > +test_lazy_prereq PERF_REFFILES ' > + test "$(git rev-parse --show-ref-format)" = files > +' > + > +ref_count=10000 > + > # clear out old tags and give us a known state > test_expect_success 'set up tags' ' > git for-each-ref --format="delete %(refname)" refs/tags >to-delete && > @@ -27,4 +33,18 @@ test_perf 'describe HEAD with one tag' ' > git describe --match=new HEAD > ' > > +test_expect_success PERF_REFFILES 'set up many unrelated refs' ' > + git tag -m tip tip HEAD && > + for i in $(test_seq $ref_count) > + do > + printf "create refs/heads/describe-perf/%05d HEAD\n" $i || > + return 1 > + done >instructions && > + git update-ref --stdin <instructions > +' > + > +test_perf 'describe exact tag with many loose refs' --prereq PERF_REFFILES ' > + git describe --exact-match HEAD > +' > + Is there a strong reason to guard this new test behind `PERF_REFFILES`? Even though the penalty of enumerating 10,000 unrelated loose references may be most pronounced in the `files` backend, skipping unnecessary reference enumeration is an architectural win for other backends (like `reftable` or a fully packed repository) as well. If we drop `PERF_REFFILES` and retitle the test to "describe exact tag with many unrelated refs", we could run it unconditionally to benchmark the improvement across all storage formats. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] describe: limit default ref iteration to tags 2026-06-08 12:36 ` Junio C Hamano @ 2026-06-08 15:53 ` Tamir Duberstein 0 siblings, 0 replies; 5+ messages in thread From: Tamir Duberstein @ 2026-06-08 15:53 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Jeff King, Patrick Steinhardt On Mon, Jun 8, 2026 at 5:36 AM Junio C Hamano <gitster@pobox.com> wrote: > > Tamir Duberstein <tamird@gmail.com> writes: > > [jc: Removing Shawn from CC who passed away quite a while ago, RIP]. > > > Unless --all is given, get_name() rejects every ref outside refs/tags/. > > The rejection happens only after the ref backend has enumerated the ref, > > so repositories with many other refs spend most of a simple describe > > invocation visiting refs which cannot affect its result. > > ... > > Both revisions were built with -O3, -mcpu=native, and ThinLTO using > > Apple clang 21.0.0 on macOS 26.5. The machine was a MacBook Pro > > (Mac16,6) with a 16-core Apple M4 Max (12 performance and four > > efficiency cores) and 128 GB RAM. > > > > Signed-off-by: Tamir Duberstein <tamird@gmail.com> > > --- > > builtin/describe.c | 3 +++ > > t/perf/p6100-describe.sh | 20 ++++++++++++++++++++ > > 2 files changed, 23 insertions(+) > > Interesting. How would this relate to and work well with > <20260601233727.43558-1-jacob.e.keller@intel.com>? They are orthogonal. That patch changes the argument construction inside the `contains` block, which invokes `cmd_name_rev()` and returns. This patch changes the ref iterator used after that block, so it only affects the ordinary, non-`--contains` path. > > > +test_lazy_prereq PERF_REFFILES ' > > + test "$(git rev-parse --show-ref-format)" = files > > +' > > + > > +ref_count=10000 > > + > > # clear out old tags and give us a known state > > test_expect_success 'set up tags' ' > > git for-each-ref --format="delete %(refname)" refs/tags >to-delete && > > @@ -27,4 +33,18 @@ test_perf 'describe HEAD with one tag' ' > > git describe --match=new HEAD > > ' > > > > +test_expect_success PERF_REFFILES 'set up many unrelated refs' ' > > + git tag -m tip tip HEAD && > > + for i in $(test_seq $ref_count) > > + do > > + printf "create refs/heads/describe-perf/%05d HEAD\n" $i || > > + return 1 > > + done >instructions && > > + git update-ref --stdin <instructions > > +' > > + > > +test_perf 'describe exact tag with many loose refs' --prereq PERF_REFFILES ' > > + git describe --exact-match HEAD > > +' > > + > > Is there a strong reason to guard this new test behind > `PERF_REFFILES`? > > Even though the penalty of enumerating 10,000 unrelated loose > references may be most pronounced in the `files` backend, skipping > unnecessary reference enumeration is an architectural win for other > backends (like `reftable` or a fully packed repository) as well. > > If we drop `PERF_REFFILES` and retitle the test to "describe exact > tag with many unrelated refs", we could run it unconditionally to > benchmark the improvement across all storage formats. Yeah, there's no good reason - and Patrick made the same observation. In v2 I will remove the prerequisite and rename the case to refer to unrelated rather than loose refs. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-08 15:54 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-07 20:51 [PATCH] describe: limit default ref iteration to tags Tamir Duberstein 2026-06-08 6:59 ` Patrick Steinhardt 2026-06-08 15:46 ` Tamir Duberstein 2026-06-08 12:36 ` Junio C Hamano 2026-06-08 15:53 ` Tamir Duberstein
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox