Git development
 help / color / mirror / Atom feed
* [PATCH] describe: limit default ref iteration to tags
@ 2026-06-07 20:51 Tamir Duberstein
  2026-06-08  6:59 ` Patrick Steinhardt
  2026-06-08 12:36 ` Junio C Hamano
  0 siblings, 2 replies; 5+ messages in thread
From: Tamir Duberstein @ 2026-06-07 20:51 UTC (permalink / raw)
  To: git
  Cc: Shawn O. Pearce, Junio C Hamano, Jeff King, Patrick Steinhardt,
	Tamir Duberstein

Unless --all is given, get_name() rejects every ref outside refs/tags/.
The rejection happens only after the ref backend has enumerated the ref,
so repositories with many other refs spend most of a simple describe
invocation visiting refs which cannot affect its result.

Commit 8a5a1884e9 (Avoid accessing non-tag refs in git-describe unless
--all is requested, 2008-02-24) moved this rejection before object
lookup, but left iteration unscoped. Pass the existing refs/tags/
restriction to the iterator unless --all is given so the backend can
avoid unrelated refs.

On a checkout with 124,357 refs, of which 330 were tags, I ran the
following command with the parent and patched binaries:

    hyperfine --warmup 3 --runs 15 \
        'git describe --always --long --abbrev=40 HEAD'

The results were:

             parent       this commit
  elapsed    196.2 ms      63.3 ms
  user        69.5 ms      48.0 ms
  system     123.0 ms      12.0 ms

The wall-time standard deviations were 13.2 ms and 2.6 ms, respectively,
for a 3.10x speedup.

Both revisions were built with -O3, -mcpu=native, and ThinLTO using
Apple clang 21.0.0 on macOS 26.5. The machine was a MacBook Pro
(Mac16,6) with a 16-core Apple M4 Max (12 performance and four
efficiency cores) and 128 GB RAM.

Signed-off-by: Tamir Duberstein <tamird@gmail.com>
---
 builtin/describe.c       |  3 +++
 t/perf/p6100-describe.sh | 20 ++++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/builtin/describe.c b/builtin/describe.c
index 1c47d7c0b7..3532c8ff22 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -740,6 +740,9 @@ int cmd_describe(int argc,
 		return ret;
 	}
 
+	if (!all)
+		for_each_ref_opts.prefix = "refs/tags/";
+
 	hashmap_init(&names, commit_name_neq, NULL, 0);
 	refs_for_each_ref_ext(get_main_ref_store(the_repository),
 			      get_name, NULL, &for_each_ref_opts);
diff --git a/t/perf/p6100-describe.sh b/t/perf/p6100-describe.sh
index 069f91ce49..dfcaf59e90 100755
--- a/t/perf/p6100-describe.sh
+++ b/t/perf/p6100-describe.sh
@@ -5,6 +5,12 @@ test_description='performance of git-describe'
 
 test_perf_default_repo
 
+test_lazy_prereq PERF_REFFILES '
+	test "$(git rev-parse --show-ref-format)" = files
+'
+
+ref_count=10000
+
 # clear out old tags and give us a known state
 test_expect_success 'set up tags' '
 	git for-each-ref --format="delete %(refname)" refs/tags >to-delete &&
@@ -27,4 +33,18 @@ test_perf 'describe HEAD with one tag' '
 	git describe --match=new HEAD
 '
 
+test_expect_success PERF_REFFILES 'set up many unrelated refs' '
+	git tag -m tip tip HEAD &&
+	for i in $(test_seq $ref_count)
+	do
+		printf "create refs/heads/describe-perf/%05d HEAD\n" $i ||
+		return 1
+	done >instructions &&
+	git update-ref --stdin <instructions
+'
+
+test_perf 'describe exact tag with many loose refs' --prereq PERF_REFFILES '
+	git describe --exact-match HEAD
+'
+
 test_done

---
base-commit: 9ac3f193c05c2237e2b14ebaa1149e9fc8a1abe0
change-id: 20260607-describe-tag-ref-scope-7d00ae140a58

Best regards,
--  
Tamir Duberstein <tamird@gmail.com>


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] describe: limit default ref iteration to tags
  2026-06-07 20:51 [PATCH] describe: limit default ref iteration to tags Tamir Duberstein
@ 2026-06-08  6:59 ` Patrick Steinhardt
  2026-06-08 15:46   ` Tamir Duberstein
  2026-06-08 12:36 ` Junio C Hamano
  1 sibling, 1 reply; 5+ messages in thread
From: Patrick Steinhardt @ 2026-06-08  6:59 UTC (permalink / raw)
  To: Tamir Duberstein; +Cc: git, Shawn O. Pearce, Junio C Hamano, Jeff King

On Sun, Jun 07, 2026 at 04:51:53PM -0400, Tamir Duberstein wrote:
> Unless --all is given, get_name() rejects every ref outside refs/tags/.
> The rejection happens only after the ref backend has enumerated the ref,
> so repositories with many other refs spend most of a simple describe
> invocation visiting refs which cannot affect its result.

Right. The relevant block is this one:

	if (skip_prefix(ref->name, "refs/tags/", &path_to_match)) {
		is_tag = 1;
	} else if (all) {
		if ((exclude_patterns.nr || patterns.nr) &&
		    !skip_prefix(ref->name, "refs/heads/", &path_to_match) &&
		    !skip_prefix(ref->name, "refs/remotes/", &path_to_match)) {
			/* Only accept reference of known type if there are match/exclude patterns */
			return 0;
		}
	} else {
		/* Reject anything outside refs/tags/ unless --all */
		return 0;
	}

So we really only use tags unless "--all" is given.

> Commit 8a5a1884e9 (Avoid accessing non-tag refs in git-describe unless
> --all is requested, 2008-02-24) moved this rejection before object
> lookup, but left iteration unscoped. Pass the existing refs/tags/
> restriction to the iterator unless --all is given so the backend can
> avoid unrelated refs.
> 
> On a checkout with 124,357 refs, of which 330 were tags, I ran the
> following command with the parent and patched binaries:
> 
>     hyperfine --warmup 3 --runs 15 \
>         'git describe --always --long --abbrev=40 HEAD'
> 
> The results were:
> 
>              parent       this commit
>   elapsed    196.2 ms      63.3 ms
>   user        69.5 ms      48.0 ms
>   system     123.0 ms      12.0 ms

It's a bit curious that you don't post the hyperfine(1) results as-is
here.

> The wall-time standard deviations were 13.2 ms and 2.6 ms, respectively,
> for a 3.10x speedup.

Makes sense that this would result in a sizeable speedup, depending of
course on the shape of the existing refs in the repository.

> diff --git a/builtin/describe.c b/builtin/describe.c
> index 1c47d7c0b7..3532c8ff22 100644
> --- a/builtin/describe.c
> +++ b/builtin/describe.c
> @@ -740,6 +740,9 @@ int cmd_describe(int argc,
>  		return ret;
>  	}
>  
> +	if (!all)
> +		for_each_ref_opts.prefix = "refs/tags/";
> +
>  	hashmap_init(&names, commit_name_neq, NULL, 0);
>  	refs_for_each_ref_ext(get_main_ref_store(the_repository),
>  			      get_name, NULL, &for_each_ref_opts);

Another performance optimization that we could do here is to wire up the
exclude patterns via `for_each_ref_opts.exclude_patterns`. But that's
outside the scope of this patch series, and also much less likely to
help many use cases out there.

> diff --git a/t/perf/p6100-describe.sh b/t/perf/p6100-describe.sh
> index 069f91ce49..dfcaf59e90 100755
> --- a/t/perf/p6100-describe.sh
> +++ b/t/perf/p6100-describe.sh
> @@ -5,6 +5,12 @@ test_description='performance of git-describe'
>  
>  test_perf_default_repo
>  
> +test_lazy_prereq PERF_REFFILES '
> +	test "$(git rev-parse --show-ref-format)" = files
> +'
> +
> +ref_count=10000

Let's not declare this variable outside of tests.

> @@ -27,4 +33,18 @@ test_perf 'describe HEAD with one tag' '
>  	git describe --match=new HEAD
>  '
>  
> +test_expect_success PERF_REFFILES 'set up many unrelated refs' '
> +	git tag -m tip tip HEAD &&
> +	for i in $(test_seq $ref_count)
> +	do
> +		printf "create refs/heads/describe-perf/%05d HEAD\n" $i ||
> +		return 1
> +	done >instructions &&
> +	git update-ref --stdin <instructions
> +'

Why is this limited to the "files" backend, only? The logic should work
for both backends as-is.

Thanks!

Patrick

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] describe: limit default ref iteration to tags
  2026-06-07 20:51 [PATCH] describe: limit default ref iteration to tags Tamir Duberstein
  2026-06-08  6:59 ` Patrick Steinhardt
@ 2026-06-08 12:36 ` Junio C Hamano
  2026-06-08 15:53   ` Tamir Duberstein
  1 sibling, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2026-06-08 12:36 UTC (permalink / raw)
  To: Tamir Duberstein; +Cc: git, Jeff King, Patrick Steinhardt

Tamir Duberstein <tamird@gmail.com> writes:

[jc: Removing Shawn from CC who passed away quite a while ago, RIP].

> Unless --all is given, get_name() rejects every ref outside refs/tags/.
> The rejection happens only after the ref backend has enumerated the ref,
> so repositories with many other refs spend most of a simple describe
> invocation visiting refs which cannot affect its result.
> ...
> Both revisions were built with -O3, -mcpu=native, and ThinLTO using
> Apple clang 21.0.0 on macOS 26.5. The machine was a MacBook Pro
> (Mac16,6) with a 16-core Apple M4 Max (12 performance and four
> efficiency cores) and 128 GB RAM.
>
> Signed-off-by: Tamir Duberstein <tamird@gmail.com>
> ---
>  builtin/describe.c       |  3 +++
>  t/perf/p6100-describe.sh | 20 ++++++++++++++++++++
>  2 files changed, 23 insertions(+)

Interesting.  How would this relate to and work well with
<20260601233727.43558-1-jacob.e.keller@intel.com>?

> +test_lazy_prereq PERF_REFFILES '
> +	test "$(git rev-parse --show-ref-format)" = files
> +'
> +
> +ref_count=10000
> +
>  # clear out old tags and give us a known state
>  test_expect_success 'set up tags' '
>  	git for-each-ref --format="delete %(refname)" refs/tags >to-delete &&
> @@ -27,4 +33,18 @@ test_perf 'describe HEAD with one tag' '
>  	git describe --match=new HEAD
>  '
>  
> +test_expect_success PERF_REFFILES 'set up many unrelated refs' '
> +	git tag -m tip tip HEAD &&
> +	for i in $(test_seq $ref_count)
> +	do
> +		printf "create refs/heads/describe-perf/%05d HEAD\n" $i ||
> +		return 1
> +	done >instructions &&
> +	git update-ref --stdin <instructions
> +'
> +
> +test_perf 'describe exact tag with many loose refs' --prereq PERF_REFFILES '
> +	git describe --exact-match HEAD
> +'
> +

Is there a strong reason to guard this new test behind
`PERF_REFFILES`?

Even though the penalty of enumerating 10,000 unrelated loose
references may be most pronounced in the `files` backend, skipping
unnecessary reference enumeration is an architectural win for other
backends (like `reftable` or a fully packed repository) as well.

If we drop `PERF_REFFILES` and retitle the test to "describe exact
tag with many unrelated refs", we could run it unconditionally to
benchmark the improvement across all storage formats.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] describe: limit default ref iteration to tags
  2026-06-08  6:59 ` Patrick Steinhardt
@ 2026-06-08 15:46   ` Tamir Duberstein
  0 siblings, 0 replies; 5+ messages in thread
From: Tamir Duberstein @ 2026-06-08 15:46 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Shawn O. Pearce, Junio C Hamano, Jeff King

On Sun, Jun 7, 2026 at 11:59 PM Patrick Steinhardt <ps@pks.im> wrote:
>
> On Sun, Jun 07, 2026 at 04:51:53PM -0400, Tamir Duberstein wrote:
> > Unless --all is given, get_name() rejects every ref outside refs/tags/.
> > The rejection happens only after the ref backend has enumerated the ref,
> > so repositories with many other refs spend most of a simple describe
> > invocation visiting refs which cannot affect its result.
>
> Right. The relevant block is this one:
>
>         if (skip_prefix(ref->name, "refs/tags/", &path_to_match)) {
>                 is_tag = 1;
>         } else if (all) {
>                 if ((exclude_patterns.nr || patterns.nr) &&
>                     !skip_prefix(ref->name, "refs/heads/", &path_to_match) &&
>                     !skip_prefix(ref->name, "refs/remotes/", &path_to_match)) {
>                         /* Only accept reference of known type if there are match/exclude patterns */
>                         return 0;
>                 }
>         } else {
>                 /* Reject anything outside refs/tags/ unless --all */
>                 return 0;
>         }
>
> So we really only use tags unless "--all" is given.
>
> > Commit 8a5a1884e9 (Avoid accessing non-tag refs in git-describe unless
> > --all is requested, 2008-02-24) moved this rejection before object
> > lookup, but left iteration unscoped. Pass the existing refs/tags/
> > restriction to the iterator unless --all is given so the backend can
> > avoid unrelated refs.
> >
> > On a checkout with 124,357 refs, of which 330 were tags, I ran the
> > following command with the parent and patched binaries:
> >
> >     hyperfine --warmup 3 --runs 15 \
> >         'git describe --always --long --abbrev=40 HEAD'
> >
> > The results were:
> >
> >              parent       this commit
> >   elapsed    196.2 ms      63.3 ms
> >   user        69.5 ms      48.0 ms
> >   system     123.0 ms      12.0 ms
>
> It's a bit curious that you don't post the hyperfine(1) results as-is
> here.

Agreed, will include that in v2. For reference:

        Benchmark 1: parent
          Time (mean ± σ):     171.7 ms ±  18.5 ms    [User: 23.9 ms,
System: 133.6 ms]
          Range (min … max):   142.3 ms … 198.3 ms    15 runs

        Benchmark 2: this commit
          Time (mean ± σ):       9.9 ms ±   1.1 ms    [User: 3.3 ms,
System: 4.7 ms]
          Range (min … max):     8.8 ms …  13.1 ms    15 runs

>
> > The wall-time standard deviations were 13.2 ms and 2.6 ms, respectively,
> > for a 3.10x speedup.
>
> Makes sense that this would result in a sizeable speedup, depending of
> course on the shape of the existing refs in the repository.
>
> > diff --git a/builtin/describe.c b/builtin/describe.c
> > index 1c47d7c0b7..3532c8ff22 100644
> > --- a/builtin/describe.c
> > +++ b/builtin/describe.c
> > @@ -740,6 +740,9 @@ int cmd_describe(int argc,
> >               return ret;
> >       }
> >
> > +     if (!all)
> > +             for_each_ref_opts.prefix = "refs/tags/";
> > +
> >       hashmap_init(&names, commit_name_neq, NULL, 0);
> >       refs_for_each_ref_ext(get_main_ref_store(the_repository),
> >                             get_name, NULL, &for_each_ref_opts);
>
> Another performance optimization that we could do here is to wire up the
> exclude patterns via `for_each_ref_opts.exclude_patterns`. But that's
> outside the scope of this patch series, and also much less likely to
> help many use cases out there.

I tried this and have a separate patch prepared.

The patterns cannot be passed through verbatim: `git describe
--exclude=foo` excludes the exact name `foo`, while the refs API would
treat `foo` as a directory prefix and also skip `foo/*`. The patch
therefore passes only patterns consisting of a literal prefix followed
by trailing asterisks, adds back the applicable ref namespace, and
retains the existing callback filtering.

With 30,000 packed remote-tracking refs under an excluded prefix, the
perf test invokes `git describe` ten times per run:

```
                                  master           patched
describe excluding many refs   0.16(0.07+0.05)  0.12(0.04+0.05)
```

That is a 25% wall-time reduction, with user CPU falling from 0.07 to
0.04 seconds.

I also tested a larger checkout with 62,170 refs under
`refs/remotes/origin/`:

```
git describe --all --exact-match --exclude='origin/*' HEAD
```

This improved from 176.7 ms to 161.3 ms, or about 10%. Startup work
unrelated to ref iteration dominates more of that repository's runtime.

>
> > diff --git a/t/perf/p6100-describe.sh b/t/perf/p6100-describe.sh
> > index 069f91ce49..dfcaf59e90 100755
> > --- a/t/perf/p6100-describe.sh
> > +++ b/t/perf/p6100-describe.sh
> > @@ -5,6 +5,12 @@ test_description='performance of git-describe'
> >
> >  test_perf_default_repo
> >
> > +test_lazy_prereq PERF_REFFILES '
> > +     test "$(git rev-parse --show-ref-format)" = files
> > +'
> > +
> > +ref_count=10000
>
> Let's not declare this variable outside of tests.

Done in v2.

>
> > @@ -27,4 +33,18 @@ test_perf 'describe HEAD with one tag' '
> >       git describe --match=new HEAD
> >  '
> >
> > +test_expect_success PERF_REFFILES 'set up many unrelated refs' '
> > +     git tag -m tip tip HEAD &&
> > +     for i in $(test_seq $ref_count)
> > +     do
> > +             printf "create refs/heads/describe-perf/%05d HEAD\n" $i ||
> > +             return 1
> > +     done >instructions &&
> > +     git update-ref --stdin <instructions
> > +'
>
> Why is this limited to the "files" backend, only? The logic should work
> for both backends as-is.

You're right, fixed in v2.

>
> Thanks!

Thanks for the quick review!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] describe: limit default ref iteration to tags
  2026-06-08 12:36 ` Junio C Hamano
@ 2026-06-08 15:53   ` Tamir Duberstein
  0 siblings, 0 replies; 5+ messages in thread
From: Tamir Duberstein @ 2026-06-08 15:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Patrick Steinhardt

On Mon, Jun 8, 2026 at 5:36 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Tamir Duberstein <tamird@gmail.com> writes:
>
> [jc: Removing Shawn from CC who passed away quite a while ago, RIP].
>
> > Unless --all is given, get_name() rejects every ref outside refs/tags/.
> > The rejection happens only after the ref backend has enumerated the ref,
> > so repositories with many other refs spend most of a simple describe
> > invocation visiting refs which cannot affect its result.
> > ...
> > Both revisions were built with -O3, -mcpu=native, and ThinLTO using
> > Apple clang 21.0.0 on macOS 26.5. The machine was a MacBook Pro
> > (Mac16,6) with a 16-core Apple M4 Max (12 performance and four
> > efficiency cores) and 128 GB RAM.
> >
> > Signed-off-by: Tamir Duberstein <tamird@gmail.com>
> > ---
> >  builtin/describe.c       |  3 +++
> >  t/perf/p6100-describe.sh | 20 ++++++++++++++++++++
> >  2 files changed, 23 insertions(+)
>
> Interesting.  How would this relate to and work well with
> <20260601233727.43558-1-jacob.e.keller@intel.com>?

They are orthogonal. That patch changes the argument construction
inside the `contains` block, which invokes `cmd_name_rev()` and
returns. This patch changes the ref iterator used after that block, so
it only affects the ordinary, non-`--contains` path.

>
> > +test_lazy_prereq PERF_REFFILES '
> > +     test "$(git rev-parse --show-ref-format)" = files
> > +'
> > +
> > +ref_count=10000
> > +
> >  # clear out old tags and give us a known state
> >  test_expect_success 'set up tags' '
> >       git for-each-ref --format="delete %(refname)" refs/tags >to-delete &&
> > @@ -27,4 +33,18 @@ test_perf 'describe HEAD with one tag' '
> >       git describe --match=new HEAD
> >  '
> >
> > +test_expect_success PERF_REFFILES 'set up many unrelated refs' '
> > +     git tag -m tip tip HEAD &&
> > +     for i in $(test_seq $ref_count)
> > +     do
> > +             printf "create refs/heads/describe-perf/%05d HEAD\n" $i ||
> > +             return 1
> > +     done >instructions &&
> > +     git update-ref --stdin <instructions
> > +'
> > +
> > +test_perf 'describe exact tag with many loose refs' --prereq PERF_REFFILES '
> > +     git describe --exact-match HEAD
> > +'
> > +
>
> Is there a strong reason to guard this new test behind
> `PERF_REFFILES`?
>
> Even though the penalty of enumerating 10,000 unrelated loose
> references may be most pronounced in the `files` backend, skipping
> unnecessary reference enumeration is an architectural win for other
> backends (like `reftable` or a fully packed repository) as well.
>
> If we drop `PERF_REFFILES` and retitle the test to "describe exact
> tag with many unrelated refs", we could run it unconditionally to
> benchmark the improvement across all storage formats.

Yeah, there's no good reason - and Patrick made the same observation.
In v2 I will remove the prerequisite and rename the case to refer to
unrelated rather than loose refs.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-08 15:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-07 20:51 [PATCH] describe: limit default ref iteration to tags Tamir Duberstein
2026-06-08  6:59 ` Patrick Steinhardt
2026-06-08 15:46   ` Tamir Duberstein
2026-06-08 12:36 ` Junio C Hamano
2026-06-08 15:53   ` Tamir Duberstein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox