Git development
 help / color / mirror / Atom feed
* [PATCH] ref-filter: restore prefix-scoped iteration
@ 2026-06-05 16:43 Tamir Duberstein
  2026-06-08 21:36 ` Junio C Hamano
  0 siblings, 1 reply; 3+ messages in thread
From: Tamir Duberstein @ 2026-06-05 16:43 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, Patrick Steinhardt, Junio C Hamano, Victoria Dye,
	ZheNing Hu, Tamir Duberstein

Commit dabecb9db2 (for-each-ref: introduce a '--start-after' option,
2025-07-15) changed single-kind branch, remote-tracking branch, and tag
enumeration in do_filter_refs() from constructing an iterator with the
namespace prefix to constructing an unscoped iterator and applying the
prefix with ref_iterator_seek().

Before that change, refs_for_each_fullref_in() passed the namespace
prefix during iterator construction. That helper has since been
replaced by refs_for_each_ref_ext().

The files backend primes its loose-ref cache for the construction
prefix before it opens packed refs. An empty construction prefix
therefore reads every loose ref, and a later seek cannot undo that I/O.
Consequently, git branch, git branch --remotes, and git tag scale with
unrelated loose refs.

Patrick Steinhardt observed during review that iterator construction
and seeking accepted similar strings but assigned them different state
semantics. Junio C Hamano then pointed out that no current command can
combine start_after with this single-kind path, but future branch or
tag support would need to keep the namespace while moving the cursor.

Keep the existing start_after path unchanged. The iterator API cannot
currently seek to one string while retaining another as its prefix:
an unflagged seek clears the prefix, while REF_ITERATOR_SEEK_SET_PREFIX
replaces it with the seek string.

For the commands affected by this regression, which do not set
start_after, pass the namespace prefix during iterator construction so
that loose refs are scoped before the packed-refs snapshot is opened.
This fixes the current regression without deleting the ref-filter state
discussed during review or changing its dormant behavior.

Add REFFILES-gated performance cases with one branch, one
remote-tracking branch, one tag, and 10,000 unrelated loose refs. The
benchmarks were run with:

    GIT_PERF_REPEAT_COUNT=5 GIT_PERF_MAKE_OPTS=-j8 \
        t/perf/run a89346e34a . -- p6300-for-each-ref.sh

The following are the best of five runs, with each run invoking the
command ten times. Times are elapsed seconds with user and system CPU
seconds in parentheses:

                                  a89346e34a       this commit
  branch                       2.74(0.13+2.56)   0.11(0.04+0.04)
  branch --remotes             2.81(0.13+2.62)   0.12(0.04+0.04)
  tag                          3.01(0.14+2.82)   0.11(0.04+0.04)

Both revisions used the default -O2 build flags and a config.mak
containing only "NO_REGEX = NeedsStartEnd". They were built with Apple
clang 21.0.0 on macOS 26.5. The machine was a MacBook Pro (Mac16,6)
with a 16-core Apple M4 Max (12 performance and four efficiency cores)
and 128 GB RAM.

Link: https://lore.kernel.org/git/aGZidwwlToWThkn8@pks.im/
Link: https://lore.kernel.org/git/xmqqikjq7s16.fsf@gitster.g/
Fixes: dabecb9db2b2 ("for-each-ref: introduce a '--start-after' option")
Assisted-by: Codex gpt-5.5
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
---
The series is based on a89346e34a (maint) because the regression has
been present in released versions since Git 2.51.0.
---
 ref-filter.c                 | 30 +++++++++++++++++++++---------
 t/perf/p6300-for-each-ref.sh | 39 ++++++++++++++++++++++++++++++++++++++-
 2 files changed, 59 insertions(+), 10 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 1da4c0e60d..2388a57b39 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -3315,19 +3315,31 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, refs_for
 		prefix = "refs/tags/";
 
 	if (prefix) {
-		struct ref_iterator *iter;
+		if (filter->start_after) {
+			struct ref_iterator *iter;
 
-		iter = refs_ref_iterator_begin(get_main_ref_store(the_repository),
-					       "", NULL, 0, 0);
+			iter = refs_ref_iterator_begin(
+				get_main_ref_store(the_repository), "", NULL, 0,
+				0);
 
-		if (filter->start_after)
 			ret = start_ref_iterator_after(iter, filter->start_after);
-		else
-			ret = ref_iterator_seek(iter, prefix,
-						REF_ITERATOR_SEEK_SET_PREFIX);
+			if (!ret)
+				ret = do_for_each_ref_iterator(iter, fn,
+							       cb_data);
+		} else {
+			/*
+			 * Pass the prefix during construction because the files
+			 * backend primes loose refs before a later seek can
+			 * narrow the iterator.
+			 */
+			struct refs_for_each_ref_options opts = {
+				.prefix = prefix,
+			};
 
-		if (!ret)
-			ret = do_for_each_ref_iterator(iter, fn, cb_data);
+			ret = refs_for_each_ref_ext(
+				get_main_ref_store(the_repository), fn, cb_data,
+				&opts);
+		}
 	} else if (filter->kind & FILTER_REFS_REGULAR) {
 		ret = for_each_fullref_in_pattern(filter, fn, cb_data);
 	}
diff --git a/t/perf/p6300-for-each-ref.sh b/t/perf/p6300-for-each-ref.sh
index fa7289c752..ed9c1c6a19 100755
--- a/t/perf/p6300-for-each-ref.sh
+++ b/t/perf/p6300-for-each-ref.sh
@@ -1,6 +1,6 @@
 #!/bin/sh
 
-test_description='performance of for-each-ref'
+test_description='performance of ref-filter users'
 . ./perf-lib.sh
 
 test_perf_fresh_repo
@@ -84,4 +84,41 @@ test_expect_success 'pack refs' '
 '
 run_tests "packed"
 
+test_expect_success REFFILES 'setup many unrelated loose refs' '
+	git init scoped &&
+	test_commit -C scoped --no-tag base &&
+	test_seq $ref_count_per_type |
+		sed "s,.*,update refs/custom/unrelated_& HEAD," |
+		git -C scoped update-ref --stdin &&
+	git -C scoped update-ref refs/remotes/origin/main HEAD &&
+	git -C scoped update-ref refs/tags/only HEAD
+'
+
+test_perf "branch (many unrelated loose refs)" --prereq REFFILES "
+	(
+		cd scoped &&
+		for i in \$(test_seq $test_iteration_count); do
+			git branch --format='%(refname)' >/dev/null
+		done
+	)
+"
+
+test_perf "branch --remotes (many unrelated loose refs)" --prereq REFFILES "
+	(
+		cd scoped &&
+		for i in \$(test_seq $test_iteration_count); do
+			git branch --remotes --format='%(refname)' >/dev/null
+		done
+	)
+"
+
+test_perf "tag (many unrelated loose refs)" --prereq REFFILES "
+	(
+		cd scoped &&
+		for i in \$(test_seq $test_iteration_count); do
+			git tag --format='%(refname)' >/dev/null
+		done
+	)
+"
+
 test_done

---
base-commit: a89346e34a937f001e5d397ee62224e3e9852040
change-id: 20260605-fix-git-branch-regression-9e4236f18091

Best regards,
--  
Tamir Duberstein <tamird@gmail.com>


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] ref-filter: restore prefix-scoped iteration
  2026-06-05 16:43 [PATCH] ref-filter: restore prefix-scoped iteration Tamir Duberstein
@ 2026-06-08 21:36 ` Junio C Hamano
  2026-06-08 22:39   ` Tamir Duberstein
  0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2026-06-08 21:36 UTC (permalink / raw)
  To: Tamir Duberstein
  Cc: git, Karthik Nayak, Patrick Steinhardt, Victoria Dye, ZheNing Hu

Tamir Duberstein <tamird@gmail.com> writes:

> diff --git a/ref-filter.c b/ref-filter.c
> index 1da4c0e60d..2388a57b39 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -3315,19 +3315,31 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, refs_for
>  		prefix = "refs/tags/";
>  
>  	if (prefix) {

Below, adding an extra call to get_main_ref_store(the_repository)
makes one line unnecessarily split and harder to read.  How about
doing

		struct ref_store *store = get_main_ref_store(the_repository);

upfront here, and then use that to replace these two calls of
get_main_ref_store(the_repository)?

> +		if (filter->start_after) {
> +			struct ref_iterator *iter;
>  
> +			iter = refs_ref_iterator_begin(
> +				get_main_ref_store(the_repository), "", NULL, 0,
> +				0);
>  
>  			ret = start_ref_iterator_after(iter, filter->start_after);
> +			if (!ret)
> +				ret = do_for_each_ref_iterator(iter, fn,
> +							       cb_data);
> +		} else {
> +			/*
> +			 * Pass the prefix during construction because the files
> +			 * backend primes loose refs before a later seek can
> +			 * narrow the iterator.
> +			 */
> +			struct refs_for_each_ref_options opts = {
> +				.prefix = prefix,
> +			};
>  
> +			ret = refs_for_each_ref_ext(
> +				get_main_ref_store(the_repository), fn, cb_data,
> +				&opts);
> +		}
>  	} else if (filter->kind & FILTER_REFS_REGULAR) {

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] ref-filter: restore prefix-scoped iteration
  2026-06-08 21:36 ` Junio C Hamano
@ 2026-06-08 22:39   ` Tamir Duberstein
  0 siblings, 0 replies; 3+ messages in thread
From: Tamir Duberstein @ 2026-06-08 22:39 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Karthik Nayak, Patrick Steinhardt, Victoria Dye, ZheNing Hu

On Mon, Jun 8, 2026 at 2:36 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Tamir Duberstein <tamird@gmail.com> writes:
>
> > diff --git a/ref-filter.c b/ref-filter.c
> > index 1da4c0e60d..2388a57b39 100644
> > --- a/ref-filter.c
> > +++ b/ref-filter.c
> > @@ -3315,19 +3315,31 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, refs_for
> >               prefix = "refs/tags/";
> >
> >       if (prefix) {
>
> Below, adding an extra call to get_main_ref_store(the_repository)
> makes one line unnecessarily split and harder to read.  How about
> doing
>
>                 struct ref_store *store = get_main_ref_store(the_repository);
>
> upfront here, and then use that to replace these two calls of
> get_main_ref_store(the_repository)?

Yep, done in v2.

Thanks for the review!

By the way, how long should I wait before sending new versions of my
patches? I have 4 outstanding at the moment.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-06-08 22:40 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-05 16:43 [PATCH] ref-filter: restore prefix-scoped iteration Tamir Duberstein
2026-06-08 21:36 ` Junio C Hamano
2026-06-08 22:39   ` Tamir Duberstein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox