Git development
 help / color / mirror / Atom feed
* [PATCH] ref-filter: reuse --contains traversal results
@ 2026-06-08  3:33 Tamir Duberstein
  2026-06-08 21:18 ` Karthik Nayak
  2026-06-08 22:34 ` Jeff King
  0 siblings, 2 replies; 7+ messages in thread
From: Tamir Duberstein @ 2026-06-08  3:33 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Karthik Nayak, Junio C Hamano, Victoria Dye,
	Derrick Stolee, Elijah Newren, Tamir Duberstein

git branch and git for-each-ref call repo_is_descendant_of() for each
candidate selected by --contains or --no-contains. Each call starts a
new graph walk, so refs with shared history repeatedly traverse the same
commits.

ffc4b8012d (tag: speed up --contains calculation, 2011-06-11) introduced
the tag traversal that caches positive and negative answers across
candidates. ee2bd06b0f (ref-filter: implement '--contains' option,
2015-07-07) preserved the branch and tag implementations when ref-filter
learned --contains. 008ed7df930 (tag.c: use the correct algorithm for
the '--contains' option, 2015-10-18) noted that they should be unified.

Use the memoized traversal for every ref-filter contains check and
remove the implementation selector. The cache records answers for one
fixed target list, so document that callers must clear it before
changing the list.

The memoized depth-first walk assumes acyclic ancestry, but replacement
refs can create cycles. Track commits while they are on the walk. If a
cycle is found, discard partial cache entries and use
repo_is_descendant_of() for that candidate.

The branch and for-each-ref path passed repo_is_descendant_of() through
a Boolean interface. In configurations where it returned -1 for missing
ancestry, ref-filter treated the error as "contains". The memoized path
instead fails when ancestry cannot be parsed, as git tag already did.
During review of the 2018 reachability series, making parse failures
fatal was explicitly deferred because that series was intended to
preserve behavior. Unifying the implementations now makes all callers
fail consistently instead of preserving that accidental Boolean
interpretation.

The added p1500 case uses up to 8,192 packed refs along one first-parent
history. It improves from 0.68 to 0.03 seconds.

On a checkout with 62,174 remote-tracking refs, I ran:

    hyperfine --warmup 0 --runs 3 \
        --command-name parent \
        '"$parent" branch -r --contains c78ae85f3ce7e >/dev/null' \
        --command-name this-commit \
        '"$this" branch -r --contains c78ae85f3ce7e >/dev/null'

The results were:

             parent       this commit
  elapsed    104.365 s     467.7 ms
  user        93.702 s     220.2 ms
  system       0.723 s     182.7 ms

The wall-time standard deviations were 11.356 seconds and 133.8
milliseconds, respectively, for a 223x speedup. Both commands produced
output with SHA-256
2466f6e2b72aa16b1a2126eddb81c8a1b2764ee251204ac034c191a925aa896f.

Both revisions were rebuilt with the default -O2 flags using Apple clang
21.0.0 on macOS 26.5. The machine was a MacBook Pro (Mac16,6) with a
16-core Apple M4 Max (12 performance and four efficiency cores) and 128
GB RAM.

Link: https://lore.kernel.org/git/1445163904-24611-1-git-send-email-Karthik.188@gmail.com/
Link: https://lore.kernel.org/git/20180723204112.233274-1-jonathantanmy@google.com/
Link: https://lore.kernel.org/git/24424e55-7fa8-d05b-bc39-e14b4d5abcb6@gmail.com/
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
---
 builtin/tag.c                  |  1 -
 commit-reach.c                 | 45 +++++++++++++++++++++++++++++++-----------
 commit-reach.h                 | 15 ++++++++++----
 ref-filter.c                   |  6 ++++--
 ref-filter.h                   |  7 +++----
 t/helper/test-reach.c          | 10 ++--------
 t/perf/p1500-graph-walks.sh    | 24 +++++++++++++++++++++-
 t/t6301-for-each-ref-errors.sh | 18 +++++++++++++++++
 t/t6302-for-each-ref-filter.sh | 21 ++++++++++++++++++++
 t/t6600-test-reach.sh          |  6 ++----
 10 files changed, 117 insertions(+), 36 deletions(-)

diff --git a/builtin/tag.c b/builtin/tag.c
index d51c2e3349..9f34d948d4 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -71,7 +71,6 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
 
 	if (verify_ref_format(format))
 		die(_("unable to parse format string"));
-	filter->with_commit_tag_algo = 1;
 	filter_and_format_refs(filter, FILTER_REFS_TAGS, sorting, format);
 
 	free(to_free);
diff --git a/commit-reach.c b/commit-reach.c
index 9b3ea46d6f..6e599a3670 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -6,7 +6,6 @@
 #include "decorate.h"
 #include "hex.h"
 #include "prio-queue.h"
-#include "ref-filter.h"
 #include "revision.h"
 #include "tag.h"
 #include "commit-reach.h"
@@ -708,7 +707,8 @@ static int in_commit_list(const struct commit_list *want, struct commit *c)
 
 /*
  * Test whether the candidate is contained in the list.
- * Do not recurse to find out, though, but return -1 if inconclusive.
+ * Do not recurse to find out, though, but return CONTAINS_UNKNOWN if
+ * inconclusive.
  */
 static enum contains_result contains_test(struct commit *candidate,
 					  const struct commit_list *want,
@@ -743,9 +743,9 @@ static void push_to_contains_stack(struct commit *candidate, struct contains_sta
 	contains_stack->contains_stack[contains_stack->nr++].parents = candidate->parents;
 }
 
-static enum contains_result contains_tag_algo(struct commit *candidate,
-					      const struct commit_list *want,
-					      struct contains_cache *cache)
+static enum contains_result contains_algo(struct commit *candidate,
+					  struct commit_list *want,
+					  struct contains_cache *cache)
 {
 	struct contains_stack contains_stack = { 0, 0, NULL };
 	enum contains_result result;
@@ -765,6 +765,7 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
 	if (result != CONTAINS_UNKNOWN)
 		return result;
 
+	*contains_cache_at(cache, candidate) = CONTAINS_IN_PROGRESS;
 	push_to_contains_stack(candidate, &contains_stack);
 	while (contains_stack.nr) {
 		struct contains_stack_entry *entry = &contains_stack.contains_stack[contains_stack.nr - 1];
@@ -776,8 +777,8 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
 			contains_stack.nr--;
 		}
 		/*
-		 * If we just popped the stack, parents->item has been marked,
-		 * therefore contains_test will return a meaningful yes/no.
+		 * A parent may have just been popped and marked, or may still
+		 * be active when replacement refs create a cycle.
 		 */
 		else switch (contains_test(parents->item, want, cache, cutoff)) {
 		case CONTAINS_YES:
@@ -787,21 +788,41 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
 		case CONTAINS_NO:
 			entry->parents = parents->next;
 			break;
+		case CONTAINS_IN_PROGRESS:
+			/*
+			 * Partial negative answers are not safe across a cycle.
+			 * Discard them and use the cycle-safe reachability walk.
+			 */
+			goto cycle;
 		case CONTAINS_UNKNOWN:
+			*contains_cache_at(cache, parents->item) =
+				CONTAINS_IN_PROGRESS;
 			push_to_contains_stack(parents->item, &contains_stack);
 			break;
 		}
 	}
 	free(contains_stack.contains_stack);
 	return contains_test(candidate, want, cache, cutoff);
+
+cycle:
+	free(contains_stack.contains_stack);
+	clear_contains_cache(cache);
+	init_contains_cache(cache);
+
+	result = repo_is_descendant_of(the_repository, candidate, want);
+	if (result < 0)
+		exit(128);
+	*contains_cache_at(cache, candidate) =
+		result ? CONTAINS_YES : CONTAINS_NO;
+	return result ? CONTAINS_YES : CONTAINS_NO;
 }
 
-int commit_contains(struct ref_filter *filter, struct commit *commit,
-		    struct commit_list *list, struct contains_cache *cache)
+int commit_contains(struct commit *commit, struct commit_list *list,
+		    struct contains_cache *cache)
 {
-	if (filter->with_commit_tag_algo)
-		return contains_tag_algo(commit, list, cache) == CONTAINS_YES;
-	return repo_is_descendant_of(the_repository, commit, list);
+	if (!list)
+		return 1;
+	return contains_algo(commit, list, cache) == CONTAINS_YES;
 }
 
 int can_all_from_reach_with_flag(struct object_array *from,
diff --git a/commit-reach.h b/commit-reach.h
index 3f3a563d8a..144dc56275 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -5,7 +5,6 @@
 #include "commit-slab.h"
 
 struct commit_list;
-struct ref_filter;
 struct object_id;
 struct object_array;
 
@@ -73,13 +72,21 @@ int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid);
 enum contains_result {
 	CONTAINS_UNKNOWN = 0,
 	CONTAINS_NO,
-	CONTAINS_YES
+	CONTAINS_YES,
+	CONTAINS_IN_PROGRESS
 };
 
 define_commit_slab(contains_cache, enum contains_result);
 
-int commit_contains(struct ref_filter *filter, struct commit *commit,
-		    struct commit_list *list, struct contains_cache *cache);
+/*
+ * Return whether "commit" is a descendant of any commit in "list". An empty
+ * list matches.
+ *
+ * "cache" records answers for one fixed "list". Clear it before changing the
+ * list.
+ */
+int commit_contains(struct commit *commit, struct commit_list *list,
+		    struct contains_cache *cache);
 
 /*
  * Determine if every commit in 'from' can reach at least one commit
diff --git a/ref-filter.c b/ref-filter.c
index 1da4c0e60d..7788147959 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2991,11 +2991,13 @@ static struct ref_array_item *apply_ref_filter(const struct reference *ref,
 			return NULL;
 		/* We perform the filtering for the '--contains' option... */
 		if (filter->with_commit &&
-		    !commit_contains(filter, commit, filter->with_commit, &filter->internal.contains_cache))
+		    !commit_contains(commit, filter->with_commit,
+				     &filter->internal.contains_cache))
 			return NULL;
 		/* ...or for the `--no-contains' option */
 		if (filter->no_commit &&
-		    commit_contains(filter, commit, filter->no_commit, &filter->internal.no_contains_cache))
+		    commit_contains(commit, filter->no_commit,
+				    &filter->internal.no_contains_cache))
 			return NULL;
 	}
 
diff --git a/ref-filter.h b/ref-filter.h
index 120221b47f..9e14afca9c 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -73,10 +73,9 @@ struct ref_filter {
 	struct commit_list *reachable_from;
 	struct commit_list *unreachable_from;
 
-	unsigned int with_commit_tag_algo : 1,
-		match_as_path : 1,
-		ignore_case : 1,
-		detached : 1;
+	unsigned int match_as_path : 1,
+		     ignore_case : 1,
+		     detached : 1;
 	unsigned int kind,
 		lines;
 	int abbrev,
diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index 5d86a96c17..82235f713e 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -6,7 +6,6 @@
 #include "gettext.h"
 #include "hex.h"
 #include "object-name.h"
-#include "ref-filter.h"
 #include "setup.h"
 #include "string-list.h"
 #include "tag.h"
@@ -138,16 +137,11 @@ int cmd__reach(int ac, const char **av)
 
 		printf("%s(X,_,_,0,0):%d\n", av[1], can_all_from_reach_with_flag(&X_obj, 2, 4, 0, 0));
 	} else if (!strcmp(av[1], "commit_contains")) {
-		struct ref_filter filter = REF_FILTER_INIT;
 		struct contains_cache cache;
 		init_contains_cache(&cache);
 
-		if (ac > 2 && !strcmp(av[2], "--tag"))
-			filter.with_commit_tag_algo = 1;
-		else
-			filter.with_commit_tag_algo = 0;
-
-		printf("%s(_,A,X,_):%d\n", av[1], commit_contains(&filter, A, X, &cache));
+		printf("%s(_,A,X,_):%d\n", av[1],
+		       commit_contains(A, X, &cache));
 		clear_contains_cache(&cache);
 	} else if (!strcmp(av[1], "get_reachable_subset")) {
 		const int reachable_flag = 1;
diff --git a/t/perf/p1500-graph-walks.sh b/t/perf/p1500-graph-walks.sh
index 5b23ce5db9..ac68fdbacd 100755
--- a/t/perf/p1500-graph-walks.sh
+++ b/t/perf/p1500-graph-walks.sh
@@ -5,6 +5,8 @@ test_description='Commit walk performance tests'
 
 test_perf_large_repo
 
+contains_ref_limit=8192
+
 test_expect_success 'setup' '
 	git for-each-ref --format="%(refname)" "refs/heads/*" "refs/tags/*" >allrefs &&
 	sort -r allrefs | head -n 50 >refs &&
@@ -32,10 +34,25 @@ test_expect_success 'setup' '
 		echo "X:$line" >>test-tool-tags || return 1
 	done &&
 
+	git rev-list --first-parent --max-count=$contains_ref_limit HEAD >contains-commits &&
+	contains_ref_count=$(wc -l <contains-commits) &&
+	test "$contains_ref_count" -gt 0 &&
+	contains_base=$(tail -n 1 contains-commits) &&
+	export contains_base &&
+	awk "{
+		printf \"update refs/contains-perf/%04d %s\\n\", NR, \$1
+	}" contains-commits |
+		git update-ref --stdin &&
+	git pack-refs --include "refs/contains-perf/*" &&
+
 	commit=$(git commit-tree $(git rev-parse HEAD^{tree})) &&
 	git update-ref refs/heads/disjoint-base $commit &&
 
-	git commit-graph write --reachable
+	git commit-graph write --reachable &&
+
+	git for-each-ref --contains="$contains_base" \
+		refs/contains-perf/ >actual &&
+	test_line_count = $contains_ref_count actual
 '
 
 test_perf 'ahead-behind counts: git for-each-ref' '
@@ -62,6 +79,11 @@ test_perf 'contains: git tag --merged' '
 	xargs git tag --merged=HEAD <tags
 '
 
+test_perf 'contains: git for-each-ref --contains' '
+	git for-each-ref --contains="$contains_base" \
+		refs/contains-perf/ >/dev/null
+'
+
 test_perf 'is-base check: test-tool reach (refs)' '
 	test-tool reach get_branch_base_for_tip <test-tool-refs
 '
diff --git a/t/t6301-for-each-ref-errors.sh b/t/t6301-for-each-ref-errors.sh
index e06feb06e9..169cc70c23 100755
--- a/t/t6301-for-each-ref-errors.sh
+++ b/t/t6301-for-each-ref-errors.sh
@@ -52,6 +52,24 @@ test_expect_success 'Missing objects are reported correctly' '
 	test_must_be_empty brief-err
 '
 
+test_expect_success 'missing ancestors are reported by contains filters' '
+	test_when_finished "git update-ref -d refs/heads/missing-parent" &&
+	{
+		echo "tree $(git rev-parse HEAD^{tree})" &&
+		echo "parent $MISSING" &&
+		git cat-file commit HEAD |
+			sed -n -e "/^author /p" -e "/^committer /p" &&
+		echo &&
+		echo "missing parent"
+	} >commit &&
+	broken=$(git hash-object -t commit -w commit) &&
+	git update-ref refs/heads/missing-parent "$broken" &&
+	test_must_fail git for-each-ref --contains=HEAD \
+		refs/heads/missing-parent >out 2>err &&
+	test_must_be_empty out &&
+	test_grep "unable to parse commit $MISSING" err
+'
+
 test_expect_success 'ahead-behind requires an argument' '
 	test_must_fail git for-each-ref \
 		--format="%(ahead-behind)" 2>err &&
diff --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh
index 7f060d97bf..423505d1fb 100755
--- a/t/t6302-for-each-ref-filter.sh
+++ b/t/t6302-for-each-ref-filter.sh
@@ -177,6 +177,27 @@ test_expect_success 'filtering with --contains and --no-contains' '
 	test_cmp expect actual
 '
 
+test_expect_success 'contains handles cyclic replacement histories' '
+	one=$(git rev-parse one) &&
+	three=$(git rev-parse three) &&
+	test_when_finished "
+		git replace -d $one
+		git replace -d $three
+		git tag -d cycle-a cycle-b
+	" &&
+	git tag cycle-a "$one" &&
+	git tag cycle-b "$three" &&
+	git replace --graft "$one" "$three" two &&
+	git replace --graft "$three" "$one" &&
+	cat >expect <<-\EOF &&
+	refs/tags/cycle-a
+	refs/tags/cycle-b
+	EOF
+	git for-each-ref --format="%(refname)" --contains=two \
+		"refs/tags/cycle-*" >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success '%(color) must fail' '
 	test_must_fail git for-each-ref --format="%(color)%(refname)"
 '
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index b5b314e570..1ecc2571c2 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -286,8 +286,7 @@ test_expect_success 'commit_contains:hit' '
 	X:commit-9-3
 	EOF
 	echo "commit_contains(_,A,X,_):1" >expect &&
-	test_all_modes commit_contains &&
-	test_all_modes commit_contains --tag
+	test_all_modes commit_contains
 '
 
 test_expect_success 'commit_contains:miss' '
@@ -303,8 +302,7 @@ test_expect_success 'commit_contains:miss' '
 	X:commit-9-3
 	EOF
 	echo "commit_contains(_,A,X,_):0" >expect &&
-	test_all_modes commit_contains &&
-	test_all_modes commit_contains --tag
+	test_all_modes commit_contains
 '
 
 test_expect_success 'rev-list: basic topo-order' '

---
base-commit: 9ac3f193c05c2237e2b14ebaa1149e9fc8a1abe0
change-id: 20260607-ref-filter-memoized-contains-7cb6b3bccad1

Best regards,
--  
Tamir Duberstein <tamird@gmail.com>


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-06-08 23:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-08  3:33 [PATCH] ref-filter: reuse --contains traversal results Tamir Duberstein
2026-06-08 21:18 ` Karthik Nayak
2026-06-08 22:30   ` Tamir Duberstein
2026-06-08 22:34 ` Jeff King
2026-06-08 23:35   ` Tamir Duberstein
2026-06-08 23:52     ` Jeff King
2026-06-08 23:56       ` Tamir Duberstein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox