* [PATCH] commit-reach: remove get_reachable_subset()
@ 2026-06-09 19:28 Kristofer Karlsson via GitGitGadget
2026-06-10 15:48 ` Junio C Hamano
2026-06-11 11:49 ` [PATCH v2] " Kristofer Karlsson via GitGitGadget
0 siblings, 2 replies; 8+ messages in thread
From: Kristofer Karlsson via GitGitGadget @ 2026-06-09 19:28 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee, Kristofer Karlsson, Kristofer Karlsson
From: Kristofer Karlsson <krka@spotify.com>
get_reachable_subset() and tips_reachable_from_bases() answer the
same question: given a set of bases and a set of tips, which tips
are reachable from at least one base?
get_reachable_subset() was introduced in fcb2c0769d (2018-11-02)
for add_missing_tags() in remote.c. tips_reachable_from_bases()
was added in cbfe360b14 (2023-03-20) as part of the ahead-behind
series. The two were never consolidated.
With a commit-graph, tips_reachable_from_bases() can have an edge:
its DFS raises the generation floor as lower targets are found,
pruning more aggressively than the static floor in
get_reachable_subset(). Without generation numbers, some edge cases
may be slower with DFS instead of BFS since the date-ordered
prio_queue naturally stays near the top of the graph, but this
should not matter in practice -- worst case both visit the full
graph down from the bases.
The flag in remote.c changes from 1 (bit 0) to TMP_MARK (bit 4)
because tips_reachable_from_bases() uses SEEN (bit 0) internally.
TMP_MARK is already used for deduplication earlier in the same
function and is cleared before the reachability check.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
commit-reach: remove get_reachable_subset()
This removes get_reachable_subset() and converts its only caller to use
tips_reachable_from_bases() directly. Both answer the same category-2
reachability question ("which tips are reachable from these bases?") but
were introduced years apart and never consolidated.
On the no-commit-graph tradeoff: without generation numbers, the
date-ordered BFS in get_reachable_subset() can be more disciplined than
DFS since it naturally stays near the top of the graph. But this only
matters for repositories that are both large enough for the difference
to be measurable and missing a commit-graph -- a combination that would
already struggle for many other reasons. The fix there is to enable the
commit-graph, not to maintain two implementations of the same
reachability query.
Notes for reviewers:
* The flag in remote.c changes from 1 to TMP_MARK because
tips_reachable_from_bases() uses SEEN (bit 0) internally. TMP_MARK is
already used earlier in the same function and is cleared before the
reachability block.
* The sent_tips array is converted to a commit_list to match the
tips_reachable_from_bases() API. This is O(n) list-node allocations,
negligible compared to the graph walk.
* Test helper and test names rename from get_reachable_subset to
tips_reachable_from_bases to match the function being exercised.
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2144%2Fspkrka%2Fkrka%2Fremove-get-reachable-subset-v2-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2144/spkrka/krka/remove-get-reachable-subset-v2-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/2144
commit-reach.c | 73 -------------------------------------------
commit-reach.h | 13 --------
remote.c | 19 ++++++-----
t/helper/test-reach.c | 39 +++++++++++------------
t/t6600-test-reach.sh | 18 +++++------
5 files changed, 36 insertions(+), 126 deletions(-)
diff --git a/commit-reach.c b/commit-reach.c
index 5df471a313..e78752eb87 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -1013,79 +1013,6 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to,
return result;
}
-struct commit_list *get_reachable_subset(struct commit **from, size_t nr_from,
- struct commit **to, size_t nr_to,
- unsigned int reachable_flag)
-{
- struct commit **item;
- struct commit *current;
- struct commit_list *found_commits = NULL;
- struct commit **to_last = to + nr_to;
- struct commit **from_last = from + nr_from;
- timestamp_t min_generation = GENERATION_NUMBER_INFINITY;
- int num_to_find = 0;
-
- struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
-
- for (item = to; item < to_last; item++) {
- timestamp_t generation;
- struct commit *c = *item;
-
- repo_parse_commit(the_repository, c);
- generation = commit_graph_generation(c);
- if (generation < min_generation)
- min_generation = generation;
-
- if (!(c->object.flags & PARENT1)) {
- c->object.flags |= PARENT1;
- num_to_find++;
- }
- }
-
- for (item = from; item < from_last; item++) {
- struct commit *c = *item;
- if (!(c->object.flags & PARENT2)) {
- c->object.flags |= PARENT2;
- repo_parse_commit(the_repository, c);
-
- prio_queue_put(&queue, *item);
- }
- }
-
- while (num_to_find && (current = prio_queue_get(&queue)) != NULL) {
- struct commit_list *parents;
-
- if (current->object.flags & PARENT1) {
- current->object.flags &= ~PARENT1;
- current->object.flags |= reachable_flag;
- commit_list_insert(current, &found_commits);
- num_to_find--;
- }
-
- for (parents = current->parents; parents; parents = parents->next) {
- struct commit *p = parents->item;
-
- repo_parse_commit(the_repository, p);
-
- if (commit_graph_generation(p) < min_generation)
- continue;
-
- if (p->object.flags & PARENT2)
- continue;
-
- p->object.flags |= PARENT2;
- prio_queue_put(&queue, p);
- }
- }
-
- clear_prio_queue(&queue);
-
- clear_commit_marks_many(nr_to, to, PARENT1);
- clear_commit_marks_many(nr_from, from, PARENT2);
-
- return found_commits;
-}
-
define_commit_slab(bit_arrays, struct bitmap *);
static struct bit_arrays bit_arrays;
diff --git a/commit-reach.h b/commit-reach.h
index 3f3a563d8a..b3e7051738 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -96,19 +96,6 @@ int can_all_from_reach_with_flag(struct object_array *from,
int can_all_from_reach(struct commit_list *from, struct commit_list *to,
int commit_date_cutoff);
-
-/*
- * Return a list of commits containing the commits in the 'to' array
- * that are reachable from at least one commit in the 'from' array.
- * Also add the given 'flag' to each of the commits in the returned list.
- *
- * This method uses the PARENT1 and PARENT2 flags during its operation,
- * so be sure these flags are not set before calling the method.
- */
-struct commit_list *get_reachable_subset(struct commit **from, size_t nr_from,
- struct commit **to, size_t nr_to,
- unsigned int reachable_flag);
-
struct ahead_behind_count {
/**
* As input, the *_index members indicate which positions in
diff --git a/remote.c b/remote.c
index f1a3681b7c..7cdb59ed87 100644
--- a/remote.c
+++ b/remote.c
@@ -1459,9 +1459,8 @@ static void add_missing_tags(struct ref *src, struct ref **dst, struct ref ***ds
* sent to the other side.
*/
if (sent_tips.nr) {
- const int reachable_flag = 1;
- struct commit_list *found_commits;
struct commit_stack src_commits = COMMIT_STACK_INIT;
+ struct commit_list *bases = NULL;
for_each_string_list_item(item, &src_tag) {
struct ref *ref = item->util;
@@ -1479,11 +1478,12 @@ static void add_missing_tags(struct ref *src, struct ref **dst, struct ref ***ds
commit_stack_push(&src_commits, commit);
}
- found_commits = get_reachable_subset(sent_tips.items,
- sent_tips.nr,
- src_commits.items,
- src_commits.nr,
- reachable_flag);
+ for (size_t i = 0; i < sent_tips.nr; i++)
+ commit_list_insert(sent_tips.items[i], &bases);
+ tips_reachable_from_bases(the_repository,
+ bases, src_commits.items,
+ src_commits.nr, TMP_MARK);
+ commit_list_free(bases);
for_each_string_list_item(item, &src_tag) {
struct ref *dst_ref;
@@ -1503,7 +1503,7 @@ static void add_missing_tags(struct ref *src, struct ref **dst, struct ref ***ds
* Is this tag, which they do not have, reachable from
* any of the commits we are sending?
*/
- if (!(commit->object.flags & reachable_flag))
+ if (!(commit->object.flags & TMP_MARK))
continue;
/* Add it in */
@@ -1513,9 +1513,8 @@ static void add_missing_tags(struct ref *src, struct ref **dst, struct ref ***ds
}
clear_commit_marks_many(src_commits.nr, src_commits.items,
- reachable_flag);
+ TMP_MARK);
commit_stack_clear(&src_commits);
- commit_list_free(found_commits);
}
string_list_clear(&src_tag, 0);
diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index 5d86a96c17..eb44a64f50 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -7,6 +7,7 @@
#include "hex.h"
#include "object-name.h"
#include "ref-filter.h"
+#include "revision.h"
#include "setup.h"
#include "string-list.h"
#include "tag.h"
@@ -149,30 +150,26 @@ int cmd__reach(int ac, const char **av)
printf("%s(_,A,X,_):%d\n", av[1], commit_contains(&filter, A, X, &cache));
clear_contains_cache(&cache);
- } else if (!strcmp(av[1], "get_reachable_subset")) {
- const int reachable_flag = 1;
- int count = 0;
- struct commit_list *current;
- struct commit_list *list = get_reachable_subset(X_stack.items, X_stack.nr,
- Y_stack.items, Y_stack.nr,
- reachable_flag);
- printf("get_reachable_subset(X,Y)\n");
- for (current = list; current; current = current->next) {
- if (!(list->item->object.flags & reachable_flag))
- die(_("commit %s is not marked reachable"),
- oid_to_hex(&list->item->object.oid));
- count++;
- }
+ } else if (!strcmp(av[1], "tips_reachable_from_bases")) {
+ struct commit_list *bases = NULL;
+ struct commit_list *result = NULL;
+
+ for (size_t i = 0; i < X_stack.nr; i++)
+ commit_list_insert(X_stack.items[i], &bases);
+ tips_reachable_from_bases(the_repository,
+ bases, Y_stack.items,
+ Y_stack.nr, TMP_MARK);
+ commit_list_free(bases);
+
+ printf("tips_reachable_from_bases(X,Y)\n");
for (size_t i = 0; i < Y_stack.nr; i++) {
- if (Y_stack.items[i]->object.flags & reachable_flag)
- count--;
+ if (Y_stack.items[i]->object.flags & TMP_MARK)
+ commit_list_insert(Y_stack.items[i], &result);
}
+ print_sorted_commit_ids(result);
- if (count < 0)
- die(_("too many commits marked reachable"));
-
- print_sorted_commit_ids(list);
- commit_list_free(list);
+ clear_commit_marks_many(Y_stack.nr, Y_stack.items, TMP_MARK);
+ commit_list_free(result);
}
object_array_clear(&X_obj);
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index b5b314e570..51b140a539 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -391,7 +391,7 @@ test_expect_success 'rev-list: symmetric difference topo-order' '
run_all_modes git rev-list --topo-order commit-3-8...commit-6-6
'
-test_expect_success 'get_reachable_subset:all' '
+test_expect_success 'tips_reachable_from_bases:all' '
cat >input <<-\EOF &&
X:commit-9-1
X:commit-8-3
@@ -403,15 +403,15 @@ test_expect_success 'get_reachable_subset:all' '
Y:commit-5-6
EOF
(
- echo "get_reachable_subset(X,Y)" &&
+ echo "tips_reachable_from_bases(X,Y)" &&
git rev-parse commit-3-3 \
commit-1-7 \
commit-5-6 | sort
) >expect &&
- test_all_modes get_reachable_subset
+ test_all_modes tips_reachable_from_bases
'
-test_expect_success 'get_reachable_subset:some' '
+test_expect_success 'tips_reachable_from_bases:some' '
cat >input <<-\EOF &&
X:commit-9-1
X:commit-8-3
@@ -422,14 +422,14 @@ test_expect_success 'get_reachable_subset:some' '
Y:commit-5-6
EOF
(
- echo "get_reachable_subset(X,Y)" &&
+ echo "tips_reachable_from_bases(X,Y)" &&
git rev-parse commit-3-3 \
commit-1-7 | sort
) >expect &&
- test_all_modes get_reachable_subset
+ test_all_modes tips_reachable_from_bases
'
-test_expect_success 'get_reachable_subset:none' '
+test_expect_success 'tips_reachable_from_bases:none' '
cat >input <<-\EOF &&
X:commit-9-1
X:commit-8-3
@@ -439,8 +439,8 @@ test_expect_success 'get_reachable_subset:none' '
Y:commit-7-6
Y:commit-2-8
EOF
- echo "get_reachable_subset(X,Y)" >expect &&
- test_all_modes get_reachable_subset
+ echo "tips_reachable_from_bases(X,Y)" >expect &&
+ test_all_modes tips_reachable_from_bases
'
test_expect_success 'for-each-ref ahead-behind:linear' '
base-commit: 600fe743028cbfb640855f659e9851522214bc0b
--
gitgitgadget
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] commit-reach: remove get_reachable_subset()
2026-06-09 19:28 [PATCH] commit-reach: remove get_reachable_subset() Kristofer Karlsson via GitGitGadget
@ 2026-06-10 15:48 ` Junio C Hamano
2026-06-10 18:25 ` Kristofer Karlsson
2026-06-10 19:29 ` Derrick Stolee
2026-06-11 11:49 ` [PATCH v2] " Kristofer Karlsson via GitGitGadget
1 sibling, 2 replies; 8+ messages in thread
From: Junio C Hamano @ 2026-06-10 15:48 UTC (permalink / raw)
To: Kristofer Karlsson via GitGitGadget
Cc: git, Derrick Stolee, Kristofer Karlsson
"Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
writes:
> get_reachable_subset() was introduced in fcb2c0769d (2018-11-02)
> for add_missing_tags() in remote.c. tips_reachable_from_bases()
> was added in cbfe360b14 (2023-03-20) as part of the ahead-behind
> series. The two were never consolidated.
Good finding. It is curious to see that these were from the same
author.
> ... Without generation numbers, some edge cases
> may be slower with DFS instead of BFS since the date-ordered
> prio_queue naturally stays near the top of the graph, but this
> should not matter in practice
"should not matter in practice" because...?
> -- worst case both visit the full
> graph down from the bases.
And of course the worst case scenario is by definition not a typical
case that appear in practice, so it does not make a good explanation
for "should not matter in practice".
> The flag in remote.c changes from 1 (bit 0) to TMP_MARK (bit 4)
> because tips_reachable_from_bases() uses SEEN (bit 0) internally.
> TMP_MARK is already used for deduplication earlier in the same
> function and is cleared before the reachability check.
And tips_reachable_from_bases() clears SEEN at the end as expected.
> commit-reach.c | 73 -------------------------------------------
> commit-reach.h | 13 --------
> remote.c | 19 ++++++-----
> t/helper/test-reach.c | 39 +++++++++++------------
> t/t6600-test-reach.sh | 18 +++++------
> 5 files changed, 36 insertions(+), 126 deletions(-)
Yay, a lot of deletions ;-)
> diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
> index b5b314e570..51b140a539 100755
> --- a/t/t6600-test-reach.sh
> +++ b/t/t6600-test-reach.sh
> @@ -391,7 +391,7 @@ test_expect_success 'rev-list: symmetric difference topo-order' '
> run_all_modes git rev-list --topo-order commit-3-8...commit-6-6
> '
>
> -test_expect_success 'get_reachable_subset:all' '
> +test_expect_success 'tips_reachable_from_bases:all' '
> cat >input <<-\EOF &&
> X:commit-9-1
> X:commit-8-3
> @@ -403,15 +403,15 @@ test_expect_success 'get_reachable_subset:all' '
> Y:commit-5-6
> EOF
> (
> - echo "get_reachable_subset(X,Y)" &&
> + echo "tips_reachable_from_bases(X,Y)" &&
> git rev-parse commit-3-3 \
> commit-1-7 \
> commit-5-6 | sort
> ) >expect &&
> - test_all_modes get_reachable_subset
> + test_all_modes tips_reachable_from_bases
> '
>
> -test_expect_success 'get_reachable_subset:some' '
> +test_expect_success 'tips_reachable_from_bases:some' '
> cat >input <<-\EOF &&
> X:commit-9-1
> X:commit-8-3
> @@ -422,14 +422,14 @@ test_expect_success 'get_reachable_subset:some' '
> Y:commit-5-6
> EOF
> (
> - echo "get_reachable_subset(X,Y)" &&
> + echo "tips_reachable_from_bases(X,Y)" &&
> git rev-parse commit-3-3 \
> commit-1-7 | sort
> ) >expect &&
> - test_all_modes get_reachable_subset
> + test_all_modes tips_reachable_from_bases
> '
>
> -test_expect_success 'get_reachable_subset:none' '
> +test_expect_success 'tips_reachable_from_bases:none' '
> cat >input <<-\EOF &&
> X:commit-9-1
> X:commit-8-3
> @@ -439,8 +439,8 @@ test_expect_success 'get_reachable_subset:none' '
> Y:commit-7-6
> Y:commit-2-8
> EOF
> - echo "get_reachable_subset(X,Y)" >expect &&
> - test_all_modes get_reachable_subset
> + echo "tips_reachable_from_bases(X,Y)" >expect &&
> + test_all_modes tips_reachable_from_bases
> '
>
> test_expect_success 'for-each-ref ahead-behind:linear' '
>
> base-commit: 600fe743028cbfb640855f659e9851522214bc0b
Initially I feared that changes to the test script were a sign of
need to adjuist to behaviour changes, but as the proposed log
message explained, all of the above changes are about the name of
the function being used and tested, which makes sense.
Thanks.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] commit-reach: remove get_reachable_subset()
2026-06-10 15:48 ` Junio C Hamano
@ 2026-06-10 18:25 ` Kristofer Karlsson
2026-06-10 19:29 ` Derrick Stolee
1 sibling, 0 replies; 8+ messages in thread
From: Kristofer Karlsson @ 2026-06-10 18:25 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Kristofer Karlsson via GitGitGadget, git, Derrick Stolee
Junio C Hamano <gitster@pobox.com> writes:
> "should not matter in practice" because...?
>
> And of course the worst case scenario is by definition not a typical
> case that appear in practice, so it does not make a good explanation
> for "should not matter in practice".
You are right, that was hand-wavy. I started with writing a somewhat
long analysis but after finding a clean way of supporting both DFS
and priority queue modes it feels somewhat unnecessary - I will
still include it here, but the short summary is that it's fixable.
Since the prio_queue struct supports both LIFO and heap mode,
it's actually quite easy to plug this into
tips_reachable_from_bases. I just need to switch away from using
the stack structure and pass a mode to choose between LIFO
and heap. This preserves the old behavior while still reducing
the code size and unifying the code more.
I will submit a v2 of the patch shortly.
I will also include the original analysis I wrote before finding
the simple fix.
Thanks for the review!
Kristofer
---
I will refer to the prio_queue approach in get_reachable_subset
as PQ for brevity, and the DFS in tips_reachable_from_bases as
simply DFS.
tips_reachable_from_bases() was designed for --merged queries where
the targets (branches/tags) can be deep ancestors of the base. DFS
is a natural fit there: it dives deep along first-parent quickly,
and with generation numbers the dynamic floor raising prunes
aggressively.
add_missing_tags() has the opposite shape: the bases are branch
tips being pushed (near the top) and the targets are tag commits
the remote does not have yet, which tend to be relatively close
to those tips. PQ ordered by commit date is a better fit here
because it sweeps down from the top and finds nearby targets early,
while DFS might take a long detour down a side branch before
coming back.
With a commit-graph this difference mostly disappears since the
generation floor keeps DFS from going too far off track. Without a
commit-graph, neither approach prunes anything (generation is
GENERATION_NUMBER_INFINITY for all commits) so the traversal order
is the only thing that matters, and PQ has the edge for shallow
targets.
So the current code actually has each caller matched to the
traversal strategy that fits its typical workload. My patch traded
that away for code reduction.
That said, in practice the difference is limited: repositories
large enough for this to matter typically have a commit-graph,
and small repositories are fast either way.
---
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] commit-reach: remove get_reachable_subset()
2026-06-10 15:48 ` Junio C Hamano
2026-06-10 18:25 ` Kristofer Karlsson
@ 2026-06-10 19:29 ` Derrick Stolee
1 sibling, 0 replies; 8+ messages in thread
From: Derrick Stolee @ 2026-06-10 19:29 UTC (permalink / raw)
To: Junio C Hamano, Kristofer Karlsson via GitGitGadget
Cc: git, Kristofer Karlsson
On 6/10/2026 11:48 AM, Junio C Hamano wrote:
> "Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
>> get_reachable_subset() was introduced in fcb2c0769d (2018-11-02)
>> for add_missing_tags() in remote.c. tips_reachable_from_bases()
>> was added in cbfe360b14 (2023-03-20) as part of the ahead-behind
>> series. The two were never consolidated.
>
> Good finding. It is curious to see that these were from the same
> author.
I agree. In my defense, these commits are five years apart. I still
should have looked for similar code that could be reused instead of
rolling new code. (But the new code is better when a commit-graph
exists.)
The other thing that I should have done in the later commit was add
the method to the test-tool, which you do here.
>> ... Without generation numbers, some edge cases
>> may be slower with DFS instead of BFS since the date-ordered
>> prio_queue naturally stays near the top of the graph, but this
>> should not matter in practice
>
> "should not matter in practice" because...?
>
>> -- worst case both visit the full
>> graph down from the bases.
>
> And of course the worst case scenario is by definition not a typical
> case that appear in practice, so it does not make a good explanation
> for "should not matter in practice".
It's important to recognize the use cases that call each method and
to understand if it is appropriate to take these performance changes.
Both methods terminate in the case that all potential targets are
found. And that's the only case that matters, as we will walk all
reachable commits in the case of any one commit not being reachable.
Both methods avoid walking below the "minimum generation" among the
target commits.
The key opportunity here is that tips_reachable_from_bases() will
"increase" the minimum generation when it finds the current-minimum
target commit. That's a big reason why the DFS approach wins: it
has the opportunity to find those lower commits without needing to
walk _every_ commit with higher generation.
The one downside to this approach is that the DFS approach does not
take into account the commit date as a fallback when there is no
commit-graph file with computed generation numbers. When there is
no commit-graph file, then the fallback to commit date to break ties
among "generation number infinity" commits can't be used to help the
BFS search in get_reachable_subset().
And perhaps that is the critical reason for the different algorithms:
in 2018 we didn't have the commit-graph for very long so it wasn't a
reasonable expectation that we'd have one, even for large repositories.
Now? The feature is quite stable and it's easy for users to create
and maintain one. All servers are expected to use it for performance
needs. It's probably reasonable to expect that any repos where this
would matter would have one.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2] commit-reach: remove get_reachable_subset()
2026-06-09 19:28 [PATCH] commit-reach: remove get_reachable_subset() Kristofer Karlsson via GitGitGadget
2026-06-10 15:48 ` Junio C Hamano
@ 2026-06-11 11:49 ` Kristofer Karlsson via GitGitGadget
2026-06-11 12:57 ` Derrick Stolee
1 sibling, 1 reply; 8+ messages in thread
From: Kristofer Karlsson via GitGitGadget @ 2026-06-11 11:49 UTC (permalink / raw)
To: git
Cc: Derrick Stolee, Kristofer Karlsson, Kristofer Karlsson,
Kristofer Karlsson
From: Kristofer Karlsson <krka@spotify.com>
get_reachable_subset() and tips_reachable_from_bases() both answer
the same reachability question but use different traversal
strategies: priority queue vs depth-first search. Consolidate them
into tips_reachable_from_bases() with a mode parameter to select
between DFS and PQ traversal, preserving the preferred strategy for
each caller.
This works cleanly because prio_queue already supports LIFO mode
(when compare is NULL), so a single prio_queue acts as either a
stack or a heap depending on the mode.
The unified traversal pushes all unseen parents at once rather than
peeking and pushing one parent at a time. This eliminates merge
commit revisits entirely: a 2-parent merge now requires 1 visit
instead of 3. For DFS (LIFO) mode, the first parent is pushed
last so it ends up on top of the stack, preserving first-parent
traversal order.
Parsing is deferred to pop time for DFS since parent objects carry
valid flags without a full repo_parse_commit() call. PQ mode
parses before push so the heap can order by generation number.
Add exhaustive reachability tests that use every commit in the
grid as a tip, protecting against subtle traversal bugs such as
wrong parent ordering or premature pruning. The existing tests
are also extended to exercise both DFS and PQ modes.
The flag in remote.c changes from 1 (bit 0) to TMP_MARK (bit 4)
because tips_reachable_from_bases() uses SEEN (bit 0) internally.
TMP_MARK is already used for deduplication earlier in the same
function and is cleared before the reachability check.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
commit-reach: remove get_reachable_subset()
This removes get_reachable_subset() and consolidates its only caller
into tips_reachable_from_bases() with a mode parameter to select between
DFS and priority queue traversal. Both functions answer the same
reachability query and use the same generation-number pruning strategy,
differing primarily in traversal order (DFS vs generation-ordered PQ).
They were introduced years apart and never consolidated.
The unified function uses prio_queue for both modes: LIFO (stack) when
compare is NULL, min-heap when given a comparator.
Changes since v1:
* Replaced the commit_stack + reverse-push pattern with a simpler
approach: push all unseen parents directly, with the first parent
pushed last so it lands on top of the LIFO stack. This preserves
first-parent DFS order without needing a temporary stack to reverse
parent order.
* Deferred repo_parse_commit() to pop time for DFS mode. Parent objects
carry valid flags without a full parse, so we avoid parsing
already-SEEN parents in the inner loop. PQ mode still parses before
push so the heap can order by generation number.
* Moved the generation floor check from the parent loop to pop time,
simplifying the hot path at the cost of occasionally enqueueing
commits that will later be discarded.
* Added SEEN flag on base commits before enqueueing, preventing
duplicate processing if bases overlap with the traversal.
* Added PQ mode to the existing test-reach tests so both DFS and PQ
paths are exercised by the test suite.
* Added two new reachability tests that use all 100 commits in the grid
as tips - the idea is to better detect subtle traversal bugs.
Notes for reviewers:
* The flag in remote.c changes from 1 to TMP_MARK because
tips_reachable_from_bases() uses SEEN (bit 0) internally. TMP_MARK is
already used earlier in the same function and is cleared before the
reachability block.
* Test helper and test names rename from get_reachable_subset to
tips_reachable_from_bases to match the function being exercised.
As Stolee noted in the v1 review, commit-graph is now stable and
expected for repositories where this matters, making the DFS approach
with dynamic floor raising an attractive default. The mode parameter
could be dropped in a future patch if we decide to use DFS everywhere,
but preserving each caller's existing strategy keeps this change
conservative and avoids any behavior change for add_missing_tags().
Performance was not a goal of this refactoring, but the simplified DFS
traversal turned out to be a pleasant surprise -- eliminating merge
commit revisits and deferring parse to pop time gives a consistent
speedup across repositories:
Benchmark results (median, 30 runs, pre-built binaries):
Repository Command Speedup
-----------------------------------------------------------
linux branch --merged=HEAD 1.09x faster
linux for-each-ref --merged=HEAD 1.14x faster
git branch --merged=HEAD neutral
git for-each-ref --merged=HEAD 1.12x faster
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2144%2Fspkrka%2Fkrka%2Fremove-get-reachable-subset-v2-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2144/spkrka/krka/remove-get-reachable-subset-v2-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/2144
Range-diff vs v1:
1: df8e2de11b ! 1: da01032a32 commit-reach: remove get_reachable_subset()
@@ Metadata
## Commit message ##
commit-reach: remove get_reachable_subset()
- get_reachable_subset() and tips_reachable_from_bases() answer the
- same question: given a set of bases and a set of tips, which tips
- are reachable from at least one base?
+ get_reachable_subset() and tips_reachable_from_bases() both answer
+ the same reachability question but use different traversal
+ strategies: priority queue vs depth-first search. Consolidate them
+ into tips_reachable_from_bases() with a mode parameter to select
+ between DFS and PQ traversal, preserving the preferred strategy for
+ each caller.
- get_reachable_subset() was introduced in fcb2c0769d (2018-11-02)
- for add_missing_tags() in remote.c. tips_reachable_from_bases()
- was added in cbfe360b14 (2023-03-20) as part of the ahead-behind
- series. The two were never consolidated.
+ This works cleanly because prio_queue already supports LIFO mode
+ (when compare is NULL), so a single prio_queue acts as either a
+ stack or a heap depending on the mode.
- With a commit-graph, tips_reachable_from_bases() can have an edge:
- its DFS raises the generation floor as lower targets are found,
- pruning more aggressively than the static floor in
- get_reachable_subset(). Without generation numbers, some edge cases
- may be slower with DFS instead of BFS since the date-ordered
- prio_queue naturally stays near the top of the graph, but this
- should not matter in practice -- worst case both visit the full
- graph down from the bases.
+ The unified traversal pushes all unseen parents at once rather than
+ peeking and pushing one parent at a time. This eliminates merge
+ commit revisits entirely: a 2-parent merge now requires 1 visit
+ instead of 3. For DFS (LIFO) mode, the first parent is pushed
+ last so it ends up on top of the stack, preserving first-parent
+ traversal order.
+
+ Parsing is deferred to pop time for DFS since parent objects carry
+ valid flags without a full repo_parse_commit() call. PQ mode
+ parses before push so the heap can order by generation number.
+
+ Add exhaustive reachability tests that use every commit in the
+ grid as a tip, protecting against subtle traversal bugs such as
+ wrong parent ordering or premature pruning. The existing tests
+ are also extended to exercise both DFS and PQ modes.
The flag in remote.c changes from 1 (bit 0) to TMP_MARK (bit 4)
because tips_reachable_from_bases() uses SEEN (bit 0) internally.
@@ commit-reach.c: int can_all_from_reach(struct commit_list *from, struct commit_l
define_commit_slab(bit_arrays, struct bitmap *);
static struct bit_arrays bit_arrays;
+@@ commit-reach.c: static int compare_commit_and_index_by_generation(const void *va, const void *vb
+ void tips_reachable_from_bases(struct repository *r,
+ struct commit_list *bases,
+ struct commit **tips, size_t tips_nr,
+- int mark)
++ int mark, enum tips_reachable_mode mode)
+ {
+ struct commit_and_index *commits;
++ struct commit_list *p;
++ struct commit *c;
+ size_t min_generation_index = 0;
+ timestamp_t min_generation;
+- struct commit_list *stack = NULL;
++ struct prio_queue queue = { NULL };
+
+ if (!bases || !tips || !tips_nr)
+ return;
+
+ /*
+- * Do a depth-first search starting at 'bases' to search for the
+- * tips. Stop at the lowest (un-found) generation number. When
+- * finding the lowest commit, increase the minimum generation
+- * number to the next lowest (un-found) generation number.
++ * Search starting at 'bases' looking for the tips. Stop at the
++ * lowest un-found generation number, raising the floor as tips
++ * are found. Use DFS by default; with TIPS_REACHABLE_PQ,
++ * use a priority queue ordered by generation then commit date.
+ */
++ if (mode == TIPS_REACHABLE_PQ)
++ queue.compare = compare_commits_by_gen_then_commit_date;
+
+ CALLOC_ARRAY(commits, tips_nr);
+
+@@ commit-reach.c: void tips_reachable_from_bases(struct repository *r,
+
+ while (bases) {
+ repo_parse_commit(r, bases->item);
+- commit_list_insert(bases->item, &stack);
++ bases->item->object.flags |= SEEN;
++ prio_queue_put(&queue, bases->item);
+ bases = bases->next;
+ }
+
+- while (stack) {
+- int explored_all_parents = 1;
+- struct commit_list *p;
+- struct commit *c = stack->item;
++ while ((c = prio_queue_get(&queue))) {
++ struct commit *first_parent = NULL;
++
++ repo_parse_commit(r, c);
++
++ /* Skip if below the current generation floor. */
++ if (commit_graph_generation(c) < min_generation)
++ continue;
+
+ /* Does it match any of our tips? */
+ {
+@@ commit-reach.c: void tips_reachable_from_bases(struct repository *r,
+ }
+
+ for (p = c->parents; p; p = p->next) {
+- repo_parse_commit(r, p->item);
+-
+ /* Have we already explored this parent? */
+ if (p->item->object.flags & SEEN)
+ continue;
+
+- /* Is it below the current minimum generation? */
+- if (commit_graph_generation(p->item) < min_generation)
+- continue;
+-
+ /* Ok, we will explore from here on. */
+ p->item->object.flags |= SEEN;
+- explored_all_parents = 0;
+- commit_list_insert(p->item, &stack);
+- break;
++ /* Parse before pushing in PQ mode for ordering. */
++ if (mode == TIPS_REACHABLE_PQ)
++ repo_parse_commit(r, p->item);
++ if (!first_parent)
++ first_parent = p->item;
++ else
++ prio_queue_put(&queue, p->item);
+ }
+-
+- if (explored_all_parents)
+- pop_commit(&stack);
++ /*
++ * Add the first parent last so that it is on top of
++ * the LIFO queue, maintaining first-parent DFS order.
++ */
++ if (first_parent)
++ prio_queue_put(&queue, first_parent);
+ }
+
+ done:
+@@ commit-reach.c: done:
+ commits[i].commit->object.flags &= ~RESULT;
+ free(commits);
+ repo_clear_commit_marks(r, SEEN);
+- commit_list_free(stack);
++ clear_prio_queue(&queue);
+ }
+
+ /*
## commit-reach.h ##
@@ commit-reach.h: int can_all_from_reach_with_flag(struct object_array *from,
@@ commit-reach.h: int can_all_from_reach_with_flag(struct object_array *from,
struct ahead_behind_count {
/**
* As input, the *_index members indicate which positions in
+@@ commit-reach.h: void ahead_behind(struct repository *r,
+ * For all tip commits, add 'mark' to their flags if and only if they
+ * are reachable from one of the commits in 'bases'.
+ */
++enum tips_reachable_mode {
++ TIPS_REACHABLE_DFS,
++ TIPS_REACHABLE_PQ,
++};
+ void tips_reachable_from_bases(struct repository *r,
+ struct commit_list *bases,
+ struct commit **tips, size_t tips_nr,
+- int mark);
++ int mark, enum tips_reachable_mode mode);
+
+ /*
+ * Given a 'tip' commit and a list potential 'bases', return the index 'i' that
+
+ ## ref-filter.c ##
+@@ ref-filter.c: static void reach_filter(struct ref_array *array,
+ tips_reachable_from_bases(the_repository,
+ *check_reachable,
+ to_clear, array->nr,
+- UNINTERESTING);
++ UNINTERESTING, TIPS_REACHABLE_DFS);
+
+ old_nr = array->nr;
+ array->nr = 0;
## remote.c ##
@@ remote.c: static void add_missing_tags(struct ref *src, struct ref **dst, struct ref ***ds
@@ remote.c: static void add_missing_tags(struct ref *src, struct ref **dst, struct
+ commit_list_insert(sent_tips.items[i], &bases);
+ tips_reachable_from_bases(the_repository,
+ bases, src_commits.items,
-+ src_commits.nr, TMP_MARK);
++ src_commits.nr, TMP_MARK,
++ TIPS_REACHABLE_PQ);
+ commit_list_free(bases);
for_each_string_list_item(item, &src_tag) {
@@ t/helper/test-reach.c: int cmd__reach(int ac, const char **av)
- oid_to_hex(&list->item->object.oid));
- count++;
- }
-+ } else if (!strcmp(av[1], "tips_reachable_from_bases")) {
++ } else if (!strcmp(av[1], "tips_reachable_from_bases") ||
++ !strcmp(av[1], "tips_reachable_from_bases_pq")) {
++ enum tips_reachable_mode mode =
++ !strcmp(av[1], "tips_reachable_from_bases_pq")
++ ? TIPS_REACHABLE_PQ : TIPS_REACHABLE_DFS;
+ struct commit_list *bases = NULL;
+ struct commit_list *result = NULL;
+
@@ t/helper/test-reach.c: int cmd__reach(int ac, const char **av)
+ commit_list_insert(X_stack.items[i], &bases);
+ tips_reachable_from_bases(the_repository,
+ bases, Y_stack.items,
-+ Y_stack.nr, TMP_MARK);
++ Y_stack.nr, TMP_MARK,
++ mode);
+ commit_list_free(bases);
+
+ printf("tips_reachable_from_bases(X,Y)\n");
@@ t/t6600-test-reach.sh: test_expect_success 'get_reachable_subset:all' '
commit-5-6 | sort
) >expect &&
- test_all_modes get_reachable_subset
-+ test_all_modes tips_reachable_from_bases
++ test_all_modes tips_reachable_from_bases &&
++ test_all_modes tips_reachable_from_bases_pq
'
-test_expect_success 'get_reachable_subset:some' '
@@ t/t6600-test-reach.sh: test_expect_success 'get_reachable_subset:some' '
commit-1-7 | sort
) >expect &&
- test_all_modes get_reachable_subset
-+ test_all_modes tips_reachable_from_bases
++ test_all_modes tips_reachable_from_bases &&
++ test_all_modes tips_reachable_from_bases_pq
'
-test_expect_success 'get_reachable_subset:none' '
@@ t/t6600-test-reach.sh: test_expect_success 'get_reachable_subset:none' '
- echo "get_reachable_subset(X,Y)" >expect &&
- test_all_modes get_reachable_subset
+ echo "tips_reachable_from_bases(X,Y)" >expect &&
-+ test_all_modes tips_reachable_from_bases
++ test_all_modes tips_reachable_from_bases &&
++ test_all_modes tips_reachable_from_bases_pq
'
test_expect_success 'for-each-ref ahead-behind:linear' '
+@@ t/t6600-test-reach.sh: test_expect_success 'for-each-ref merged:duplicate at min generation' '
+ --format="%(refname)" --stdin
+ '
+
++test_expect_success 'for-each-ref merged:all reachable commits' '
++ for x in $(test_seq 1 10)
++ do
++ for y in $(test_seq 1 10)
++ do
++ echo "refs/heads/commit-$x-$y" || return 1
++ done
++ done >input &&
++ for x in $(test_seq 1 5)
++ do
++ for y in $(test_seq 1 5)
++ do
++ echo "refs/heads/commit-$x-$y" || return 1
++ done
++ done | sort >expect &&
++ run_all_modes git for-each-ref --merged=commit-5-5 \
++ --format="%(refname)" --stdin
++'
++
++test_expect_success 'for-each-ref merged:all reachable, multibase' '
++ for x in $(test_seq 1 10)
++ do
++ for y in $(test_seq 1 10)
++ do
++ echo "refs/heads/commit-$x-$y" || return 1
++ done
++ done >input &&
++ for x in $(test_seq 1 10)
++ do
++ for y in $(test_seq 1 10)
++ do
++ if { test $x -le 3 && test $y -le 7; } ||
++ { test $x -le 7 && test $y -le 3; }
++ then
++ echo "refs/heads/commit-$x-$y" || return 1
++ fi
++ done
++ done | sort >expect &&
++ run_all_modes git for-each-ref \
++ --merged=commit-3-7 \
++ --merged=commit-7-3 \
++ --format="%(refname)" --stdin
++'
++
+ # For get_branch_base_for_tip, we only care about
+ # first-parent history. Here is the test graph with
+ # second parents removed:
commit-reach.c | 131 +++++++++++-------------------------------
commit-reach.h | 19 ++----
ref-filter.c | 2 +-
remote.c | 20 +++----
t/helper/test-reach.c | 44 +++++++-------
t/t6600-test-reach.sh | 65 ++++++++++++++++++---
6 files changed, 129 insertions(+), 152 deletions(-)
diff --git a/commit-reach.c b/commit-reach.c
index 5df471a313..1cad7b211e 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -1013,79 +1013,6 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to,
return result;
}
-struct commit_list *get_reachable_subset(struct commit **from, size_t nr_from,
- struct commit **to, size_t nr_to,
- unsigned int reachable_flag)
-{
- struct commit **item;
- struct commit *current;
- struct commit_list *found_commits = NULL;
- struct commit **to_last = to + nr_to;
- struct commit **from_last = from + nr_from;
- timestamp_t min_generation = GENERATION_NUMBER_INFINITY;
- int num_to_find = 0;
-
- struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
-
- for (item = to; item < to_last; item++) {
- timestamp_t generation;
- struct commit *c = *item;
-
- repo_parse_commit(the_repository, c);
- generation = commit_graph_generation(c);
- if (generation < min_generation)
- min_generation = generation;
-
- if (!(c->object.flags & PARENT1)) {
- c->object.flags |= PARENT1;
- num_to_find++;
- }
- }
-
- for (item = from; item < from_last; item++) {
- struct commit *c = *item;
- if (!(c->object.flags & PARENT2)) {
- c->object.flags |= PARENT2;
- repo_parse_commit(the_repository, c);
-
- prio_queue_put(&queue, *item);
- }
- }
-
- while (num_to_find && (current = prio_queue_get(&queue)) != NULL) {
- struct commit_list *parents;
-
- if (current->object.flags & PARENT1) {
- current->object.flags &= ~PARENT1;
- current->object.flags |= reachable_flag;
- commit_list_insert(current, &found_commits);
- num_to_find--;
- }
-
- for (parents = current->parents; parents; parents = parents->next) {
- struct commit *p = parents->item;
-
- repo_parse_commit(the_repository, p);
-
- if (commit_graph_generation(p) < min_generation)
- continue;
-
- if (p->object.flags & PARENT2)
- continue;
-
- p->object.flags |= PARENT2;
- prio_queue_put(&queue, p);
- }
- }
-
- clear_prio_queue(&queue);
-
- clear_commit_marks_many(nr_to, to, PARENT1);
- clear_commit_marks_many(nr_from, from, PARENT2);
-
- return found_commits;
-}
-
define_commit_slab(bit_arrays, struct bitmap *);
static struct bit_arrays bit_arrays;
@@ -1212,22 +1139,26 @@ static int compare_commit_and_index_by_generation(const void *va, const void *vb
void tips_reachable_from_bases(struct repository *r,
struct commit_list *bases,
struct commit **tips, size_t tips_nr,
- int mark)
+ int mark, enum tips_reachable_mode mode)
{
struct commit_and_index *commits;
+ struct commit_list *p;
+ struct commit *c;
size_t min_generation_index = 0;
timestamp_t min_generation;
- struct commit_list *stack = NULL;
+ struct prio_queue queue = { NULL };
if (!bases || !tips || !tips_nr)
return;
/*
- * Do a depth-first search starting at 'bases' to search for the
- * tips. Stop at the lowest (un-found) generation number. When
- * finding the lowest commit, increase the minimum generation
- * number to the next lowest (un-found) generation number.
+ * Search starting at 'bases' looking for the tips. Stop at the
+ * lowest un-found generation number, raising the floor as tips
+ * are found. Use DFS by default; with TIPS_REACHABLE_PQ,
+ * use a priority queue ordered by generation then commit date.
*/
+ if (mode == TIPS_REACHABLE_PQ)
+ queue.compare = compare_commits_by_gen_then_commit_date;
CALLOC_ARRAY(commits, tips_nr);
@@ -1245,14 +1176,19 @@ void tips_reachable_from_bases(struct repository *r,
while (bases) {
repo_parse_commit(r, bases->item);
- commit_list_insert(bases->item, &stack);
+ bases->item->object.flags |= SEEN;
+ prio_queue_put(&queue, bases->item);
bases = bases->next;
}
- while (stack) {
- int explored_all_parents = 1;
- struct commit_list *p;
- struct commit *c = stack->item;
+ while ((c = prio_queue_get(&queue))) {
+ struct commit *first_parent = NULL;
+
+ repo_parse_commit(r, c);
+
+ /* Skip if below the current generation floor. */
+ if (commit_graph_generation(c) < min_generation)
+ continue;
/* Does it match any of our tips? */
{
@@ -1276,25 +1212,26 @@ void tips_reachable_from_bases(struct repository *r,
}
for (p = c->parents; p; p = p->next) {
- repo_parse_commit(r, p->item);
-
/* Have we already explored this parent? */
if (p->item->object.flags & SEEN)
continue;
- /* Is it below the current minimum generation? */
- if (commit_graph_generation(p->item) < min_generation)
- continue;
-
/* Ok, we will explore from here on. */
p->item->object.flags |= SEEN;
- explored_all_parents = 0;
- commit_list_insert(p->item, &stack);
- break;
+ /* Parse before pushing in PQ mode for ordering. */
+ if (mode == TIPS_REACHABLE_PQ)
+ repo_parse_commit(r, p->item);
+ if (!first_parent)
+ first_parent = p->item;
+ else
+ prio_queue_put(&queue, p->item);
}
-
- if (explored_all_parents)
- pop_commit(&stack);
+ /*
+ * Add the first parent last so that it is on top of
+ * the LIFO queue, maintaining first-parent DFS order.
+ */
+ if (first_parent)
+ prio_queue_put(&queue, first_parent);
}
done:
@@ -1302,7 +1239,7 @@ done:
commits[i].commit->object.flags &= ~RESULT;
free(commits);
repo_clear_commit_marks(r, SEEN);
- commit_list_free(stack);
+ clear_prio_queue(&queue);
}
/*
diff --git a/commit-reach.h b/commit-reach.h
index 3f3a563d8a..71e60d727a 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -96,19 +96,6 @@ int can_all_from_reach_with_flag(struct object_array *from,
int can_all_from_reach(struct commit_list *from, struct commit_list *to,
int commit_date_cutoff);
-
-/*
- * Return a list of commits containing the commits in the 'to' array
- * that are reachable from at least one commit in the 'from' array.
- * Also add the given 'flag' to each of the commits in the returned list.
- *
- * This method uses the PARENT1 and PARENT2 flags during its operation,
- * so be sure these flags are not set before calling the method.
- */
-struct commit_list *get_reachable_subset(struct commit **from, size_t nr_from,
- struct commit **to, size_t nr_to,
- unsigned int reachable_flag);
-
struct ahead_behind_count {
/**
* As input, the *_index members indicate which positions in
@@ -144,10 +131,14 @@ void ahead_behind(struct repository *r,
* For all tip commits, add 'mark' to their flags if and only if they
* are reachable from one of the commits in 'bases'.
*/
+enum tips_reachable_mode {
+ TIPS_REACHABLE_DFS,
+ TIPS_REACHABLE_PQ,
+};
void tips_reachable_from_bases(struct repository *r,
struct commit_list *bases,
struct commit **tips, size_t tips_nr,
- int mark);
+ int mark, enum tips_reachable_mode mode);
/*
* Given a 'tip' commit and a list potential 'bases', return the index 'i' that
diff --git a/ref-filter.c b/ref-filter.c
index 1da4c0e60d..9c8896d347 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -3157,7 +3157,7 @@ static void reach_filter(struct ref_array *array,
tips_reachable_from_bases(the_repository,
*check_reachable,
to_clear, array->nr,
- UNINTERESTING);
+ UNINTERESTING, TIPS_REACHABLE_DFS);
old_nr = array->nr;
array->nr = 0;
diff --git a/remote.c b/remote.c
index 00723b385e..0324c25743 100644
--- a/remote.c
+++ b/remote.c
@@ -1459,9 +1459,8 @@ static void add_missing_tags(struct ref *src, struct ref **dst, struct ref ***ds
* sent to the other side.
*/
if (sent_tips.nr) {
- const int reachable_flag = 1;
- struct commit_list *found_commits;
struct commit_stack src_commits = COMMIT_STACK_INIT;
+ struct commit_list *bases = NULL;
for_each_string_list_item(item, &src_tag) {
struct ref *ref = item->util;
@@ -1479,11 +1478,13 @@ static void add_missing_tags(struct ref *src, struct ref **dst, struct ref ***ds
commit_stack_push(&src_commits, commit);
}
- found_commits = get_reachable_subset(sent_tips.items,
- sent_tips.nr,
- src_commits.items,
- src_commits.nr,
- reachable_flag);
+ for (size_t i = 0; i < sent_tips.nr; i++)
+ commit_list_insert(sent_tips.items[i], &bases);
+ tips_reachable_from_bases(the_repository,
+ bases, src_commits.items,
+ src_commits.nr, TMP_MARK,
+ TIPS_REACHABLE_PQ);
+ commit_list_free(bases);
for_each_string_list_item(item, &src_tag) {
struct ref *dst_ref;
@@ -1503,7 +1504,7 @@ static void add_missing_tags(struct ref *src, struct ref **dst, struct ref ***ds
* Is this tag, which they do not have, reachable from
* any of the commits we are sending?
*/
- if (!(commit->object.flags & reachable_flag))
+ if (!(commit->object.flags & TMP_MARK))
continue;
/* Add it in */
@@ -1513,9 +1514,8 @@ static void add_missing_tags(struct ref *src, struct ref **dst, struct ref ***ds
}
clear_commit_marks_many(src_commits.nr, src_commits.items,
- reachable_flag);
+ TMP_MARK);
commit_stack_clear(&src_commits);
- commit_list_free(found_commits);
}
string_list_clear(&src_tag, 0);
diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index 5d86a96c17..66ee35e70d 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -7,6 +7,7 @@
#include "hex.h"
#include "object-name.h"
#include "ref-filter.h"
+#include "revision.h"
#include "setup.h"
#include "string-list.h"
#include "tag.h"
@@ -149,30 +150,31 @@ int cmd__reach(int ac, const char **av)
printf("%s(_,A,X,_):%d\n", av[1], commit_contains(&filter, A, X, &cache));
clear_contains_cache(&cache);
- } else if (!strcmp(av[1], "get_reachable_subset")) {
- const int reachable_flag = 1;
- int count = 0;
- struct commit_list *current;
- struct commit_list *list = get_reachable_subset(X_stack.items, X_stack.nr,
- Y_stack.items, Y_stack.nr,
- reachable_flag);
- printf("get_reachable_subset(X,Y)\n");
- for (current = list; current; current = current->next) {
- if (!(list->item->object.flags & reachable_flag))
- die(_("commit %s is not marked reachable"),
- oid_to_hex(&list->item->object.oid));
- count++;
- }
+ } else if (!strcmp(av[1], "tips_reachable_from_bases") ||
+ !strcmp(av[1], "tips_reachable_from_bases_pq")) {
+ enum tips_reachable_mode mode =
+ !strcmp(av[1], "tips_reachable_from_bases_pq")
+ ? TIPS_REACHABLE_PQ : TIPS_REACHABLE_DFS;
+ struct commit_list *bases = NULL;
+ struct commit_list *result = NULL;
+
+ for (size_t i = 0; i < X_stack.nr; i++)
+ commit_list_insert(X_stack.items[i], &bases);
+ tips_reachable_from_bases(the_repository,
+ bases, Y_stack.items,
+ Y_stack.nr, TMP_MARK,
+ mode);
+ commit_list_free(bases);
+
+ printf("tips_reachable_from_bases(X,Y)\n");
for (size_t i = 0; i < Y_stack.nr; i++) {
- if (Y_stack.items[i]->object.flags & reachable_flag)
- count--;
+ if (Y_stack.items[i]->object.flags & TMP_MARK)
+ commit_list_insert(Y_stack.items[i], &result);
}
+ print_sorted_commit_ids(result);
- if (count < 0)
- die(_("too many commits marked reachable"));
-
- print_sorted_commit_ids(list);
- commit_list_free(list);
+ clear_commit_marks_many(Y_stack.nr, Y_stack.items, TMP_MARK);
+ commit_list_free(result);
}
object_array_clear(&X_obj);
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index b5b314e570..b736d893d5 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -391,7 +391,7 @@ test_expect_success 'rev-list: symmetric difference topo-order' '
run_all_modes git rev-list --topo-order commit-3-8...commit-6-6
'
-test_expect_success 'get_reachable_subset:all' '
+test_expect_success 'tips_reachable_from_bases:all' '
cat >input <<-\EOF &&
X:commit-9-1
X:commit-8-3
@@ -403,15 +403,16 @@ test_expect_success 'get_reachable_subset:all' '
Y:commit-5-6
EOF
(
- echo "get_reachable_subset(X,Y)" &&
+ echo "tips_reachable_from_bases(X,Y)" &&
git rev-parse commit-3-3 \
commit-1-7 \
commit-5-6 | sort
) >expect &&
- test_all_modes get_reachable_subset
+ test_all_modes tips_reachable_from_bases &&
+ test_all_modes tips_reachable_from_bases_pq
'
-test_expect_success 'get_reachable_subset:some' '
+test_expect_success 'tips_reachable_from_bases:some' '
cat >input <<-\EOF &&
X:commit-9-1
X:commit-8-3
@@ -422,14 +423,15 @@ test_expect_success 'get_reachable_subset:some' '
Y:commit-5-6
EOF
(
- echo "get_reachable_subset(X,Y)" &&
+ echo "tips_reachable_from_bases(X,Y)" &&
git rev-parse commit-3-3 \
commit-1-7 | sort
) >expect &&
- test_all_modes get_reachable_subset
+ test_all_modes tips_reachable_from_bases &&
+ test_all_modes tips_reachable_from_bases_pq
'
-test_expect_success 'get_reachable_subset:none' '
+test_expect_success 'tips_reachable_from_bases:none' '
cat >input <<-\EOF &&
X:commit-9-1
X:commit-8-3
@@ -439,8 +441,9 @@ test_expect_success 'get_reachable_subset:none' '
Y:commit-7-6
Y:commit-2-8
EOF
- echo "get_reachable_subset(X,Y)" >expect &&
- test_all_modes get_reachable_subset
+ echo "tips_reachable_from_bases(X,Y)" >expect &&
+ test_all_modes tips_reachable_from_bases &&
+ test_all_modes tips_reachable_from_bases_pq
'
test_expect_success 'for-each-ref ahead-behind:linear' '
@@ -657,6 +660,50 @@ test_expect_success 'for-each-ref merged:duplicate at min generation' '
--format="%(refname)" --stdin
'
+test_expect_success 'for-each-ref merged:all reachable commits' '
+ for x in $(test_seq 1 10)
+ do
+ for y in $(test_seq 1 10)
+ do
+ echo "refs/heads/commit-$x-$y" || return 1
+ done
+ done >input &&
+ for x in $(test_seq 1 5)
+ do
+ for y in $(test_seq 1 5)
+ do
+ echo "refs/heads/commit-$x-$y" || return 1
+ done
+ done | sort >expect &&
+ run_all_modes git for-each-ref --merged=commit-5-5 \
+ --format="%(refname)" --stdin
+'
+
+test_expect_success 'for-each-ref merged:all reachable, multibase' '
+ for x in $(test_seq 1 10)
+ do
+ for y in $(test_seq 1 10)
+ do
+ echo "refs/heads/commit-$x-$y" || return 1
+ done
+ done >input &&
+ for x in $(test_seq 1 10)
+ do
+ for y in $(test_seq 1 10)
+ do
+ if { test $x -le 3 && test $y -le 7; } ||
+ { test $x -le 7 && test $y -le 3; }
+ then
+ echo "refs/heads/commit-$x-$y" || return 1
+ fi
+ done
+ done | sort >expect &&
+ run_all_modes git for-each-ref \
+ --merged=commit-3-7 \
+ --merged=commit-7-3 \
+ --format="%(refname)" --stdin
+'
+
# For get_branch_base_for_tip, we only care about
# first-parent history. Here is the test graph with
# second parents removed:
base-commit: 1ff279f3404a482a83fb04c7457e41ab26884aea
--
gitgitgadget
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2] commit-reach: remove get_reachable_subset()
2026-06-11 11:49 ` [PATCH v2] " Kristofer Karlsson via GitGitGadget
@ 2026-06-11 12:57 ` Derrick Stolee
2026-06-11 13:52 ` Kristofer Karlsson
0 siblings, 1 reply; 8+ messages in thread
From: Derrick Stolee @ 2026-06-11 12:57 UTC (permalink / raw)
To: Kristofer Karlsson via GitGitGadget, git; +Cc: Kristofer Karlsson
On 6/11/2026 7:49 AM, Kristofer Karlsson via GitGitGadget wrote:
> From: Kristofer Karlsson <krka@spotify.com>
> * Added PQ mode to the existing test-reach tests so both DFS and PQ
> paths are exercised by the test suite.
This is a substantial change that I don't think is merited. I
think that this makes the point of your change moot: we essentially
have two implementations in one complicated method instead of two
implementations that have different performance characteristics.
I'd rather leave the code as-is than take this complication. I don't
think your commit message justifies the merging of these
implementations, either.
Moreover, I thought the previous patch was fine, it just needed better
awareness of the performance implications of the change. Specifically,
it could be a regression for large repos without a commit-graph file
while simultaneously potentially being a performance boost for large
repos _with_ a commit-graph file.
_If_ we were to go this direction, then it should be two patches, with
the first introducing the new mode and testing it. The second patch
could change the callers of get_reachable_subset() and delete that
code.
Finally, a commentary: You seem to have a habit of responding to
review feedback only through new patch versions, but I'd rather see
some thoughts in the discussion thread as direct replies to the review,
especially if you think you will change direction like this. Saying
something like "Maybe I should update the method to have two walk modes"
in a reply would have given me an opportunity to respond and perhaps
avoided a new version that went in this direction.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] commit-reach: remove get_reachable_subset()
2026-06-11 12:57 ` Derrick Stolee
@ 2026-06-11 13:52 ` Kristofer Karlsson
2026-06-11 14:51 ` Derrick Stolee
0 siblings, 1 reply; 8+ messages in thread
From: Kristofer Karlsson @ 2026-06-11 13:52 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git
On Thu, 11 Jun 2026 at 14:57, Derrick Stolee <stolee@gmail.com> wrote:
> Finally, a commentary: You seem to have a habit of responding to
> review feedback only through new patch versions, but I'd rather see
> some thoughts in the discussion thread as direct replies to the review,
> especially if you think you will change direction like this. Saying
> something like "Maybe I should update the method to have two walk modes"
> in a reply would have given me an opportunity to respond and perhaps
> avoided a new version that went in this direction.
That's fair, I apologize both for jumping ahead too quickly with a new
patch and also for evolving it into multiple logical changes
(both code removal and complex refactoring).
I will be more mindful going forward about letting the discussion
settle more before submitting followup patches.
I have no strong opinion on how to proceed - either park/abandon this
or go with v1 as-is. I think you're right that having two modes within
tips_reachable_from_bases is reducing the win here unless the mode is
truly seamless but the abstraction does leak through a bit.
Thanks,
Kristofer
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] commit-reach: remove get_reachable_subset()
2026-06-11 13:52 ` Kristofer Karlsson
@ 2026-06-11 14:51 ` Derrick Stolee
0 siblings, 0 replies; 8+ messages in thread
From: Derrick Stolee @ 2026-06-11 14:51 UTC (permalink / raw)
To: Kristofer Karlsson; +Cc: Kristofer Karlsson via GitGitGadget, git
On 6/11/2026 9:52 AM, Kristofer Karlsson wrote:
> On Thu, 11 Jun 2026 at 14:57, Derrick Stolee <stolee@gmail.com> wrote:
>> Finally, a commentary: You seem to have a habit of responding to
>> review feedback only through new patch versions, but I'd rather see
>> some thoughts in the discussion thread as direct replies to the review,
>> especially if you think you will change direction like this. Saying
>> something like "Maybe I should update the method to have two walk modes"
>> in a reply would have given me an opportunity to respond and perhaps
>> avoided a new version that went in this direction.
>
> That's fair, I apologize both for jumping ahead too quickly with a new
> patch and also for evolving it into multiple logical changes
> (both code removal and complex refactoring).
>
> I will be more mindful going forward about letting the discussion
> settle more before submitting followup patches.
>
> I have no strong opinion on how to proceed - either park/abandon this
> or go with v1 as-is. I think you're right that having two modes within
> tips_reachable_from_bases is reducing the win here unless the mode is
> truly seamless but the abstraction does leak through a bit.
I think that my biggest issue is that callers are needing to know
something about the performance characteristics of each solution. It
may be better to keep the behavior completely internal: have the
method decide which approach is better based on the information it
has. For instance: is the minimum generation number "infinite"? Then
the BFS is going to be better than the DFS approach. Such an approach
would make it clear why there is the complexity _and_ would improve
both callers in the right scenarios.
You were correct to find two methods that claimed to do the same
thing, but we need to take time to consider the solutions based on
all factors.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-06-11 14:51 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-09 19:28 [PATCH] commit-reach: remove get_reachable_subset() Kristofer Karlsson via GitGitGadget
2026-06-10 15:48 ` Junio C Hamano
2026-06-10 18:25 ` Kristofer Karlsson
2026-06-10 19:29 ` Derrick Stolee
2026-06-11 11:49 ` [PATCH v2] " Kristofer Karlsson via GitGitGadget
2026-06-11 12:57 ` Derrick Stolee
2026-06-11 13:52 ` Kristofer Karlsson
2026-06-11 14:51 ` Derrick Stolee
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox