Git development
 help / color / mirror / Atom feed
* [PATCH] commit-reach: early exit paint_down_to_common for single merge-base
@ 2026-05-08 15:07 Kristofer Karlsson via GitGitGadget
  2026-05-11  2:08 ` Junio C Hamano
  2026-05-11  6:19 ` [PATCH v2] " Kristofer Karlsson via GitGitGadget
  0 siblings, 2 replies; 11+ messages in thread
From: Kristofer Karlsson via GitGitGadget @ 2026-05-08 15:07 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Kristofer Karlsson, Kristofer Karlsson

From: Kristofer Karlsson <krka@spotify.com>

When find_all is false and generation numbers are available, the
priority queue pops in non-increasing generation order.  The first
doubly-painted commit is a valid best merge-base; no later commit
can dominate it.  Skip the expensive STALE drain in this case.

The early exit is guarded by three conditions: find_all must be
false, the commit-graph must provide generation numbers, and the
merge-base commit itself must have a finite generation (not
GENERATION_NUMBER_INFINITY from being outside the commit-graph).

Add find_all parameter to repo_get_merge_bases_many_dirty() and
thread it through to paint_down_to_common().  git merge-base
(without --all) passes show_all=0, triggering the early exit.

On a 2.2M-commit merge-heavy monorepo with commit-graph:

  HEAD vs ~500:   5,229ms -> 24ms
  HEAD vs ~1000:  4,214ms -> 39ms
  HEAD vs ~5000:  3,799ms -> 46ms
  HEAD vs ~10000: 3,827ms -> 61ms

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
    [RFC] commit-reach: skip STALE drain when only one merge-base needed
    
    Context for what this is all about.
    
    I am working with a very large git monorepo and have been investigating
    performance issue. After some digging I ended up looking more deeply
    into git merge-base. I saw it had an --all parameter but the default is
    to only return a single merge-base. Looking through the code and adding
    debug timing, I realized that although the total time to compute the
    merge-base was high, a very small amount of time was spent finding the
    initial merge-base value that was later returned.
    
    The optimization is actually quite dramatic in a large repo - runtime
    went down from 5000ms to 50ms, so it's roughly a 100x optimization. This
    comes from an exploding frontier of STALE commits to drain.
    
    Thus, my idea is simply to return early from the function once we know
    what will be returned. This only works if we find a candidate that we
    know will not be pruned later - but fortunately if we have a commit
    graph with generations we will visit commits in order such that it will
    actually not be pruned.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2109%2Fspkrka%2Fmerge-base-early-exit-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2109/spkrka/merge-base-early-exit-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/2109

 builtin/merge-base.c  |   3 +-
 commit-reach.c        |  26 ++++++---
 commit-reach.h        |   5 +-
 t/t6010-merge-base.sh | 119 ++++++++++++++++++++++++++++++++++++++++++
 t/t6600-test-reach.sh |  40 ++++++++++++++
 5 files changed, 183 insertions(+), 10 deletions(-)

diff --git a/builtin/merge-base.c b/builtin/merge-base.c
index c7ee97fa6a..6b9d42f596 100644
--- a/builtin/merge-base.c
+++ b/builtin/merge-base.c
@@ -14,7 +14,8 @@ static int show_merge_base(struct commit **rev, size_t rev_nr, int show_all)
 	struct commit_list *result = NULL, *r;
 
 	if (repo_get_merge_bases_many_dirty(the_repository, rev[0],
-					    rev_nr - 1, rev + 1, &result) < 0) {
+					    rev_nr - 1, rev + 1,
+					    show_all, &result) < 0) {
 		commit_list_free(result);
 		return -1;
 	}
diff --git a/commit-reach.c b/commit-reach.c
index d3a9b3ed6f..c9d2d594de 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -55,14 +55,16 @@ static int paint_down_to_common(struct repository *r,
 				struct commit **twos,
 				timestamp_t min_generation,
 				int ignore_missing_commits,
+				int find_all,
 				struct commit_list **result)
 {
 	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
 	int i;
+	int has_gens = min_generation || corrected_commit_dates_enabled(r);
 	timestamp_t last_gen = GENERATION_NUMBER_INFINITY;
 	struct commit_list **tail = result;
 
-	if (!min_generation && !corrected_commit_dates_enabled(r))
+	if (!has_gens)
 		queue.compare = compare_commits_by_commit_date;
 
 	one->object.flags |= PARENT1;
@@ -97,6 +99,11 @@ static int paint_down_to_common(struct repository *r,
 			if (!(commit->object.flags & RESULT)) {
 				commit->object.flags |= RESULT;
 				tail = commit_list_append(commit, tail);
+				/* Generation-ordered queue: no later
+				 * commit can dominate this one. */
+				if (!find_all && has_gens &&
+				    generation < GENERATION_NUMBER_INFINITY)
+					break;
 			}
 			/* Mark parents of a found merge stale */
 			flags |= STALE;
@@ -136,6 +143,7 @@ static int paint_down_to_common(struct repository *r,
 static int merge_bases_many(struct repository *r,
 			    struct commit *one, int n,
 			    struct commit **twos,
+			    int find_all,
 			    struct commit_list **result)
 {
 	struct commit_list *list = NULL, **tail = result;
@@ -165,7 +173,7 @@ static int merge_bases_many(struct repository *r,
 				     oid_to_hex(&twos[i]->object.oid));
 	}
 
-	if (paint_down_to_common(r, one, n, twos, 0, 0, &list)) {
+	if (paint_down_to_common(r, one, n, twos, 0, 0, find_all, &list)) {
 		commit_list_free(list);
 		return -1;
 	}
@@ -246,7 +254,7 @@ static int remove_redundant_no_gen(struct repository *r,
 				min_generation = curr_generation;
 		}
 		if (paint_down_to_common(r, array[i], filled,
-					 work, min_generation, 0, &common)) {
+					 work, min_generation, 0, 1, &common)) {
 			clear_commit_marks(array[i], all_flags);
 			clear_commit_marks_many(filled, work, all_flags);
 			commit_list_free(common);
@@ -425,6 +433,7 @@ static int get_merge_bases_many_0(struct repository *r,
 				  size_t n,
 				  struct commit **twos,
 				  int cleanup,
+				  int find_all,
 				  struct commit_list **result)
 {
 	struct commit_list *list, **tail = result;
@@ -432,7 +441,7 @@ static int get_merge_bases_many_0(struct repository *r,
 	size_t cnt, i;
 	int ret;
 
-	if (merge_bases_many(r, one, n, twos, result) < 0)
+	if (merge_bases_many(r, one, n, twos, find_all, result) < 0)
 		return -1;
 	for (i = 0; i < n; i++) {
 		if (one == twos[i])
@@ -475,16 +484,17 @@ int repo_get_merge_bases_many(struct repository *r,
 			      struct commit **twos,
 			      struct commit_list **result)
 {
-	return get_merge_bases_many_0(r, one, n, twos, 1, result);
+	return get_merge_bases_many_0(r, one, n, twos, 1, 1, result);
 }
 
 int repo_get_merge_bases_many_dirty(struct repository *r,
 				    struct commit *one,
 				    size_t n,
 				    struct commit **twos,
+				    int find_all,
 				    struct commit_list **result)
 {
-	return get_merge_bases_many_0(r, one, n, twos, 0, result);
+	return get_merge_bases_many_0(r, one, n, twos, 0, find_all, result);
 }
 
 int repo_get_merge_bases(struct repository *r,
@@ -492,7 +502,7 @@ int repo_get_merge_bases(struct repository *r,
 			 struct commit *two,
 			 struct commit_list **result)
 {
-	return get_merge_bases_many_0(r, one, 1, &two, 1, result);
+	return get_merge_bases_many_0(r, one, 1, &two, 1, 1, result);
 }
 
 /*
@@ -555,7 +565,7 @@ int repo_in_merge_bases_many(struct repository *r, struct commit *commit,
 
 	if (paint_down_to_common(r, commit,
 				 nr_reference, reference,
-				 generation, ignore_missing_commits, &bases))
+				 generation, ignore_missing_commits, 1, &bases))
 		ret = -1;
 	else if (commit->object.flags & PARENT2)
 		ret = 1;
diff --git a/commit-reach.h b/commit-reach.h
index 6012402dfc..908b9539c5 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -17,10 +17,13 @@ int repo_get_merge_bases_many(struct repository *r,
 			      struct commit *one, size_t n,
 			      struct commit **twos,
 			      struct commit_list **result);
-/* To be used only when object flags after this call no longer matter */
+/* To be used only when object flags after this call no longer matter.
+ * When find_all is false and generation numbers are available, returns
+ * after finding the first merge-base, skipping the STALE drain. */
 int repo_get_merge_bases_many_dirty(struct repository *r,
 				    struct commit *one, size_t n,
 				    struct commit **twos,
+				    int find_all,
 				    struct commit_list **result);
 
 int get_octopus_merge_bases(struct commit_list *in, struct commit_list **result);
diff --git a/t/t6010-merge-base.sh b/t/t6010-merge-base.sh
index 44c726ea39..f6c85d4f53 100755
--- a/t/t6010-merge-base.sh
+++ b/t/t6010-merge-base.sh
@@ -305,4 +305,123 @@ test_expect_success 'merge-base --octopus --all for complex tree' '
 	test_cmp expected actual
 '
 
+# The following tests verify that "git merge-base" (without --all)
+# returns the same result with and without a commit-graph.
+# This exercises the early-exit optimisation in paint_down_to_common
+# that skips the STALE drain when generation numbers are available.
+
+test_expect_success 'setup for commit-graph tests' '
+	git init graph-repo &&
+	(
+		cd graph-repo &&
+
+		# Build a forked DAG:
+		#
+		#     L1---L2  (left)
+		#    /
+		#   S
+		#    \
+		#     R1---R2  (right)
+		#
+		test_commit GS &&
+		git checkout -b left &&
+		test_commit L1 &&
+		test_commit L2 &&
+		git checkout GS &&
+		git checkout -b right &&
+		test_commit GR1 &&
+		test_commit GR2
+	)
+'
+
+test_expect_success 'merge-base without commit-graph' '
+	(
+		cd graph-repo &&
+		rm -f .git/objects/info/commit-graph &&
+		git merge-base left right >actual &&
+		git rev-parse GS >expected &&
+		test_cmp expected actual
+	)
+'
+
+test_expect_success 'merge-base with commit-graph' '
+	(
+		cd graph-repo &&
+		git commit-graph write --reachable &&
+		git merge-base left right >actual &&
+		git rev-parse GS >expected &&
+		test_cmp expected actual
+	)
+'
+
+test_expect_success 'merge-base --all with commit-graph' '
+	(
+		cd graph-repo &&
+		git merge-base --all left right >actual &&
+		git rev-parse GS >expected &&
+		test_cmp expected actual
+	)
+'
+
+test_expect_success 'merge-base agrees with --all for single result' '
+	(
+		cd graph-repo &&
+		git commit-graph write --reachable &&
+		git merge-base left right >actual.single &&
+		git merge-base --all left right >actual.all &&
+		test_cmp actual.all actual.single
+	)
+'
+
+test_expect_success 'setup for deep chain commit-graph test' '
+	git init deep-repo &&
+	(
+		cd deep-repo &&
+
+		# Build a deep forked DAG:
+		#
+		#   L1--L2--...--L20  (left)
+		#  /
+		# S
+		#  \
+		#   R1--R2--...--R20  (right)
+		#
+		test_commit DS &&
+		git checkout -b left &&
+		for i in $(test_seq 1 20)
+		do
+			test_commit DL$i || return 1
+		done &&
+		git checkout DS &&
+		git checkout -b right &&
+		for i in $(test_seq 1 20)
+		do
+			test_commit DR$i || return 1
+		done
+	)
+'
+
+test_expect_success 'deep chain: merge-base matches with and without commit-graph' '
+	(
+		cd deep-repo &&
+		rm -f .git/objects/info/commit-graph &&
+		git merge-base left right >actual.no-graph &&
+		git rev-parse DS >expected &&
+		test_cmp expected actual.no-graph &&
+		git commit-graph write --reachable &&
+		git merge-base left right >actual.graph &&
+		test_cmp expected actual.graph
+	)
+'
+
+test_expect_success 'deep chain: --all and non---all agree with commit-graph' '
+	(
+		cd deep-repo &&
+		git commit-graph write --reachable &&
+		git merge-base left right >actual.single &&
+		git merge-base --all left right >actual.all &&
+		test_cmp actual.all actual.single
+	)
+'
+
 test_done
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index dc0421ed2f..51c23b7683 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -882,4 +882,44 @@ test_expect_success 'rev-list --maximal-only matches merge-base --independent' '
 	test_cmp expect.sorted actual.sorted
 '
 
+# The following tests verify the early-exit optimisation in
+# paint_down_to_common when merge-base is invoked without --all.
+# Each test checks all four commit-graph configurations.
+
+merge_base_all_modes () {
+	test_when_finished rm -rf .git/objects/info/commit-graph &&
+	git merge-base "$@" >actual &&
+	test_cmp expect actual &&
+	cp commit-graph-full .git/objects/info/commit-graph &&
+	git merge-base "$@" >actual &&
+	test_cmp expect actual &&
+	cp commit-graph-half .git/objects/info/commit-graph &&
+	git merge-base "$@" >actual &&
+	test_cmp expect actual &&
+	cp commit-graph-no-gdat .git/objects/info/commit-graph &&
+	git merge-base "$@" >actual &&
+	test_cmp expect actual
+}
+
+test_expect_success 'merge-base without --all (unique base)' '
+	git rev-parse commit-5-3 >expect &&
+	merge_base_all_modes commit-5-7 commit-8-3
+'
+
+test_expect_success 'merge-base without --all is one of --all results' '
+	test_when_finished rm -rf .git/objects/info/commit-graph &&
+
+	cp commit-graph-full .git/objects/info/commit-graph &&
+	git merge-base --all commit-5-7 commit-4-8 commit-6-6 commit-8-3 >all &&
+	git merge-base commit-5-7 commit-4-8 commit-6-6 commit-8-3 >single &&
+	test_line_count = 1 single &&
+	grep -F -f single all &&
+
+	cp commit-graph-half .git/objects/info/commit-graph &&
+	git merge-base --all commit-5-7 commit-4-8 commit-6-6 commit-8-3 >all &&
+	git merge-base commit-5-7 commit-4-8 commit-6-6 commit-8-3 >single &&
+	test_line_count = 1 single &&
+	grep -F -f single all
+'
+
 test_done

base-commit: 94f057755b7941b321fd11fec1b2e3ca5313a4e0
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-05-12  5:16 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-08 15:07 [PATCH] commit-reach: early exit paint_down_to_common for single merge-base Kristofer Karlsson via GitGitGadget
2026-05-11  2:08 ` Junio C Hamano
2026-05-11  6:19 ` [PATCH v2] " Kristofer Karlsson via GitGitGadget
2026-05-11  7:22   ` Patrick Steinhardt
2026-05-11 11:22   ` [PATCH v3] " Kristofer Karlsson via GitGitGadget
2026-05-11 12:04     ` Patrick Steinhardt
2026-05-11 12:59     ` [PATCH v4 0/2] [RFC] commit-reach: skip STALE drain when only one merge-base needed Kristofer Karlsson via GitGitGadget
2026-05-11 12:59       ` [PATCH v4 1/2] commit-reach: introduce merge_base_flags enum Kristofer Karlsson via GitGitGadget
2026-05-11 12:59       ` [PATCH v4 2/2] commit-reach: early exit paint_down_to_common for single merge-base Kristofer Karlsson via GitGitGadget
2026-05-12  0:40         ` Junio C Hamano
2026-05-12  5:16           ` Kristofer Karlsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox