All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Derrick Stolee <stolee@gmail.com>,
	Elijah Newren <newren@gmail.com>,
	Kristofer Karlsson <krka@spotify.com>
Subject: [PATCH v2 0/7] commit-reach: terminate merge-base walk when one side is exhausted
Date: Wed, 24 Jun 2026 12:14:06 +0000	[thread overview]
Message-ID: <pull.2149.v2.git.1782303254.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2149.git.1781951820.gitgitgadget@gmail.com>

commit-reach: terminate merge-base walk when one side is exhausted

Optimize paint_down_to_common() for merge-base queries that hit large
one-sided histories.

When the walk from one side reaches a commit with a very low generation
number that the other side never paints, the walk is forced to drain most of
the graph. A common trigger is a repository import that grafts a separate
history with its own root, but any merge that introduces a low-generation
commit never painted by the other side has the same effect.

A new merge-base candidate can only be discovered when exclusive PARENT1 and
PARENT2 paint meet. This series teaches paint_down_to_common() to stop as
soon as one side has no exclusive commits left in the queue; once one side
is exhausted, no further candidates can appear.

  origin/HEAD  o   o  PR HEAD
               |   |
     (import)  o   :
              / \ /
             |   o  merge-base
             |   |
             :   :  (~2.5M commits)
             |   |
  import root   main root


In the RFC thread [1], Derrick Stolee provided a criss-cross counterexample
that sharpened the halt condition, and Elijah Newren independently
discovered the same optimization and shared an implementation in PR #2150
[2]. Patches 2-3 incorporate test cases from Elijah's branch.

This series implements the optimization only after the walk enters the
finite-generation region, where generation ordering guarantees that paint on
visited commits is final.

Patch layout:

1/7 Documentation/technical: add paint-down-to-common doc 2/7 t6600: add
test cases for side-exhaustion edge cases 3/7 t6099, t6600: add
side-exhaustion regression tests 4/7 commit-reach: add trace2
instrumentation to paint_down_to_common() 5/7 commit-reach: introduce struct
paint_state with per-side counters 6/7 commit-reach: remove unused
nonstale_queue dedup wrappers 7/7 commit-reach: terminate merge-base walk
when one paint side is exhausted

Benchmarks

Step counts are deterministic (measured via trace2_data_intmax added in
patch 4). Wall-clock times are medians over 10-20 runs with CPU governor set
to performance.

2.6M-commit monorepo with commit-graph (baseline v2.55.0-rc1):

                                        steps              wall-clock
merge-base --all  (across import)    2682391 ->  53521     7.26s ->   88ms
merge-base --all  (1000 apart)       2659607 ->   1106     6.98s ->    8ms
merge-tree        (across import)          -               8.11s ->  100ms


git.git (88k commits, commit-graph):

                                        steps              wall-clock
merge-base --all v2.0.0 v2.55.0-rc1   72264 ->  44589      82ms ->   49ms
merge-base --all HEAD HEAD~1000         9873 ->   3817      19ms ->    9ms
merge-base --all HEAD HEAD~10000       72285 ->  41523      80ms ->   48ms
merge-base HEAD HEAD~1000                  -                 9ms ->    9ms
merge-base --is-ancestor HEAD~1000 HEAD    -                 6ms ->    6ms


Changes since v1:

 * Reordered patches: documentation first (describing the existing
   algorithm), tests before code changes, so they demonstrate passing with
   old logic first.

 * Dropped the ahead_behind decoupling patch. paint_state is now a NEW
   struct alongside nonstale_queue instead of replacing it. ahead_behind()
   is completely untouched.

 * Removed nonstale_queue_put_dedup() and nonstale_queue_get_dedup() (dead
   code after the conversion) in a separate commit.

 * Renamed: struct paint_queue -> paint_state, field pq -> queue,
   paint_count_add/remove -> paint_count_update (single function with signed
   delta parameter).

 * Split the old paint_count_transition (which handled both old and new
   flags in one call) into separate remove/add calls with a signed delta.
   This eliminates the need for the case 0 handler (which tracked "not in
   the queue") and allows an exhaustive switch on (PARENT1 | PARENT2 |
   STALE) that documents all valid flag combinations, with BUG() in default.

 * Moved all termination conditions into paint_queue_get(). The all-zero
   check and the side-exhaustion check are merged under a shared
   !pending_merge_bases guard. paint_queue_get() derives the generation from
   the dequeued commit itself, so no extra parameter is needed.

 * Added trace2_data_intmax() instrumentation to report the number of
   commits visited per paint walk (separate commit), with deterministic
   step-count assertions in t6600.

 * Expanded switch statements to multi-line format per .clang-format.

 * Used !count style throughout instead of count == 0.

 * Updated technical documentation alongside code changes.

 * Added benchmark data (both git-bench wall-clock and trace2 step counts)
   to commit messages.

[1]
https://lore.kernel.org/git/CAL71e4Ps-2_0+uuZu43N9pFnXBemoAohPs_eyRJf8taXHJPAXQ@mail.gmail.com/T/#u
[2] https://github.com/gitgitgadget/git/pull/2150

Elijah Newren (1):
  t6600: add test cases for side-exhaustion edge cases

Kristofer Karlsson (6):
  Documentation/technical: add paint-down-to-common doc
  t6099, t6600: add side-exhaustion regression tests
  commit-reach: add trace2 instrumentation to paint_down_to_common()
  commit-reach: introduce struct paint_state with per-side counters
  commit-reach: remove unused nonstale_queue dedup wrappers
  commit-reach: terminate merge-base walk when one paint side is
    exhausted

 Documentation/Makefile                        |   1 +
 Documentation/technical/meson.build           |   1 +
 .../technical/paint-down-to-common.adoc       | 128 ++++++++++++++
 commit-reach.c                                | 119 ++++++++++---
 t/meson.build                                 |   1 +
 t/t6099-merge-base-side-exhaustion.sh         |  82 +++++++++
 t/t6600-test-reach.sh                         | 157 ++++++++++++++++++
 7 files changed, 464 insertions(+), 25 deletions(-)
 create mode 100644 Documentation/technical/paint-down-to-common.adoc
 create mode 100755 t/t6099-merge-base-side-exhaustion.sh


base-commit: ab776a62a78576513ee121424adb19597fbb7613
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2149%2Fspkrka%2Fside-exhaust-pr-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2149/spkrka/side-exhaust-pr-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/2149

Range-diff vs v1:

 1:  5492acda0a < -:  ---------- commit-reach: decouple ahead_behind from nonstale_queue
 6:  9cbfc67d72 ! 1:  19ed743bd1 Documentation/technical: add paint-down-to-common doc
     @@ Commit message
          Documentation/technical: add paint-down-to-common doc
      
          Add a technical document describing the paint_down_to_common()
     -    algorithm used for merge-base computation.
     +    algorithm used for merge-base computation, covering the paint
     +    walk, generation number regions, and termination conditions.
      
          Signed-off-by: Kristofer Karlsson <krka@spotify.com>
      
     @@ Documentation/technical/paint-down-to-common.adoc (new)
      +Termination
      +-----------
      +
     -+Termination happens when we can prove that no extra progress is
     -+possible. We are done with the main loop when one of the following
     -+conditions holds:
     ++The walk uses a `nonstale_queue` wrapper around `prio_queue` that
     ++tracks `max_nonstale`: the lowest-priority non-stale commit enqueued
     ++so far. Once that commit is dequeued, every remaining entry is known
     ++to be STALE and the loop terminates. Specifically, the main loop
     ++ends when one of the following conditions holds:
      +
      +  1. The queue is empty.
     -+  2. The queue only contains STALE entries.
     -+  3. Side-exhaustion: the walk has reached the finite region and one
     -+     of the sides is fully exhausted.
     -+
     -+The loop waits for all pending merge-base candidates to be popped
     -+and recorded before any early exit fires, so no separate drain phase
     -+is needed after termination.
     ++  2. `max_nonstale` has been dequeued, meaning the queue only contains
     ++     STALE entries.
      +
      +Stale entry condition
      +~~~~~~~~~~~~~~~~~~~~~
     -+If all entries are stale we cannot find any new merge bases since
     -+that requires at least one enqueued side node meeting the other side.
     -+However, we could still invalidate merge bases (if there are more
     -+than one). This is unnecessary since `remove_redundant()` will clean
     -+that up as a post-process step.
     -+
     -+Side-exhaustion
     -+~~~~~~~~~~~~~~~
     -+A commit is *exclusive* to one side if it carries that side's paint
     -+but not the other (e.g. PARENT1 without PARENT2).
     -+
     -+If we have reached the finite region of the graph, no future
     -+traversal step can add paint to an already-visited commit. Thus if
     -+there are no exclusive PARENT2 commits in the queue, no additional
     -+PARENT2 paint can be introduced into the walk. Even if exclusive
     -+PARENT1 commits remain, no new merge-base candidates can be
     -+discovered. The same holds symmetrically for PARENT1.
     -+
     -+This invariant is only valid in the finite region of the graph.
     ++Once all queued entries are stale, no new merge-base candidates can
     ++be discovered -- that requires at least one non-stale commit from
     ++each side meeting. Continuing the walk could still invalidate
     ++existing candidates by proving one is an ancestor of another, but
     ++`remove_redundant()` handles that as a post-processing step, so it
     ++is safe to exit early.
      +
      +Related documentation
      +---------------------
      +
      +  - `Documentation/technical/commit-graph.adoc` -- generation numbers
      +    and the reachability closure property.
     +
     + ## commit-reach.c ##
     +@@ commit-reach.c: static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue)
     + 	return commit;
     + }
     + 
     +-/* all input commits in one and twos[] must have been parsed! */
     ++/*
     ++ * See Documentation/technical/paint-down-to-common.adoc
     ++ *
     ++ * All input commits in one and twos[] must have been parsed!
     ++ */
     + static int paint_down_to_common(struct repository *r,
     + 				struct commit *one, int n,
     + 				struct commit **twos,
 4:  91372b975f ! 2:  6151b8e0a3 t6600: add test cases for side-exhaustion edge cases
     @@ t/t6600-test-reach.sh: test_expect_success 'setup' '
      +	#   ps-T1   ps-T2
      +	#
      +	# where ps-T1=merge(ps-Z,ps-B), ps-T2=merge(ps-W,ps-B), so
     -+	# merge-base(ps-T1,ps-T2) = ps-B.  During the walk, ps-X transitions
     ++	# merge-base(ps-T1,ps-T2) = ps-B. During the walk, ps-X transitions
      +	# to (PARENT1|PARENT2) via ps-Z and ps-W before ps-B is dequeued;
      +	# then the STALE-walk from ps-B transitions ps-X to
      +	# (PARENT1|PARENT2|STALE).
     @@ t/t6600-test-reach.sh: test_expect_success 'setup' '
      +
      +	# Build a side topology that lives entirely outside the half
      +	# commit-graph and has non-monotonic commit dates, to exercise the
     -+	# INFINITY-gate in paint_down_to_common.  With both tips outside
     ++	# INFINITY-gate in paint_down_to_common. With both tips outside
      +	# the graph, generation is INFINITY and the queue falls back to
      +	# commit-date order, which here is non-monotonic.
      +	#
     @@ t/t6600-test-reach.sh: test_expect_success 'get_merge_bases_many' '
      +
      +test_expect_success 'get_merge_bases_many:pending-stale' '
      +	# Exercises the (PARENT1|PARENT2) -> (...|STALE) transition path in
     -+	# paint_down_to_common().  See the topology comment in the setup test.
     ++	# paint_down_to_common(). See the topology comment in the setup test.
      +	cat >input <<-\EOF &&
      +	A:ps-T1
      +	X:ps-T2
     @@ t/t6600-test-reach.sh: test_expect_success 'get_merge_bases_many' '
      +'
      +
      +test_expect_success 'get_merge_bases_many:infinity-both-sides' '
     -+	# Exercises the push-time INFINITY-gate in paint_down_to_common().  See
     ++	# Exercises the push-time INFINITY-gate in paint_down_to_common(). See
      +	# the pi-* topology comment in the setup test.
      +	cat >input <<-\EOF &&
      +	A:pi-X
 5:  faf5bc98ed ! 3:  90f09ecb5c t6099, t6600: add side-exhaustion regression tests
     @@ t/t6099-merge-base-side-exhaustion.sh (new)
      +
      +Test that merge-base --all correctly handles cases where
      +multiple merge-base candidates exist and one is an ancestor
     -+of another.  The side-exhaustion optimization in
     ++of another. The side-exhaustion optimization in
      +paint_down_to_common may exit before STALE propagation
      +removes the ancestor, but remove_redundant catches it.
      +
     @@ t/t6600-test-reach.sh: test_expect_success 'get_merge_bases_many:infinity-both-s
      +test_expect_success 'setup mixed finite/INFINITY topology' '
      +	# Create a commit outside all saved commit-graph files so it always
      +	# has INFINITY generation, while its parent (ps-X) is in the graph
     -+	# with a finite generation.  Use the ps-* orphan topology so we do
     ++	# with a finite generation. Use the ps-* orphan topology so we do
      +	# not pollute the grid-based rev-list tests.
      +	git checkout ps-X &&
      +	test_env GIT_TEST_COMMIT_GRAPH= test_commit pm-INF
     @@ t/t6600-test-reach.sh: test_expect_success 'get_merge_bases_many:infinity-both-s
      +test_expect_success 'get_merge_bases_many:mixed-finite-infinity' '
      +	# One tip (pm-INF) is outside the commit-graph with INFINITY
      +	# generation; the other (ps-B) is in the graph with finite
     -+	# generation.  The walk starts in the INFINITY region and crosses
     ++	# generation. The walk starts in the INFINITY region and crosses
      +	# into the finite region where side-exhaustion can fire.
      +	cat >input <<-\EOF &&
      +	A:pm-INF
 -:  ---------- > 4:  6ade4df2ed commit-reach: add trace2 instrumentation to paint_down_to_common()
 2:  316e4dfe26 ! 5:  f24edd45f0 commit-reach: introduce struct paint_queue with per-side counters
     @@ Metadata
      Author: Kristofer Karlsson <krka@spotify.com>
      
       ## Commit message ##
     -    commit-reach: introduce struct paint_queue with per-side counters
     +    commit-reach: introduce struct paint_state with per-side counters
      
     -    Replace the nonstale_queue abstraction in paint_down_to_common() with
     -    a new paint_queue struct that tracks per-side commit counts. Each
     -    non-stale queued commit occupies exactly one counter bucket based on
     -    its paint flags: PARENT1-only, PARENT2-only, or both sides (a pending
     +    Add a paint_state struct for use by paint_down_to_common() that
     +    wraps a prio_queue with per-side commit counters. Each non-stale
     +    queued commit occupies exactly one counter bucket based on its
     +    paint flags: PARENT1-only, PARENT2-only, or both sides (a pending
          merge-base candidate).
      
     -    The counters are maintained by paint_count_transition() which handles
     -    all flag changes as bucket transfers: remove from the old bucket, add
     -    to the new one. Either step is a no-op when the respective state has
     -    no bucket (stale or zero).
     +    The counters are maintained by paint_count_update() which adjusts
     +    the appropriate bucket by a signed delta. An exhaustive switch on
     +    the paint+stale bits documents all valid flag combinations in one
     +    place.
      
     -    The loop now drains the queue via paint_queue_get() and breaks when
     -    all counters reach zero, replacing the old pointer-based termination
     -    (max_nonstale). This is equivalent behavior.
     +    Convert paint_down_to_common() to use paint_state. The loop now
     +    drains the queue via paint_queue_get() which returns NULL when all
     +    counters reach zero, replacing the old pointer-based termination
     +    (max_nonstale). This is equivalent behavior -- both conditions
     +    detect that no non-stale entries remain.
     +
     +    The existing nonstale_queue is left in place for ahead_behind().
     +
     +    Step counts (via trace2 from the previous commit) are identical
     +    before and after this refactoring, confirming no behavioral change.
      
          Signed-off-by: Kristofer Karlsson <krka@spotify.com>
      
     + ## Documentation/technical/paint-down-to-common.adoc ##
     +@@ Documentation/technical/paint-down-to-common.adoc: re-enqueued is bounded by the number of flag transitions.
     + Termination
     + -----------
     + 
     +-The walk uses a `nonstale_queue` wrapper around `prio_queue` that
     +-tracks `max_nonstale`: the lowest-priority non-stale commit enqueued
     +-so far. Once that commit is dequeued, every remaining entry is known
     +-to be STALE and the loop terminates. Specifically, the main loop
     ++The walk tracks the number of commits of each type in the queue
     ++(PARENT1-only, PARENT2-only, pending merge-base). The main loop
     + ends when one of the following conditions holds:
     + 
     +   1. The queue is empty.
     +-  2. `max_nonstale` has been dequeued, meaning the queue only contains
     +-     STALE entries.
     ++  2. The queue contains only stale entries.
     + 
     + Stale entry condition
     + ~~~~~~~~~~~~~~~~~~~~~
     +
       ## commit-reach.c ##
     -@@ commit-reach.c: static int compare_commits_by_gen(const void *_a, const void *_b)
     +@@ commit-reach.c: static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue)
     + 	return commit;
       }
       
     - /*
     -- * A prio_queue with O(1) termination check.  'max_nonstale' tracks
     -- * the lowest-priority non-stale commit enqueued so far; once it is
     -- * popped, every remaining entry is known to be STALE.
     ++/*
      + * Priority queue with per-side commit counters for paint_down_to_common().
      + * Each non-stale queued commit occupies exactly one bucket: PARENT1-only,
      + * PARENT2-only, or both (a pending merge-base candidate).
     -  */
     --struct nonstale_queue {
     -+struct paint_queue {
     - 	struct prio_queue pq;
     --	struct commit *max_nonstale;
     ++ */
     ++struct paint_state {
     ++	struct prio_queue queue;
      +	int p1_count;
      +	int p2_count;
      +	int pending_merge_bases;
     - };
     - 
     --static void nonstale_queue_put(struct nonstale_queue *queue,
     --			       struct commit *c)
     -+/*
     -+ * Adjust per-side counters for a paint-state transition.  Non-stale
     -+ * commits are counted in one of three counters: PARENT1-only,
     -+ * PARENT2-only, or both.  Zero means "not in the queue" (used on
     -+ * enqueue/dequeue); stale commits are not counted at all.
     -+ */
     -+static void paint_count_transition(struct paint_queue *queue,
     -+				   unsigned old_flags, unsigned new_flags)
     - {
     --	struct commit *old = queue->max_nonstale;
     -+	unsigned old_paint = old_flags & (PARENT1 | PARENT2 | STALE);
     -+	unsigned new_paint = new_flags & (PARENT1 | PARENT2 | STALE);
     - 
     --	prio_queue_put(&queue->pq, c);
     --	if (c->object.flags & STALE)
     -+	if (old_paint == new_paint)
     - 		return;
     --	if (!old || queue->pq.compare(old, c, queue->pq.cb_data) <= 0)
     --		queue->max_nonstale = c;
     --}
     --
     --static struct commit *nonstale_queue_get(struct nonstale_queue *queue)
     --{
     --	struct commit *commit = prio_queue_get(&queue->pq);
     - 
     --	if (commit == queue->max_nonstale)
     --		queue->max_nonstale = NULL;
     --
     --	return commit;
     -+	if (!(old_paint & STALE)) {
     -+		switch (old_paint & (PARENT1 | PARENT2)) {
     -+		case 0:                  break;
     -+		case PARENT1:            queue->p1_count--; break;
     -+		case PARENT2:            queue->p2_count--; break;
     -+		case PARENT1 | PARENT2:  queue->pending_merge_bases--; break;
     -+		default:                 BUG("unexpected paint state");
     -+		}
     -+	}
     -+	if (!(new_paint & STALE)) {
     -+		switch (new_paint & (PARENT1 | PARENT2)) {
     -+		case 0:                  break;
     -+		case PARENT1:            queue->p1_count++; break;
     -+		case PARENT2:            queue->p2_count++; break;
     -+		case PARENT1 | PARENT2:  queue->pending_merge_bases++; break;
     -+		default:                 BUG("unexpected paint state");
     -+		}
     ++};
     ++
     ++static void paint_count_update(struct paint_state *state,
     ++			       unsigned flags, int delta)
     ++{
     ++	switch (flags & (PARENT1 | PARENT2 | STALE)) {
     ++	case PARENT1:
     ++		state->p1_count += delta;
     ++		break;
     ++
     ++	case PARENT2:
     ++		state->p2_count += delta;
     ++		break;
     ++
     ++	case PARENT1 | PARENT2:
     ++		state->pending_merge_bases += delta;
     ++		break;
     ++
     ++	case PARENT1 | PARENT2 | STALE:
     ++		break;
     ++
     ++	default:
     ++		BUG("unexpected paint state");
      +	}
     - }
     - 
     --static void clear_nonstale_queue(struct nonstale_queue *queue)
     -+static void paint_queue_put(struct paint_queue *queue,
     ++}
     ++
     ++static void paint_queue_put(struct paint_state *state,
      +			    struct commit *c, unsigned add_flags)
     - {
     --	clear_prio_queue(&queue->pq);
     --	queue->max_nonstale = NULL;
     --}
     ++{
      +	unsigned old_flags = c->object.flags;
      +	c->object.flags |= add_flags;
     - 
     --static void nonstale_queue_put_dedup(struct nonstale_queue *queue,
     --				     struct commit *c)
     --{
     --	if (c->object.flags & ENQUEUED)
     --		return;
     --	c->object.flags |= ENQUEUED;
     --	nonstale_queue_put(queue, c);
     ++
      +	if (old_flags & ENQUEUED) {
     -+		paint_count_transition(queue, old_flags, c->object.flags);
     ++		paint_count_update(state, old_flags, -1);
     ++		paint_count_update(state, c->object.flags, 1);
      +	} else {
      +		c->object.flags |= ENQUEUED;
     -+		prio_queue_put(&queue->pq, c);
     -+		paint_count_transition(queue, 0, c->object.flags);
     ++		prio_queue_put(&state->queue, c);
     ++		paint_count_update(state, c->object.flags, 1);
      +	}
     - }
     - 
     --static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue)
     -+static struct commit *paint_queue_get(struct paint_queue *queue)
     - {
     --	struct commit *commit = nonstale_queue_get(queue);
     -+	struct commit *commit = prio_queue_get(&queue->pq);
     - 
     --	if (commit)
     ++}
     ++
     ++static struct commit *paint_queue_get(struct paint_state *state)
     ++{
     ++	struct commit *commit;
     ++
     ++	if (!state->p1_count && !state->p2_count &&
     ++	    !state->pending_merge_bases)
     ++		return NULL;
     ++
     ++	commit = prio_queue_get(&state->queue);
      +	if (commit) {
     - 		commit->object.flags &= ~ENQUEUED;
     -+		paint_count_transition(queue, commit->object.flags, 0);
     ++		commit->object.flags &= ~ENQUEUED;
     ++		paint_count_update(state, commit->object.flags, -1);
      +	}
     - 	return commit;
     - }
     - 
     ++	return commit;
     ++}
     ++
     + /*
     +  * See Documentation/technical/paint-down-to-common.adoc
     +  *
      @@ commit-reach.c: static int paint_down_to_common(struct repository *r,
       				enum merge_base_flags mb_flags,
       				struct commit_list **result)
       {
      -	struct nonstale_queue queue = {
      -		{ compare_commits_by_gen_then_commit_date }
     -+	struct paint_queue queue = {
     -+		.pq = { compare_commits_by_gen_then_commit_date }
     ++	struct paint_state state = {
     ++		.queue = { compare_commits_by_gen_then_commit_date }
       	};
      +	struct commit *commit;
       	int i;
     + 	int steps = 0;
       	timestamp_t last_gen = GENERATION_NUMBER_INFINITY;
       	struct commit_list **tail = result;
     -@@ commit-reach.c: static int paint_down_to_common(struct repository *r,
     + 
     + 	if (!min_generation && !corrected_commit_dates_enabled(r))
     +-		queue.pq.compare = compare_commits_by_commit_date;
     ++		state.queue.compare = compare_commits_by_commit_date;
     + 
     + 	one->object.flags |= PARENT1;
     + 	if (!n) {
       		commit_list_append(one, result);
       		return 0;
       	}
      -	nonstale_queue_put_dedup(&queue, one);
     -+	paint_queue_put(&queue, one, 0);
     ++	paint_queue_put(&state, one, 0);
       
      -	for (i = 0; i < n; i++) {
      -		twos[i]->object.flags |= PARENT2;
      -		nonstale_queue_put_dedup(&queue, twos[i]);
      -	}
      +	for (i = 0; i < n; i++)
     -+		paint_queue_put(&queue, twos[i], PARENT2);
     ++		paint_queue_put(&state, twos[i], PARENT2);
       
      -	while (queue.max_nonstale) {
      -		struct commit *commit = nonstale_queue_get_dedup(&queue);
     -+	while ((commit = paint_queue_get(&queue))) {
     ++	while ((commit = paint_queue_get(&state))) {
       		struct commit_list *parents;
       		int flags;
       		timestamp_t generation = commit_graph_generation(commit);
     @@ commit-reach.c: static int paint_down_to_common(struct repository *r,
       				continue;
       			if (repo_parse_commit(r, p)) {
      -				clear_nonstale_queue(&queue);
     -+				clear_prio_queue(&queue.pq);
     ++				clear_prio_queue(&state.queue);
       				commit_list_free(*result);
       				*result = NULL;
       				/*
     @@ commit-reach.c: static int paint_down_to_common(struct repository *r,
       			}
      -			p->object.flags |= flags;
      -			nonstale_queue_put_dedup(&queue, p);
     -+			paint_queue_put(&queue, p, flags);
     ++			paint_queue_put(&state, p, flags);
       		}
     -+
     -+		if (queue.p1_count + queue.p2_count +
     -+		    queue.pending_merge_bases == 0)
     -+			break;
       	}
       
      -	clear_nonstale_queue(&queue);
     -+	clear_prio_queue(&queue.pq);
     ++	clear_prio_queue(&state.queue);
     + 	trace2_data_intmax("paint_down_to_common", r,
     + 			   "steps", steps);
       	commit_list_sort_by_date(result);
     - 	return 0;
     - }
 -:  ---------- > 6:  8c72f01083 commit-reach: remove unused nonstale_queue dedup wrappers
 3:  ed12a5cb5b ! 7:  d84b932e5b commit-reach: terminate merge-base walk when one paint side is exhausted
     @@ Commit message
          commit-reach: terminate merge-base walk when one paint side is exhausted
      
          Add an early termination check to paint_down_to_common() using the
     -    per-side counters introduced in the previous commit. Once the walk
     -    enters the finite-generation region, terminate early when one side's
     -    exclusive count drops to zero -- no new merge-base can form without
     -    both paint sides meeting.
     +    per-side counters introduced earlier. Once the walk enters the
     +    finite-generation region, terminate early when one side's exclusive
     +    count drops to zero -- no new merge-base can form without both paint
     +    sides meeting.
      
          The check also waits for pending_merge_bases to reach zero, ensuring
     -    all merge-base candidates have been popped and recorded before
     +    all merge-base candidates have been dequeued and recorded before
          exiting.
      
          The INFINITY gate ensures correctness: commits without a commit-graph
     @@ Commit message
          once the walk enters the finite-generation region where ordering
          guarantees hold.
      
     -    On large repositories with commit-graph, this yields 100-1000x
     -    speedups for merge-base queries where one side (e.g. a PR branch) is
     -    much smaller than the other.
     +    Step counts measured with trace2 on git.git with commit-graph:
     +
     +      merge-base --all v2.0.0 v2.55.0-rc1:
     +        before: 72264 steps    after: 44589 steps
     +
     +      merge-base --all v2.55.0-rc1 v2.55.0-rc1~5:
     +        before:   110 steps    after:     7 steps
      
          Helped-by: Derrick Stolee <stolee@gmail.com>
          Helped-by: Elijah Newren <newren@gmail.com>
          Signed-off-by: Kristofer Karlsson <krka@spotify.com>
      
     + ## Documentation/technical/paint-down-to-common.adoc ##
     +@@ Documentation/technical/paint-down-to-common.adoc: ends when one of the following conditions holds:
     + 
     +   1. The queue is empty.
     +   2. The queue contains only stale entries.
     ++  3. Side exhaustion: no pure PARENT1 or pure PARENT2 commits
     ++     remain in the queue, no pending merge-base candidates exist,
     ++     and the walk has entered the finite-generation region.
     + 
     + Stale entry condition
     + ~~~~~~~~~~~~~~~~~~~~~
     +@@ Documentation/technical/paint-down-to-common.adoc: existing candidates by proving one is an ancestor of another, but
     + `remove_redundant()` handles that as a post-processing step, so it
     + is safe to exit early.
     + 
     ++Side-exhaustion condition
     ++~~~~~~~~~~~~~~~~~~~~~~~~~
     ++A new merge-base requires commits from both sides to meet. When one
     ++side's exclusive counter reaches zero and there are no pending
     ++merge-base candidates, no future traversal step can produce a new
     ++candidate.
     ++
     ++This optimization only activates in the finite-generation region
     ++where topological ordering holds. In that region, children are
     ++always visited before parents, so paint flags are final at visit
     ++time and an exhausted side cannot reappear. In the INFINITY region,
     ++commit-date ordering can violate this guarantee, so the check is
     ++skipped.
     ++
     + Related documentation
     + ---------------------
     + 
     +
       ## commit-reach.c ##
     -@@ commit-reach.c: static int paint_down_to_common(struct repository *r,
     - 		if (queue.p1_count + queue.p2_count +
     - 		    queue.pending_merge_bases == 0)
     - 			break;
     +@@ commit-reach.c: static void paint_queue_put(struct paint_state *state,
     + 
     + static struct commit *paint_queue_get(struct paint_state *state)
     + {
     +-	struct commit *commit;
     ++	struct commit *commit = prio_queue_get(&state->queue);
     + 
     +-	if (!state->p1_count && !state->p2_count &&
     +-	    !state->pending_merge_bases)
     ++	if (!commit)
     + 		return NULL;
     + 
     +-	commit = prio_queue_get(&state->queue);
     +-	if (commit) {
     +-		commit->object.flags &= ~ENQUEUED;
     +-		paint_count_update(state, commit->object.flags, -1);
     ++	commit->object.flags &= ~ENQUEUED;
      +
     ++	if (!state->pending_merge_bases) {
     ++		if (!state->p1_count && !state->p2_count)
     ++			return NULL;
      +		/*
      +		 * Side exhaustion: a new merge-base can only form
      +		 * when both PARENT1-only and PARENT2-only commits
     -+		 * remain in the queue.  In the finite-generation
     ++		 * remain in the queue. In the finite-generation
      +		 * region the queue is ordered topologically, so
      +		 * no future step can add paint to visited commits
      +		 * and an exhausted side cannot reappear.
      +		 */
     -+		if (generation < GENERATION_NUMBER_INFINITY &&
     -+		    queue.pending_merge_bases == 0 &&
     -+		    (queue.p1_count == 0 || queue.p2_count == 0))
     -+			break;
     ++		if ((!state->p1_count || !state->p2_count) &&
     ++		    commit_graph_generation(commit) < GENERATION_NUMBER_INFINITY)
     ++			return NULL;
       	}
     ++
     ++	paint_count_update(state, commit->object.flags, -1);
     + 	return commit;
     + }
     + 
     +
     + ## t/t6600-test-reach.sh ##
     +@@ t/t6600-test-reach.sh: test_expect_success 'merge-base --all commit-walk steps' '
     + 	cp commit-graph-full .git/objects/info/commit-graph &&
     + 	GIT_TRACE2_EVENT="$(pwd)/trace-full.txt" \
     + 		git merge-base --all commit-9-9 commit-9-1 >actual &&
     +-	test_trace2_data paint_down_to_common steps 80 <trace-full.txt &&
     ++	test_trace2_data paint_down_to_common steps 9 <trace-full.txt &&
     + 
     + 	cp commit-graph-half .git/objects/info/commit-graph &&
     + 	GIT_TRACE2_EVENT="$(pwd)/trace-half.txt" \
     + 		git merge-base --all commit-9-9 commit-9-1 >actual &&
     +-	test_trace2_data paint_down_to_common steps 81 <trace-half.txt
     ++	test_trace2_data paint_down_to_common steps 57 <trace-half.txt
     + '
       
     - 	clear_prio_queue(&queue.pq);
     + test_expect_success 'reduce_heads' '

-- 
gitgitgadget

  parent reply	other threads:[~2026-06-24 12:14 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-20 10:36 [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted Kristofer Karlsson via GitGitGadget
2026-06-20 10:36 ` [PATCH/RFC 1/6] commit-reach: decouple ahead_behind from nonstale_queue Kristofer Karlsson via GitGitGadget
2026-06-22 18:00   ` Derrick Stolee
2026-06-22 18:53     ` Kristofer Karlsson
2026-06-20 10:36 ` [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters Kristofer Karlsson via GitGitGadget
2026-06-22 18:10   ` Derrick Stolee
2026-06-22 19:14     ` Kristofer Karlsson
2026-06-22 20:23       ` Derrick Stolee
2026-06-23 10:13         ` Kristofer Karlsson
2026-06-23 13:50           ` Derrick Stolee
2026-06-23 14:09             ` Kristofer Karlsson
2026-06-23 14:17               ` Derrick Stolee
2026-06-24 11:25                 ` Kristofer Karlsson
2026-06-20 10:36 ` [PATCH/RFC 3/6] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget
2026-06-22 18:12   ` Derrick Stolee
2026-06-22 19:19     ` Kristofer Karlsson
2026-06-22 20:26       ` Derrick Stolee
2026-06-22 21:03         ` Kristofer Karlsson
2026-06-23 13:40           ` Derrick Stolee
2026-06-20 10:36 ` [PATCH/RFC 4/6] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget
2026-06-22 18:15   ` Derrick Stolee
2026-06-22 19:25     ` Kristofer Karlsson
2026-06-22 20:28       ` Derrick Stolee
2026-06-20 10:36 ` [PATCH/RFC 5/6] t6099, t6600: add side-exhaustion regression tests Kristofer Karlsson via GitGitGadget
2026-06-22 18:16   ` Derrick Stolee
2026-06-20 10:36 ` [PATCH/RFC 6/6] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget
2026-06-22 18:21   ` Derrick Stolee
2026-06-22 19:30     ` Kristofer Karlsson
2026-06-22 18:22 ` [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted Derrick Stolee
2026-06-24 12:14 ` Kristofer Karlsson via GitGitGadget [this message]
2026-06-24 12:14   ` [PATCH v2 1/7] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget
2026-06-24 17:09     ` Junio C Hamano
2026-06-24 12:14   ` [PATCH v2 2/7] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget
2026-06-24 13:43     ` Derrick Stolee
2026-06-24 14:33       ` Kristofer Karlsson
2026-06-24 12:14   ` [PATCH v2 3/7] t6099, t6600: add side-exhaustion regression tests Kristofer Karlsson via GitGitGadget
2026-06-24 12:14   ` [PATCH v2 4/7] commit-reach: add trace2 instrumentation to paint_down_to_common() Kristofer Karlsson via GitGitGadget
2026-06-24 13:41     ` Derrick Stolee
2026-06-24 14:31       ` Kristofer Karlsson
2026-06-24 12:14   ` [PATCH v2 5/7] commit-reach: introduce struct paint_state with per-side counters Kristofer Karlsson via GitGitGadget
2026-06-24 13:54     ` Derrick Stolee
2026-06-24 14:38       ` Kristofer Karlsson
2026-06-24 12:14   ` [PATCH v2 6/7] commit-reach: remove unused nonstale_queue dedup wrappers Kristofer Karlsson via GitGitGadget
2026-06-24 13:55     ` Derrick Stolee
2026-06-24 12:14   ` [PATCH v2 7/7] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget
2026-06-24 14:02     ` Derrick Stolee
2026-06-24 14:47       ` Kristofer Karlsson
2026-06-24 15:07         ` Derrick Stolee
2026-06-24 13:34   ` [PATCH v2 0/7] commit-reach: terminate merge-base walk when one " Derrick Stolee
2026-06-24 14:25     ` Kristofer Karlsson
2026-06-24 14:09   ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.2149.v2.git.1782303254.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=krka@spotify.com \
    --cc=newren@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.