* [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted
@ 2026-06-20 10:36 Kristofer Karlsson via GitGitGadget
2026-06-20 10:36 ` [PATCH/RFC 1/6] commit-reach: decouple ahead_behind from nonstale_queue Kristofer Karlsson via GitGitGadget
` (7 more replies)
0 siblings, 8 replies; 51+ messages in thread
From: Kristofer Karlsson via GitGitGadget @ 2026-06-20 10:36 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson
Hi,
This follows up on my RFC [1] with a concrete proposal. I expect the design
to still be scrutinized, but that may be easier with actual code to look at.
I tried to make this easier to review by splitting into atomic patches. The
first two patches are the meatiest parts, though they are pure refactoring.
The behavior change is in patch 3 and is in itself quite small. The last
patch adds technical documentation to support future development.
----------------------------------------------------------------------------
Optimize paint_down_to_common() for merge-base queries that hit large
one-sided histories.
When the walk from one side reaches a commit with a very low generation
number that the other side never paints, the walk is forced to drain most of
the graph. A common trigger is a repository import that grafts a separate
history with its own root, but any merge that introduces a low-generation
commit never painted by the other side has the same effect.
A new merge-base candidate can only be discovered when exclusive PARENT1 and
PARENT2 paint meet. This series teaches paint_down_to_common() to stop as
soon as one side has no exclusive commits left in the queue; once one side
is exhausted, no further candidates can appear.
origin/HEAD o o PR HEAD
| |
(import) o :
/ \ /
| o merge-base
| |
: : (~2.5M commits)
| |
import root main root
In the RFC thread [1], Derrick Stolee provided a criss-cross counterexample
that sharpened the halt condition, and Elijah Newren independently
discovered the same optimization and shared an implementation in PR #2150
[2]. Patch 4 incorporates test cases from Elijah's branch.
This series implements the optimization only after the walk enters the
finite-generation region, where generation ordering guarantees that paint on
visited commits is final.
Patch layout:
1/6 commit-reach: decouple ahead_behind from nonstale_queue 2/6
commit-reach: introduce paint_queue and per-side counters 3/6 commit-reach:
stop the walk when one side is exhausted 4/6 t6600: add side-exhaustion
edge-case tests 5/6 t6099, t6600: add side-exhaustion regression tests 6/6
Documentation/technical: document paint_down_to_common()
Benchmarks
Measured on a 2.6M-commit monorepo with commit-graph (baseline v2.55-rc1):
merge-base --all (across import) 4.293s -> 8ms (537x)
merge-tree (across import) 5.345s -> 13ms (411x)
merge-base --all (1000 commits apart) 5.404s -> 7ms (772x)
No regression on linux.git (1.4M commits, commit-graph):
merge-base HEAD HEAD~1000 38ms -> 40ms
merge-base --all HEAD HEAD~1000 87ms -> 36ms
merge-base --is-ancestor HEAD~1000 HEAD 11ms -> 11ms
merge-base --all HEAD HEAD~10000 626ms -> 428ms
[1]
https://lore.kernel.org/git/CAL71e4Ps-2_0+uuZu43N9pFnXBemoAohPs_eyRJf8taXHJPAXQ@mail.gmail.com/T/#u
[2] https://github.com/gitgitgadget/git/pull/2150
Elijah Newren (1):
t6600: add test cases for side-exhaustion edge cases
Kristofer Karlsson (5):
commit-reach: decouple ahead_behind from nonstale_queue
commit-reach: introduce struct paint_queue with per-side counters
commit-reach: terminate merge-base walk when one paint side is
exhausted
t6099, t6600: add side-exhaustion regression tests
Documentation/technical: add paint-down-to-common doc
Documentation/Makefile | 1 +
Documentation/technical/meson.build | 1 +
.../technical/paint-down-to-common.adoc | 130 ++++++++++++++
commit-reach.c | 159 +++++++++++-------
t/meson.build | 1 +
t/t6099-merge-base-side-exhaustion.sh | 82 +++++++++
t/t6600-test-reach.sh | 136 +++++++++++++++
7 files changed, 451 insertions(+), 59 deletions(-)
create mode 100644 Documentation/technical/paint-down-to-common.adoc
create mode 100755 t/t6099-merge-base-side-exhaustion.sh
base-commit: 8d96f09e9245ddf80c1981476fcbac8c4bb4125f
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2149%2Fspkrka%2Fside-exhaust-pr-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2149/spkrka/side-exhaust-pr-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/2149
--
gitgitgadget
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH/RFC 1/6] commit-reach: decouple ahead_behind from nonstale_queue
2026-06-20 10:36 [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted Kristofer Karlsson via GitGitGadget
@ 2026-06-20 10:36 ` Kristofer Karlsson via GitGitGadget
2026-06-22 18:00 ` Derrick Stolee
2026-06-20 10:36 ` [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters Kristofer Karlsson via GitGitGadget
` (6 subsequent siblings)
7 siblings, 1 reply; 51+ messages in thread
From: Kristofer Karlsson via GitGitGadget @ 2026-06-20 10:36 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson,
Kristofer Karlsson
From: Kristofer Karlsson <krka@spotify.com>
Move ahead_behind() off the shared nonstale_queue abstraction to use
a plain prio_queue with a local max_nonstale pointer. The nonstale
tracking is inlined into insert_no_dup().
This prepares for replacing nonstale_queue with a paint_queue struct
that tracks per-side commit counts, which ahead_behind() does not
need. No behavior change.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
commit-reach.c | 32 +++++++++++++++++++++-----------
1 file changed, 21 insertions(+), 11 deletions(-)
diff --git a/commit-reach.c b/commit-reach.c
index 5df471a313..377a5cc42a 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -1089,12 +1089,18 @@ struct commit_list *get_reachable_subset(struct commit **from, size_t nr_from,
define_commit_slab(bit_arrays, struct bitmap *);
static struct bit_arrays bit_arrays;
-static void insert_no_dup(struct nonstale_queue *queue, struct commit *c)
+static void insert_no_dup(struct prio_queue *queue,
+ struct commit **max_nonstale,
+ struct commit *c)
{
if (c->object.flags & PARENT2)
return;
- nonstale_queue_put(queue, c);
c->object.flags |= PARENT2;
+ prio_queue_put(queue, c);
+ if (!(c->object.flags & STALE) &&
+ (!*max_nonstale ||
+ queue->compare(*max_nonstale, c, queue->cb_data) <= 0))
+ *max_nonstale = c;
}
static struct bitmap *get_bit_array(struct commit *c, int width)
@@ -1118,9 +1124,10 @@ void ahead_behind(struct repository *r,
struct commit **commits, size_t commits_nr,
struct ahead_behind_count *counts, size_t counts_nr)
{
- struct nonstale_queue queue = {
- { .compare = compare_commits_by_gen_then_commit_date }
+ struct prio_queue queue = {
+ .compare = compare_commits_by_gen_then_commit_date
};
+ struct commit *max_nonstale = NULL;
size_t width = DIV_ROUND_UP(commits_nr, BITS_IN_EWORD);
if (!commits_nr || !counts_nr)
@@ -1140,14 +1147,17 @@ void ahead_behind(struct repository *r,
struct bitmap *bitmap = get_bit_array(c, width);
bitmap_set(bitmap, i);
- insert_no_dup(&queue, c);
+ insert_no_dup(&queue, &max_nonstale, c);
}
- while (queue.max_nonstale) {
- struct commit *c = nonstale_queue_get(&queue);
+ while (max_nonstale) {
+ struct commit *c = prio_queue_get(&queue);
struct commit_list *p;
struct bitmap *bitmap_c = get_bit_array(c, width);
+ if (c == max_nonstale)
+ max_nonstale = NULL;
+
for (size_t i = 0; i < counts_nr; i++) {
int reach_from_tip = !!bitmap_get(bitmap_c, counts[i].tip_index);
int reach_from_base = !!bitmap_get(bitmap_c, counts[i].base_index);
@@ -1178,7 +1188,7 @@ void ahead_behind(struct repository *r,
if (bitmap_popcount(bitmap_p) == commits_nr)
p->item->object.flags |= STALE;
- insert_no_dup(&queue, p->item);
+ insert_no_dup(&queue, &max_nonstale, p->item);
}
free_bit_array(c);
@@ -1186,10 +1196,10 @@ void ahead_behind(struct repository *r,
/* STALE is used here, PARENT2 is used by insert_no_dup(). */
repo_clear_commit_marks(r, PARENT2 | STALE);
- for (size_t i = 0; i < queue.pq.nr; i++)
- free_bit_array(queue.pq.array[i].data);
+ for (size_t i = 0; i < queue.nr; i++)
+ free_bit_array(queue.array[i].data);
clear_bit_arrays(&bit_arrays);
- clear_nonstale_queue(&queue);
+ clear_prio_queue(&queue);
}
struct commit_and_index {
--
gitgitgadget
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters
2026-06-20 10:36 [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted Kristofer Karlsson via GitGitGadget
2026-06-20 10:36 ` [PATCH/RFC 1/6] commit-reach: decouple ahead_behind from nonstale_queue Kristofer Karlsson via GitGitGadget
@ 2026-06-20 10:36 ` Kristofer Karlsson via GitGitGadget
2026-06-22 18:10 ` Derrick Stolee
2026-06-20 10:36 ` [PATCH/RFC 3/6] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget
` (5 subsequent siblings)
7 siblings, 1 reply; 51+ messages in thread
From: Kristofer Karlsson via GitGitGadget @ 2026-06-20 10:36 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson,
Kristofer Karlsson
From: Kristofer Karlsson <krka@spotify.com>
Replace the nonstale_queue abstraction in paint_down_to_common() with
a new paint_queue struct that tracks per-side commit counts. Each
non-stale queued commit occupies exactly one counter bucket based on
its paint flags: PARENT1-only, PARENT2-only, or both sides (a pending
merge-base candidate).
The counters are maintained by paint_count_transition() which handles
all flag changes as bucket transfers: remove from the old bucket, add
to the new one. Either step is a no-op when the respective state has
no bucket (stale or zero).
The loop now drains the queue via paint_queue_get() and breaks when
all counters reach zero, replacing the old pointer-based termination
(max_nonstale). This is equivalent behavior.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
commit-reach.c | 114 ++++++++++++++++++++++++++++---------------------
1 file changed, 66 insertions(+), 48 deletions(-)
diff --git a/commit-reach.c b/commit-reach.c
index 377a5cc42a..ba1e896f0f 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -41,58 +41,75 @@ static int compare_commits_by_gen(const void *_a, const void *_b)
}
/*
- * A prio_queue with O(1) termination check. 'max_nonstale' tracks
- * the lowest-priority non-stale commit enqueued so far; once it is
- * popped, every remaining entry is known to be STALE.
+ * Priority queue with per-side commit counters for paint_down_to_common().
+ * Each non-stale queued commit occupies exactly one bucket: PARENT1-only,
+ * PARENT2-only, or both (a pending merge-base candidate).
*/
-struct nonstale_queue {
+struct paint_queue {
struct prio_queue pq;
- struct commit *max_nonstale;
+ int p1_count;
+ int p2_count;
+ int pending_merge_bases;
};
-static void nonstale_queue_put(struct nonstale_queue *queue,
- struct commit *c)
+/*
+ * Adjust per-side counters for a paint-state transition. Non-stale
+ * commits are counted in one of three counters: PARENT1-only,
+ * PARENT2-only, or both. Zero means "not in the queue" (used on
+ * enqueue/dequeue); stale commits are not counted at all.
+ */
+static void paint_count_transition(struct paint_queue *queue,
+ unsigned old_flags, unsigned new_flags)
{
- struct commit *old = queue->max_nonstale;
+ unsigned old_paint = old_flags & (PARENT1 | PARENT2 | STALE);
+ unsigned new_paint = new_flags & (PARENT1 | PARENT2 | STALE);
- prio_queue_put(&queue->pq, c);
- if (c->object.flags & STALE)
+ if (old_paint == new_paint)
return;
- if (!old || queue->pq.compare(old, c, queue->pq.cb_data) <= 0)
- queue->max_nonstale = c;
-}
-
-static struct commit *nonstale_queue_get(struct nonstale_queue *queue)
-{
- struct commit *commit = prio_queue_get(&queue->pq);
- if (commit == queue->max_nonstale)
- queue->max_nonstale = NULL;
-
- return commit;
+ if (!(old_paint & STALE)) {
+ switch (old_paint & (PARENT1 | PARENT2)) {
+ case 0: break;
+ case PARENT1: queue->p1_count--; break;
+ case PARENT2: queue->p2_count--; break;
+ case PARENT1 | PARENT2: queue->pending_merge_bases--; break;
+ default: BUG("unexpected paint state");
+ }
+ }
+ if (!(new_paint & STALE)) {
+ switch (new_paint & (PARENT1 | PARENT2)) {
+ case 0: break;
+ case PARENT1: queue->p1_count++; break;
+ case PARENT2: queue->p2_count++; break;
+ case PARENT1 | PARENT2: queue->pending_merge_bases++; break;
+ default: BUG("unexpected paint state");
+ }
+ }
}
-static void clear_nonstale_queue(struct nonstale_queue *queue)
+static void paint_queue_put(struct paint_queue *queue,
+ struct commit *c, unsigned add_flags)
{
- clear_prio_queue(&queue->pq);
- queue->max_nonstale = NULL;
-}
+ unsigned old_flags = c->object.flags;
+ c->object.flags |= add_flags;
-static void nonstale_queue_put_dedup(struct nonstale_queue *queue,
- struct commit *c)
-{
- if (c->object.flags & ENQUEUED)
- return;
- c->object.flags |= ENQUEUED;
- nonstale_queue_put(queue, c);
+ if (old_flags & ENQUEUED) {
+ paint_count_transition(queue, old_flags, c->object.flags);
+ } else {
+ c->object.flags |= ENQUEUED;
+ prio_queue_put(&queue->pq, c);
+ paint_count_transition(queue, 0, c->object.flags);
+ }
}
-static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue)
+static struct commit *paint_queue_get(struct paint_queue *queue)
{
- struct commit *commit = nonstale_queue_get(queue);
+ struct commit *commit = prio_queue_get(&queue->pq);
- if (commit)
+ if (commit) {
commit->object.flags &= ~ENQUEUED;
+ paint_count_transition(queue, commit->object.flags, 0);
+ }
return commit;
}
@@ -104,9 +121,10 @@ static int paint_down_to_common(struct repository *r,
enum merge_base_flags mb_flags,
struct commit_list **result)
{
- struct nonstale_queue queue = {
- { compare_commits_by_gen_then_commit_date }
+ struct paint_queue queue = {
+ .pq = { compare_commits_by_gen_then_commit_date }
};
+ struct commit *commit;
int i;
timestamp_t last_gen = GENERATION_NUMBER_INFINITY;
struct commit_list **tail = result;
@@ -119,15 +137,12 @@ static int paint_down_to_common(struct repository *r,
commit_list_append(one, result);
return 0;
}
- nonstale_queue_put_dedup(&queue, one);
+ paint_queue_put(&queue, one, 0);
- for (i = 0; i < n; i++) {
- twos[i]->object.flags |= PARENT2;
- nonstale_queue_put_dedup(&queue, twos[i]);
- }
+ for (i = 0; i < n; i++)
+ paint_queue_put(&queue, twos[i], PARENT2);
- while (queue.max_nonstale) {
- struct commit *commit = nonstale_queue_get_dedup(&queue);
+ while ((commit = paint_queue_get(&queue))) {
struct commit_list *parents;
int flags;
timestamp_t generation = commit_graph_generation(commit);
@@ -165,7 +180,7 @@ static int paint_down_to_common(struct repository *r,
if ((p->object.flags & flags) == flags)
continue;
if (repo_parse_commit(r, p)) {
- clear_nonstale_queue(&queue);
+ clear_prio_queue(&queue.pq);
commit_list_free(*result);
*result = NULL;
/*
@@ -180,12 +195,15 @@ static int paint_down_to_common(struct repository *r,
return error(_("could not parse commit %s"),
oid_to_hex(&p->object.oid));
}
- p->object.flags |= flags;
- nonstale_queue_put_dedup(&queue, p);
+ paint_queue_put(&queue, p, flags);
}
+
+ if (queue.p1_count + queue.p2_count +
+ queue.pending_merge_bases == 0)
+ break;
}
- clear_nonstale_queue(&queue);
+ clear_prio_queue(&queue.pq);
commit_list_sort_by_date(result);
return 0;
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH/RFC 3/6] commit-reach: terminate merge-base walk when one paint side is exhausted
2026-06-20 10:36 [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted Kristofer Karlsson via GitGitGadget
2026-06-20 10:36 ` [PATCH/RFC 1/6] commit-reach: decouple ahead_behind from nonstale_queue Kristofer Karlsson via GitGitGadget
2026-06-20 10:36 ` [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters Kristofer Karlsson via GitGitGadget
@ 2026-06-20 10:36 ` Kristofer Karlsson via GitGitGadget
2026-06-22 18:12 ` Derrick Stolee
2026-06-20 10:36 ` [PATCH/RFC 4/6] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget
` (4 subsequent siblings)
7 siblings, 1 reply; 51+ messages in thread
From: Kristofer Karlsson via GitGitGadget @ 2026-06-20 10:36 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson,
Kristofer Karlsson
From: Kristofer Karlsson <krka@spotify.com>
Add an early termination check to paint_down_to_common() using the
per-side counters introduced in the previous commit. Once the walk
enters the finite-generation region, terminate early when one side's
exclusive count drops to zero -- no new merge-base can form without
both paint sides meeting.
The check also waits for pending_merge_bases to reach zero, ensuring
all merge-base candidates have been popped and recorded before
exiting.
The INFINITY gate ensures correctness: commits without a commit-graph
entry have GENERATION_NUMBER_INFINITY and are ordered by commit date,
which is not topologically reliable. The optimization only fires
once the walk enters the finite-generation region where ordering
guarantees hold.
On large repositories with commit-graph, this yields 100-1000x
speedups for merge-base queries where one side (e.g. a PR branch) is
much smaller than the other.
Helped-by: Derrick Stolee <stolee@gmail.com>
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
commit-reach.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/commit-reach.c b/commit-reach.c
index ba1e896f0f..fcd1ad0167 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -201,6 +201,19 @@ static int paint_down_to_common(struct repository *r,
if (queue.p1_count + queue.p2_count +
queue.pending_merge_bases == 0)
break;
+
+ /*
+ * Side exhaustion: a new merge-base can only form
+ * when both PARENT1-only and PARENT2-only commits
+ * remain in the queue. In the finite-generation
+ * region the queue is ordered topologically, so
+ * no future step can add paint to visited commits
+ * and an exhausted side cannot reappear.
+ */
+ if (generation < GENERATION_NUMBER_INFINITY &&
+ queue.pending_merge_bases == 0 &&
+ (queue.p1_count == 0 || queue.p2_count == 0))
+ break;
}
clear_prio_queue(&queue.pq);
--
gitgitgadget
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH/RFC 4/6] t6600: add test cases for side-exhaustion edge cases
2026-06-20 10:36 [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted Kristofer Karlsson via GitGitGadget
` (2 preceding siblings ...)
2026-06-20 10:36 ` [PATCH/RFC 3/6] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget
@ 2026-06-20 10:36 ` Elijah Newren via GitGitGadget
2026-06-22 18:15 ` Derrick Stolee
2026-06-20 10:36 ` [PATCH/RFC 5/6] t6099, t6600: add side-exhaustion regression tests Kristofer Karlsson via GitGitGadget
` (3 subsequent siblings)
7 siblings, 1 reply; 51+ messages in thread
From: Elijah Newren via GitGitGadget @ 2026-06-20 10:36 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Elijah Newren
From: Elijah Newren <newren@gmail.com>
Add test cases to t6600-test-reach.sh that exercise edge cases in the
side-exhaustion optimization for paint_down_to_common():
- in_merge_bases_many:self: commit is both A and one of the X inputs
- get_merge_bases_many:duplicate-twos: duplicate entries in X list
- get_merge_bases_many:pending-stale: STALE transition on an
already-painted commit (ps-* diamond topology)
- get_merge_bases_many:infinity-both-sides: both tips outside the
commit-graph with non-monotonic dates (pi-* topology)
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
t/t6600-test-reach.sh | 111 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 111 insertions(+)
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index b5b314e570..775c077c87 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -49,6 +49,62 @@ test_expect_success 'setup' '
git tag -a -m "$x-$i" tag-$x-$i commit-$x-$i || return 1
done
done &&
+
+ # Build a small side topology to exercise the (PARENT1|PARENT2) ->
+ # (PARENT1|PARENT2|STALE) transition in paint_down_to_common(); the
+ # 10x10 grid above does not exercise it because no merge-base candidate
+ # there is a descendant of another, so STALE never reaches a
+ # still-pending candidate.
+ #
+ # ps-X
+ # /|\
+ # / | \
+ # ps-Z ps-B ps-W
+ # | / \ |
+ # | / \ |
+ # |/ \|
+ # ps-T1 ps-T2
+ #
+ # where ps-T1=merge(ps-Z,ps-B), ps-T2=merge(ps-W,ps-B), so
+ # merge-base(ps-T1,ps-T2) = ps-B. During the walk, ps-X transitions
+ # to (PARENT1|PARENT2) via ps-Z and ps-W before ps-B is dequeued;
+ # then the STALE-walk from ps-B transitions ps-X to
+ # (PARENT1|PARENT2|STALE).
+ git checkout --orphan ps-orphan &&
+ test_commit ps-X &&
+ git checkout -b ps-B-br ps-X && test_commit ps-B &&
+ git checkout -b ps-Z-br ps-X && test_commit ps-Z &&
+ git checkout -b ps-W-br ps-X && test_commit ps-W &&
+ git checkout -b ps-T1 ps-Z &&
+ git merge --no-ff -m ps-T1 ps-B &&
+ git checkout -b ps-T2 ps-W &&
+ git merge --no-ff -m ps-T2 ps-B &&
+
+ # Build a side topology that lives entirely outside the half
+ # commit-graph and has non-monotonic commit dates, to exercise the
+ # INFINITY-gate in paint_down_to_common. With both tips outside
+ # the graph, generation is INFINITY and the queue falls back to
+ # commit-date order, which here is non-monotonic.
+ #
+ # pi-X (date 500, PARENT1 tip) --> pi-P, pi-D
+ # pi-D (date 480) --> pi-C
+ # pi-C (date 200) --> pi-B
+ # pi-B (date 100, PARENT2 tip) --> pi-P
+ # pi-P (date 450, root)
+ #
+ # merge-base(pi-X, pi-B) = pi-B (it is an ancestor of pi-X and is
+ # itself one of the queried tips).
+ git checkout --orphan pi-orphan &&
+ test_commit --date "@450 +0000" pi-P &&
+ test_commit --date "@100 +0000" pi-B &&
+ test_commit --date "@200 +0000" pi-C &&
+ test_commit --date "@480 +0000" pi-D &&
+ GIT_AUTHOR_DATE="@500 +0000" GIT_COMMITTER_DATE="@500 +0000" \
+ git commit-tree -p pi-D -p pi-P -m pi-X pi-D^{tree} >pi-X-oid &&
+ pi_x="$(cat pi-X-oid)" &&
+ git branch -f pi-X-br "$pi_x" &&
+ git tag pi-X "$pi_x" &&
+
git commit-graph write --reachable &&
mv .git/objects/info/commit-graph commit-graph-full &&
chmod u+w commit-graph-full &&
@@ -146,6 +202,16 @@ test_expect_success 'in_merge_bases_many:miss-heuristic' '
test_all_modes in_merge_bases_many
'
+test_expect_success 'in_merge_bases_many:self' '
+ cat >input <<-\EOF &&
+ A:commit-6-8
+ X:commit-5-9
+ X:commit-6-8
+ EOF
+ echo "in_merge_bases_many(A,X):1" >expect &&
+ test_all_modes in_merge_bases_many
+'
+
test_expect_success 'is_descendant_of:hit' '
cat >input <<-\EOF &&
A:commit-5-7
@@ -183,6 +249,51 @@ test_expect_success 'get_merge_bases_many' '
test_all_modes get_merge_bases_many
'
+test_expect_success 'get_merge_bases_many:duplicate-twos' '
+ cat >input <<-\EOF &&
+ A:commit-5-7
+ X:commit-4-8
+ X:commit-4-8
+ X:commit-6-6
+ X:commit-6-6
+ X:commit-8-3
+ EOF
+ {
+ echo "get_merge_bases_many(A,X):" &&
+ git rev-parse commit-5-6 \
+ commit-4-7 | sort
+ } >expect &&
+ test_all_modes get_merge_bases_many
+'
+
+test_expect_success 'get_merge_bases_many:pending-stale' '
+ # Exercises the (PARENT1|PARENT2) -> (...|STALE) transition path in
+ # paint_down_to_common(). See the topology comment in the setup test.
+ cat >input <<-\EOF &&
+ A:ps-T1
+ X:ps-T2
+ EOF
+ {
+ echo "get_merge_bases_many(A,X):" &&
+ git rev-parse ps-B
+ } >expect &&
+ test_all_modes get_merge_bases_many
+'
+
+test_expect_success 'get_merge_bases_many:infinity-both-sides' '
+ # Exercises the push-time INFINITY-gate in paint_down_to_common(). See
+ # the pi-* topology comment in the setup test.
+ cat >input <<-\EOF &&
+ A:pi-X
+ X:pi-B
+ EOF
+ {
+ echo "get_merge_bases_many(A,X):" &&
+ git rev-parse pi-B
+ } >expect &&
+ test_all_modes get_merge_bases_many
+'
+
test_expect_success 'reduce_heads' '
cat >input <<-\EOF &&
X:commit-1-10
--
gitgitgadget
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH/RFC 5/6] t6099, t6600: add side-exhaustion regression tests
2026-06-20 10:36 [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted Kristofer Karlsson via GitGitGadget
` (3 preceding siblings ...)
2026-06-20 10:36 ` [PATCH/RFC 4/6] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget
@ 2026-06-20 10:36 ` Kristofer Karlsson via GitGitGadget
2026-06-22 18:16 ` Derrick Stolee
2026-06-20 10:36 ` [PATCH/RFC 6/6] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget
` (2 subsequent siblings)
7 siblings, 1 reply; 51+ messages in thread
From: Kristofer Karlsson via GitGitGadget @ 2026-06-20 10:36 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson,
Kristofer Karlsson
From: Kristofer Karlsson <krka@spotify.com>
Add t6099 to test the case where multiple merge-base candidates exist
and one is an ancestor of another. This exercises the side-exhaustion
optimization in paint_down_to_common together with the
remove_redundant safety net in get_merge_bases_many_0.
Add a mixed finite/INFINITY test to t6600 where one tip is outside
the commit-graph (INFINITY generation) and the other is inside.
This exercises the region transition: the walk starts in the
INFINITY region where side-exhaustion is disabled, then crosses
into the finite region where it can fire.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
t/meson.build | 1 +
t/t6099-merge-base-side-exhaustion.sh | 82 +++++++++++++++++++++++++++
t/t6600-test-reach.sh | 25 ++++++++
3 files changed, 108 insertions(+)
create mode 100755 t/t6099-merge-base-side-exhaustion.sh
diff --git a/t/meson.build b/t/meson.build
index 3219264fe7..ee6ebdffb9 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -786,6 +786,7 @@ integration_tests = [
't6041-bisect-submodule.sh',
't6050-replace.sh',
't6060-merge-index.sh',
+ 't6099-merge-base-side-exhaustion.sh',
't6100-rev-list-in-order.sh',
't6101-rev-parse-parents.sh',
't6102-rev-list-unexpected-objects.sh',
diff --git a/t/t6099-merge-base-side-exhaustion.sh b/t/t6099-merge-base-side-exhaustion.sh
new file mode 100755
index 0000000000..bae3ea7f83
--- /dev/null
+++ b/t/t6099-merge-base-side-exhaustion.sh
@@ -0,0 +1,82 @@
+#!/bin/sh
+
+test_description='merge-base with ancestor among merge-base candidates
+
+Test that merge-base --all correctly handles cases where
+multiple merge-base candidates exist and one is an ancestor
+of another. The side-exhaustion optimization in
+paint_down_to_common may exit before STALE propagation
+removes the ancestor, but remove_redundant catches it.
+
+Graph shape (parents are below children):
+
+ A ----------- X
+ |\ /|
+ | B---------/ |
+ | | |
+ e2 \ f2
+ | | |
+ e1 d1 f1
+ \ | /
+ \ | /
+ \| /
+ C
+
+A and X are the two tips.
+B and C are both reachable from A and X.
+B reaches C through d1.
+Only B should appear in merge-base --all output.
+'
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+test_expect_success 'setup ancestor merge-base candidate' '
+ test_commit C &&
+
+ git checkout -b d-chain HEAD &&
+ test_commit d1 &&
+ test_commit B &&
+
+ git checkout -b e-path C &&
+ test_commit e1 &&
+ test_commit e2 &&
+
+ git checkout -b f-path C &&
+ test_commit f1 &&
+ test_commit f2 &&
+
+ git checkout -b branch-A e-path &&
+ test_merge A B &&
+
+ git checkout -b branch-X f-path &&
+ test_merge X B &&
+
+ git commit-graph write --reachable
+'
+
+test_expect_success 'merge-base --all excludes ancestor candidate' '
+ git rev-parse B >expected &&
+ git merge-base --all A X >actual &&
+ test_cmp expected actual
+'
+
+test_expect_success 'merge-base (single) finds shallowest' '
+ git rev-parse B >expected &&
+ git merge-base A X >actual &&
+ test_cmp expected actual
+'
+
+# Without commit-graph: generation numbers are INFINITY,
+# side-exhaustion optimization does not fire.
+test_expect_success 'merge-base --all without commit-graph' '
+ rm -f .git/objects/info/commit-graph &&
+ git rev-parse B >expected &&
+ git merge-base --all A X >actual &&
+ test_cmp expected actual
+'
+
+test_done
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index 775c077c87..f5560b0c1c 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -294,6 +294,31 @@ test_expect_success 'get_merge_bases_many:infinity-both-sides' '
test_all_modes get_merge_bases_many
'
+test_expect_success 'setup mixed finite/INFINITY topology' '
+ # Create a commit outside all saved commit-graph files so it always
+ # has INFINITY generation, while its parent (ps-X) is in the graph
+ # with a finite generation. Use the ps-* orphan topology so we do
+ # not pollute the grid-based rev-list tests.
+ git checkout ps-X &&
+ test_env GIT_TEST_COMMIT_GRAPH= test_commit pm-INF
+'
+
+test_expect_success 'get_merge_bases_many:mixed-finite-infinity' '
+ # One tip (pm-INF) is outside the commit-graph with INFINITY
+ # generation; the other (ps-B) is in the graph with finite
+ # generation. The walk starts in the INFINITY region and crosses
+ # into the finite region where side-exhaustion can fire.
+ cat >input <<-\EOF &&
+ A:pm-INF
+ X:ps-B
+ EOF
+ {
+ echo "get_merge_bases_many(A,X):" &&
+ git rev-parse ps-X
+ } >expect &&
+ test_all_modes get_merge_bases_many
+'
+
test_expect_success 'reduce_heads' '
cat >input <<-\EOF &&
X:commit-1-10
--
gitgitgadget
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH/RFC 6/6] Documentation/technical: add paint-down-to-common doc
2026-06-20 10:36 [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted Kristofer Karlsson via GitGitGadget
` (4 preceding siblings ...)
2026-06-20 10:36 ` [PATCH/RFC 5/6] t6099, t6600: add side-exhaustion regression tests Kristofer Karlsson via GitGitGadget
@ 2026-06-20 10:36 ` Kristofer Karlsson via GitGitGadget
2026-06-22 18:21 ` Derrick Stolee
2026-06-22 18:22 ` [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted Derrick Stolee
2026-06-24 12:14 ` [PATCH v2 0/7] " Kristofer Karlsson via GitGitGadget
7 siblings, 1 reply; 51+ messages in thread
From: Kristofer Karlsson via GitGitGadget @ 2026-06-20 10:36 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson,
Kristofer Karlsson
From: Kristofer Karlsson <krka@spotify.com>
Add a technical document describing the paint_down_to_common()
algorithm used for merge-base computation.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
Documentation/Makefile | 1 +
Documentation/technical/meson.build | 1 +
.../technical/paint-down-to-common.adoc | 130 ++++++++++++++++++
3 files changed, 132 insertions(+)
create mode 100644 Documentation/technical/paint-down-to-common.adoc
diff --git a/Documentation/Makefile b/Documentation/Makefile
index 2699f0b24a..f8dea4b395 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -129,6 +129,7 @@ TECH_DOCS += technical/long-running-process-protocol
TECH_DOCS += technical/multi-pack-index
TECH_DOCS += technical/packfile-uri
TECH_DOCS += technical/pack-heuristics
+TECH_DOCS += technical/paint-down-to-common
TECH_DOCS += technical/parallel-checkout
TECH_DOCS += technical/partial-clone
TECH_DOCS += technical/platform-support
diff --git a/Documentation/technical/meson.build b/Documentation/technical/meson.build
index ec07088c57..9ce11d5e48 100644
--- a/Documentation/technical/meson.build
+++ b/Documentation/technical/meson.build
@@ -18,6 +18,7 @@ articles = [
'multi-pack-index.adoc',
'packfile-uri.adoc',
'pack-heuristics.adoc',
+ 'paint-down-to-common.adoc',
'parallel-checkout.adoc',
'partial-clone.adoc',
'platform-support.adoc',
diff --git a/Documentation/technical/paint-down-to-common.adoc b/Documentation/technical/paint-down-to-common.adoc
new file mode 100644
index 0000000000..e677cce84d
--- /dev/null
+++ b/Documentation/technical/paint-down-to-common.adoc
@@ -0,0 +1,130 @@
+Merge-Base Computation and paint_down_to_common()
+==================================================
+
+The function `paint_down_to_common()` in `commit-reach.c` computes merge
+bases by walking the commit graph backwards from two sets of tips and
+finding where their ancestry meets.
+
+Use cases
+---------
+
+Computing merge bases is used in two different ways:
+
+ 1. *Finding all merge bases* (`merge-base --all`, `merge-tree`,
+ `merge`, `rebase`). A merge base is a common ancestor that is
+ not itself an ancestor of another common ancestor.
+
+ 2. *Ancestry checks* (`in_merge_bases`, used by `merge-base
+ --is-ancestor`, `branch -d`, `fetch`). These ask: "is commit A
+ an ancestor of commit B?" If a common ancestor equals one of the
+ inputs, that input is necessarily the only merge base -- no other
+ common ancestor can be both as recent and not an ancestor of it.
+
+Both use cases share the same algorithm and implementation.
+
+Algorithm
+---------
+
+Given a commit `one` and a set of commits `twos[]`, the walk paints
+commits with two colors:
+
+ - PARENT1: reachable from `one`
+ - PARENT2: reachable from any commit in `twos[]`
+
+The walk uses a priority queue ordered by generation number (falling
+back to commit date when generation numbers are unavailable). Each
+step dequeues the highest-priority commit (this is when we say a
+commit is "visited") and propagates its paint flags to its parents,
+enqueuing them if they gained new flags. When a commit receives
+both PARENT1 and PARENT2, it is a merge-base candidate. A candidate
+gains the STALE flag so its ancestors propagate staleness -- any
+deeper common ancestor is necessarily redundant.
+
+INFINITY and finite generation regions
+--------------------------------------
+
+The commit-graph stores a generation number for each commit. Commits
+not in the commit-graph have generation `GENERATION_NUMBER_INFINITY`. The
+graph is closed under reachability: if a commit is in the graph, all
+its ancestors are too. This partitions the commit graph into two regions:
+
+....
+ +---------------------------------------+
+ | INFINITY region |
+ | generation = INFINITY |
+ | queue order: heuristic (commit date) |
+ +---------------------------------------+
+ |
+ v
+ +---------------------------------------+
+ | Finite region |
+ | generation = finite |
+ | queue order: topological |
+ +---------------------------------------+
+....
+
+When the commit-graph is enabled, the INFINITY region is typically
+very small -- it only contains commits added since the last
+commit-graph refresh.
+
+All reachable INFINITY-generation commits are visited before any
+finite-generation commit, because INFINITY is larger than any finite
+value. Once the walk crosses into the finite region, it stays there.
+
+In the finite region, generation ordering guarantees topological
+traversal: children are always visited before their parents. This
+means that paint on already-visited commits is final -- no future
+traversal step can add paint to them.
+
+In the INFINITY region, commit-date ordering can violate this: a
+parent with a later date can be visited before a child with an earlier
+date. Paint flags are therefore NOT final at visit time, and a
+commit visited with only one side's paint may later gain the other.
+
+Paint flags are only added, never removed. Since each flag can be set
+at most once per commit, the number of times a commit can be
+re-enqueued is bounded by the number of flag transitions.
+
+Termination
+-----------
+
+Termination happens when we can prove that no extra progress is
+possible. We are done with the main loop when one of the following
+conditions holds:
+
+ 1. The queue is empty.
+ 2. The queue only contains STALE entries.
+ 3. Side-exhaustion: the walk has reached the finite region and one
+ of the sides is fully exhausted.
+
+The loop waits for all pending merge-base candidates to be popped
+and recorded before any early exit fires, so no separate drain phase
+is needed after termination.
+
+Stale entry condition
+~~~~~~~~~~~~~~~~~~~~~
+If all entries are stale we cannot find any new merge bases since
+that requires at least one enqueued side node meeting the other side.
+However, we could still invalidate merge bases (if there are more
+than one). This is unnecessary since `remove_redundant()` will clean
+that up as a post-process step.
+
+Side-exhaustion
+~~~~~~~~~~~~~~~
+A commit is *exclusive* to one side if it carries that side's paint
+but not the other (e.g. PARENT1 without PARENT2).
+
+If we have reached the finite region of the graph, no future
+traversal step can add paint to an already-visited commit. Thus if
+there are no exclusive PARENT2 commits in the queue, no additional
+PARENT2 paint can be introduced into the walk. Even if exclusive
+PARENT1 commits remain, no new merge-base candidates can be
+discovered. The same holds symmetrically for PARENT1.
+
+This invariant is only valid in the finite region of the graph.
+
+Related documentation
+---------------------
+
+ - `Documentation/technical/commit-graph.adoc` -- generation numbers
+ and the reachability closure property.
--
gitgitgadget
^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: [PATCH/RFC 1/6] commit-reach: decouple ahead_behind from nonstale_queue
2026-06-20 10:36 ` [PATCH/RFC 1/6] commit-reach: decouple ahead_behind from nonstale_queue Kristofer Karlsson via GitGitGadget
@ 2026-06-22 18:00 ` Derrick Stolee
2026-06-22 18:53 ` Kristofer Karlsson
0 siblings, 1 reply; 51+ messages in thread
From: Derrick Stolee @ 2026-06-22 18:00 UTC (permalink / raw)
To: Kristofer Karlsson via GitGitGadget, git
Cc: Elijah Newren, Kristofer Karlsson
On 6/20/2026 6:36 AM, Kristofer Karlsson via GitGitGadget wrote:
> From: Kristofer Karlsson <krka@spotify.com>
>
> Move ahead_behind() off the shared nonstale_queue abstraction to use
> a plain prio_queue with a local max_nonstale pointer. The nonstale
> tracking is inlined into insert_no_dup().
>
> This prepares for replacing nonstale_queue with a paint_queue struct
> that tracks per-side commit counts, which ahead_behind() does not
> need. No behavior change.
This change is only needed if we are intending to delete the nonstale
queue struct, which is currently happening in your patch 2. But we
are essentially recreating its logic in a more disjointed way here,
leaving this code in a worse state.
I'd rather see patch 2 create a _new_ data structure instead of
_replacing_ one that already works for multiple callers. (It does
drop to only one caller, but that seems cleaner to me right now.)
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters
2026-06-20 10:36 ` [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters Kristofer Karlsson via GitGitGadget
@ 2026-06-22 18:10 ` Derrick Stolee
2026-06-22 19:14 ` Kristofer Karlsson
0 siblings, 1 reply; 51+ messages in thread
From: Derrick Stolee @ 2026-06-22 18:10 UTC (permalink / raw)
To: Kristofer Karlsson via GitGitGadget, git
Cc: Elijah Newren, Kristofer Karlsson
On 6/20/2026 6:36 AM, Kristofer Karlsson via GitGitGadget wrote:
> From: Kristofer Karlsson <krka@spotify.com>
> + if (!(old_paint & STALE)) {
> + switch (old_paint & (PARENT1 | PARENT2)) {
> + case 0: break;
> + case PARENT1: queue->p1_count--; break;
> + case PARENT2: queue->p2_count--; break;
> + case PARENT1 | PARENT2: queue->pending_merge_bases--; break;
> + default: BUG("unexpected paint state");
> + }
> + }
> + if (!(new_paint & STALE)) {
> + switch (new_paint & (PARENT1 | PARENT2)) {
> + case 0: break;
> + case PARENT1: queue->p1_count++; break;
> + case PARENT2: queue->p2_count++; break;
> + case PARENT1 | PARENT2: queue->pending_merge_bases++; break;
> + default: BUG("unexpected paint state");
> + }
> + }
While correct and compact, I don't believe that these switch
statements follow the coding guidelines. We should split the
lines appropriately so they are more standard, such as:
if (!(new_paint & STALE)) {
switch (new_paint & (PARENT1 | PARENT2)) {
case 0:
break;
case PARENT1:
queue->p1_count++;
break;
case PARENT2:
queue->p2_count++;
break;
case PARENT1 | PARENT2:
queue->pending_merge_bases++;
break;
default:
BUG("unexpected paint state");
}
}
Also: technically "case 0" should be a BUG() state, right? We
shouldn't be walking any commit that isn't reachable from at
least one side. (case 0 does happen for old_paint, though.)
> }
>
> -static void clear_nonstale_queue(struct nonstale_queue *queue)
> +static void paint_queue_put(struct paint_queue *queue,
> + struct commit *c, unsigned add_flags)
> {
> - clear_prio_queue(&queue->pq);
> - queue->max_nonstale = NULL;
> -}
> + unsigned old_flags = c->object.flags;
> + c->object.flags |= add_flags;
Diffs like this are part of the reason I'd like to see a _new_
data structure instead of replacing the old one. Keeping the
old one for ahead_behind seems like a good idea to me, but even
if we don't land on that end state then deleting the old code
_after_ adding the new code will make the diff more readable.
> - struct nonstale_queue queue = {
> - { compare_commits_by_gen_then_commit_date }
> + struct paint_queue queue = {
> + .pq = { compare_commits_by_gen_then_commit_date }
> };
I didn't notice when reading the struct definition, but looking at
'pq' here makes me think that we shouldn't be using that abbreviation
as it could stand for "prio_queue" or "paint_queue".
> + while ((commit = paint_queue_get(&queue))) {
...> +
> + if (queue.p1_count + queue.p2_count +
> + queue.pending_merge_bases == 0)
> + break;
> }
When possible, I like to try to make loops only have one terminating
condition. Should we have paint_queue_get() return NULL when it sees
this internal state condition?
Also, I'd rather see it of the form of (!count) instead of using
addition to make it clear that we care about each value being zero.
Finally, I think we actually want this case to get the benefit:
if ((!queue.p1_count || !queue.p2_count) &&
!queue.pending_merge_bases)
I do see that you have this condition in patch 3 with the extra
detail that the max generation in the queue is finite. I think this
is more reason to include this in the data structure method and not
in the loop.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH/RFC 3/6] commit-reach: terminate merge-base walk when one paint side is exhausted
2026-06-20 10:36 ` [PATCH/RFC 3/6] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget
@ 2026-06-22 18:12 ` Derrick Stolee
2026-06-22 19:19 ` Kristofer Karlsson
0 siblings, 1 reply; 51+ messages in thread
From: Derrick Stolee @ 2026-06-22 18:12 UTC (permalink / raw)
To: Kristofer Karlsson via GitGitGadget, git
Cc: Elijah Newren, Kristofer Karlsson
On 6/20/2026 6:36 AM, Kristofer Karlsson via GitGitGadget wrote:
> From: Kristofer Karlsson <krka@spotify.com>
>
> Add an early termination check to paint_down_to_common() using the
> per-side counters introduced in the previous commit. Once the walk
> enters the finite-generation region, terminate early when one side's
> exclusive count drops to zero -- no new merge-base can form without
> both paint sides meeting.
>
> The check also waits for pending_merge_bases to reach zero, ensuring
> all merge-base candidates have been popped and recorded before
> exiting.
>
> The INFINITY gate ensures correctness: commits without a commit-graph
> entry have GENERATION_NUMBER_INFINITY and are ordered by commit date,
> which is not topologically reliable. The optimization only fires
> once the walk enters the finite-generation region where ordering
> guarantees hold.
>
> On large repositories with commit-graph, this yields 100-1000x
> speedups for merge-base queries where one side (e.g. a PR branch) is
> much smaller than the other.
>
> Helped-by: Derrick Stolee <stolee@gmail.com>
> Helped-by: Elijah Newren <newren@gmail.com>
> Signed-off-by: Kristofer Karlsson <krka@spotify.com>
> ---
> commit-reach.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/commit-reach.c b/commit-reach.c
> index ba1e896f0f..fcd1ad0167 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -201,6 +201,19 @@ static int paint_down_to_common(struct repository *r,
> if (queue.p1_count + queue.p2_count +
> queue.pending_merge_bases == 0)
> break;
> +
> + /*
> + * Side exhaustion: a new merge-base can only form
> + * when both PARENT1-only and PARENT2-only commits
> + * remain in the queue. In the finite-generation
> + * region the queue is ordered topologically, so
> + * no future step can add paint to visited commits
> + * and an exhausted side cannot reappear.
> + */
> + if (generation < GENERATION_NUMBER_INFINITY &&
> + queue.pending_merge_bases == 0 &&
> + (queue.p1_count == 0 || queue.p2_count == 0))
> + break;
I mentioned it earlier, but I think this check should be in the
dequeueing method instead of in the tail of the loop.
But I think this is the correct ending case.
I like that you broke this out into its own patch to demonstrate
that this is the key performance boost. It may be good to have
some performance test numbers that demonstrate that patch 2 does
not add any substantial overhead (timing should match previous
code) and in patch 3 this single condition gets us a huge benefit,
though it requires the data tracking of patch 2 to work.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH/RFC 4/6] t6600: add test cases for side-exhaustion edge cases
2026-06-20 10:36 ` [PATCH/RFC 4/6] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget
@ 2026-06-22 18:15 ` Derrick Stolee
2026-06-22 19:25 ` Kristofer Karlsson
0 siblings, 1 reply; 51+ messages in thread
From: Derrick Stolee @ 2026-06-22 18:15 UTC (permalink / raw)
To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren, Kristofer Karlsson
On 6/20/2026 6:36 AM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
>
> Add test cases to t6600-test-reach.sh that exercise edge cases in the
> side-exhaustion optimization for paint_down_to_common():
>
> - in_merge_bases_many:self: commit is both A and one of the X inputs
> - get_merge_bases_many:duplicate-twos: duplicate entries in X list
> - get_merge_bases_many:pending-stale: STALE transition on an
> already-painted commit (ps-* diamond topology)
> - get_merge_bases_many:infinity-both-sides: both tips outside the
> commit-graph with non-monotonic dates (pi-* topology)
It's usually my preference to see these tests show up before the
new code arrives, that way we can see that they already work with
the old logic and continue to work with the new logic.
It's minor, but putting them after your code change may be adding
enforcement of a change of behavior.
One thing that could be helpful here is to consider tracing a
count of "commits walked" in the merge-base code, then you could
have these tests demonstrate the performance benefit by checking
for that number changing.
In t6600, that tracing number would not be the same across the
three different data shapes (full graph, half graph, no graph) and
that could be valuable to demonstrate in tests.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH/RFC 5/6] t6099, t6600: add side-exhaustion regression tests
2026-06-20 10:36 ` [PATCH/RFC 5/6] t6099, t6600: add side-exhaustion regression tests Kristofer Karlsson via GitGitGadget
@ 2026-06-22 18:16 ` Derrick Stolee
0 siblings, 0 replies; 51+ messages in thread
From: Derrick Stolee @ 2026-06-22 18:16 UTC (permalink / raw)
To: Kristofer Karlsson via GitGitGadget, git
Cc: Elijah Newren, Kristofer Karlsson
On 6/20/2026 6:36 AM, Kristofer Karlsson via GitGitGadget wrote:
> Add t6099 to test the case where multiple merge-base candidates exist
> and one is an ancestor of another. This exercises the side-exhaustion
> optimization in paint_down_to_common together with the
> remove_redundant safety net in get_merge_bases_many_0.
Same as the previous patch: I'd like to see these before the code
change. And if we trace a count of commits walked, we'd be able to
see the number change in this specific case.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH/RFC 6/6] Documentation/technical: add paint-down-to-common doc
2026-06-20 10:36 ` [PATCH/RFC 6/6] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget
@ 2026-06-22 18:21 ` Derrick Stolee
2026-06-22 19:30 ` Kristofer Karlsson
0 siblings, 1 reply; 51+ messages in thread
From: Derrick Stolee @ 2026-06-22 18:21 UTC (permalink / raw)
To: Kristofer Karlsson via GitGitGadget, git
Cc: Elijah Newren, Kristofer Karlsson
On 6/20/2026 6:36 AM, Kristofer Karlsson via GitGitGadget wrote:
> From: Kristofer Karlsson <krka@spotify.com>
>
> Add a technical document describing the paint_down_to_common()
> algorithm used for merge-base computation.
I like the idea of documenting this so it's easier to understand.
There is risk of drift from the actual implementation. You may want
to add a comment to the method in commit-reach.c to indicate that
any change should be reflected in this document.
> +Termination
> +-----------
> +
> +Termination happens when we can prove that no extra progress is
> +possible. We are done with the main loop when one of the following
> +conditions holds:
> +
> + 1. The queue is empty.
> + 2. The queue only contains STALE entries.
> + 3. Side-exhaustion: the walk has reached the finite region and one
> + of the sides is fully exhausted.
It could be an interesting exercise, but potentially wasteful, to
add this document as a Patch 1, but reflecting the old algorithm
and then to update the document at the same time as you update the
code.
The changes in your patch 2 would impact this doc in terms of the
data being tracked by the paint_queue data structure instead of the
nonstale_queue structure (though those details are not currently
handled in the current version). The change to the termination
condition would come along with patch 3.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted
2026-06-20 10:36 [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted Kristofer Karlsson via GitGitGadget
` (5 preceding siblings ...)
2026-06-20 10:36 ` [PATCH/RFC 6/6] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget
@ 2026-06-22 18:22 ` Derrick Stolee
2026-06-24 12:14 ` [PATCH v2 0/7] " Kristofer Karlsson via GitGitGadget
7 siblings, 0 replies; 51+ messages in thread
From: Derrick Stolee @ 2026-06-22 18:22 UTC (permalink / raw)
To: Kristofer Karlsson via GitGitGadget, git
Cc: Elijah Newren, Kristofer Karlsson
On 6/20/2026 6:36 AM, Kristofer Karlsson via GitGitGadget wrote:
> Hi,
>
> This follows up on my RFC [1] with a concrete proposal. I expect the design
> to still be scrutinized, but that may be easier with actual code to look at.
>
> I tried to make this easier to review by splitting into atomic patches. The
> first two patches are the meatiest parts, though they are pure refactoring.
> The behavior change is in patch 3 and is in itself quite small. The last
> patch adds technical documentation to support future development.
Thanks for putting this together carefully.
I gave some feedback on the specific code and the patch organization.
Overall, I believe that this implementation is functionally correct
and everything I have to say is about presentation and data gathering.
I look forward to a non-RFC v2.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH/RFC 1/6] commit-reach: decouple ahead_behind from nonstale_queue
2026-06-22 18:00 ` Derrick Stolee
@ 2026-06-22 18:53 ` Kristofer Karlsson
0 siblings, 0 replies; 51+ messages in thread
From: Kristofer Karlsson @ 2026-06-22 18:53 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
On Mon, 22 Jun 2026 at 20:00, Derrick Stolee <stolee@gmail.com> wrote:
>
> This change is only needed if we are intending to delete the nonstale
> queue struct, which is currently happening in your patch 2. But we
> are essentially recreating its logic in a more disjointed way here,
> leaving this code in a worse state.
>
> I'd rather see patch 2 create a _new_ data structure instead of
> _replacing_ one that already works for multiple callers. (It does
> drop to only one caller, but that seems cleaner to me right now.)
I can definitely do that and leave ahead_behind unchanged for v2.
I was thinking that with only a single caller, and ahead_behind
being simpler than paint_down in this respect, it would be
worthwhile to simplify it, but if so I could instead do that as
a standalone follow up (though it may prove to be not enough
value for the win).
Thanks,
Kristofer
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters
2026-06-22 18:10 ` Derrick Stolee
@ 2026-06-22 19:14 ` Kristofer Karlsson
2026-06-22 20:23 ` Derrick Stolee
0 siblings, 1 reply; 51+ messages in thread
From: Kristofer Karlsson @ 2026-06-22 19:14 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
On Mon, 22 Jun 2026 at 20:10, Derrick Stolee <stolee@gmail.com> wrote:
>
> On 6/20/2026 6:36 AM, Kristofer Karlsson via GitGitGadget wrote:
> > From: Kristofer Karlsson <krka@spotify.com>
>
> > + if (!(old_paint & STALE)) {
> > + switch (old_paint & (PARENT1 | PARENT2)) {
> > + case 0: break;
> > + case PARENT1: queue->p1_count--; break;
> > + case PARENT2: queue->p2_count--; break;
> > + case PARENT1 | PARENT2: queue->pending_merge_bases--; break;
> > + default: BUG("unexpected paint state");
> > + }
> > + }
> > + if (!(new_paint & STALE)) {
> > + switch (new_paint & (PARENT1 | PARENT2)) {
> > + case 0: break;
> > + case PARENT1: queue->p1_count++; break;
> > + case PARENT2: queue->p2_count++; break;
> > + case PARENT1 | PARENT2: queue->pending_merge_bases++; break;
> > + default: BUG("unexpected paint state");
> > + }
> > + }
>
> While correct and compact, I don't believe that these switch
> statements follow the coding guidelines. We should split the
> lines appropriately so they are more standard, such as:
>
> if (!(new_paint & STALE)) {
> switch (new_paint & (PARENT1 | PARENT2)) {
> case 0:
> break;
>
> case PARENT1:
> queue->p1_count++;
> break;
>
> case PARENT2:
> queue->p2_count++;
> break;
>
> case PARENT1 | PARENT2:
> queue->pending_merge_bases++;
> break;
>
> default:
> BUG("unexpected paint state");
> }
> }
Agreed, I will change to that style. I did try to look for style guidelines
but I missed the .clang-format file (I was only looking through text files).
Apologies, will remember clang-format for next time (and v2)
> Also: technically "case 0" should be a BUG() state, right? We
> shouldn't be walking any commit that isn't reachable from at
> least one side. (case 0 does happen for old_paint, though.)
No, this is actually intended - initially I started with skipping
case 0 and let it fall through, but that would hide _other_ bugs.
I use 0 as a marker for "not in the queue" so we have this:
Enqueuing: 0 -> flags
Dequeueing: flags -> 0
Only the case with the modified commit being in the queue
will have non-zero flags. I tried to document this, but perhaps
it is not clear enough, I will see if I can rephrase it, or add an
inline comment around the case itself.
> > -static void clear_nonstale_queue(struct nonstale_queue *queue)
> > +static void paint_queue_put(struct paint_queue *queue,
> > + struct commit *c, unsigned add_flags)
> > {
> > - clear_prio_queue(&queue->pq);
> > - queue->max_nonstale = NULL;
> > -}
> > + unsigned old_flags = c->object.flags;
> > + c->object.flags |= add_flags;
>
> Diffs like this are part of the reason I'd like to see a _new_
> data structure instead of replacing the old one. Keeping the
> old one for ahead_behind seems like a good idea to me, but even
> if we don't land on that end state then deleting the old code
> _after_ adding the new code will make the diff more readable.
Agreed, will address that.
> > - struct nonstale_queue queue = {
> > - { compare_commits_by_gen_then_commit_date }
> > + struct paint_queue queue = {
> > + .pq = { compare_commits_by_gen_then_commit_date }
> > };
>
> I didn't notice when reading the struct definition, but looking at
> 'pq' here makes me think that we shouldn't be using that abbreviation
> as it could stand for "prio_queue" or "paint_queue".
Good point, I should pick a longer name for the field. Perhaps simply queue
(I want to avoid prio_queue since it exactly matches the name of the struct
which could be confusing.)
> > + while ((commit = paint_queue_get(&queue))) {
> ...> +
> > + if (queue.p1_count + queue.p2_count +
> > + queue.pending_merge_bases == 0)
> > + break;
> > }
> When possible, I like to try to make loops only have one terminating
> condition. Should we have paint_queue_get() return NULL when it sees
> this internal state condition?
Possibly, but that would couple the paint_queue struct very tightly with
the usage. Not a problem in practice since it only has one call site, and
it's unlikely that we want to add more of them but it may feel more natural
to let the paint_queue purely have the queue semantics and counters,
and keep the halt condition within the function itself. I don't feel
super-strongly about this and can change it if needed, I will just need to
verify that nothing else gets complex as a result, I have not fully thought
through the effects.
> Also, I'd rather see it of the form of (!count) instead of using
> addition to make it clear that we care about each value being zero.
I did consider that, and most of the code in commit-reach.c at least
prefers x and !x over x != 0 and x == 0, but my thinking was that
other code in the repo did use comparison operators specifically
for things like counters. Happy to change it to conform better though!
> Finally, I think we actually want this case to get the benefit:
>
> if ((!queue.p1_count || !queue.p2_count) &&
> !queue.pending_merge_bases)
>
> I do see that you have this condition in patch 3 with the extra
> detail that the max generation in the queue is finite. I think this
> is more reason to include this in the data structure method and not
> in the loop.
Yes, but just to be clear, you don't want to merge together patch 2 and 3
here, just grouping the halt conditions closer together
(within paint_queue_get)? Keeping patch 2 and 3 separate would be nice
to make it easier to show that introducing this extra counter bookkeeping
does not negatively impact the overall performance too much.
Thanks! I appreciate the thorough review of this patch
(which I feared was the most annoying one to look at).
Kristofer
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH/RFC 3/6] commit-reach: terminate merge-base walk when one paint side is exhausted
2026-06-22 18:12 ` Derrick Stolee
@ 2026-06-22 19:19 ` Kristofer Karlsson
2026-06-22 20:26 ` Derrick Stolee
0 siblings, 1 reply; 51+ messages in thread
From: Kristofer Karlsson @ 2026-06-22 19:19 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
On Mon, 22 Jun 2026 at 20:12, Derrick Stolee <stolee@gmail.com> wrote:
> > + if (generation < GENERATION_NUMBER_INFINITY &&
> > + queue.pending_merge_bases == 0 &&
> > + (queue.p1_count == 0 || queue.p2_count == 0))
> > + break;
> I mentioned it earlier, but I think this check should be in the
> dequeueing method instead of in the tail of the loop.
Yes, I will try to fold this one into the paint_queue_get as well.
> I like that you broke this out into its own patch to demonstrate
> that this is the key performance boost. It may be good to have
> some performance test numbers that demonstrate that patch 2 does
> not add any substantial overhead (timing should match previous
> code) and in patch 3 this single condition gets us a huge benefit,
> though it requires the data tracking of patch 2 to work.
Good point, I will try to run enough local tests to ensure that patch 2
does not add too much overhead to slow things down.
I think I may need to create some type of (temporary, internal)
test runner that runs the same walk multiple times to reduce
the noise from parsing commits. I am not sure if I should also
commit such a performance test or simply include a brief summary
in the commit message
Thanks,
Kristofer
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH/RFC 4/6] t6600: add test cases for side-exhaustion edge cases
2026-06-22 18:15 ` Derrick Stolee
@ 2026-06-22 19:25 ` Kristofer Karlsson
2026-06-22 20:28 ` Derrick Stolee
0 siblings, 1 reply; 51+ messages in thread
From: Kristofer Karlsson @ 2026-06-22 19:25 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Elijah Newren via GitGitGadget, git, Elijah Newren
On Mon, 22 Jun 2026 at 20:15, Derrick Stolee <stolee@gmail.com> wrote:
> It's usually my preference to see these tests show up before the
> new code arrives, that way we can see that they already work with
> the old logic and continue to work with the new logic.
>
> It's minor, but putting them after your code change may be adding
> enforcement of a change of behavior.
Agreed, I actually also prefer that in practice so I am not
sure why I ordered them this way - perhaps some attempt at
making it easier to review (show the idea and change before
the verification). I will reorder to put all new tests as the first commit
(or second, if I will also introduce a status-quo technical first).
>
> One thing that could be helpful here is to consider tracing a
> count of "commits walked" in the merge-base code, then you could
> have these tests demonstrate the performance benefit by checking
> for that number changing.
Good idea, I actually had some of that locally when developing it,
but I removed the ugly traces before submitting this. I will try to
re-introduce that in a nice way. It would be neat to let tests
inspect that side effect, though in the worst case that could make
it fragile. At the very least it's good for human debugging though.
> In t6600, that tracing number would not be the same across the
> three different data shapes (full graph, half graph, no graph) and
> that could be valuable to demonstrate in tests.
Agreed, the number of commits visited would be more interesting
than the relative performance numbers since it's an algorithmic
change rather than a micro-optimization.
Thanks,
Kristofer
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH/RFC 6/6] Documentation/technical: add paint-down-to-common doc
2026-06-22 18:21 ` Derrick Stolee
@ 2026-06-22 19:30 ` Kristofer Karlsson
0 siblings, 0 replies; 51+ messages in thread
From: Kristofer Karlsson @ 2026-06-22 19:30 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
On Mon, 22 Jun 2026 at 20:21, Derrick Stolee <stolee@gmail.com> wrote:
>
> I like the idea of documenting this so it's easier to understand.
Yes I was myself thinking that I can prove it to myself now that it works,
and anyone else could also prove it to themselves, but having it
explicit here is even better. I found the other documents
(i.e. commit-graph) to be a good source of inspiration here.
> There is risk of drift from the actual implementation. You may want
> to add a comment to the method in commit-reach.c to indicate that
> any change should be reflected in this document.
Good idea, will add that.
> > +Termination
> > +-----------
> > +
> > +Termination happens when we can prove that no extra progress is
> > +possible. We are done with the main loop when one of the following
> > +conditions holds:
> > +
> > + 1. The queue is empty.
> > + 2. The queue only contains STALE entries.
> > + 3. Side-exhaustion: the walk has reached the finite region and one
> > + of the sides is fully exhausted.
> It could be an interesting exercise, but potentially wasteful, to
> add this document as a Patch 1, but reflecting the old algorithm
> and then to update the document at the same time as you update the
> code.
I did consider that initially but I was worried it would be considered
noisy. I am quite happy to rework it in a way that first
explains the status quo. That would make the document diff
more interesting. Agreed that should become the first patch,
and the patch that changes the algorithm should include
the documentation change.
> The changes in your patch 2 would impact this doc in terms of the
> data being tracked by the paint_queue data structure instead of the
> nonstale_queue structure (though those details are not currently
> handled in the current version). The change to the termination
> condition would come along with patch 3.
Agreed, I would need to rephrase from tracking non-stale
to tracking counts of p1 and p2 (and pending merge bases) commits,
but I think that would be a small tweak and well worth doing.
Thanks,
Kristofer
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters
2026-06-22 19:14 ` Kristofer Karlsson
@ 2026-06-22 20:23 ` Derrick Stolee
2026-06-23 10:13 ` Kristofer Karlsson
0 siblings, 1 reply; 51+ messages in thread
From: Derrick Stolee @ 2026-06-22 20:23 UTC (permalink / raw)
To: Kristofer Karlsson
Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
On 6/22/2026 3:14 PM, Kristofer Karlsson wrote:
> On Mon, 22 Jun 2026 at 20:10, Derrick Stolee <stolee@gmail.com> wrote:
>>
>> Also: technically "case 0" should be a BUG() state, right? We
>> shouldn't be walking any commit that isn't reachable from at
>> least one side. (case 0 does happen for old_paint, though.)
>
> No, this is actually intended - initially I started with skipping
> case 0 and let it fall through, but that would hide _other_ bugs.
> I use 0 as a marker for "not in the queue" so we have this:
> Enqueuing: 0 -> flags
> Dequeueing: flags -> 0
> Only the case with the modified commit being in the queue
> will have non-zero flags. I tried to document this, but perhaps
> it is not clear enough, I will see if I can rephrase it, or add an
> inline comment around the case itself.
I bet this would be obvious if I tried to change the code and
run the tests. thanks for the explanation.
>>> + while ((commit = paint_queue_get(&queue))) {
>> ...> +
>>> + if (queue.p1_count + queue.p2_count +
>>> + queue.pending_merge_bases == 0)
>>> + break;
>>> }
>> When possible, I like to try to make loops only have one terminating
>> condition. Should we have paint_queue_get() return NULL when it sees
>> this internal state condition?
>
> Possibly, but that would couple the paint_queue struct very tightly with
> the usage. Not a problem in practice since it only has one call site, and
> it's unlikely that we want to add more of them but it may feel more natural
> to let the paint_queue purely have the queue semantics and counters,
> and keep the halt condition within the function itself. I don't feel
> super-strongly about this and can change it if needed, I will just need to
> verify that nothing else gets complex as a result, I have not fully thought
> through the effects.
Hm. Interesting. The coupling is perhaps expected, because the data
structure tracks counts that don't otherwise need to be tracked.
Maybe the terminating condition method could be descriptively named
to say why it would be completing.
>> Also, I'd rather see it of the form of (!count) instead of using
>> addition to make it clear that we care about each value being zero.
>
> I did consider that, and most of the code in commit-reach.c at least
> prefers x and !x over x != 0 and x == 0, but my thinking was that
> other code in the repo did use comparison operators specifically
> for things like counters. Happy to change it to conform better though!
I just worry about the idea that a negative number (or an addition
overflow) would create conditions for termination that we did not
intend. That's why using the nonzero status as true/false combined
with ands and ors is better.
>> Finally, I think we actually want this case to get the benefit:
>>
>> if ((!queue.p1_count || !queue.p2_count) &&
>> !queue.pending_merge_bases)
>>
>> I do see that you have this condition in patch 3 with the extra
>> detail that the max generation in the queue is finite. I think this
>> is more reason to include this in the data structure method and not
>> in the loop.
>
> Yes, but just to be clear, you don't want to merge together patch 2 and 3
> here, just grouping the halt conditions closer together
> (within paint_queue_get)? Keeping patch 2 and 3 separate would be nice
> to make it easier to show that introducing this extra counter bookkeeping
> does not negatively impact the overall performance too much.
No, I don't want you to squash them. I was perhaps unclear as I was
discovering the structure as we went. The thing I was missing above
was the "finite generation number" condition, which you make very
clear in patch 3.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH/RFC 3/6] commit-reach: terminate merge-base walk when one paint side is exhausted
2026-06-22 19:19 ` Kristofer Karlsson
@ 2026-06-22 20:26 ` Derrick Stolee
2026-06-22 21:03 ` Kristofer Karlsson
0 siblings, 1 reply; 51+ messages in thread
From: Derrick Stolee @ 2026-06-22 20:26 UTC (permalink / raw)
To: Kristofer Karlsson
Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
On 6/22/2026 3:19 PM, Kristofer Karlsson wrote:
> I think I may need to create some type of (temporary, internal)
> test runner that runs the same walk multiple times to reduce
> the noise from parsing commits.
I've used hyperfine [1] when doing specific performance tests
in the past. You can build Git before and after and have hyperfine
run the two modes and compare them:
hyperfine --warmup=3 \
-n 'old' "~/git-old/bin-wrappers/git -C $repo merge-base $A $B" \
-n 'new' "~/git-new/bin-wrappers/git -C $repo merge-base $A $B"
[1] https://github.com/sharkdp/hyperfine
Good luck!
-Stolee
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH/RFC 4/6] t6600: add test cases for side-exhaustion edge cases
2026-06-22 19:25 ` Kristofer Karlsson
@ 2026-06-22 20:28 ` Derrick Stolee
0 siblings, 0 replies; 51+ messages in thread
From: Derrick Stolee @ 2026-06-22 20:28 UTC (permalink / raw)
To: Kristofer Karlsson; +Cc: Elijah Newren via GitGitGadget, git, Elijah Newren
On 6/22/2026 3:25 PM, Kristofer Karlsson wrote:
> On Mon, 22 Jun 2026 at 20:15, Derrick Stolee <stolee@gmail.com> wrote:
>> It's usually my preference to see these tests show up before the
>> new code arrives, that way we can see that they already work with
>> the old logic and continue to work with the new logic.
>>
>> It's minor, but putting them after your code change may be adding
>> enforcement of a change of behavior.
>
> Agreed, I actually also prefer that in practice so I am not
> sure why I ordered them this way - perhaps some attempt at
> making it easier to review (show the idea and change before
> the verification). I will reorder to put all new tests as the first commit
> (or second, if I will also introduce a status-quo technical first).
>
>>
>> One thing that could be helpful here is to consider tracing a
>> count of "commits walked" in the merge-base code, then you could
>> have these tests demonstrate the performance benefit by checking
>> for that number changing.
>
> Good idea, I actually had some of that locally when developing it,
> but I removed the ugly traces before submitting this. I will try to
> re-introduce that in a nice way. It would be neat to let tests
> inspect that side effect, though in the worst case that could make
> it fragile. At the very least it's good for human debugging though.
And to be clear, I'm suggesting using trace2_data_intmax() calls
to get structured data that can be parsed in the GIT_TRACE2_EVENT
logs during tests. It could also be picked up by teletry tools that
listen to trace2 output, if desired.
It will show up differently in GIT_TRACE2_PERF, but that's a nice
human-readable way to debug things.
>> In t6600, that tracing number would not be the same across the
>> three different data shapes (full graph, half graph, no graph) and
>> that could be valuable to demonstrate in tests.
>
> Agreed, the number of commits visited would be more interesting
> than the relative performance numbers since it's an algorithmic
> change rather than a micro-optimization.
They are both interesting, but only the commit count can be
guaranteed rigorously in the test suite. It's possible that a
great improvement to such a trace doesn't result in great end-to-
end time improvement, but I believe that it is true in this case.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH/RFC 3/6] commit-reach: terminate merge-base walk when one paint side is exhausted
2026-06-22 20:26 ` Derrick Stolee
@ 2026-06-22 21:03 ` Kristofer Karlsson
2026-06-23 13:40 ` Derrick Stolee
0 siblings, 1 reply; 51+ messages in thread
From: Kristofer Karlsson @ 2026-06-22 21:03 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
On Mon, 22 Jun 2026 at 22:26, Derrick Stolee <stolee@gmail.com> wrote:
>
> I've used hyperfine [1] when doing specific performance tests
> in the past. You can build Git before and after and have hyperfine
> run the two modes and compare them:
>
> hyperfine --warmup=3 \
> -n 'old' "~/git-old/bin-wrappers/git -C $repo merge-base $A $B" \
> -n 'new' "~/git-new/bin-wrappers/git -C $repo merge-base $A $B"
>
> [1] https://github.com/sharkdp/hyperfine
I can definitely use that, but I was thinking that the overhead
of operations such as repo_parse_commit would be high relative
to the overhead of the new paint_queue struct such that it would
be hard to properly measure and that it would be easier if I could
spread out that cost across multiple internal runs (which requires
a custom binary of some sort), but perhaps it's enough to just
show that there's no measurable regression here and then
hyperfine is indeed the right fit. I'll start with that and see if I need
to do anything more complex.
Thanks,
Kristofer
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters
2026-06-22 20:23 ` Derrick Stolee
@ 2026-06-23 10:13 ` Kristofer Karlsson
2026-06-23 13:50 ` Derrick Stolee
0 siblings, 1 reply; 51+ messages in thread
From: Kristofer Karlsson @ 2026-06-23 10:13 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
On Mon, 22 Jun 2026 at 22:23, Derrick Stolee <stolee@gmail.com> wrote:
>
> On 6/22/2026 3:14 PM, Kristofer Karlsson wrote:
> >
> > On Mon, 22 Jun 2026 at 20:10, Derrick Stolee <stolee@gmail.com> wrote:
> >>
> >> When possible, I like to try to make loops only have one terminating
> >> condition. Should we have paint_queue_get() return NULL when it sees
> >> this internal state condition?
> >
> > Possibly, but that would couple the paint_queue struct very tightly with
> > the usage. Not a problem in practice since it only has one call site, and
> > it's unlikely that we want to add more of them but it may feel more natural
> > to let the paint_queue purely have the queue semantics and counters,
> > and keep the halt condition within the function itself. I don't feel
> > super-strongly about this and can change it if needed, I will just need to
> > verify that nothing else gets complex as a result, I have not fully thought
> > through the effects.
>
> Hm. Interesting. The coupling is perhaps expected, because the data
> structure tracks counts that don't otherwise need to be tracked.
> Maybe the terminating condition method could be descriptively named
> to say why it would be completing.
>
I have been working on v2 locally and most of the changes landed
nicely and were clear improvements but there's one point I would
want to discuss a bit more.
For the termination conditions, I moved them into paint_queue_get()
as you suggested. The all-zero check was straightforward since it
only depends on the counters but the side-exhaustion check also
needs to know whether we have entered the finite-generation region,
so I pass last_gen (already a local in paint_down_to_common) as a
parameter:
static struct commit *paint_queue_get(struct paint_state *state,
timestamp_t last_gen)
Inside, the two conditions merge nicely under a shared guard:
if (!state->pending_merge_bases) {
if (!state->p1_count && !state->p2_count)
return NULL;
if (last_gen < GENERATION_NUMBER_INFINITY &&
(!state->p1_count || !state->p2_count))
return NULL;
}
Both conditions require pending_merge_bases == 0, so the nesting
felt natural. The first is "nothing non-stale left" (works in any
region). The second is "one side exhausted" (only in the finite
region where topological ordering holds).
I think passing in last_gen into paint_queue_get() feels _slightly_
awkward but not too bad in practice. However, we also have my
older (first) patch with the fast-exit if the caller only needs one
merge base -- that has a separate break that also could be folded
into paint_queue_get(). The messy part here is that we would need
to also pass the mb_flags parameter to paint_queue_get().
Perhaps we should just let this remain as-is for now and follow up
with _removing_ that optimization. I think the value of having it
is much diminished (but not fully gone) by the side-exhaust approach.
Additionally there's a correctness argument to be made -- perhaps
all callers _should_ care about multiple merge bases existing, and
instead bail out if it finds more than one. The only use case
where this matters today is "git merge-base A B" without --all.
Right now I am leaning towards simply passing in last_gen and
containing all of the halt conditions there
(except the old !FIND_ALL).
The nicest alternative I can think of is to let this part only
break when the queue is empty:
while ((commit = paint_queue_get(&state)))
and then adding a logical halt-section at the end of the while-loop
(where all the useful variables we need are already available), and
we could logically think of that as an optimization section, never
strictly needed for correctness.
> I just worry about the idea that a negative number (or an addition
> overflow) would create conditions for termination that we did not
> intend. That's why using the nonzero status as true/false combined
> with ands and ors is better.
Good point, I have addressed that locally too.
Thanks,
Kristofer
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH/RFC 3/6] commit-reach: terminate merge-base walk when one paint side is exhausted
2026-06-22 21:03 ` Kristofer Karlsson
@ 2026-06-23 13:40 ` Derrick Stolee
0 siblings, 0 replies; 51+ messages in thread
From: Derrick Stolee @ 2026-06-23 13:40 UTC (permalink / raw)
To: Kristofer Karlsson
Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
On 6/22/2026 5:03 PM, Kristofer Karlsson wrote:
> On Mon, 22 Jun 2026 at 22:26, Derrick Stolee <stolee@gmail.com> wrote:
>>
>> I've used hyperfine [1] when doing specific performance tests
>> in the past. You can build Git before and after and have hyperfine
>> run the two modes and compare them:
>>
>> hyperfine --warmup=3 \
>> -n 'old' "~/git-old/bin-wrappers/git -C $repo merge-base $A $B" \
>> -n 'new' "~/git-new/bin-wrappers/git -C $repo merge-base $A $B"
>>
>> [1] https://github.com/sharkdp/hyperfine
>
> I can definitely use that, but I was thinking that the overhead
> of operations such as repo_parse_commit would be high relative
> to the overhead of the new paint_queue struct such that it would
> be hard to properly measure and that it would be easier if I could
> spread out that cost across multiple internal runs (which requires
> a custom binary of some sort), but perhaps it's enough to just
> show that there's no measurable regression here and then
> hyperfine is indeed the right fit. I'll start with that and see if I need
> to do anything more complex.
Unit-level performance is nice, but doesn't tell the whole story.
We typically focus on end-to-end performance numbers when possible.
Another way to do it would be to use trace2_region_enter() and
trace2_region_leave() markers and then pull the timing data out of
the trace2 event logs. It's more complicated and usually only
needed if we are struggling to reproduce the performance impact due
to external factors.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters
2026-06-23 10:13 ` Kristofer Karlsson
@ 2026-06-23 13:50 ` Derrick Stolee
2026-06-23 14:09 ` Kristofer Karlsson
0 siblings, 1 reply; 51+ messages in thread
From: Derrick Stolee @ 2026-06-23 13:50 UTC (permalink / raw)
To: Kristofer Karlsson
Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
On 6/23/2026 6:13 AM, Kristofer Karlsson wrote:
> On Mon, 22 Jun 2026 at 22:23, Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 6/22/2026 3:14 PM, Kristofer Karlsson wrote:
>>>
>>> On Mon, 22 Jun 2026 at 20:10, Derrick Stolee <stolee@gmail.com> wrote:
>>>>
>>>> When possible, I like to try to make loops only have one terminating
>>>> condition. Should we have paint_queue_get() return NULL when it sees
>>>> this internal state condition?
>>>
>>> Possibly, but that would couple the paint_queue struct very tightly with
>>> the usage. Not a problem in practice since it only has one call site, and
>>> it's unlikely that we want to add more of them but it may feel more natural
>>> to let the paint_queue purely have the queue semantics and counters,
>>> and keep the halt condition within the function itself. I don't feel
>>> super-strongly about this and can change it if needed, I will just need to
>>> verify that nothing else gets complex as a result, I have not fully thought
>>> through the effects.
>>
>> Hm. Interesting. The coupling is perhaps expected, because the data
>> structure tracks counts that don't otherwise need to be tracked.
>> Maybe the terminating condition method could be descriptively named
>> to say why it would be completing.
>>
>
> I have been working on v2 locally and most of the changes landed
> nicely and were clear improvements but there's one point I would
> want to discuss a bit more.
>
> For the termination conditions, I moved them into paint_queue_get()
> as you suggested. The all-zero check was straightforward since it
> only depends on the counters but the side-exhaustion check also
> needs to know whether we have entered the finite-generation region,
> so I pass last_gen (already a local in paint_down_to_common) as a
> parameter:
>
> static struct commit *paint_queue_get(struct paint_state *state,
> timestamp_t last_gen)
>
> Inside, the two conditions merge nicely under a shared guard:
>
> if (!state->pending_merge_bases) {
> if (!state->p1_count && !state->p2_count)
> return NULL;
> if (last_gen < GENERATION_NUMBER_INFINITY &&
> (!state->p1_count || !state->p2_count))
> return NULL;
> }
This looks good to me. I'm not even bothered by the last_gen
parameter. You do make a good point about it being a potentially
leaky abstraction.
> Both conditions require pending_merge_bases == 0, so the nesting
> felt natural. The first is "nothing non-stale left" (works in any
> region). The second is "one side exhausted" (only in the finite
> region where topological ordering holds).
>
> I think passing in last_gen into paint_queue_get() feels _slightly_
> awkward but not too bad in practice. However, we also have my
> older (first) patch with the fast-exit if the caller only needs one
> merge base -- that has a separate break that also could be folded
> into paint_queue_get(). The messy part here is that we would need
> to also pass the mb_flags parameter to paint_queue_get().
How much of this data that you are passing into the method could be
state in the paint_queue struct? Could we have the paint_queue manage
all of the state necessary to make decisions around the walk
termination?
Or, could we do a peek into the queue to see the "top" commit, and
check if it is a finite commit or not? I know that 'last_gen' is
supposed to be the commit walked in the previous cycle, but it seems
that we only care about "the remaining commits are finite" as our
condition.
> Perhaps we should just let this remain as-is for now and follow up
> with _removing_ that optimization. I think the value of having it
> is much diminished (but not fully gone) by the side-exhaust approach.
>
> Additionally there's a correctness argument to be made -- perhaps
> all callers _should_ care about multiple merge bases existing, and
> instead bail out if it finds more than one. The only use case
> where this matters today is "git merge-base A B" without --all.
> Right now I am leaning towards simply passing in last_gen and
> containing all of the halt conditions there
> (except the old !FIND_ALL).
This is a good start, but hopefully storing the data in the
struct would be a good way to handle that.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters
2026-06-23 13:50 ` Derrick Stolee
@ 2026-06-23 14:09 ` Kristofer Karlsson
2026-06-23 14:17 ` Derrick Stolee
0 siblings, 1 reply; 51+ messages in thread
From: Kristofer Karlsson @ 2026-06-23 14:09 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
On Tue, 23 Jun 2026 at 15:50, Derrick Stolee <stolee@gmail.com> wrote:
>
> > For the termination conditions, I moved them into paint_queue_get()
> > as you suggested. The all-zero check was straightforward since it
> > only depends on the counters but the side-exhaustion check also
> > needs to know whether we have entered the finite-generation region,
> > so I pass last_gen (already a local in paint_down_to_common) as a
> > parameter:
> >
> > static struct commit *paint_queue_get(struct paint_state *state,
> > timestamp_t last_gen)
> >
> > Inside, the two conditions merge nicely under a shared guard:
> >
> > if (!state->pending_merge_bases) {
> > if (!state->p1_count && !state->p2_count)
> > return NULL;
> > if (last_gen < GENERATION_NUMBER_INFINITY &&
> > (!state->p1_count || !state->p2_count))
> > return NULL;
> > }
>
> This looks good to me. I'm not even bothered by the last_gen
> parameter. You do make a good point about it being a potentially
> leaky abstraction.
Agreed, I am not also bothered by it.
> > Both conditions require pending_merge_bases == 0, so the nesting
> > felt natural. The first is "nothing non-stale left" (works in any
> > region). The second is "one side exhausted" (only in the finite
> > region where topological ordering holds).
> >
> > I think passing in last_gen into paint_queue_get() feels _slightly_
> > awkward but not too bad in practice. However, we also have my
> > older (first) patch with the fast-exit if the caller only needs one
> > merge base -- that has a separate break that also could be folded
> > into paint_queue_get(). The messy part here is that we would need
> > to also pass the mb_flags parameter to paint_queue_get().
>
> How much of this data that you are passing into the method could be
> state in the paint_queue struct? Could we have the paint_queue manage
> all of the state necessary to make decisions around the walk
> termination?
Good idea, I think adding last_gen to the struct is doable and makes it cleaner.
If needed we could also add the mb_flags there (but would be a followup patch)
Minor note: I renamed the struct to paint_state so that I could rename
the prio_queue to queue and not have "queue.queue" which felt
confusing in the code.
> Or, could we do a peek into the queue to see the "top" commit, and
> check if it is a finite commit or not? I know that 'last_gen' is
> supposed to be the commit walked in the previous cycle, but it seems
> that we only care about "the remaining commits are finite" as our
> condition.
Yes, peeking into the queue would work too, but it would feel awkward,
commit = prio_queue_peek();
if (halt conditions) return NULL;
prio_queue_get();
And if we get first, the condition is not valid - that said, it would be doable
to instead put the halt conditions _between_ popping the commit and
updating the counters. I am not sure how ugly or confusing it would be,
but I could add a comment to explain why that sequencing is important.
(Popping the commit and updating the counters may lead to temporary
0 counts, but then when we enqueue parents of the commits they
move away from the 0 anyway). It would become something like:
// dry-/pseudo-coded
commit *paint_queue_pop() {
commit = prio_queue_pop();
if (!commit) return NULL;
if (halt_condition(state, commit.generation)) return NULL;
// important: don't decrement counters before checking the halt condition
paint_count_update(state, commit->object.flags, -1);
return commit;
}
> > Right now I am leaning towards simply passing in last_gen and
> > containing all of the halt conditions there
> > (except the old !FIND_ALL).
>
> This is a good start, but hopefully storing the data in the
> struct would be a good way to handle that.
Sounds good, I will massage the code a bit, store the relevant pieces
in the struct
and see how clean I can make it.
Thanks,
Kristofer
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters
2026-06-23 14:09 ` Kristofer Karlsson
@ 2026-06-23 14:17 ` Derrick Stolee
2026-06-24 11:25 ` Kristofer Karlsson
0 siblings, 1 reply; 51+ messages in thread
From: Derrick Stolee @ 2026-06-23 14:17 UTC (permalink / raw)
To: Kristofer Karlsson
Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
On 6/23/2026 10:09 AM, Kristofer Karlsson wrote:
> On Tue, 23 Jun 2026 at 15:50, Derrick Stolee <stolee@gmail.com> wrote:
>> How much of this data that you are passing into the method could be
>> state in the paint_queue struct? Could we have the paint_queue manage
>> all of the state necessary to make decisions around the walk
>> termination?
>
> Good idea, I think adding last_gen to the struct is doable and makes it cleaner.
> If needed we could also add the mb_flags there (but would be a followup patch)
> Minor note: I renamed the struct to paint_state so that I could rename
> the prio_queue to queue and not have "queue.queue" which felt
> confusing in the code.
>
>> Or, could we do a peek into the queue to see the "top" commit, and
>> check if it is a finite commit or not? I know that 'last_gen' is
>> supposed to be the commit walked in the previous cycle, but it seems
>> that we only care about "the remaining commits are finite" as our
>> condition.
>
> Yes, peeking into the queue would work too, but it would feel awkward,
>
> commit = prio_queue_peek();
> if (halt conditions) return NULL;
> prio_queue_get();
Good instinct to notice that peeking and getting from the same
method is awkward.
> And if we get first, the condition is not valid - that said, it would be doable
> to instead put the halt conditions _between_ popping the commit and
> updating the counters. I am not sure how ugly or confusing it would be,
> but I could add a comment to explain why that sequencing is important.
> (Popping the commit and updating the counters may lead to temporary
> 0 counts, but then when we enqueue parents of the commits they
> move away from the 0 anyway). It would become something like:
>
> // dry-/pseudo-coded
> commit *paint_queue_pop() {
> commit = prio_queue_pop();
> if (!commit) return NULL;
> if (halt_condition(state, commit.generation)) return NULL;
> // important: don't decrement counters before checking the halt condition
> paint_count_update(state, commit->object.flags, -1);
> return commit;
> }
I think this would be an appropriate way to handle this. If we
pop and return NULL then it's ok that we removed data from the
queue because it shouldn't be reused.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters
2026-06-23 14:17 ` Derrick Stolee
@ 2026-06-24 11:25 ` Kristofer Karlsson
0 siblings, 0 replies; 51+ messages in thread
From: Kristofer Karlsson @ 2026-06-24 11:25 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
On Tue, 23 Jun 2026 at 16:17, Derrick Stolee <stolee@gmail.com> wrote:
>
> I think this would be an appropriate way to handle this. If we
> pop and return NULL then it's ok that we removed data from the
> queue because it shouldn't be reused.
I have prepared v2 on GGG which I believe addresses all of the
feedback. The halt conditions now live inside paint_queue_get()
as you suggested.
I am not 100% happy with the halt-condition placement yet --
the existing loop in master already has several exit paths
(while condition, min_generation break, FIND_ALL break) and I
think there is an opportunity to consolidate them. But that is
a separate discussion and I do not want to derail this series.
I can propose some alternatives in a follow-up after this
lands.
Thanks,
Kristofer
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH v2 0/7] commit-reach: terminate merge-base walk when one side is exhausted
2026-06-20 10:36 [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted Kristofer Karlsson via GitGitGadget
` (6 preceding siblings ...)
2026-06-22 18:22 ` [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted Derrick Stolee
@ 2026-06-24 12:14 ` Kristofer Karlsson via GitGitGadget
2026-06-24 12:14 ` [PATCH v2 1/7] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget
` (8 more replies)
7 siblings, 9 replies; 51+ messages in thread
From: Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson
commit-reach: terminate merge-base walk when one side is exhausted
Optimize paint_down_to_common() for merge-base queries that hit large
one-sided histories.
When the walk from one side reaches a commit with a very low generation
number that the other side never paints, the walk is forced to drain most of
the graph. A common trigger is a repository import that grafts a separate
history with its own root, but any merge that introduces a low-generation
commit never painted by the other side has the same effect.
A new merge-base candidate can only be discovered when exclusive PARENT1 and
PARENT2 paint meet. This series teaches paint_down_to_common() to stop as
soon as one side has no exclusive commits left in the queue; once one side
is exhausted, no further candidates can appear.
origin/HEAD o o PR HEAD
| |
(import) o :
/ \ /
| o merge-base
| |
: : (~2.5M commits)
| |
import root main root
In the RFC thread [1], Derrick Stolee provided a criss-cross counterexample
that sharpened the halt condition, and Elijah Newren independently
discovered the same optimization and shared an implementation in PR #2150
[2]. Patches 2-3 incorporate test cases from Elijah's branch.
This series implements the optimization only after the walk enters the
finite-generation region, where generation ordering guarantees that paint on
visited commits is final.
Patch layout:
1/7 Documentation/technical: add paint-down-to-common doc 2/7 t6600: add
test cases for side-exhaustion edge cases 3/7 t6099, t6600: add
side-exhaustion regression tests 4/7 commit-reach: add trace2
instrumentation to paint_down_to_common() 5/7 commit-reach: introduce struct
paint_state with per-side counters 6/7 commit-reach: remove unused
nonstale_queue dedup wrappers 7/7 commit-reach: terminate merge-base walk
when one paint side is exhausted
Benchmarks
Step counts are deterministic (measured via trace2_data_intmax added in
patch 4). Wall-clock times are medians over 10-20 runs with CPU governor set
to performance.
2.6M-commit monorepo with commit-graph (baseline v2.55.0-rc1):
steps wall-clock
merge-base --all (across import) 2682391 -> 53521 7.26s -> 88ms
merge-base --all (1000 apart) 2659607 -> 1106 6.98s -> 8ms
merge-tree (across import) - 8.11s -> 100ms
git.git (88k commits, commit-graph):
steps wall-clock
merge-base --all v2.0.0 v2.55.0-rc1 72264 -> 44589 82ms -> 49ms
merge-base --all HEAD HEAD~1000 9873 -> 3817 19ms -> 9ms
merge-base --all HEAD HEAD~10000 72285 -> 41523 80ms -> 48ms
merge-base HEAD HEAD~1000 - 9ms -> 9ms
merge-base --is-ancestor HEAD~1000 HEAD - 6ms -> 6ms
Changes since v1:
* Reordered patches: documentation first (describing the existing
algorithm), tests before code changes, so they demonstrate passing with
old logic first.
* Dropped the ahead_behind decoupling patch. paint_state is now a NEW
struct alongside nonstale_queue instead of replacing it. ahead_behind()
is completely untouched.
* Removed nonstale_queue_put_dedup() and nonstale_queue_get_dedup() (dead
code after the conversion) in a separate commit.
* Renamed: struct paint_queue -> paint_state, field pq -> queue,
paint_count_add/remove -> paint_count_update (single function with signed
delta parameter).
* Split the old paint_count_transition (which handled both old and new
flags in one call) into separate remove/add calls with a signed delta.
This eliminates the need for the case 0 handler (which tracked "not in
the queue") and allows an exhaustive switch on (PARENT1 | PARENT2 |
STALE) that documents all valid flag combinations, with BUG() in default.
* Moved all termination conditions into paint_queue_get(). The all-zero
check and the side-exhaustion check are merged under a shared
!pending_merge_bases guard. paint_queue_get() derives the generation from
the dequeued commit itself, so no extra parameter is needed.
* Added trace2_data_intmax() instrumentation to report the number of
commits visited per paint walk (separate commit), with deterministic
step-count assertions in t6600.
* Expanded switch statements to multi-line format per .clang-format.
* Used !count style throughout instead of count == 0.
* Updated technical documentation alongside code changes.
* Added benchmark data (both git-bench wall-clock and trace2 step counts)
to commit messages.
[1]
https://lore.kernel.org/git/CAL71e4Ps-2_0+uuZu43N9pFnXBemoAohPs_eyRJf8taXHJPAXQ@mail.gmail.com/T/#u
[2] https://github.com/gitgitgadget/git/pull/2150
Elijah Newren (1):
t6600: add test cases for side-exhaustion edge cases
Kristofer Karlsson (6):
Documentation/technical: add paint-down-to-common doc
t6099, t6600: add side-exhaustion regression tests
commit-reach: add trace2 instrumentation to paint_down_to_common()
commit-reach: introduce struct paint_state with per-side counters
commit-reach: remove unused nonstale_queue dedup wrappers
commit-reach: terminate merge-base walk when one paint side is
exhausted
Documentation/Makefile | 1 +
Documentation/technical/meson.build | 1 +
.../technical/paint-down-to-common.adoc | 128 ++++++++++++++
commit-reach.c | 119 ++++++++++---
t/meson.build | 1 +
t/t6099-merge-base-side-exhaustion.sh | 82 +++++++++
t/t6600-test-reach.sh | 157 ++++++++++++++++++
7 files changed, 464 insertions(+), 25 deletions(-)
create mode 100644 Documentation/technical/paint-down-to-common.adoc
create mode 100755 t/t6099-merge-base-side-exhaustion.sh
base-commit: ab776a62a78576513ee121424adb19597fbb7613
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2149%2Fspkrka%2Fside-exhaust-pr-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2149/spkrka/side-exhaust-pr-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/2149
Range-diff vs v1:
1: 5492acda0a < -: ---------- commit-reach: decouple ahead_behind from nonstale_queue
6: 9cbfc67d72 ! 1: 19ed743bd1 Documentation/technical: add paint-down-to-common doc
@@ Commit message
Documentation/technical: add paint-down-to-common doc
Add a technical document describing the paint_down_to_common()
- algorithm used for merge-base computation.
+ algorithm used for merge-base computation, covering the paint
+ walk, generation number regions, and termination conditions.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
@@ Documentation/technical/paint-down-to-common.adoc (new)
+Termination
+-----------
+
-+Termination happens when we can prove that no extra progress is
-+possible. We are done with the main loop when one of the following
-+conditions holds:
++The walk uses a `nonstale_queue` wrapper around `prio_queue` that
++tracks `max_nonstale`: the lowest-priority non-stale commit enqueued
++so far. Once that commit is dequeued, every remaining entry is known
++to be STALE and the loop terminates. Specifically, the main loop
++ends when one of the following conditions holds:
+
+ 1. The queue is empty.
-+ 2. The queue only contains STALE entries.
-+ 3. Side-exhaustion: the walk has reached the finite region and one
-+ of the sides is fully exhausted.
-+
-+The loop waits for all pending merge-base candidates to be popped
-+and recorded before any early exit fires, so no separate drain phase
-+is needed after termination.
++ 2. `max_nonstale` has been dequeued, meaning the queue only contains
++ STALE entries.
+
+Stale entry condition
+~~~~~~~~~~~~~~~~~~~~~
-+If all entries are stale we cannot find any new merge bases since
-+that requires at least one enqueued side node meeting the other side.
-+However, we could still invalidate merge bases (if there are more
-+than one). This is unnecessary since `remove_redundant()` will clean
-+that up as a post-process step.
-+
-+Side-exhaustion
-+~~~~~~~~~~~~~~~
-+A commit is *exclusive* to one side if it carries that side's paint
-+but not the other (e.g. PARENT1 without PARENT2).
-+
-+If we have reached the finite region of the graph, no future
-+traversal step can add paint to an already-visited commit. Thus if
-+there are no exclusive PARENT2 commits in the queue, no additional
-+PARENT2 paint can be introduced into the walk. Even if exclusive
-+PARENT1 commits remain, no new merge-base candidates can be
-+discovered. The same holds symmetrically for PARENT1.
-+
-+This invariant is only valid in the finite region of the graph.
++Once all queued entries are stale, no new merge-base candidates can
++be discovered -- that requires at least one non-stale commit from
++each side meeting. Continuing the walk could still invalidate
++existing candidates by proving one is an ancestor of another, but
++`remove_redundant()` handles that as a post-processing step, so it
++is safe to exit early.
+
+Related documentation
+---------------------
+
+ - `Documentation/technical/commit-graph.adoc` -- generation numbers
+ and the reachability closure property.
+
+ ## commit-reach.c ##
+@@ commit-reach.c: static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue)
+ return commit;
+ }
+
+-/* all input commits in one and twos[] must have been parsed! */
++/*
++ * See Documentation/technical/paint-down-to-common.adoc
++ *
++ * All input commits in one and twos[] must have been parsed!
++ */
+ static int paint_down_to_common(struct repository *r,
+ struct commit *one, int n,
+ struct commit **twos,
4: 91372b975f ! 2: 6151b8e0a3 t6600: add test cases for side-exhaustion edge cases
@@ t/t6600-test-reach.sh: test_expect_success 'setup' '
+ # ps-T1 ps-T2
+ #
+ # where ps-T1=merge(ps-Z,ps-B), ps-T2=merge(ps-W,ps-B), so
-+ # merge-base(ps-T1,ps-T2) = ps-B. During the walk, ps-X transitions
++ # merge-base(ps-T1,ps-T2) = ps-B. During the walk, ps-X transitions
+ # to (PARENT1|PARENT2) via ps-Z and ps-W before ps-B is dequeued;
+ # then the STALE-walk from ps-B transitions ps-X to
+ # (PARENT1|PARENT2|STALE).
@@ t/t6600-test-reach.sh: test_expect_success 'setup' '
+
+ # Build a side topology that lives entirely outside the half
+ # commit-graph and has non-monotonic commit dates, to exercise the
-+ # INFINITY-gate in paint_down_to_common. With both tips outside
++ # INFINITY-gate in paint_down_to_common. With both tips outside
+ # the graph, generation is INFINITY and the queue falls back to
+ # commit-date order, which here is non-monotonic.
+ #
@@ t/t6600-test-reach.sh: test_expect_success 'get_merge_bases_many' '
+
+test_expect_success 'get_merge_bases_many:pending-stale' '
+ # Exercises the (PARENT1|PARENT2) -> (...|STALE) transition path in
-+ # paint_down_to_common(). See the topology comment in the setup test.
++ # paint_down_to_common(). See the topology comment in the setup test.
+ cat >input <<-\EOF &&
+ A:ps-T1
+ X:ps-T2
@@ t/t6600-test-reach.sh: test_expect_success 'get_merge_bases_many' '
+'
+
+test_expect_success 'get_merge_bases_many:infinity-both-sides' '
-+ # Exercises the push-time INFINITY-gate in paint_down_to_common(). See
++ # Exercises the push-time INFINITY-gate in paint_down_to_common(). See
+ # the pi-* topology comment in the setup test.
+ cat >input <<-\EOF &&
+ A:pi-X
5: faf5bc98ed ! 3: 90f09ecb5c t6099, t6600: add side-exhaustion regression tests
@@ t/t6099-merge-base-side-exhaustion.sh (new)
+
+Test that merge-base --all correctly handles cases where
+multiple merge-base candidates exist and one is an ancestor
-+of another. The side-exhaustion optimization in
++of another. The side-exhaustion optimization in
+paint_down_to_common may exit before STALE propagation
+removes the ancestor, but remove_redundant catches it.
+
@@ t/t6600-test-reach.sh: test_expect_success 'get_merge_bases_many:infinity-both-s
+test_expect_success 'setup mixed finite/INFINITY topology' '
+ # Create a commit outside all saved commit-graph files so it always
+ # has INFINITY generation, while its parent (ps-X) is in the graph
-+ # with a finite generation. Use the ps-* orphan topology so we do
++ # with a finite generation. Use the ps-* orphan topology so we do
+ # not pollute the grid-based rev-list tests.
+ git checkout ps-X &&
+ test_env GIT_TEST_COMMIT_GRAPH= test_commit pm-INF
@@ t/t6600-test-reach.sh: test_expect_success 'get_merge_bases_many:infinity-both-s
+test_expect_success 'get_merge_bases_many:mixed-finite-infinity' '
+ # One tip (pm-INF) is outside the commit-graph with INFINITY
+ # generation; the other (ps-B) is in the graph with finite
-+ # generation. The walk starts in the INFINITY region and crosses
++ # generation. The walk starts in the INFINITY region and crosses
+ # into the finite region where side-exhaustion can fire.
+ cat >input <<-\EOF &&
+ A:pm-INF
-: ---------- > 4: 6ade4df2ed commit-reach: add trace2 instrumentation to paint_down_to_common()
2: 316e4dfe26 ! 5: f24edd45f0 commit-reach: introduce struct paint_queue with per-side counters
@@ Metadata
Author: Kristofer Karlsson <krka@spotify.com>
## Commit message ##
- commit-reach: introduce struct paint_queue with per-side counters
+ commit-reach: introduce struct paint_state with per-side counters
- Replace the nonstale_queue abstraction in paint_down_to_common() with
- a new paint_queue struct that tracks per-side commit counts. Each
- non-stale queued commit occupies exactly one counter bucket based on
- its paint flags: PARENT1-only, PARENT2-only, or both sides (a pending
+ Add a paint_state struct for use by paint_down_to_common() that
+ wraps a prio_queue with per-side commit counters. Each non-stale
+ queued commit occupies exactly one counter bucket based on its
+ paint flags: PARENT1-only, PARENT2-only, or both sides (a pending
merge-base candidate).
- The counters are maintained by paint_count_transition() which handles
- all flag changes as bucket transfers: remove from the old bucket, add
- to the new one. Either step is a no-op when the respective state has
- no bucket (stale or zero).
+ The counters are maintained by paint_count_update() which adjusts
+ the appropriate bucket by a signed delta. An exhaustive switch on
+ the paint+stale bits documents all valid flag combinations in one
+ place.
- The loop now drains the queue via paint_queue_get() and breaks when
- all counters reach zero, replacing the old pointer-based termination
- (max_nonstale). This is equivalent behavior.
+ Convert paint_down_to_common() to use paint_state. The loop now
+ drains the queue via paint_queue_get() which returns NULL when all
+ counters reach zero, replacing the old pointer-based termination
+ (max_nonstale). This is equivalent behavior -- both conditions
+ detect that no non-stale entries remain.
+
+ The existing nonstale_queue is left in place for ahead_behind().
+
+ Step counts (via trace2 from the previous commit) are identical
+ before and after this refactoring, confirming no behavioral change.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
+ ## Documentation/technical/paint-down-to-common.adoc ##
+@@ Documentation/technical/paint-down-to-common.adoc: re-enqueued is bounded by the number of flag transitions.
+ Termination
+ -----------
+
+-The walk uses a `nonstale_queue` wrapper around `prio_queue` that
+-tracks `max_nonstale`: the lowest-priority non-stale commit enqueued
+-so far. Once that commit is dequeued, every remaining entry is known
+-to be STALE and the loop terminates. Specifically, the main loop
++The walk tracks the number of commits of each type in the queue
++(PARENT1-only, PARENT2-only, pending merge-base). The main loop
+ ends when one of the following conditions holds:
+
+ 1. The queue is empty.
+- 2. `max_nonstale` has been dequeued, meaning the queue only contains
+- STALE entries.
++ 2. The queue contains only stale entries.
+
+ Stale entry condition
+ ~~~~~~~~~~~~~~~~~~~~~
+
## commit-reach.c ##
-@@ commit-reach.c: static int compare_commits_by_gen(const void *_a, const void *_b)
+@@ commit-reach.c: static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue)
+ return commit;
}
- /*
-- * A prio_queue with O(1) termination check. 'max_nonstale' tracks
-- * the lowest-priority non-stale commit enqueued so far; once it is
-- * popped, every remaining entry is known to be STALE.
++/*
+ * Priority queue with per-side commit counters for paint_down_to_common().
+ * Each non-stale queued commit occupies exactly one bucket: PARENT1-only,
+ * PARENT2-only, or both (a pending merge-base candidate).
- */
--struct nonstale_queue {
-+struct paint_queue {
- struct prio_queue pq;
-- struct commit *max_nonstale;
++ */
++struct paint_state {
++ struct prio_queue queue;
+ int p1_count;
+ int p2_count;
+ int pending_merge_bases;
- };
-
--static void nonstale_queue_put(struct nonstale_queue *queue,
-- struct commit *c)
-+/*
-+ * Adjust per-side counters for a paint-state transition. Non-stale
-+ * commits are counted in one of three counters: PARENT1-only,
-+ * PARENT2-only, or both. Zero means "not in the queue" (used on
-+ * enqueue/dequeue); stale commits are not counted at all.
-+ */
-+static void paint_count_transition(struct paint_queue *queue,
-+ unsigned old_flags, unsigned new_flags)
- {
-- struct commit *old = queue->max_nonstale;
-+ unsigned old_paint = old_flags & (PARENT1 | PARENT2 | STALE);
-+ unsigned new_paint = new_flags & (PARENT1 | PARENT2 | STALE);
-
-- prio_queue_put(&queue->pq, c);
-- if (c->object.flags & STALE)
-+ if (old_paint == new_paint)
- return;
-- if (!old || queue->pq.compare(old, c, queue->pq.cb_data) <= 0)
-- queue->max_nonstale = c;
--}
--
--static struct commit *nonstale_queue_get(struct nonstale_queue *queue)
--{
-- struct commit *commit = prio_queue_get(&queue->pq);
-
-- if (commit == queue->max_nonstale)
-- queue->max_nonstale = NULL;
--
-- return commit;
-+ if (!(old_paint & STALE)) {
-+ switch (old_paint & (PARENT1 | PARENT2)) {
-+ case 0: break;
-+ case PARENT1: queue->p1_count--; break;
-+ case PARENT2: queue->p2_count--; break;
-+ case PARENT1 | PARENT2: queue->pending_merge_bases--; break;
-+ default: BUG("unexpected paint state");
-+ }
-+ }
-+ if (!(new_paint & STALE)) {
-+ switch (new_paint & (PARENT1 | PARENT2)) {
-+ case 0: break;
-+ case PARENT1: queue->p1_count++; break;
-+ case PARENT2: queue->p2_count++; break;
-+ case PARENT1 | PARENT2: queue->pending_merge_bases++; break;
-+ default: BUG("unexpected paint state");
-+ }
++};
++
++static void paint_count_update(struct paint_state *state,
++ unsigned flags, int delta)
++{
++ switch (flags & (PARENT1 | PARENT2 | STALE)) {
++ case PARENT1:
++ state->p1_count += delta;
++ break;
++
++ case PARENT2:
++ state->p2_count += delta;
++ break;
++
++ case PARENT1 | PARENT2:
++ state->pending_merge_bases += delta;
++ break;
++
++ case PARENT1 | PARENT2 | STALE:
++ break;
++
++ default:
++ BUG("unexpected paint state");
+ }
- }
-
--static void clear_nonstale_queue(struct nonstale_queue *queue)
-+static void paint_queue_put(struct paint_queue *queue,
++}
++
++static void paint_queue_put(struct paint_state *state,
+ struct commit *c, unsigned add_flags)
- {
-- clear_prio_queue(&queue->pq);
-- queue->max_nonstale = NULL;
--}
++{
+ unsigned old_flags = c->object.flags;
+ c->object.flags |= add_flags;
-
--static void nonstale_queue_put_dedup(struct nonstale_queue *queue,
-- struct commit *c)
--{
-- if (c->object.flags & ENQUEUED)
-- return;
-- c->object.flags |= ENQUEUED;
-- nonstale_queue_put(queue, c);
++
+ if (old_flags & ENQUEUED) {
-+ paint_count_transition(queue, old_flags, c->object.flags);
++ paint_count_update(state, old_flags, -1);
++ paint_count_update(state, c->object.flags, 1);
+ } else {
+ c->object.flags |= ENQUEUED;
-+ prio_queue_put(&queue->pq, c);
-+ paint_count_transition(queue, 0, c->object.flags);
++ prio_queue_put(&state->queue, c);
++ paint_count_update(state, c->object.flags, 1);
+ }
- }
-
--static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue)
-+static struct commit *paint_queue_get(struct paint_queue *queue)
- {
-- struct commit *commit = nonstale_queue_get(queue);
-+ struct commit *commit = prio_queue_get(&queue->pq);
-
-- if (commit)
++}
++
++static struct commit *paint_queue_get(struct paint_state *state)
++{
++ struct commit *commit;
++
++ if (!state->p1_count && !state->p2_count &&
++ !state->pending_merge_bases)
++ return NULL;
++
++ commit = prio_queue_get(&state->queue);
+ if (commit) {
- commit->object.flags &= ~ENQUEUED;
-+ paint_count_transition(queue, commit->object.flags, 0);
++ commit->object.flags &= ~ENQUEUED;
++ paint_count_update(state, commit->object.flags, -1);
+ }
- return commit;
- }
-
++ return commit;
++}
++
+ /*
+ * See Documentation/technical/paint-down-to-common.adoc
+ *
@@ commit-reach.c: static int paint_down_to_common(struct repository *r,
enum merge_base_flags mb_flags,
struct commit_list **result)
{
- struct nonstale_queue queue = {
- { compare_commits_by_gen_then_commit_date }
-+ struct paint_queue queue = {
-+ .pq = { compare_commits_by_gen_then_commit_date }
++ struct paint_state state = {
++ .queue = { compare_commits_by_gen_then_commit_date }
};
+ struct commit *commit;
int i;
+ int steps = 0;
timestamp_t last_gen = GENERATION_NUMBER_INFINITY;
struct commit_list **tail = result;
-@@ commit-reach.c: static int paint_down_to_common(struct repository *r,
+
+ if (!min_generation && !corrected_commit_dates_enabled(r))
+- queue.pq.compare = compare_commits_by_commit_date;
++ state.queue.compare = compare_commits_by_commit_date;
+
+ one->object.flags |= PARENT1;
+ if (!n) {
commit_list_append(one, result);
return 0;
}
- nonstale_queue_put_dedup(&queue, one);
-+ paint_queue_put(&queue, one, 0);
++ paint_queue_put(&state, one, 0);
- for (i = 0; i < n; i++) {
- twos[i]->object.flags |= PARENT2;
- nonstale_queue_put_dedup(&queue, twos[i]);
- }
+ for (i = 0; i < n; i++)
-+ paint_queue_put(&queue, twos[i], PARENT2);
++ paint_queue_put(&state, twos[i], PARENT2);
- while (queue.max_nonstale) {
- struct commit *commit = nonstale_queue_get_dedup(&queue);
-+ while ((commit = paint_queue_get(&queue))) {
++ while ((commit = paint_queue_get(&state))) {
struct commit_list *parents;
int flags;
timestamp_t generation = commit_graph_generation(commit);
@@ commit-reach.c: static int paint_down_to_common(struct repository *r,
continue;
if (repo_parse_commit(r, p)) {
- clear_nonstale_queue(&queue);
-+ clear_prio_queue(&queue.pq);
++ clear_prio_queue(&state.queue);
commit_list_free(*result);
*result = NULL;
/*
@@ commit-reach.c: static int paint_down_to_common(struct repository *r,
}
- p->object.flags |= flags;
- nonstale_queue_put_dedup(&queue, p);
-+ paint_queue_put(&queue, p, flags);
++ paint_queue_put(&state, p, flags);
}
-+
-+ if (queue.p1_count + queue.p2_count +
-+ queue.pending_merge_bases == 0)
-+ break;
}
- clear_nonstale_queue(&queue);
-+ clear_prio_queue(&queue.pq);
++ clear_prio_queue(&state.queue);
+ trace2_data_intmax("paint_down_to_common", r,
+ "steps", steps);
commit_list_sort_by_date(result);
- return 0;
- }
-: ---------- > 6: 8c72f01083 commit-reach: remove unused nonstale_queue dedup wrappers
3: ed12a5cb5b ! 7: d84b932e5b commit-reach: terminate merge-base walk when one paint side is exhausted
@@ Commit message
commit-reach: terminate merge-base walk when one paint side is exhausted
Add an early termination check to paint_down_to_common() using the
- per-side counters introduced in the previous commit. Once the walk
- enters the finite-generation region, terminate early when one side's
- exclusive count drops to zero -- no new merge-base can form without
- both paint sides meeting.
+ per-side counters introduced earlier. Once the walk enters the
+ finite-generation region, terminate early when one side's exclusive
+ count drops to zero -- no new merge-base can form without both paint
+ sides meeting.
The check also waits for pending_merge_bases to reach zero, ensuring
- all merge-base candidates have been popped and recorded before
+ all merge-base candidates have been dequeued and recorded before
exiting.
The INFINITY gate ensures correctness: commits without a commit-graph
@@ Commit message
once the walk enters the finite-generation region where ordering
guarantees hold.
- On large repositories with commit-graph, this yields 100-1000x
- speedups for merge-base queries where one side (e.g. a PR branch) is
- much smaller than the other.
+ Step counts measured with trace2 on git.git with commit-graph:
+
+ merge-base --all v2.0.0 v2.55.0-rc1:
+ before: 72264 steps after: 44589 steps
+
+ merge-base --all v2.55.0-rc1 v2.55.0-rc1~5:
+ before: 110 steps after: 7 steps
Helped-by: Derrick Stolee <stolee@gmail.com>
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
+ ## Documentation/technical/paint-down-to-common.adoc ##
+@@ Documentation/technical/paint-down-to-common.adoc: ends when one of the following conditions holds:
+
+ 1. The queue is empty.
+ 2. The queue contains only stale entries.
++ 3. Side exhaustion: no pure PARENT1 or pure PARENT2 commits
++ remain in the queue, no pending merge-base candidates exist,
++ and the walk has entered the finite-generation region.
+
+ Stale entry condition
+ ~~~~~~~~~~~~~~~~~~~~~
+@@ Documentation/technical/paint-down-to-common.adoc: existing candidates by proving one is an ancestor of another, but
+ `remove_redundant()` handles that as a post-processing step, so it
+ is safe to exit early.
+
++Side-exhaustion condition
++~~~~~~~~~~~~~~~~~~~~~~~~~
++A new merge-base requires commits from both sides to meet. When one
++side's exclusive counter reaches zero and there are no pending
++merge-base candidates, no future traversal step can produce a new
++candidate.
++
++This optimization only activates in the finite-generation region
++where topological ordering holds. In that region, children are
++always visited before parents, so paint flags are final at visit
++time and an exhausted side cannot reappear. In the INFINITY region,
++commit-date ordering can violate this guarantee, so the check is
++skipped.
++
+ Related documentation
+ ---------------------
+
+
## commit-reach.c ##
-@@ commit-reach.c: static int paint_down_to_common(struct repository *r,
- if (queue.p1_count + queue.p2_count +
- queue.pending_merge_bases == 0)
- break;
+@@ commit-reach.c: static void paint_queue_put(struct paint_state *state,
+
+ static struct commit *paint_queue_get(struct paint_state *state)
+ {
+- struct commit *commit;
++ struct commit *commit = prio_queue_get(&state->queue);
+
+- if (!state->p1_count && !state->p2_count &&
+- !state->pending_merge_bases)
++ if (!commit)
+ return NULL;
+
+- commit = prio_queue_get(&state->queue);
+- if (commit) {
+- commit->object.flags &= ~ENQUEUED;
+- paint_count_update(state, commit->object.flags, -1);
++ commit->object.flags &= ~ENQUEUED;
+
++ if (!state->pending_merge_bases) {
++ if (!state->p1_count && !state->p2_count)
++ return NULL;
+ /*
+ * Side exhaustion: a new merge-base can only form
+ * when both PARENT1-only and PARENT2-only commits
-+ * remain in the queue. In the finite-generation
++ * remain in the queue. In the finite-generation
+ * region the queue is ordered topologically, so
+ * no future step can add paint to visited commits
+ * and an exhausted side cannot reappear.
+ */
-+ if (generation < GENERATION_NUMBER_INFINITY &&
-+ queue.pending_merge_bases == 0 &&
-+ (queue.p1_count == 0 || queue.p2_count == 0))
-+ break;
++ if ((!state->p1_count || !state->p2_count) &&
++ commit_graph_generation(commit) < GENERATION_NUMBER_INFINITY)
++ return NULL;
}
++
++ paint_count_update(state, commit->object.flags, -1);
+ return commit;
+ }
+
+
+ ## t/t6600-test-reach.sh ##
+@@ t/t6600-test-reach.sh: test_expect_success 'merge-base --all commit-walk steps' '
+ cp commit-graph-full .git/objects/info/commit-graph &&
+ GIT_TRACE2_EVENT="$(pwd)/trace-full.txt" \
+ git merge-base --all commit-9-9 commit-9-1 >actual &&
+- test_trace2_data paint_down_to_common steps 80 <trace-full.txt &&
++ test_trace2_data paint_down_to_common steps 9 <trace-full.txt &&
+
+ cp commit-graph-half .git/objects/info/commit-graph &&
+ GIT_TRACE2_EVENT="$(pwd)/trace-half.txt" \
+ git merge-base --all commit-9-9 commit-9-1 >actual &&
+- test_trace2_data paint_down_to_common steps 81 <trace-half.txt
++ test_trace2_data paint_down_to_common steps 57 <trace-half.txt
+ '
- clear_prio_queue(&queue.pq);
+ test_expect_success 'reduce_heads' '
--
gitgitgadget
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH v2 1/7] Documentation/technical: add paint-down-to-common doc
2026-06-24 12:14 ` [PATCH v2 0/7] " Kristofer Karlsson via GitGitGadget
@ 2026-06-24 12:14 ` Kristofer Karlsson via GitGitGadget
2026-06-24 17:09 ` Junio C Hamano
2026-06-24 12:14 ` [PATCH v2 2/7] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget
` (7 subsequent siblings)
8 siblings, 1 reply; 51+ messages in thread
From: Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson,
Kristofer Karlsson
From: Kristofer Karlsson <krka@spotify.com>
Add a technical document describing the paint_down_to_common()
algorithm used for merge-base computation, covering the paint
walk, generation number regions, and termination conditions.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
Documentation/Makefile | 1 +
Documentation/technical/meson.build | 1 +
.../technical/paint-down-to-common.adoc | 114 ++++++++++++++++++
commit-reach.c | 6 +-
4 files changed, 121 insertions(+), 1 deletion(-)
create mode 100644 Documentation/technical/paint-down-to-common.adoc
diff --git a/Documentation/Makefile b/Documentation/Makefile
index 2699f0b24a..f8dea4b395 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -129,6 +129,7 @@ TECH_DOCS += technical/long-running-process-protocol
TECH_DOCS += technical/multi-pack-index
TECH_DOCS += technical/packfile-uri
TECH_DOCS += technical/pack-heuristics
+TECH_DOCS += technical/paint-down-to-common
TECH_DOCS += technical/parallel-checkout
TECH_DOCS += technical/partial-clone
TECH_DOCS += technical/platform-support
diff --git a/Documentation/technical/meson.build b/Documentation/technical/meson.build
index ec07088c57..9ce11d5e48 100644
--- a/Documentation/technical/meson.build
+++ b/Documentation/technical/meson.build
@@ -18,6 +18,7 @@ articles = [
'multi-pack-index.adoc',
'packfile-uri.adoc',
'pack-heuristics.adoc',
+ 'paint-down-to-common.adoc',
'parallel-checkout.adoc',
'partial-clone.adoc',
'platform-support.adoc',
diff --git a/Documentation/technical/paint-down-to-common.adoc b/Documentation/technical/paint-down-to-common.adoc
new file mode 100644
index 0000000000..c10d5d2887
--- /dev/null
+++ b/Documentation/technical/paint-down-to-common.adoc
@@ -0,0 +1,114 @@
+Merge-Base Computation and paint_down_to_common()
+==================================================
+
+The function `paint_down_to_common()` in `commit-reach.c` computes merge
+bases by walking the commit graph backwards from two sets of tips and
+finding where their ancestry meets.
+
+Use cases
+---------
+
+Computing merge bases is used in two different ways:
+
+ 1. *Finding all merge bases* (`merge-base --all`, `merge-tree`,
+ `merge`, `rebase`). A merge base is a common ancestor that is
+ not itself an ancestor of another common ancestor.
+
+ 2. *Ancestry checks* (`in_merge_bases`, used by `merge-base
+ --is-ancestor`, `branch -d`, `fetch`). These ask: "is commit A
+ an ancestor of commit B?" If a common ancestor equals one of the
+ inputs, that input is necessarily the only merge base -- no other
+ common ancestor can be both as recent and not an ancestor of it.
+
+Both use cases share the same algorithm and implementation.
+
+Algorithm
+---------
+
+Given a commit `one` and a set of commits `twos[]`, the walk paints
+commits with two colors:
+
+ - PARENT1: reachable from `one`
+ - PARENT2: reachable from any commit in `twos[]`
+
+The walk uses a priority queue ordered by generation number (falling
+back to commit date when generation numbers are unavailable). Each
+step dequeues the highest-priority commit (this is when we say a
+commit is "visited") and propagates its paint flags to its parents,
+enqueuing them if they gained new flags. When a commit receives
+both PARENT1 and PARENT2, it is a merge-base candidate. A candidate
+gains the STALE flag so its ancestors propagate staleness -- any
+deeper common ancestor is necessarily redundant.
+
+INFINITY and finite generation regions
+--------------------------------------
+
+The commit-graph stores a generation number for each commit. Commits
+not in the commit-graph have generation `GENERATION_NUMBER_INFINITY`. The
+graph is closed under reachability: if a commit is in the graph, all
+its ancestors are too. This partitions the commit graph into two regions:
+
+....
+ +---------------------------------------+
+ | INFINITY region |
+ | generation = INFINITY |
+ | queue order: heuristic (commit date) |
+ +---------------------------------------+
+ |
+ v
+ +---------------------------------------+
+ | Finite region |
+ | generation = finite |
+ | queue order: topological |
+ +---------------------------------------+
+....
+
+When the commit-graph is enabled, the INFINITY region is typically
+very small -- it only contains commits added since the last
+commit-graph refresh.
+
+All reachable INFINITY-generation commits are visited before any
+finite-generation commit, because INFINITY is larger than any finite
+value. Once the walk crosses into the finite region, it stays there.
+
+In the finite region, generation ordering guarantees topological
+traversal: children are always visited before their parents. This
+means that paint on already-visited commits is final -- no future
+traversal step can add paint to them.
+
+In the INFINITY region, commit-date ordering can violate this: a
+parent with a later date can be visited before a child with an earlier
+date. Paint flags are therefore NOT final at visit time, and a
+commit visited with only one side's paint may later gain the other.
+
+Paint flags are only added, never removed. Since each flag can be set
+at most once per commit, the number of times a commit can be
+re-enqueued is bounded by the number of flag transitions.
+
+Termination
+-----------
+
+The walk uses a `nonstale_queue` wrapper around `prio_queue` that
+tracks `max_nonstale`: the lowest-priority non-stale commit enqueued
+so far. Once that commit is dequeued, every remaining entry is known
+to be STALE and the loop terminates. Specifically, the main loop
+ends when one of the following conditions holds:
+
+ 1. The queue is empty.
+ 2. `max_nonstale` has been dequeued, meaning the queue only contains
+ STALE entries.
+
+Stale entry condition
+~~~~~~~~~~~~~~~~~~~~~
+Once all queued entries are stale, no new merge-base candidates can
+be discovered -- that requires at least one non-stale commit from
+each side meeting. Continuing the walk could still invalidate
+existing candidates by proving one is an ancestor of another, but
+`remove_redundant()` handles that as a post-processing step, so it
+is safe to exit early.
+
+Related documentation
+---------------------
+
+ - `Documentation/technical/commit-graph.adoc` -- generation numbers
+ and the reachability closure property.
diff --git a/commit-reach.c b/commit-reach.c
index 5df471a313..a9483759e0 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -96,7 +96,11 @@ static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue)
return commit;
}
-/* all input commits in one and twos[] must have been parsed! */
+/*
+ * See Documentation/technical/paint-down-to-common.adoc
+ *
+ * All input commits in one and twos[] must have been parsed!
+ */
static int paint_down_to_common(struct repository *r,
struct commit *one, int n,
struct commit **twos,
--
gitgitgadget
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v2 2/7] t6600: add test cases for side-exhaustion edge cases
2026-06-24 12:14 ` [PATCH v2 0/7] " Kristofer Karlsson via GitGitGadget
2026-06-24 12:14 ` [PATCH v2 1/7] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget
@ 2026-06-24 12:14 ` Elijah Newren via GitGitGadget
2026-06-24 13:43 ` Derrick Stolee
2026-06-24 12:14 ` [PATCH v2 3/7] t6099, t6600: add side-exhaustion regression tests Kristofer Karlsson via GitGitGadget
` (6 subsequent siblings)
8 siblings, 1 reply; 51+ messages in thread
From: Elijah Newren via GitGitGadget @ 2026-06-24 12:14 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Elijah Newren
From: Elijah Newren <newren@gmail.com>
Add test cases to t6600-test-reach.sh that exercise edge cases in the
side-exhaustion optimization for paint_down_to_common():
- in_merge_bases_many:self: commit is both A and one of the X inputs
- get_merge_bases_many:duplicate-twos: duplicate entries in X list
- get_merge_bases_many:pending-stale: STALE transition on an
already-painted commit (ps-* diamond topology)
- get_merge_bases_many:infinity-both-sides: both tips outside the
commit-graph with non-monotonic dates (pi-* topology)
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
t/t6600-test-reach.sh | 111 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 111 insertions(+)
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index b5b314e570..c2e091aad1 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -49,6 +49,62 @@ test_expect_success 'setup' '
git tag -a -m "$x-$i" tag-$x-$i commit-$x-$i || return 1
done
done &&
+
+ # Build a small side topology to exercise the (PARENT1|PARENT2) ->
+ # (PARENT1|PARENT2|STALE) transition in paint_down_to_common(); the
+ # 10x10 grid above does not exercise it because no merge-base candidate
+ # there is a descendant of another, so STALE never reaches a
+ # still-pending candidate.
+ #
+ # ps-X
+ # /|\
+ # / | \
+ # ps-Z ps-B ps-W
+ # | / \ |
+ # | / \ |
+ # |/ \|
+ # ps-T1 ps-T2
+ #
+ # where ps-T1=merge(ps-Z,ps-B), ps-T2=merge(ps-W,ps-B), so
+ # merge-base(ps-T1,ps-T2) = ps-B. During the walk, ps-X transitions
+ # to (PARENT1|PARENT2) via ps-Z and ps-W before ps-B is dequeued;
+ # then the STALE-walk from ps-B transitions ps-X to
+ # (PARENT1|PARENT2|STALE).
+ git checkout --orphan ps-orphan &&
+ test_commit ps-X &&
+ git checkout -b ps-B-br ps-X && test_commit ps-B &&
+ git checkout -b ps-Z-br ps-X && test_commit ps-Z &&
+ git checkout -b ps-W-br ps-X && test_commit ps-W &&
+ git checkout -b ps-T1 ps-Z &&
+ git merge --no-ff -m ps-T1 ps-B &&
+ git checkout -b ps-T2 ps-W &&
+ git merge --no-ff -m ps-T2 ps-B &&
+
+ # Build a side topology that lives entirely outside the half
+ # commit-graph and has non-monotonic commit dates, to exercise the
+ # INFINITY-gate in paint_down_to_common. With both tips outside
+ # the graph, generation is INFINITY and the queue falls back to
+ # commit-date order, which here is non-monotonic.
+ #
+ # pi-X (date 500, PARENT1 tip) --> pi-P, pi-D
+ # pi-D (date 480) --> pi-C
+ # pi-C (date 200) --> pi-B
+ # pi-B (date 100, PARENT2 tip) --> pi-P
+ # pi-P (date 450, root)
+ #
+ # merge-base(pi-X, pi-B) = pi-B (it is an ancestor of pi-X and is
+ # itself one of the queried tips).
+ git checkout --orphan pi-orphan &&
+ test_commit --date "@450 +0000" pi-P &&
+ test_commit --date "@100 +0000" pi-B &&
+ test_commit --date "@200 +0000" pi-C &&
+ test_commit --date "@480 +0000" pi-D &&
+ GIT_AUTHOR_DATE="@500 +0000" GIT_COMMITTER_DATE="@500 +0000" \
+ git commit-tree -p pi-D -p pi-P -m pi-X pi-D^{tree} >pi-X-oid &&
+ pi_x="$(cat pi-X-oid)" &&
+ git branch -f pi-X-br "$pi_x" &&
+ git tag pi-X "$pi_x" &&
+
git commit-graph write --reachable &&
mv .git/objects/info/commit-graph commit-graph-full &&
chmod u+w commit-graph-full &&
@@ -146,6 +202,16 @@ test_expect_success 'in_merge_bases_many:miss-heuristic' '
test_all_modes in_merge_bases_many
'
+test_expect_success 'in_merge_bases_many:self' '
+ cat >input <<-\EOF &&
+ A:commit-6-8
+ X:commit-5-9
+ X:commit-6-8
+ EOF
+ echo "in_merge_bases_many(A,X):1" >expect &&
+ test_all_modes in_merge_bases_many
+'
+
test_expect_success 'is_descendant_of:hit' '
cat >input <<-\EOF &&
A:commit-5-7
@@ -183,6 +249,51 @@ test_expect_success 'get_merge_bases_many' '
test_all_modes get_merge_bases_many
'
+test_expect_success 'get_merge_bases_many:duplicate-twos' '
+ cat >input <<-\EOF &&
+ A:commit-5-7
+ X:commit-4-8
+ X:commit-4-8
+ X:commit-6-6
+ X:commit-6-6
+ X:commit-8-3
+ EOF
+ {
+ echo "get_merge_bases_many(A,X):" &&
+ git rev-parse commit-5-6 \
+ commit-4-7 | sort
+ } >expect &&
+ test_all_modes get_merge_bases_many
+'
+
+test_expect_success 'get_merge_bases_many:pending-stale' '
+ # Exercises the (PARENT1|PARENT2) -> (...|STALE) transition path in
+ # paint_down_to_common(). See the topology comment in the setup test.
+ cat >input <<-\EOF &&
+ A:ps-T1
+ X:ps-T2
+ EOF
+ {
+ echo "get_merge_bases_many(A,X):" &&
+ git rev-parse ps-B
+ } >expect &&
+ test_all_modes get_merge_bases_many
+'
+
+test_expect_success 'get_merge_bases_many:infinity-both-sides' '
+ # Exercises the push-time INFINITY-gate in paint_down_to_common(). See
+ # the pi-* topology comment in the setup test.
+ cat >input <<-\EOF &&
+ A:pi-X
+ X:pi-B
+ EOF
+ {
+ echo "get_merge_bases_many(A,X):" &&
+ git rev-parse pi-B
+ } >expect &&
+ test_all_modes get_merge_bases_many
+'
+
test_expect_success 'reduce_heads' '
cat >input <<-\EOF &&
X:commit-1-10
--
gitgitgadget
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v2 3/7] t6099, t6600: add side-exhaustion regression tests
2026-06-24 12:14 ` [PATCH v2 0/7] " Kristofer Karlsson via GitGitGadget
2026-06-24 12:14 ` [PATCH v2 1/7] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget
2026-06-24 12:14 ` [PATCH v2 2/7] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget
@ 2026-06-24 12:14 ` Kristofer Karlsson via GitGitGadget
2026-06-24 12:14 ` [PATCH v2 4/7] commit-reach: add trace2 instrumentation to paint_down_to_common() Kristofer Karlsson via GitGitGadget
` (5 subsequent siblings)
8 siblings, 0 replies; 51+ messages in thread
From: Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson,
Kristofer Karlsson
From: Kristofer Karlsson <krka@spotify.com>
Add t6099 to test the case where multiple merge-base candidates exist
and one is an ancestor of another. This exercises the side-exhaustion
optimization in paint_down_to_common together with the
remove_redundant safety net in get_merge_bases_many_0.
Add a mixed finite/INFINITY test to t6600 where one tip is outside
the commit-graph (INFINITY generation) and the other is inside.
This exercises the region transition: the walk starts in the
INFINITY region where side-exhaustion is disabled, then crosses
into the finite region where it can fire.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
t/meson.build | 1 +
t/t6099-merge-base-side-exhaustion.sh | 82 +++++++++++++++++++++++++++
t/t6600-test-reach.sh | 25 ++++++++
3 files changed, 108 insertions(+)
create mode 100755 t/t6099-merge-base-side-exhaustion.sh
diff --git a/t/meson.build b/t/meson.build
index 3219264fe7..ee6ebdffb9 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -786,6 +786,7 @@ integration_tests = [
't6041-bisect-submodule.sh',
't6050-replace.sh',
't6060-merge-index.sh',
+ 't6099-merge-base-side-exhaustion.sh',
't6100-rev-list-in-order.sh',
't6101-rev-parse-parents.sh',
't6102-rev-list-unexpected-objects.sh',
diff --git a/t/t6099-merge-base-side-exhaustion.sh b/t/t6099-merge-base-side-exhaustion.sh
new file mode 100755
index 0000000000..4f1e0d50ef
--- /dev/null
+++ b/t/t6099-merge-base-side-exhaustion.sh
@@ -0,0 +1,82 @@
+#!/bin/sh
+
+test_description='merge-base with ancestor among merge-base candidates
+
+Test that merge-base --all correctly handles cases where
+multiple merge-base candidates exist and one is an ancestor
+of another. The side-exhaustion optimization in
+paint_down_to_common may exit before STALE propagation
+removes the ancestor, but remove_redundant catches it.
+
+Graph shape (parents are below children):
+
+ A ----------- X
+ |\ /|
+ | B---------/ |
+ | | |
+ e2 \ f2
+ | | |
+ e1 d1 f1
+ \ | /
+ \ | /
+ \| /
+ C
+
+A and X are the two tips.
+B and C are both reachable from A and X.
+B reaches C through d1.
+Only B should appear in merge-base --all output.
+'
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+test_expect_success 'setup ancestor merge-base candidate' '
+ test_commit C &&
+
+ git checkout -b d-chain HEAD &&
+ test_commit d1 &&
+ test_commit B &&
+
+ git checkout -b e-path C &&
+ test_commit e1 &&
+ test_commit e2 &&
+
+ git checkout -b f-path C &&
+ test_commit f1 &&
+ test_commit f2 &&
+
+ git checkout -b branch-A e-path &&
+ test_merge A B &&
+
+ git checkout -b branch-X f-path &&
+ test_merge X B &&
+
+ git commit-graph write --reachable
+'
+
+test_expect_success 'merge-base --all excludes ancestor candidate' '
+ git rev-parse B >expected &&
+ git merge-base --all A X >actual &&
+ test_cmp expected actual
+'
+
+test_expect_success 'merge-base (single) finds shallowest' '
+ git rev-parse B >expected &&
+ git merge-base A X >actual &&
+ test_cmp expected actual
+'
+
+# Without commit-graph: generation numbers are INFINITY,
+# side-exhaustion optimization does not fire.
+test_expect_success 'merge-base --all without commit-graph' '
+ rm -f .git/objects/info/commit-graph &&
+ git rev-parse B >expected &&
+ git merge-base --all A X >actual &&
+ test_cmp expected actual
+'
+
+test_done
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index c2e091aad1..4b771b4c58 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -294,6 +294,31 @@ test_expect_success 'get_merge_bases_many:infinity-both-sides' '
test_all_modes get_merge_bases_many
'
+test_expect_success 'setup mixed finite/INFINITY topology' '
+ # Create a commit outside all saved commit-graph files so it always
+ # has INFINITY generation, while its parent (ps-X) is in the graph
+ # with a finite generation. Use the ps-* orphan topology so we do
+ # not pollute the grid-based rev-list tests.
+ git checkout ps-X &&
+ test_env GIT_TEST_COMMIT_GRAPH= test_commit pm-INF
+'
+
+test_expect_success 'get_merge_bases_many:mixed-finite-infinity' '
+ # One tip (pm-INF) is outside the commit-graph with INFINITY
+ # generation; the other (ps-B) is in the graph with finite
+ # generation. The walk starts in the INFINITY region and crosses
+ # into the finite region where side-exhaustion can fire.
+ cat >input <<-\EOF &&
+ A:pm-INF
+ X:ps-B
+ EOF
+ {
+ echo "get_merge_bases_many(A,X):" &&
+ git rev-parse ps-X
+ } >expect &&
+ test_all_modes get_merge_bases_many
+'
+
test_expect_success 'reduce_heads' '
cat >input <<-\EOF &&
X:commit-1-10
--
gitgitgadget
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v2 4/7] commit-reach: add trace2 instrumentation to paint_down_to_common()
2026-06-24 12:14 ` [PATCH v2 0/7] " Kristofer Karlsson via GitGitGadget
` (2 preceding siblings ...)
2026-06-24 12:14 ` [PATCH v2 3/7] t6099, t6600: add side-exhaustion regression tests Kristofer Karlsson via GitGitGadget
@ 2026-06-24 12:14 ` Kristofer Karlsson via GitGitGadget
2026-06-24 13:41 ` Derrick Stolee
2026-06-24 12:14 ` [PATCH v2 5/7] commit-reach: introduce struct paint_state with per-side counters Kristofer Karlsson via GitGitGadget
` (4 subsequent siblings)
8 siblings, 1 reply; 51+ messages in thread
From: Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson,
Kristofer Karlsson
From: Kristofer Karlsson <krka@spotify.com>
Add a step counter and trace2_data_intmax() call so that the number
of commits visited during the paint walk is observable via
GIT_TRACE2_PERF. This provides a way to measure the impact of
future optimizations without relying on wall-clock benchmarks alone.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
commit-reach.c | 5 +++++
t/t6600-test-reach.sh | 21 +++++++++++++++++++++
2 files changed, 26 insertions(+)
diff --git a/commit-reach.c b/commit-reach.c
index a9483759e0..f6a438550b 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -11,6 +11,7 @@
#include "tag.h"
#include "commit-reach.h"
#include "ewah/ewok.h"
+#include "trace2.h"
/* Remember to update object flag allocation in object.h */
#define PARENT1 (1u<<16)
@@ -112,6 +113,7 @@ static int paint_down_to_common(struct repository *r,
{ compare_commits_by_gen_then_commit_date }
};
int i;
+ int steps = 0;
timestamp_t last_gen = GENERATION_NUMBER_INFINITY;
struct commit_list **tail = result;
@@ -135,6 +137,7 @@ static int paint_down_to_common(struct repository *r,
struct commit_list *parents;
int flags;
timestamp_t generation = commit_graph_generation(commit);
+ steps++;
if (min_generation && generation > last_gen)
BUG("bad generation skip %"PRItime" > %"PRItime" at %s",
@@ -190,6 +193,8 @@ static int paint_down_to_common(struct repository *r,
}
clear_nonstale_queue(&queue);
+ trace2_data_intmax("paint_down_to_common", r,
+ "steps", steps);
commit_list_sort_by_date(result);
return 0;
}
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index 4b771b4c58..c1109fb42f 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -319,6 +319,27 @@ test_expect_success 'get_merge_bases_many:mixed-finite-infinity' '
test_all_modes get_merge_bases_many
'
+test_expect_success 'merge-base --all commit-walk steps' '
+ test_when_finished rm -rf .git/objects/info/commit-graph \
+ .git/objects/info/commit-graphs &&
+ rm -rf .git/objects/info/commit-graph \
+ .git/objects/info/commit-graphs &&
+
+ GIT_TRACE2_EVENT="$(pwd)/trace-none.txt" \
+ git merge-base --all commit-9-9 commit-9-1 >actual &&
+ test_trace2_data paint_down_to_common steps 81 <trace-none.txt &&
+
+ cp commit-graph-full .git/objects/info/commit-graph &&
+ GIT_TRACE2_EVENT="$(pwd)/trace-full.txt" \
+ git merge-base --all commit-9-9 commit-9-1 >actual &&
+ test_trace2_data paint_down_to_common steps 80 <trace-full.txt &&
+
+ cp commit-graph-half .git/objects/info/commit-graph &&
+ GIT_TRACE2_EVENT="$(pwd)/trace-half.txt" \
+ git merge-base --all commit-9-9 commit-9-1 >actual &&
+ test_trace2_data paint_down_to_common steps 81 <trace-half.txt
+'
+
test_expect_success 'reduce_heads' '
cat >input <<-\EOF &&
X:commit-1-10
--
gitgitgadget
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v2 5/7] commit-reach: introduce struct paint_state with per-side counters
2026-06-24 12:14 ` [PATCH v2 0/7] " Kristofer Karlsson via GitGitGadget
` (3 preceding siblings ...)
2026-06-24 12:14 ` [PATCH v2 4/7] commit-reach: add trace2 instrumentation to paint_down_to_common() Kristofer Karlsson via GitGitGadget
@ 2026-06-24 12:14 ` Kristofer Karlsson via GitGitGadget
2026-06-24 13:54 ` Derrick Stolee
2026-06-24 12:14 ` [PATCH v2 6/7] commit-reach: remove unused nonstale_queue dedup wrappers Kristofer Karlsson via GitGitGadget
` (3 subsequent siblings)
8 siblings, 1 reply; 51+ messages in thread
From: Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson,
Kristofer Karlsson
From: Kristofer Karlsson <krka@spotify.com>
Add a paint_state struct for use by paint_down_to_common() that
wraps a prio_queue with per-side commit counters. Each non-stale
queued commit occupies exactly one counter bucket based on its
paint flags: PARENT1-only, PARENT2-only, or both sides (a pending
merge-base candidate).
The counters are maintained by paint_count_update() which adjusts
the appropriate bucket by a signed delta. An exhaustive switch on
the paint+stale bits documents all valid flag combinations in one
place.
Convert paint_down_to_common() to use paint_state. The loop now
drains the queue via paint_queue_get() which returns NULL when all
counters reach zero, replacing the old pointer-based termination
(max_nonstale). This is equivalent behavior -- both conditions
detect that no non-stale entries remain.
The existing nonstale_queue is left in place for ahead_behind().
Step counts (via trace2 from the previous commit) are identical
before and after this refactoring, confirming no behavioral change.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
.../technical/paint-down-to-common.adoc | 9 +-
commit-reach.c | 93 ++++++++++++++++---
2 files changed, 82 insertions(+), 20 deletions(-)
diff --git a/Documentation/technical/paint-down-to-common.adoc b/Documentation/technical/paint-down-to-common.adoc
index c10d5d2887..0f4e1892a5 100644
--- a/Documentation/technical/paint-down-to-common.adoc
+++ b/Documentation/technical/paint-down-to-common.adoc
@@ -88,15 +88,12 @@ re-enqueued is bounded by the number of flag transitions.
Termination
-----------
-The walk uses a `nonstale_queue` wrapper around `prio_queue` that
-tracks `max_nonstale`: the lowest-priority non-stale commit enqueued
-so far. Once that commit is dequeued, every remaining entry is known
-to be STALE and the loop terminates. Specifically, the main loop
+The walk tracks the number of commits of each type in the queue
+(PARENT1-only, PARENT2-only, pending merge-base). The main loop
ends when one of the following conditions holds:
1. The queue is empty.
- 2. `max_nonstale` has been dequeued, meaning the queue only contains
- STALE entries.
+ 2. The queue contains only stale entries.
Stale entry condition
~~~~~~~~~~~~~~~~~~~~~
diff --git a/commit-reach.c b/commit-reach.c
index f6a438550b..bf102f5e28 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -97,6 +97,74 @@ static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue)
return commit;
}
+/*
+ * Priority queue with per-side commit counters for paint_down_to_common().
+ * Each non-stale queued commit occupies exactly one bucket: PARENT1-only,
+ * PARENT2-only, or both (a pending merge-base candidate).
+ */
+struct paint_state {
+ struct prio_queue queue;
+ int p1_count;
+ int p2_count;
+ int pending_merge_bases;
+};
+
+static void paint_count_update(struct paint_state *state,
+ unsigned flags, int delta)
+{
+ switch (flags & (PARENT1 | PARENT2 | STALE)) {
+ case PARENT1:
+ state->p1_count += delta;
+ break;
+
+ case PARENT2:
+ state->p2_count += delta;
+ break;
+
+ case PARENT1 | PARENT2:
+ state->pending_merge_bases += delta;
+ break;
+
+ case PARENT1 | PARENT2 | STALE:
+ break;
+
+ default:
+ BUG("unexpected paint state");
+ }
+}
+
+static void paint_queue_put(struct paint_state *state,
+ struct commit *c, unsigned add_flags)
+{
+ unsigned old_flags = c->object.flags;
+ c->object.flags |= add_flags;
+
+ if (old_flags & ENQUEUED) {
+ paint_count_update(state, old_flags, -1);
+ paint_count_update(state, c->object.flags, 1);
+ } else {
+ c->object.flags |= ENQUEUED;
+ prio_queue_put(&state->queue, c);
+ paint_count_update(state, c->object.flags, 1);
+ }
+}
+
+static struct commit *paint_queue_get(struct paint_state *state)
+{
+ struct commit *commit;
+
+ if (!state->p1_count && !state->p2_count &&
+ !state->pending_merge_bases)
+ return NULL;
+
+ commit = prio_queue_get(&state->queue);
+ if (commit) {
+ commit->object.flags &= ~ENQUEUED;
+ paint_count_update(state, commit->object.flags, -1);
+ }
+ return commit;
+}
+
/*
* See Documentation/technical/paint-down-to-common.adoc
*
@@ -109,31 +177,29 @@ static int paint_down_to_common(struct repository *r,
enum merge_base_flags mb_flags,
struct commit_list **result)
{
- struct nonstale_queue queue = {
- { compare_commits_by_gen_then_commit_date }
+ struct paint_state state = {
+ .queue = { compare_commits_by_gen_then_commit_date }
};
+ struct commit *commit;
int i;
int steps = 0;
timestamp_t last_gen = GENERATION_NUMBER_INFINITY;
struct commit_list **tail = result;
if (!min_generation && !corrected_commit_dates_enabled(r))
- queue.pq.compare = compare_commits_by_commit_date;
+ state.queue.compare = compare_commits_by_commit_date;
one->object.flags |= PARENT1;
if (!n) {
commit_list_append(one, result);
return 0;
}
- nonstale_queue_put_dedup(&queue, one);
+ paint_queue_put(&state, one, 0);
- for (i = 0; i < n; i++) {
- twos[i]->object.flags |= PARENT2;
- nonstale_queue_put_dedup(&queue, twos[i]);
- }
+ for (i = 0; i < n; i++)
+ paint_queue_put(&state, twos[i], PARENT2);
- while (queue.max_nonstale) {
- struct commit *commit = nonstale_queue_get_dedup(&queue);
+ while ((commit = paint_queue_get(&state))) {
struct commit_list *parents;
int flags;
timestamp_t generation = commit_graph_generation(commit);
@@ -172,7 +238,7 @@ static int paint_down_to_common(struct repository *r,
if ((p->object.flags & flags) == flags)
continue;
if (repo_parse_commit(r, p)) {
- clear_nonstale_queue(&queue);
+ clear_prio_queue(&state.queue);
commit_list_free(*result);
*result = NULL;
/*
@@ -187,12 +253,11 @@ static int paint_down_to_common(struct repository *r,
return error(_("could not parse commit %s"),
oid_to_hex(&p->object.oid));
}
- p->object.flags |= flags;
- nonstale_queue_put_dedup(&queue, p);
+ paint_queue_put(&state, p, flags);
}
}
- clear_nonstale_queue(&queue);
+ clear_prio_queue(&state.queue);
trace2_data_intmax("paint_down_to_common", r,
"steps", steps);
commit_list_sort_by_date(result);
--
gitgitgadget
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v2 6/7] commit-reach: remove unused nonstale_queue dedup wrappers
2026-06-24 12:14 ` [PATCH v2 0/7] " Kristofer Karlsson via GitGitGadget
` (4 preceding siblings ...)
2026-06-24 12:14 ` [PATCH v2 5/7] commit-reach: introduce struct paint_state with per-side counters Kristofer Karlsson via GitGitGadget
@ 2026-06-24 12:14 ` Kristofer Karlsson via GitGitGadget
2026-06-24 13:55 ` Derrick Stolee
2026-06-24 12:14 ` [PATCH v2 7/7] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget
` (2 subsequent siblings)
8 siblings, 1 reply; 51+ messages in thread
From: Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson,
Kristofer Karlsson
From: Kristofer Karlsson <krka@spotify.com>
nonstale_queue_put_dedup() and nonstale_queue_get_dedup() became
unused after the previous commit. The core nonstale_queue functions
remain in use by ahead_behind().
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
commit-reach.c | 18 ------------------
1 file changed, 18 deletions(-)
diff --git a/commit-reach.c b/commit-reach.c
index bf102f5e28..e0d9874f99 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -79,24 +79,6 @@ static void clear_nonstale_queue(struct nonstale_queue *queue)
queue->max_nonstale = NULL;
}
-static void nonstale_queue_put_dedup(struct nonstale_queue *queue,
- struct commit *c)
-{
- if (c->object.flags & ENQUEUED)
- return;
- c->object.flags |= ENQUEUED;
- nonstale_queue_put(queue, c);
-}
-
-static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue)
-{
- struct commit *commit = nonstale_queue_get(queue);
-
- if (commit)
- commit->object.flags &= ~ENQUEUED;
- return commit;
-}
-
/*
* Priority queue with per-side commit counters for paint_down_to_common().
* Each non-stale queued commit occupies exactly one bucket: PARENT1-only,
--
gitgitgadget
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v2 7/7] commit-reach: terminate merge-base walk when one paint side is exhausted
2026-06-24 12:14 ` [PATCH v2 0/7] " Kristofer Karlsson via GitGitGadget
` (5 preceding siblings ...)
2026-06-24 12:14 ` [PATCH v2 6/7] commit-reach: remove unused nonstale_queue dedup wrappers Kristofer Karlsson via GitGitGadget
@ 2026-06-24 12:14 ` Kristofer Karlsson via GitGitGadget
2026-06-24 14:02 ` Derrick Stolee
2026-06-24 13:34 ` [PATCH v2 0/7] commit-reach: terminate merge-base walk when one " Derrick Stolee
2026-06-24 14:09 ` Derrick Stolee
8 siblings, 1 reply; 51+ messages in thread
From: Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson,
Kristofer Karlsson
From: Kristofer Karlsson <krka@spotify.com>
Add an early termination check to paint_down_to_common() using the
per-side counters introduced earlier. Once the walk enters the
finite-generation region, terminate early when one side's exclusive
count drops to zero -- no new merge-base can form without both paint
sides meeting.
The check also waits for pending_merge_bases to reach zero, ensuring
all merge-base candidates have been dequeued and recorded before
exiting.
The INFINITY gate ensures correctness: commits without a commit-graph
entry have GENERATION_NUMBER_INFINITY and are ordered by commit date,
which is not topologically reliable. The optimization only fires
once the walk enters the finite-generation region where ordering
guarantees hold.
Step counts measured with trace2 on git.git with commit-graph:
merge-base --all v2.0.0 v2.55.0-rc1:
before: 72264 steps after: 44589 steps
merge-base --all v2.55.0-rc1 v2.55.0-rc1~5:
before: 110 steps after: 7 steps
Helped-by: Derrick Stolee <stolee@gmail.com>
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
.../technical/paint-down-to-common.adoc | 17 ++++++++++++
commit-reach.c | 27 ++++++++++++++-----
t/t6600-test-reach.sh | 4 +--
3 files changed, 39 insertions(+), 9 deletions(-)
diff --git a/Documentation/technical/paint-down-to-common.adoc b/Documentation/technical/paint-down-to-common.adoc
index 0f4e1892a5..983dfcf233 100644
--- a/Documentation/technical/paint-down-to-common.adoc
+++ b/Documentation/technical/paint-down-to-common.adoc
@@ -94,6 +94,9 @@ ends when one of the following conditions holds:
1. The queue is empty.
2. The queue contains only stale entries.
+ 3. Side exhaustion: no pure PARENT1 or pure PARENT2 commits
+ remain in the queue, no pending merge-base candidates exist,
+ and the walk has entered the finite-generation region.
Stale entry condition
~~~~~~~~~~~~~~~~~~~~~
@@ -104,6 +107,20 @@ existing candidates by proving one is an ancestor of another, but
`remove_redundant()` handles that as a post-processing step, so it
is safe to exit early.
+Side-exhaustion condition
+~~~~~~~~~~~~~~~~~~~~~~~~~
+A new merge-base requires commits from both sides to meet. When one
+side's exclusive counter reaches zero and there are no pending
+merge-base candidates, no future traversal step can produce a new
+candidate.
+
+This optimization only activates in the finite-generation region
+where topological ordering holds. In that region, children are
+always visited before parents, so paint flags are final at visit
+time and an exhausted side cannot reappear. In the INFINITY region,
+commit-date ordering can violate this guarantee, so the check is
+skipped.
+
Related documentation
---------------------
diff --git a/commit-reach.c b/commit-reach.c
index e0d9874f99..f79d0b64d6 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -133,17 +133,30 @@ static void paint_queue_put(struct paint_state *state,
static struct commit *paint_queue_get(struct paint_state *state)
{
- struct commit *commit;
+ struct commit *commit = prio_queue_get(&state->queue);
- if (!state->p1_count && !state->p2_count &&
- !state->pending_merge_bases)
+ if (!commit)
return NULL;
- commit = prio_queue_get(&state->queue);
- if (commit) {
- commit->object.flags &= ~ENQUEUED;
- paint_count_update(state, commit->object.flags, -1);
+ commit->object.flags &= ~ENQUEUED;
+
+ if (!state->pending_merge_bases) {
+ if (!state->p1_count && !state->p2_count)
+ return NULL;
+ /*
+ * Side exhaustion: a new merge-base can only form
+ * when both PARENT1-only and PARENT2-only commits
+ * remain in the queue. In the finite-generation
+ * region the queue is ordered topologically, so
+ * no future step can add paint to visited commits
+ * and an exhausted side cannot reappear.
+ */
+ if ((!state->p1_count || !state->p2_count) &&
+ commit_graph_generation(commit) < GENERATION_NUMBER_INFINITY)
+ return NULL;
}
+
+ paint_count_update(state, commit->object.flags, -1);
return commit;
}
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index c1109fb42f..03175befb3 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -332,12 +332,12 @@ test_expect_success 'merge-base --all commit-walk steps' '
cp commit-graph-full .git/objects/info/commit-graph &&
GIT_TRACE2_EVENT="$(pwd)/trace-full.txt" \
git merge-base --all commit-9-9 commit-9-1 >actual &&
- test_trace2_data paint_down_to_common steps 80 <trace-full.txt &&
+ test_trace2_data paint_down_to_common steps 9 <trace-full.txt &&
cp commit-graph-half .git/objects/info/commit-graph &&
GIT_TRACE2_EVENT="$(pwd)/trace-half.txt" \
git merge-base --all commit-9-9 commit-9-1 >actual &&
- test_trace2_data paint_down_to_common steps 81 <trace-half.txt
+ test_trace2_data paint_down_to_common steps 57 <trace-half.txt
'
test_expect_success 'reduce_heads' '
--
gitgitgadget
^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: [PATCH v2 0/7] commit-reach: terminate merge-base walk when one side is exhausted
2026-06-24 12:14 ` [PATCH v2 0/7] " Kristofer Karlsson via GitGitGadget
` (6 preceding siblings ...)
2026-06-24 12:14 ` [PATCH v2 7/7] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget
@ 2026-06-24 13:34 ` Derrick Stolee
2026-06-24 14:25 ` Kristofer Karlsson
2026-06-24 14:09 ` Derrick Stolee
8 siblings, 1 reply; 51+ messages in thread
From: Derrick Stolee @ 2026-06-24 13:34 UTC (permalink / raw)
To: Kristofer Karlsson via GitGitGadget, git
Cc: Elijah Newren, Kristofer Karlsson
On 6/24/2026 8:14 AM, Kristofer Karlsson via GitGitGadget wrote:
> Benchmarks
>
> Step counts are deterministic (measured via trace2_data_intmax added in
> patch 4). Wall-clock times are medians over 10-20 runs with CPU governor set
> to performance.
>
> 2.6M-commit monorepo with commit-graph (baseline v2.55.0-rc1):
>
> steps wall-clock
> merge-base --all (across import) 2682391 -> 53521 7.26s -> 88ms
> merge-base --all (1000 apart) 2659607 -> 1106 6.98s -> 8ms
> merge-tree (across import) - 8.11s -> 100ms
>
>
> git.git (88k commits, commit-graph):
>
> steps wall-clock
> merge-base --all v2.0.0 v2.55.0-rc1 72264 -> 44589 82ms -> 49ms
> merge-base --all HEAD HEAD~1000 9873 -> 3817 19ms -> 9ms
> merge-base --all HEAD HEAD~10000 72285 -> 41523 80ms -> 48ms
> merge-base HEAD HEAD~1000 - 9ms -> 9ms
> merge-base --is-ancestor HEAD~1000 HEAD - 6ms -> 6ms
I like seeing these updates including the deterministic steps. Is there
a reason you don't include the step data for 'merge-tree (across import)'
in your monorepo case? The wall-clock is substantial, so it's not like the
last two examples in git.git where there may not be any difference.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2 4/7] commit-reach: add trace2 instrumentation to paint_down_to_common()
2026-06-24 12:14 ` [PATCH v2 4/7] commit-reach: add trace2 instrumentation to paint_down_to_common() Kristofer Karlsson via GitGitGadget
@ 2026-06-24 13:41 ` Derrick Stolee
2026-06-24 14:31 ` Kristofer Karlsson
0 siblings, 1 reply; 51+ messages in thread
From: Derrick Stolee @ 2026-06-24 13:41 UTC (permalink / raw)
To: Kristofer Karlsson via GitGitGadget, git
Cc: Elijah Newren, Kristofer Karlsson
On 6/24/2026 8:14 AM, Kristofer Karlsson via GitGitGadget wrote:
> From: Kristofer Karlsson <krka@spotify.com>
>
> Add a step counter and trace2_data_intmax() call so that the number
> of commits visited during the paint walk is observable via
> GIT_TRACE2_PERF. This provides a way to measure the impact of
> future optimizations without relying on wall-clock benchmarks alone.
> + trace2_data_intmax("paint_down_to_common", r,
> + "steps", steps);
This is great data. Very clearly marked for what we should be
doing here.
> +test_expect_success 'merge-base --all commit-walk steps' '
> + test_when_finished rm -rf .git/objects/info/commit-graph \
> + .git/objects/info/commit-graphs &&
(highlighting this chunk)
> + rm -rf .git/objects/info/commit-graph \
> + .git/objects/info/commit-graphs &&
> +
> + GIT_TRACE2_EVENT="$(pwd)/trace-none.txt" \
> + git merge-base --all commit-9-9 commit-9-1 >actual &&
> + test_trace2_data paint_down_to_common steps 81 <trace-none.txt &&
I'd rather see the whitespace line before the `rm` to make it
more obvious that it's setting up the "none" case.
> +
> + cp commit-graph-full .git/objects/info/commit-graph &&
> + GIT_TRACE2_EVENT="$(pwd)/trace-full.txt" \
> + git merge-base --all commit-9-9 commit-9-1 >actual &&
> + test_trace2_data paint_down_to_common steps 80 <trace-full.txt &&
> +
> + cp commit-graph-half .git/objects/info/commit-graph &&
> + GIT_TRACE2_EVENT="$(pwd)/trace-half.txt" \
> + git merge-base --all commit-9-9 commit-9-1 >actual &&
> + test_trace2_data paint_down_to_common steps 81 <trace-half.txt
> +'
> +
This test is a great example. I look forward to seeing that it
updates in the future.
One thing I was hoping to see was that your side-exhaustion tests
(from patch v2 2/7) would also include these checks so they are
more obviously updating when the implementation updates later.
One way to accomplish that is to reorder this patch before adding
those tests so their first version includes these checks and then
the values update when changing the implementation.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2 2/7] t6600: add test cases for side-exhaustion edge cases
2026-06-24 12:14 ` [PATCH v2 2/7] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget
@ 2026-06-24 13:43 ` Derrick Stolee
2026-06-24 14:33 ` Kristofer Karlsson
0 siblings, 1 reply; 51+ messages in thread
From: Derrick Stolee @ 2026-06-24 13:43 UTC (permalink / raw)
To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren, Kristofer Karlsson
On 6/24/2026 8:14 AM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
>
> Add test cases to t6600-test-reach.sh that exercise edge cases in the
> side-exhaustion optimization for paint_down_to_common():
>
> - in_merge_bases_many:self: commit is both A and one of the X inputs
> - get_merge_bases_many:duplicate-twos: duplicate entries in X list
> - get_merge_bases_many:pending-stale: STALE transition on an
> already-painted commit (ps-* diamond topology)
> - get_merge_bases_many:infinity-both-sides: both tips outside the
> commit-graph with non-monotonic dates (pi-* topology)
I'm happy that these cases now exist.
> +test_expect_success 'in_merge_bases_many:self' '
> + cat >input <<-\EOF &&
> + A:commit-6-8
> + X:commit-5-9
> + X:commit-6-8
> + EOF
> + echo "in_merge_bases_many(A,X):1" >expect &&
> + test_all_modes in_merge_bases_many
> +'
and using 'test_all_modes' is great to get coverage of all the
different commit-graph states. In reply to patch v2 4/7 I ask
to see the results of the traces in these kinds of test cases,
but each of these modes will have different values.
One way to make these tests have potential to check exact stats
without too much extra work would be to update 'test_all_modes'
to run each command with GIT_TRACE2_EVENT set to a known trace
file (reset each time) that can then be checked after verifying
that the results of each command is the same.
Then, these tests could have lines such as
test_trace2_data paint_down_to_common steps 20 <trace-full.txt &&
test_trace2_data paint_down_to_common steps 30 <trace-half.txt &&
test_trace2_data paint_down_to_common steps 40 <trace-none.txt
after the test_all_modes line.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2 5/7] commit-reach: introduce struct paint_state with per-side counters
2026-06-24 12:14 ` [PATCH v2 5/7] commit-reach: introduce struct paint_state with per-side counters Kristofer Karlsson via GitGitGadget
@ 2026-06-24 13:54 ` Derrick Stolee
2026-06-24 14:38 ` Kristofer Karlsson
0 siblings, 1 reply; 51+ messages in thread
From: Derrick Stolee @ 2026-06-24 13:54 UTC (permalink / raw)
To: Kristofer Karlsson via GitGitGadget, git
Cc: Elijah Newren, Kristofer Karlsson
On 6/24/2026 8:14 AM, Kristofer Karlsson via GitGitGadget wrote:
> Termination
> -----------
>
> -The walk uses a `nonstale_queue` wrapper around `prio_queue` that
> -tracks `max_nonstale`: the lowest-priority non-stale commit enqueued
> -so far. Once that commit is dequeued, every remaining entry is known
> -to be STALE and the loop terminates. Specifically, the main loop
> +The walk tracks the number of commits of each type in the queue
> +(PARENT1-only, PARENT2-only, pending merge-base). The main loop
> ends when one of the following conditions holds:
>
> 1. The queue is empty.
> - 2. `max_nonstale` has been dequeued, meaning the queue only contains
> - STALE entries.
> + 2. The queue contains only stale entries.
I'm grateful to see these changes happening to the doc in real-
time. I know it was extra work, but I'm grateful right now.
Hopefully future historians will also benefit from this effort.
> +static void paint_count_update(struct paint_state *state,
> + unsigned flags, int delta)
> +{
> + switch (flags & (PARENT1 | PARENT2 | STALE)) {
> + case PARENT1:
> + state->p1_count += delta;
> + break;
> +
> + case PARENT2:
> + state->p2_count += delta;
> + break;
> +
> + case PARENT1 | PARENT2:
> + state->pending_merge_bases += delta;
> + break;
> +
> + case PARENT1 | PARENT2 | STALE:
> + break;
> +
> + default:
> + BUG("unexpected paint state");
> + }
> +}
I like the use of 'delta' to allow reuse of this switch.
> +
> +static void paint_queue_put(struct paint_state *state,
> + struct commit *c, unsigned add_flags)
> +{
> + unsigned old_flags = c->object.flags;
> + c->object.flags |= add_flags;
> +
> + if (old_flags & ENQUEUED) {
> + paint_count_update(state, old_flags, -1);
> + paint_count_update(state, c->object.flags, 1);
> + } else {
> + c->object.flags |= ENQUEUED;
> + prio_queue_put(&state->queue, c);
> + paint_count_update(state, c->object.flags, 1);
> + }
> +}
ok: if we are already in the queue then we have old flags and
may need to subtract their values because they were counted
already. Otherwise, we need to queue it for the first time and
only add the values. Makes sense.
> +
> +static struct commit *paint_queue_get(struct paint_state *state)
> +{
Since we are going to make this a more complete termination
condition, we may want to make that very explicit with a doc-
comment. Something along the lines of "dequeue a commit when
possible, but also signal termination of the walk when we
conclude that no more merge bases will be discovered due to
internal state."
> @@ -187,12 +253,11 @@ static int paint_down_to_common(struct repository *r,
> return error(_("could not parse commit %s"),
> oid_to_hex(&p->object.oid));
> }
> - p->object.flags |= flags;
> - nonstale_queue_put_dedup(&queue, p);
> + paint_queue_put(&state, p, flags);
I like how this simplifies the flag-assignment logic somewhat.
You mentioned in your cover letter how the min_generation value
can add extra termination conditions. It may be a good idea to
insert min_generation into the paint_queue struct and make it a
termination condition for paint_queue_get(). If you consider this
direction, then I'd make it a separate patch on top of this one
_before_ adding the one-sided change. The extra tests that cover
the exact number of walked commits can help to guarantee the same
behavior, assuming that some of those tests check a non-zero
min_generation input. (It may be good to add such trace tests in
an earlier patch to help confidence in this case.)
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2 6/7] commit-reach: remove unused nonstale_queue dedup wrappers
2026-06-24 12:14 ` [PATCH v2 6/7] commit-reach: remove unused nonstale_queue dedup wrappers Kristofer Karlsson via GitGitGadget
@ 2026-06-24 13:55 ` Derrick Stolee
0 siblings, 0 replies; 51+ messages in thread
From: Derrick Stolee @ 2026-06-24 13:55 UTC (permalink / raw)
To: Kristofer Karlsson via GitGitGadget, git
Cc: Elijah Newren, Kristofer Karlsson
On 6/24/2026 8:14 AM, Kristofer Karlsson via GitGitGadget wrote:
> From: Kristofer Karlsson <krka@spotify.com>
>
> nonstale_queue_put_dedup() and nonstale_queue_get_dedup() became
> unused after the previous commit. The core nonstale_queue functions
> remain in use by ahead_behind().
This is a nice cleanup that makes the previous diff easier to
read. Thanks!
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2 7/7] commit-reach: terminate merge-base walk when one paint side is exhausted
2026-06-24 12:14 ` [PATCH v2 7/7] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget
@ 2026-06-24 14:02 ` Derrick Stolee
2026-06-24 14:47 ` Kristofer Karlsson
0 siblings, 1 reply; 51+ messages in thread
From: Derrick Stolee @ 2026-06-24 14:02 UTC (permalink / raw)
To: Kristofer Karlsson via GitGitGadget, git
Cc: Elijah Newren, Kristofer Karlsson
On 6/24/2026 8:14 AM, Kristofer Karlsson via GitGitGadget wrote:
> From: Kristofer Karlsson <krka@spotify.com>
>
> Add an early termination check to paint_down_to_common() using the
> per-side counters introduced earlier. Once the walk enters the
> finite-generation region, terminate early when one side's exclusive
> count drops to zero -- no new merge-base can form without both paint
> sides meeting.
Having this as the last patch is truly a nice climax moment for the
patch series!
> @@ -94,6 +94,9 @@ ends when one of the following conditions holds:
>
> 1. The queue is empty.
> 2. The queue contains only stale entries.
> + 3. Side exhaustion: no pure PARENT1 or pure PARENT2 commits
> + remain in the queue, no pending merge-base candidates exist,
> + and the walk has entered the finite-generation region.
...> +Side-exhaustion condition
> +~~~~~~~~~~~~~~~~~~~~~~~~~
> +A new merge-base requires commits from both sides to meet. When one
> +side's exclusive counter reaches zero and there are no pending
> +merge-base candidates, no future traversal step can produce a new
> +candidate.
> +
> +This optimization only activates in the finite-generation region
> +where topological ordering holds. In that region, children are
> +always visited before parents, so paint flags are final at visit
> +time and an exhausted side cannot reappear. In the INFINITY region,
> +commit-date ordering can violate this guarantee, so the check is
> +skipped.
> +
And these doc updates inline make me happy.
> Related documentation
> ---------------------
>
> diff --git a/commit-reach.c b/commit-reach.c
> index e0d9874f99..f79d0b64d6 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -133,17 +133,30 @@ static void paint_queue_put(struct paint_state *state,
>
> static struct commit *paint_queue_get(struct paint_state *state)
> {
> - struct commit *commit;
> + struct commit *commit = prio_queue_get(&state->queue);
>
> - if (!state->p1_count && !state->p2_count &&
> - !state->pending_merge_bases)
> + if (!commit)
> return NULL;
I see how the previous implementation has a termination condition
before calling prio_queue_get(), which is technically more
efficient. It does make this initial diff a bit more complicated
because we are moving the prio_queue_get() line.
If the introduction of the method in patch 5/7 looked like this:
+static struct commit *paint_queue_get(struct paint_state *state)
+{
+ struct commit *commit = prio_queue_get(&state->queue);
+
+ if (!commit)
+ return NULL;
+
+ if (!state->p1_count && !state->p2_count &&
+ !state->pending_merge_bases)
+ return NULL;
+
+ commit->object.flags &= ~ENQUEUED;
+ paint_count_update(state, commit->object.flags, -1);
+ return commit;
+}
Then this diff would look cleaner.
(This is the nittiest of nitpicks so feel free to ignore if this
doesn't bother you at all.)
> - commit = prio_queue_get(&state->queue);
> - if (commit) {
> - commit->object.flags &= ~ENQUEUED;
> - paint_count_update(state, commit->object.flags, -1);
> + commit->object.flags &= ~ENQUEUED;
> +
> + if (!state->pending_merge_bases) {
> + if (!state->p1_count && !state->p2_count)
> + return NULL;
> + /*
> + * Side exhaustion: a new merge-base can only form
> + * when both PARENT1-only and PARENT2-only commits
> + * remain in the queue. In the finite-generation
> + * region the queue is ordered topologically, so
> + * no future step can add paint to visited commits
> + * and an exhausted side cannot reappear.
> + */
> + if ((!state->p1_count || !state->p2_count) &&
> + commit_graph_generation(commit) < GENERATION_NUMBER_INFINITY)
> + return NULL;
> }
> +
> + paint_count_update(state, commit->object.flags, -1);
> return commit;
> }
I like how the crux of this implementation is entirely within
paint_queue_get() now.
> diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
> index c1109fb42f..03175befb3 100755
> --- a/t/t6600-test-reach.sh
> +++ b/t/t6600-test-reach.sh
> @@ -332,12 +332,12 @@ test_expect_success 'merge-base --all commit-walk steps' '
> cp commit-graph-full .git/objects/info/commit-graph &&
> GIT_TRACE2_EVENT="$(pwd)/trace-full.txt" \
> git merge-base --all commit-9-9 commit-9-1 >actual &&
> - test_trace2_data paint_down_to_common steps 80 <trace-full.txt &&
> + test_trace2_data paint_down_to_common steps 9 <trace-full.txt &&
>
> cp commit-graph-half .git/objects/info/commit-graph &&
> GIT_TRACE2_EVENT="$(pwd)/trace-half.txt" \
> git merge-base --all commit-9-9 commit-9-1 >actual &&
> - test_trace2_data paint_down_to_common steps 81 <trace-half.txt
> + test_trace2_data paint_down_to_common steps 57 <trace-half.txt
> '
I love to see these steps change. If you take my suggestion to
update more tests with these checks, then this diff will get bigger
(but in a deserved way).
Also, when I suggested that 'test_all_modes' creates the trace
files on our behalf, I forgot to mention that this specific test
that you added in patch 4/7 simplifies by running the merge-base
check under 'test_all_modes' and then checking the trace2 data
on the three well-known files afterwards.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2 0/7] commit-reach: terminate merge-base walk when one side is exhausted
2026-06-24 12:14 ` [PATCH v2 0/7] " Kristofer Karlsson via GitGitGadget
` (7 preceding siblings ...)
2026-06-24 13:34 ` [PATCH v2 0/7] commit-reach: terminate merge-base walk when one " Derrick Stolee
@ 2026-06-24 14:09 ` Derrick Stolee
8 siblings, 0 replies; 51+ messages in thread
From: Derrick Stolee @ 2026-06-24 14:09 UTC (permalink / raw)
To: Kristofer Karlsson via GitGitGadget, git
Cc: Elijah Newren, Kristofer Karlsson
On 6/24/2026 8:14 AM, Kristofer Karlsson via GitGitGadget wrote:
> commit-reach: terminate merge-base walk when one side is exhausted
>
> Optimize paint_down_to_common() for merge-base queries that hit large
> one-sided histories.
I completed my review of this version. All of my comments are around
either making the commit history a little cleaner or expanding the
tests that use the trace2 data.
I believe that this code is _correct_ and could be shipped as-is. My
comments are focused on making it the best that it could be, with an
eye towards a cleaner final result or a more robust test setup.
The most actionable things are:
1. You can add tracing before the new tests, allowing the new tests
to also check the step counts in their first versions and then
get updated in the final patch to demonstrate how they change
with that behavior change.
2. The t6600 helper 'test_all_modes' could set GIT_TRACE2_EVENT for
each mode into a different trace file that can be scanned later.
This will simplify your current tracing tests but also unlock
easier tracing like this in the future.
3. The termination condition depending on min_generation could be
refactored into paint_queue_get() to help make things even more
obvious as to when we terminate. This should help with your
concerns that you mentioned in response to patch 2/6 of the
previous version:
> I am not 100% happy with the halt-condition placement yet --
> the existing loop in master already has several exit paths
> (while condition, min_generation break, FIND_ALL break) and I
> think there is an opportunity to consolidate them. But that is
> a separate discussion and I do not want to derail this series.
> I can propose some alternatives in a follow-up after this
> lands.
I then have some super minor comments around making the diffs
even easier to read, but they could be ignored as they are very
nit-picky. It's the kind of detail that I would try to resolve
if I was the author, but I'm _not_. You are. Your time is
valuable so make your own conclusions as to whether you want to
go down that road. You've already entertained my ideas around
updating the docs as the implementation changes.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2 0/7] commit-reach: terminate merge-base walk when one side is exhausted
2026-06-24 13:34 ` [PATCH v2 0/7] commit-reach: terminate merge-base walk when one " Derrick Stolee
@ 2026-06-24 14:25 ` Kristofer Karlsson
0 siblings, 0 replies; 51+ messages in thread
From: Kristofer Karlsson @ 2026-06-24 14:25 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
On Wed, 24 Jun 2026 at 15:34, Derrick Stolee <stolee@gmail.com> wrote:
>
> I like seeing these updates including the deterministic steps. Is there
> a reason you don't include the step data for 'merge-tree (across import)'
> in your monorepo case? The wall-clock is substantial, so it's not like the
> last two examples in git.git where there may not be any difference.
I will have to attribute to laziness I suppose :)
I ran the initial benchmarks before adding the trace, and I didn't
update all of them,
just enough to show the improvement and value of the trace data.
I will ensure that I include all of it in the next version though
(maybe 1-2 days from now?) or maybe drop some of the benchmarks to
not overload with partly redundant information.
(merge-tree benchmarks doesn't perhaps add much significance on top
of merge-base in practice).
Thanks,
Kristofer
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2 4/7] commit-reach: add trace2 instrumentation to paint_down_to_common()
2026-06-24 13:41 ` Derrick Stolee
@ 2026-06-24 14:31 ` Kristofer Karlsson
0 siblings, 0 replies; 51+ messages in thread
From: Kristofer Karlsson @ 2026-06-24 14:31 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
On Wed, 24 Jun 2026 at 15:41, Derrick Stolee <stolee@gmail.com> wrote:
>
> (highlighting this chunk)
>
> > + rm -rf .git/objects/info/commit-graph \
> > + .git/objects/info/commit-graphs &&
> > +
> > + GIT_TRACE2_EVENT="$(pwd)/trace-none.txt" \
> > + git merge-base --all commit-9-9 commit-9-1 >actual &&
> > + test_trace2_data paint_down_to_common steps 81 <trace-none.txt &&
>
> I'd rather see the whitespace line before the `rm` to make it
> more obvious that it's setting up the "none" case.
Ah yes, good point, will fix.
> > + cp commit-graph-full .git/objects/info/commit-graph &&
> > + GIT_TRACE2_EVENT="$(pwd)/trace-full.txt" \
> > + git merge-base --all commit-9-9 commit-9-1 >actual &&
> > + test_trace2_data paint_down_to_common steps 80 <trace-full.txt &&
> > +
> > + cp commit-graph-half .git/objects/info/commit-graph &&
> > + GIT_TRACE2_EVENT="$(pwd)/trace-half.txt" \
> > + git merge-base --all commit-9-9 commit-9-1 >actual &&
> > + test_trace2_data paint_down_to_common steps 81 <trace-half.txt
> > +'
> > +
>
> This test is a great example. I look forward to seeing that it
> updates in the future.
>
> One thing I was hoping to see was that your side-exhaustion tests
> (from patch v2 2/7) would also include these checks so they are
> more obviously updating when the implementation updates later.
I was internally contemplating how much I should introduce the steps
validation to existing tests. My worry was that it might make tests
fragile - for example I repeatedly got some off-by-one changes
after refactoring the halt condition slightly (differs depending on
adding the halts solely within paint_queue_get or having it at the end
of the loop) and I think potentially other future work could affect it.
But I'm happy to attach the steps checks for more relevant tests,
it's not much work to change.
> One way to accomplish that is to reorder this patch before adding
> those tests so their first version includes these checks and then
> the values update when changing the implementation.
I was thinking I could keep the same order, but the patch to introduce
the trace could also modify the tests at the same time - that would
perhaps make it even more clear. Also this means I could avoid
making changes to Elijah's commit that I already _partly_ butchered
(extracted the test change as-is, but dropped the other file changes)
and I don't want to make that one more unclean.
Thanks,
Kristofer
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2 2/7] t6600: add test cases for side-exhaustion edge cases
2026-06-24 13:43 ` Derrick Stolee
@ 2026-06-24 14:33 ` Kristofer Karlsson
0 siblings, 0 replies; 51+ messages in thread
From: Kristofer Karlsson @ 2026-06-24 14:33 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Elijah Newren via GitGitGadget, git, Elijah Newren
On Wed, 24 Jun 2026 at 15:43, Derrick Stolee <stolee@gmail.com> wrote:
>
> One way to make these tests have potential to check exact stats
> without too much extra work would be to update 'test_all_modes'
> to run each command with GIT_TRACE2_EVENT set to a known trace
> file (reset each time) that can then be checked after verifying
> that the results of each command is the same.
>
> Then, these tests could have lines such as
>
> test_trace2_data paint_down_to_common steps 20 <trace-full.txt &&
> test_trace2_data paint_down_to_common steps 30 <trace-half.txt &&
> test_trace2_data paint_down_to_common steps 40 <trace-none.txt
>
> after the test_all_modes line.
That does indeed look quite clean, I will try to massage the tests to
utilize that, though I haven't fully worked it through in my head yet so
I am not sure if/where I would get stuck. :)
Thanks,
Kristofer
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2 5/7] commit-reach: introduce struct paint_state with per-side counters
2026-06-24 13:54 ` Derrick Stolee
@ 2026-06-24 14:38 ` Kristofer Karlsson
0 siblings, 0 replies; 51+ messages in thread
From: Kristofer Karlsson @ 2026-06-24 14:38 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
On Wed, 24 Jun 2026 at 15:54, Derrick Stolee <stolee@gmail.com> wrote:
>
> I'm grateful to see these changes happening to the doc in real-
> time. I know it was extra work, but I'm grateful right now.
>
> Hopefully future historians will also benefit from this effort.
It was honestly not bad at all, and I agree it felt quite nice to
see how the doc naturally changed along with the implementation.
> > +static struct commit *paint_queue_get(struct paint_state *state)
> > +{
>
> Since we are going to make this a more complete termination
> condition, we may want to make that very explicit with a doc-
> comment. Something along the lines of "dequeue a commit when
> possible, but also signal termination of the walk when we
> conclude that no more merge bases will be discovered due to
> internal state."
Yes, I'll make sure to clean that part up more, maybe also
rename the function to be more descriptive.
> You mentioned in your cover letter how the min_generation value
> can add extra termination conditions. It may be a good idea to
> insert min_generation into the paint_queue struct and make it a
> termination condition for paint_queue_get(). If you consider this
> direction, then I'd make it a separate patch on top of this one
> _before_ adding the one-sided change. The extra tests that cover
> the exact number of walked commits can help to guarantee the same
> behavior, assuming that some of those tests check a non-zero
> min_generation input. (It may be good to add such trace tests in
> an earlier patch to help confidence in this case.)
I think I might wait with this - the patch series already feels
quite big, and I think it has a natural progression and finish now.
But I can definitely commit to following up later -- it would be a
smaller series that is easier to reason about, likely a single commit.
Thanks,
Kristofer
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2 7/7] commit-reach: terminate merge-base walk when one paint side is exhausted
2026-06-24 14:02 ` Derrick Stolee
@ 2026-06-24 14:47 ` Kristofer Karlsson
2026-06-24 15:07 ` Derrick Stolee
0 siblings, 1 reply; 51+ messages in thread
From: Kristofer Karlsson @ 2026-06-24 14:47 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
On Wed, 24 Jun 2026 at 16:02, Derrick Stolee <stolee@gmail.com> wrote:
>
> I see how the previous implementation has a termination condition
> before calling prio_queue_get(), which is technically more
> efficient. It does make this initial diff a bit more complicated
> because we are moving the prio_queue_get() line.
I was thinking the efficiency here does not matter in practice -
prio_queue_get() only returns NULL once, and all other times
where we keep looping we do need the value.
I agree it does get a bit complex though.
> If the introduction of the method in patch 5/7 looked like this:
>
> +static struct commit *paint_queue_get(struct paint_state *state)
> +{
> + struct commit *commit = prio_queue_get(&state->queue);
> +
> + if (!commit)
> + return NULL;
> +
> + if (!state->p1_count && !state->p2_count &&
> + !state->pending_merge_bases)
> + return NULL;
> +
> + commit->object.flags &= ~ENQUEUED;
> + paint_count_update(state, commit->object.flags, -1);
> + return commit;
> +}
>
> Then this diff would look cleaner.
>
> (This is the nittiest of nitpicks so feel free to ignore if this
> doesn't bother you at all.)
That's a good point. It doesn't technically bother me,
but it would be cleaner. The refactor commit would effectively
be looking into the future and prepare for it. I can change it for
the next version - my only thinking was that the current refactor
patch matched my original idea for how to best handle
the halt condition, but that did indeed change after this discussion.
> > - test_trace2_data paint_down_to_common steps 81 <trace-half.txt
> > + test_trace2_data paint_down_to_common steps 57 <trace-half.txt
> > '
> I love to see these steps change. If you take my suggestion to
> update more tests with these checks, then this diff will get bigger
> (but in a deserved way).
I will try to add them to some (but not all) tests since it's more
closely related to performance than correctness and I want to
avoid making too many tests overly fragile.
> Also, when I suggested that 'test_all_modes' creates the trace
> files on our behalf, I forgot to mention that this specific test
> that you added in patch 4/7 simplifies by running the merge-base
> check under 'test_all_modes' and then checking the trace2 data
> on the three well-known files afterwards.
That's a nice bonus, I will try to see if I can manage to utilize it.
Thanks,
Kristofer
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2 7/7] commit-reach: terminate merge-base walk when one paint side is exhausted
2026-06-24 14:47 ` Kristofer Karlsson
@ 2026-06-24 15:07 ` Derrick Stolee
0 siblings, 0 replies; 51+ messages in thread
From: Derrick Stolee @ 2026-06-24 15:07 UTC (permalink / raw)
To: Kristofer Karlsson
Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
On 6/24/2026 10:47 AM, Kristofer Karlsson wrote:
> On Wed, 24 Jun 2026 at 16:02, Derrick Stolee <stolee@gmail.com> wrote:
>>> - test_trace2_data paint_down_to_common steps 81 <trace-half.txt
>>> + test_trace2_data paint_down_to_common steps 57 <trace-half.txt
>>> '
>> I love to see these steps change. If you take my suggestion to
>> update more tests with these checks, then this diff will get bigger
>> (but in a deserved way).
>
> I will try to add them to some (but not all) tests since it's more
> closely related to performance than correctness and I want to
> avoid making too many tests overly fragile.
In this case, I think it's more about protecting all of our special-
cased termination conditions. The rigidity means that it is hard to
accidentally change the behavior. It does have the downside that
more tests need to change if there is an intentional change, but it
also gives the same _evidence_ that the change has the intended
impact.
We are definitely leaning into personal preferences, though. There
is no hard rule one way or another.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2 1/7] Documentation/technical: add paint-down-to-common doc
2026-06-24 12:14 ` [PATCH v2 1/7] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget
@ 2026-06-24 17:09 ` Junio C Hamano
0 siblings, 0 replies; 51+ messages in thread
From: Junio C Hamano @ 2026-06-24 17:09 UTC (permalink / raw)
To: Kristofer Karlsson via GitGitGadget
Cc: git, Derrick Stolee, Elijah Newren, Kristofer Karlsson
"Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
writes:
> From: Kristofer Karlsson <krka@spotify.com>
>
> Add a technical document describing the paint_down_to_common()
> algorithm used for merge-base computation, covering the paint
> walk, generation number regions, and termination conditions.
>
> Signed-off-by: Kristofer Karlsson <krka@spotify.com>
> ---
> Documentation/Makefile | 1 +
> Documentation/technical/meson.build | 1 +
> .../technical/paint-down-to-common.adoc | 114 ++++++++++++++++++
> commit-reach.c | 6 +-
> 4 files changed, 121 insertions(+), 1 deletion(-)
> create mode 100644 Documentation/technical/paint-down-to-common.adoc
Great write-up that very clearly and concisely explains what goes on
inside the merge-base computation. Thanks for a pleasant read.
^ permalink raw reply [flat|nested] 51+ messages in thread
end of thread, other threads:[~2026-06-24 17:09 UTC | newest]
Thread overview: 51+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-20 10:36 [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted Kristofer Karlsson via GitGitGadget
2026-06-20 10:36 ` [PATCH/RFC 1/6] commit-reach: decouple ahead_behind from nonstale_queue Kristofer Karlsson via GitGitGadget
2026-06-22 18:00 ` Derrick Stolee
2026-06-22 18:53 ` Kristofer Karlsson
2026-06-20 10:36 ` [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters Kristofer Karlsson via GitGitGadget
2026-06-22 18:10 ` Derrick Stolee
2026-06-22 19:14 ` Kristofer Karlsson
2026-06-22 20:23 ` Derrick Stolee
2026-06-23 10:13 ` Kristofer Karlsson
2026-06-23 13:50 ` Derrick Stolee
2026-06-23 14:09 ` Kristofer Karlsson
2026-06-23 14:17 ` Derrick Stolee
2026-06-24 11:25 ` Kristofer Karlsson
2026-06-20 10:36 ` [PATCH/RFC 3/6] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget
2026-06-22 18:12 ` Derrick Stolee
2026-06-22 19:19 ` Kristofer Karlsson
2026-06-22 20:26 ` Derrick Stolee
2026-06-22 21:03 ` Kristofer Karlsson
2026-06-23 13:40 ` Derrick Stolee
2026-06-20 10:36 ` [PATCH/RFC 4/6] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget
2026-06-22 18:15 ` Derrick Stolee
2026-06-22 19:25 ` Kristofer Karlsson
2026-06-22 20:28 ` Derrick Stolee
2026-06-20 10:36 ` [PATCH/RFC 5/6] t6099, t6600: add side-exhaustion regression tests Kristofer Karlsson via GitGitGadget
2026-06-22 18:16 ` Derrick Stolee
2026-06-20 10:36 ` [PATCH/RFC 6/6] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget
2026-06-22 18:21 ` Derrick Stolee
2026-06-22 19:30 ` Kristofer Karlsson
2026-06-22 18:22 ` [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted Derrick Stolee
2026-06-24 12:14 ` [PATCH v2 0/7] " Kristofer Karlsson via GitGitGadget
2026-06-24 12:14 ` [PATCH v2 1/7] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget
2026-06-24 17:09 ` Junio C Hamano
2026-06-24 12:14 ` [PATCH v2 2/7] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget
2026-06-24 13:43 ` Derrick Stolee
2026-06-24 14:33 ` Kristofer Karlsson
2026-06-24 12:14 ` [PATCH v2 3/7] t6099, t6600: add side-exhaustion regression tests Kristofer Karlsson via GitGitGadget
2026-06-24 12:14 ` [PATCH v2 4/7] commit-reach: add trace2 instrumentation to paint_down_to_common() Kristofer Karlsson via GitGitGadget
2026-06-24 13:41 ` Derrick Stolee
2026-06-24 14:31 ` Kristofer Karlsson
2026-06-24 12:14 ` [PATCH v2 5/7] commit-reach: introduce struct paint_state with per-side counters Kristofer Karlsson via GitGitGadget
2026-06-24 13:54 ` Derrick Stolee
2026-06-24 14:38 ` Kristofer Karlsson
2026-06-24 12:14 ` [PATCH v2 6/7] commit-reach: remove unused nonstale_queue dedup wrappers Kristofer Karlsson via GitGitGadget
2026-06-24 13:55 ` Derrick Stolee
2026-06-24 12:14 ` [PATCH v2 7/7] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget
2026-06-24 14:02 ` Derrick Stolee
2026-06-24 14:47 ` Kristofer Karlsson
2026-06-24 15:07 ` Derrick Stolee
2026-06-24 13:34 ` [PATCH v2 0/7] commit-reach: terminate merge-base walk when one " Derrick Stolee
2026-06-24 14:25 ` Kristofer Karlsson
2026-06-24 14:09 ` Derrick Stolee
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox