* [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted
@ 2026-06-20 10:36 Kristofer Karlsson via GitGitGadget
2026-06-20 10:36 ` [PATCH/RFC 1/6] commit-reach: decouple ahead_behind from nonstale_queue Kristofer Karlsson via GitGitGadget
` (7 more replies)
0 siblings, 8 replies; 84+ messages in thread
From: Kristofer Karlsson via GitGitGadget @ 2026-06-20 10:36 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson
Hi,
This follows up on my RFC [1] with a concrete proposal. I expect the design
to still be scrutinized, but that may be easier with actual code to look at.
I tried to make this easier to review by splitting into atomic patches. The
first two patches are the meatiest parts, though they are pure refactoring.
The behavior change is in patch 3 and is in itself quite small. The last
patch adds technical documentation to support future development.
----------------------------------------------------------------------------
Optimize paint_down_to_common() for merge-base queries that hit large
one-sided histories.
When the walk from one side reaches a commit with a very low generation
number that the other side never paints, the walk is forced to drain most of
the graph. A common trigger is a repository import that grafts a separate
history with its own root, but any merge that introduces a low-generation
commit never painted by the other side has the same effect.
A new merge-base candidate can only be discovered when exclusive PARENT1 and
PARENT2 paint meet. This series teaches paint_down_to_common() to stop as
soon as one side has no exclusive commits left in the queue; once one side
is exhausted, no further candidates can appear.
origin/HEAD o o PR HEAD
| |
(import) o :
/ \ /
| o merge-base
| |
: : (~2.5M commits)
| |
import root main root
In the RFC thread [1], Derrick Stolee provided a criss-cross counterexample
that sharpened the halt condition, and Elijah Newren independently
discovered the same optimization and shared an implementation in PR #2150
[2]. Patch 4 incorporates test cases from Elijah's branch.
This series implements the optimization only after the walk enters the
finite-generation region, where generation ordering guarantees that paint on
visited commits is final.
Patch layout:
1/6 commit-reach: decouple ahead_behind from nonstale_queue 2/6
commit-reach: introduce paint_queue and per-side counters 3/6 commit-reach:
stop the walk when one side is exhausted 4/6 t6600: add side-exhaustion
edge-case tests 5/6 t6099, t6600: add side-exhaustion regression tests 6/6
Documentation/technical: document paint_down_to_common()
Benchmarks
Measured on a 2.6M-commit monorepo with commit-graph (baseline v2.55-rc1):
merge-base --all (across import) 4.293s -> 8ms (537x)
merge-tree (across import) 5.345s -> 13ms (411x)
merge-base --all (1000 commits apart) 5.404s -> 7ms (772x)
No regression on linux.git (1.4M commits, commit-graph):
merge-base HEAD HEAD~1000 38ms -> 40ms
merge-base --all HEAD HEAD~1000 87ms -> 36ms
merge-base --is-ancestor HEAD~1000 HEAD 11ms -> 11ms
merge-base --all HEAD HEAD~10000 626ms -> 428ms
[1]
https://lore.kernel.org/git/CAL71e4Ps-2_0+uuZu43N9pFnXBemoAohPs_eyRJf8taXHJPAXQ@mail.gmail.com/T/#u
[2] https://github.com/gitgitgadget/git/pull/2150
Elijah Newren (1):
t6600: add test cases for side-exhaustion edge cases
Kristofer Karlsson (5):
commit-reach: decouple ahead_behind from nonstale_queue
commit-reach: introduce struct paint_queue with per-side counters
commit-reach: terminate merge-base walk when one paint side is
exhausted
t6099, t6600: add side-exhaustion regression tests
Documentation/technical: add paint-down-to-common doc
Documentation/Makefile | 1 +
Documentation/technical/meson.build | 1 +
.../technical/paint-down-to-common.adoc | 130 ++++++++++++++
commit-reach.c | 159 +++++++++++-------
t/meson.build | 1 +
t/t6099-merge-base-side-exhaustion.sh | 82 +++++++++
t/t6600-test-reach.sh | 136 +++++++++++++++
7 files changed, 451 insertions(+), 59 deletions(-)
create mode 100644 Documentation/technical/paint-down-to-common.adoc
create mode 100755 t/t6099-merge-base-side-exhaustion.sh
base-commit: 8d96f09e9245ddf80c1981476fcbac8c4bb4125f
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2149%2Fspkrka%2Fside-exhaust-pr-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2149/spkrka/side-exhaust-pr-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/2149
--
gitgitgadget
^ permalink raw reply [flat|nested] 84+ messages in thread* [PATCH/RFC 1/6] commit-reach: decouple ahead_behind from nonstale_queue 2026-06-20 10:36 [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted Kristofer Karlsson via GitGitGadget @ 2026-06-20 10:36 ` Kristofer Karlsson via GitGitGadget 2026-06-22 18:00 ` Derrick Stolee 2026-06-20 10:36 ` [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters Kristofer Karlsson via GitGitGadget ` (6 subsequent siblings) 7 siblings, 1 reply; 84+ messages in thread From: Kristofer Karlsson via GitGitGadget @ 2026-06-20 10:36 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Kristofer Karlsson From: Kristofer Karlsson <krka@spotify.com> Move ahead_behind() off the shared nonstale_queue abstraction to use a plain prio_queue with a local max_nonstale pointer. The nonstale tracking is inlined into insert_no_dup(). This prepares for replacing nonstale_queue with a paint_queue struct that tracks per-side commit counts, which ahead_behind() does not need. No behavior change. Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- commit-reach.c | 32 +++++++++++++++++++++----------- 1 file changed, 21 insertions(+), 11 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index 5df471a313..377a5cc42a 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -1089,12 +1089,18 @@ struct commit_list *get_reachable_subset(struct commit **from, size_t nr_from, define_commit_slab(bit_arrays, struct bitmap *); static struct bit_arrays bit_arrays; -static void insert_no_dup(struct nonstale_queue *queue, struct commit *c) +static void insert_no_dup(struct prio_queue *queue, + struct commit **max_nonstale, + struct commit *c) { if (c->object.flags & PARENT2) return; - nonstale_queue_put(queue, c); c->object.flags |= PARENT2; + prio_queue_put(queue, c); + if (!(c->object.flags & STALE) && + (!*max_nonstale || + queue->compare(*max_nonstale, c, queue->cb_data) <= 0)) + *max_nonstale = c; } static struct bitmap *get_bit_array(struct commit *c, int width) @@ -1118,9 +1124,10 @@ void ahead_behind(struct repository *r, struct commit **commits, size_t commits_nr, struct ahead_behind_count *counts, size_t counts_nr) { - struct nonstale_queue queue = { - { .compare = compare_commits_by_gen_then_commit_date } + struct prio_queue queue = { + .compare = compare_commits_by_gen_then_commit_date }; + struct commit *max_nonstale = NULL; size_t width = DIV_ROUND_UP(commits_nr, BITS_IN_EWORD); if (!commits_nr || !counts_nr) @@ -1140,14 +1147,17 @@ void ahead_behind(struct repository *r, struct bitmap *bitmap = get_bit_array(c, width); bitmap_set(bitmap, i); - insert_no_dup(&queue, c); + insert_no_dup(&queue, &max_nonstale, c); } - while (queue.max_nonstale) { - struct commit *c = nonstale_queue_get(&queue); + while (max_nonstale) { + struct commit *c = prio_queue_get(&queue); struct commit_list *p; struct bitmap *bitmap_c = get_bit_array(c, width); + if (c == max_nonstale) + max_nonstale = NULL; + for (size_t i = 0; i < counts_nr; i++) { int reach_from_tip = !!bitmap_get(bitmap_c, counts[i].tip_index); int reach_from_base = !!bitmap_get(bitmap_c, counts[i].base_index); @@ -1178,7 +1188,7 @@ void ahead_behind(struct repository *r, if (bitmap_popcount(bitmap_p) == commits_nr) p->item->object.flags |= STALE; - insert_no_dup(&queue, p->item); + insert_no_dup(&queue, &max_nonstale, p->item); } free_bit_array(c); @@ -1186,10 +1196,10 @@ void ahead_behind(struct repository *r, /* STALE is used here, PARENT2 is used by insert_no_dup(). */ repo_clear_commit_marks(r, PARENT2 | STALE); - for (size_t i = 0; i < queue.pq.nr; i++) - free_bit_array(queue.pq.array[i].data); + for (size_t i = 0; i < queue.nr; i++) + free_bit_array(queue.array[i].data); clear_bit_arrays(&bit_arrays); - clear_nonstale_queue(&queue); + clear_prio_queue(&queue); } struct commit_and_index { -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
* Re: [PATCH/RFC 1/6] commit-reach: decouple ahead_behind from nonstale_queue 2026-06-20 10:36 ` [PATCH/RFC 1/6] commit-reach: decouple ahead_behind from nonstale_queue Kristofer Karlsson via GitGitGadget @ 2026-06-22 18:00 ` Derrick Stolee 2026-06-22 18:53 ` Kristofer Karlsson 0 siblings, 1 reply; 84+ messages in thread From: Derrick Stolee @ 2026-06-22 18:00 UTC (permalink / raw) To: Kristofer Karlsson via GitGitGadget, git Cc: Elijah Newren, Kristofer Karlsson On 6/20/2026 6:36 AM, Kristofer Karlsson via GitGitGadget wrote: > From: Kristofer Karlsson <krka@spotify.com> > > Move ahead_behind() off the shared nonstale_queue abstraction to use > a plain prio_queue with a local max_nonstale pointer. The nonstale > tracking is inlined into insert_no_dup(). > > This prepares for replacing nonstale_queue with a paint_queue struct > that tracks per-side commit counts, which ahead_behind() does not > need. No behavior change. This change is only needed if we are intending to delete the nonstale queue struct, which is currently happening in your patch 2. But we are essentially recreating its logic in a more disjointed way here, leaving this code in a worse state. I'd rather see patch 2 create a _new_ data structure instead of _replacing_ one that already works for multiple callers. (It does drop to only one caller, but that seems cleaner to me right now.) Thanks, -Stolee ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH/RFC 1/6] commit-reach: decouple ahead_behind from nonstale_queue 2026-06-22 18:00 ` Derrick Stolee @ 2026-06-22 18:53 ` Kristofer Karlsson 0 siblings, 0 replies; 84+ messages in thread From: Kristofer Karlsson @ 2026-06-22 18:53 UTC (permalink / raw) To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren On Mon, 22 Jun 2026 at 20:00, Derrick Stolee <stolee@gmail.com> wrote: > > This change is only needed if we are intending to delete the nonstale > queue struct, which is currently happening in your patch 2. But we > are essentially recreating its logic in a more disjointed way here, > leaving this code in a worse state. > > I'd rather see patch 2 create a _new_ data structure instead of > _replacing_ one that already works for multiple callers. (It does > drop to only one caller, but that seems cleaner to me right now.) I can definitely do that and leave ahead_behind unchanged for v2. I was thinking that with only a single caller, and ahead_behind being simpler than paint_down in this respect, it would be worthwhile to simplify it, but if so I could instead do that as a standalone follow up (though it may prove to be not enough value for the win). Thanks, Kristofer ^ permalink raw reply [flat|nested] 84+ messages in thread
* [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters 2026-06-20 10:36 [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted Kristofer Karlsson via GitGitGadget 2026-06-20 10:36 ` [PATCH/RFC 1/6] commit-reach: decouple ahead_behind from nonstale_queue Kristofer Karlsson via GitGitGadget @ 2026-06-20 10:36 ` Kristofer Karlsson via GitGitGadget 2026-06-22 18:10 ` Derrick Stolee 2026-06-20 10:36 ` [PATCH/RFC 3/6] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget ` (5 subsequent siblings) 7 siblings, 1 reply; 84+ messages in thread From: Kristofer Karlsson via GitGitGadget @ 2026-06-20 10:36 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Kristofer Karlsson From: Kristofer Karlsson <krka@spotify.com> Replace the nonstale_queue abstraction in paint_down_to_common() with a new paint_queue struct that tracks per-side commit counts. Each non-stale queued commit occupies exactly one counter bucket based on its paint flags: PARENT1-only, PARENT2-only, or both sides (a pending merge-base candidate). The counters are maintained by paint_count_transition() which handles all flag changes as bucket transfers: remove from the old bucket, add to the new one. Either step is a no-op when the respective state has no bucket (stale or zero). The loop now drains the queue via paint_queue_get() and breaks when all counters reach zero, replacing the old pointer-based termination (max_nonstale). This is equivalent behavior. Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- commit-reach.c | 114 ++++++++++++++++++++++++++++--------------------- 1 file changed, 66 insertions(+), 48 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index 377a5cc42a..ba1e896f0f 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -41,58 +41,75 @@ static int compare_commits_by_gen(const void *_a, const void *_b) } /* - * A prio_queue with O(1) termination check. 'max_nonstale' tracks - * the lowest-priority non-stale commit enqueued so far; once it is - * popped, every remaining entry is known to be STALE. + * Priority queue with per-side commit counters for paint_down_to_common(). + * Each non-stale queued commit occupies exactly one bucket: PARENT1-only, + * PARENT2-only, or both (a pending merge-base candidate). */ -struct nonstale_queue { +struct paint_queue { struct prio_queue pq; - struct commit *max_nonstale; + int p1_count; + int p2_count; + int pending_merge_bases; }; -static void nonstale_queue_put(struct nonstale_queue *queue, - struct commit *c) +/* + * Adjust per-side counters for a paint-state transition. Non-stale + * commits are counted in one of three counters: PARENT1-only, + * PARENT2-only, or both. Zero means "not in the queue" (used on + * enqueue/dequeue); stale commits are not counted at all. + */ +static void paint_count_transition(struct paint_queue *queue, + unsigned old_flags, unsigned new_flags) { - struct commit *old = queue->max_nonstale; + unsigned old_paint = old_flags & (PARENT1 | PARENT2 | STALE); + unsigned new_paint = new_flags & (PARENT1 | PARENT2 | STALE); - prio_queue_put(&queue->pq, c); - if (c->object.flags & STALE) + if (old_paint == new_paint) return; - if (!old || queue->pq.compare(old, c, queue->pq.cb_data) <= 0) - queue->max_nonstale = c; -} - -static struct commit *nonstale_queue_get(struct nonstale_queue *queue) -{ - struct commit *commit = prio_queue_get(&queue->pq); - if (commit == queue->max_nonstale) - queue->max_nonstale = NULL; - - return commit; + if (!(old_paint & STALE)) { + switch (old_paint & (PARENT1 | PARENT2)) { + case 0: break; + case PARENT1: queue->p1_count--; break; + case PARENT2: queue->p2_count--; break; + case PARENT1 | PARENT2: queue->pending_merge_bases--; break; + default: BUG("unexpected paint state"); + } + } + if (!(new_paint & STALE)) { + switch (new_paint & (PARENT1 | PARENT2)) { + case 0: break; + case PARENT1: queue->p1_count++; break; + case PARENT2: queue->p2_count++; break; + case PARENT1 | PARENT2: queue->pending_merge_bases++; break; + default: BUG("unexpected paint state"); + } + } } -static void clear_nonstale_queue(struct nonstale_queue *queue) +static void paint_queue_put(struct paint_queue *queue, + struct commit *c, unsigned add_flags) { - clear_prio_queue(&queue->pq); - queue->max_nonstale = NULL; -} + unsigned old_flags = c->object.flags; + c->object.flags |= add_flags; -static void nonstale_queue_put_dedup(struct nonstale_queue *queue, - struct commit *c) -{ - if (c->object.flags & ENQUEUED) - return; - c->object.flags |= ENQUEUED; - nonstale_queue_put(queue, c); + if (old_flags & ENQUEUED) { + paint_count_transition(queue, old_flags, c->object.flags); + } else { + c->object.flags |= ENQUEUED; + prio_queue_put(&queue->pq, c); + paint_count_transition(queue, 0, c->object.flags); + } } -static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue) +static struct commit *paint_queue_get(struct paint_queue *queue) { - struct commit *commit = nonstale_queue_get(queue); + struct commit *commit = prio_queue_get(&queue->pq); - if (commit) + if (commit) { commit->object.flags &= ~ENQUEUED; + paint_count_transition(queue, commit->object.flags, 0); + } return commit; } @@ -104,9 +121,10 @@ static int paint_down_to_common(struct repository *r, enum merge_base_flags mb_flags, struct commit_list **result) { - struct nonstale_queue queue = { - { compare_commits_by_gen_then_commit_date } + struct paint_queue queue = { + .pq = { compare_commits_by_gen_then_commit_date } }; + struct commit *commit; int i; timestamp_t last_gen = GENERATION_NUMBER_INFINITY; struct commit_list **tail = result; @@ -119,15 +137,12 @@ static int paint_down_to_common(struct repository *r, commit_list_append(one, result); return 0; } - nonstale_queue_put_dedup(&queue, one); + paint_queue_put(&queue, one, 0); - for (i = 0; i < n; i++) { - twos[i]->object.flags |= PARENT2; - nonstale_queue_put_dedup(&queue, twos[i]); - } + for (i = 0; i < n; i++) + paint_queue_put(&queue, twos[i], PARENT2); - while (queue.max_nonstale) { - struct commit *commit = nonstale_queue_get_dedup(&queue); + while ((commit = paint_queue_get(&queue))) { struct commit_list *parents; int flags; timestamp_t generation = commit_graph_generation(commit); @@ -165,7 +180,7 @@ static int paint_down_to_common(struct repository *r, if ((p->object.flags & flags) == flags) continue; if (repo_parse_commit(r, p)) { - clear_nonstale_queue(&queue); + clear_prio_queue(&queue.pq); commit_list_free(*result); *result = NULL; /* @@ -180,12 +195,15 @@ static int paint_down_to_common(struct repository *r, return error(_("could not parse commit %s"), oid_to_hex(&p->object.oid)); } - p->object.flags |= flags; - nonstale_queue_put_dedup(&queue, p); + paint_queue_put(&queue, p, flags); } + + if (queue.p1_count + queue.p2_count + + queue.pending_merge_bases == 0) + break; } - clear_nonstale_queue(&queue); + clear_prio_queue(&queue.pq); commit_list_sort_by_date(result); return 0; } -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
* Re: [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters 2026-06-20 10:36 ` [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters Kristofer Karlsson via GitGitGadget @ 2026-06-22 18:10 ` Derrick Stolee 2026-06-22 19:14 ` Kristofer Karlsson 0 siblings, 1 reply; 84+ messages in thread From: Derrick Stolee @ 2026-06-22 18:10 UTC (permalink / raw) To: Kristofer Karlsson via GitGitGadget, git Cc: Elijah Newren, Kristofer Karlsson On 6/20/2026 6:36 AM, Kristofer Karlsson via GitGitGadget wrote: > From: Kristofer Karlsson <krka@spotify.com> > + if (!(old_paint & STALE)) { > + switch (old_paint & (PARENT1 | PARENT2)) { > + case 0: break; > + case PARENT1: queue->p1_count--; break; > + case PARENT2: queue->p2_count--; break; > + case PARENT1 | PARENT2: queue->pending_merge_bases--; break; > + default: BUG("unexpected paint state"); > + } > + } > + if (!(new_paint & STALE)) { > + switch (new_paint & (PARENT1 | PARENT2)) { > + case 0: break; > + case PARENT1: queue->p1_count++; break; > + case PARENT2: queue->p2_count++; break; > + case PARENT1 | PARENT2: queue->pending_merge_bases++; break; > + default: BUG("unexpected paint state"); > + } > + } While correct and compact, I don't believe that these switch statements follow the coding guidelines. We should split the lines appropriately so they are more standard, such as: if (!(new_paint & STALE)) { switch (new_paint & (PARENT1 | PARENT2)) { case 0: break; case PARENT1: queue->p1_count++; break; case PARENT2: queue->p2_count++; break; case PARENT1 | PARENT2: queue->pending_merge_bases++; break; default: BUG("unexpected paint state"); } } Also: technically "case 0" should be a BUG() state, right? We shouldn't be walking any commit that isn't reachable from at least one side. (case 0 does happen for old_paint, though.) > } > > -static void clear_nonstale_queue(struct nonstale_queue *queue) > +static void paint_queue_put(struct paint_queue *queue, > + struct commit *c, unsigned add_flags) > { > - clear_prio_queue(&queue->pq); > - queue->max_nonstale = NULL; > -} > + unsigned old_flags = c->object.flags; > + c->object.flags |= add_flags; Diffs like this are part of the reason I'd like to see a _new_ data structure instead of replacing the old one. Keeping the old one for ahead_behind seems like a good idea to me, but even if we don't land on that end state then deleting the old code _after_ adding the new code will make the diff more readable. > - struct nonstale_queue queue = { > - { compare_commits_by_gen_then_commit_date } > + struct paint_queue queue = { > + .pq = { compare_commits_by_gen_then_commit_date } > }; I didn't notice when reading the struct definition, but looking at 'pq' here makes me think that we shouldn't be using that abbreviation as it could stand for "prio_queue" or "paint_queue". > + while ((commit = paint_queue_get(&queue))) { ...> + > + if (queue.p1_count + queue.p2_count + > + queue.pending_merge_bases == 0) > + break; > } When possible, I like to try to make loops only have one terminating condition. Should we have paint_queue_get() return NULL when it sees this internal state condition? Also, I'd rather see it of the form of (!count) instead of using addition to make it clear that we care about each value being zero. Finally, I think we actually want this case to get the benefit: if ((!queue.p1_count || !queue.p2_count) && !queue.pending_merge_bases) I do see that you have this condition in patch 3 with the extra detail that the max generation in the queue is finite. I think this is more reason to include this in the data structure method and not in the loop. Thanks, -Stolee ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters 2026-06-22 18:10 ` Derrick Stolee @ 2026-06-22 19:14 ` Kristofer Karlsson 2026-06-22 20:23 ` Derrick Stolee 0 siblings, 1 reply; 84+ messages in thread From: Kristofer Karlsson @ 2026-06-22 19:14 UTC (permalink / raw) To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren On Mon, 22 Jun 2026 at 20:10, Derrick Stolee <stolee@gmail.com> wrote: > > On 6/20/2026 6:36 AM, Kristofer Karlsson via GitGitGadget wrote: > > From: Kristofer Karlsson <krka@spotify.com> > > > + if (!(old_paint & STALE)) { > > + switch (old_paint & (PARENT1 | PARENT2)) { > > + case 0: break; > > + case PARENT1: queue->p1_count--; break; > > + case PARENT2: queue->p2_count--; break; > > + case PARENT1 | PARENT2: queue->pending_merge_bases--; break; > > + default: BUG("unexpected paint state"); > > + } > > + } > > + if (!(new_paint & STALE)) { > > + switch (new_paint & (PARENT1 | PARENT2)) { > > + case 0: break; > > + case PARENT1: queue->p1_count++; break; > > + case PARENT2: queue->p2_count++; break; > > + case PARENT1 | PARENT2: queue->pending_merge_bases++; break; > > + default: BUG("unexpected paint state"); > > + } > > + } > > While correct and compact, I don't believe that these switch > statements follow the coding guidelines. We should split the > lines appropriately so they are more standard, such as: > > if (!(new_paint & STALE)) { > switch (new_paint & (PARENT1 | PARENT2)) { > case 0: > break; > > case PARENT1: > queue->p1_count++; > break; > > case PARENT2: > queue->p2_count++; > break; > > case PARENT1 | PARENT2: > queue->pending_merge_bases++; > break; > > default: > BUG("unexpected paint state"); > } > } Agreed, I will change to that style. I did try to look for style guidelines but I missed the .clang-format file (I was only looking through text files). Apologies, will remember clang-format for next time (and v2) > Also: technically "case 0" should be a BUG() state, right? We > shouldn't be walking any commit that isn't reachable from at > least one side. (case 0 does happen for old_paint, though.) No, this is actually intended - initially I started with skipping case 0 and let it fall through, but that would hide _other_ bugs. I use 0 as a marker for "not in the queue" so we have this: Enqueuing: 0 -> flags Dequeueing: flags -> 0 Only the case with the modified commit being in the queue will have non-zero flags. I tried to document this, but perhaps it is not clear enough, I will see if I can rephrase it, or add an inline comment around the case itself. > > -static void clear_nonstale_queue(struct nonstale_queue *queue) > > +static void paint_queue_put(struct paint_queue *queue, > > + struct commit *c, unsigned add_flags) > > { > > - clear_prio_queue(&queue->pq); > > - queue->max_nonstale = NULL; > > -} > > + unsigned old_flags = c->object.flags; > > + c->object.flags |= add_flags; > > Diffs like this are part of the reason I'd like to see a _new_ > data structure instead of replacing the old one. Keeping the > old one for ahead_behind seems like a good idea to me, but even > if we don't land on that end state then deleting the old code > _after_ adding the new code will make the diff more readable. Agreed, will address that. > > - struct nonstale_queue queue = { > > - { compare_commits_by_gen_then_commit_date } > > + struct paint_queue queue = { > > + .pq = { compare_commits_by_gen_then_commit_date } > > }; > > I didn't notice when reading the struct definition, but looking at > 'pq' here makes me think that we shouldn't be using that abbreviation > as it could stand for "prio_queue" or "paint_queue". Good point, I should pick a longer name for the field. Perhaps simply queue (I want to avoid prio_queue since it exactly matches the name of the struct which could be confusing.) > > + while ((commit = paint_queue_get(&queue))) { > ...> + > > + if (queue.p1_count + queue.p2_count + > > + queue.pending_merge_bases == 0) > > + break; > > } > When possible, I like to try to make loops only have one terminating > condition. Should we have paint_queue_get() return NULL when it sees > this internal state condition? Possibly, but that would couple the paint_queue struct very tightly with the usage. Not a problem in practice since it only has one call site, and it's unlikely that we want to add more of them but it may feel more natural to let the paint_queue purely have the queue semantics and counters, and keep the halt condition within the function itself. I don't feel super-strongly about this and can change it if needed, I will just need to verify that nothing else gets complex as a result, I have not fully thought through the effects. > Also, I'd rather see it of the form of (!count) instead of using > addition to make it clear that we care about each value being zero. I did consider that, and most of the code in commit-reach.c at least prefers x and !x over x != 0 and x == 0, but my thinking was that other code in the repo did use comparison operators specifically for things like counters. Happy to change it to conform better though! > Finally, I think we actually want this case to get the benefit: > > if ((!queue.p1_count || !queue.p2_count) && > !queue.pending_merge_bases) > > I do see that you have this condition in patch 3 with the extra > detail that the max generation in the queue is finite. I think this > is more reason to include this in the data structure method and not > in the loop. Yes, but just to be clear, you don't want to merge together patch 2 and 3 here, just grouping the halt conditions closer together (within paint_queue_get)? Keeping patch 2 and 3 separate would be nice to make it easier to show that introducing this extra counter bookkeeping does not negatively impact the overall performance too much. Thanks! I appreciate the thorough review of this patch (which I feared was the most annoying one to look at). Kristofer ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters 2026-06-22 19:14 ` Kristofer Karlsson @ 2026-06-22 20:23 ` Derrick Stolee 2026-06-23 10:13 ` Kristofer Karlsson 0 siblings, 1 reply; 84+ messages in thread From: Derrick Stolee @ 2026-06-22 20:23 UTC (permalink / raw) To: Kristofer Karlsson Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren On 6/22/2026 3:14 PM, Kristofer Karlsson wrote: > On Mon, 22 Jun 2026 at 20:10, Derrick Stolee <stolee@gmail.com> wrote: >> >> Also: technically "case 0" should be a BUG() state, right? We >> shouldn't be walking any commit that isn't reachable from at >> least one side. (case 0 does happen for old_paint, though.) > > No, this is actually intended - initially I started with skipping > case 0 and let it fall through, but that would hide _other_ bugs. > I use 0 as a marker for "not in the queue" so we have this: > Enqueuing: 0 -> flags > Dequeueing: flags -> 0 > Only the case with the modified commit being in the queue > will have non-zero flags. I tried to document this, but perhaps > it is not clear enough, I will see if I can rephrase it, or add an > inline comment around the case itself. I bet this would be obvious if I tried to change the code and run the tests. thanks for the explanation. >>> + while ((commit = paint_queue_get(&queue))) { >> ...> + >>> + if (queue.p1_count + queue.p2_count + >>> + queue.pending_merge_bases == 0) >>> + break; >>> } >> When possible, I like to try to make loops only have one terminating >> condition. Should we have paint_queue_get() return NULL when it sees >> this internal state condition? > > Possibly, but that would couple the paint_queue struct very tightly with > the usage. Not a problem in practice since it only has one call site, and > it's unlikely that we want to add more of them but it may feel more natural > to let the paint_queue purely have the queue semantics and counters, > and keep the halt condition within the function itself. I don't feel > super-strongly about this and can change it if needed, I will just need to > verify that nothing else gets complex as a result, I have not fully thought > through the effects. Hm. Interesting. The coupling is perhaps expected, because the data structure tracks counts that don't otherwise need to be tracked. Maybe the terminating condition method could be descriptively named to say why it would be completing. >> Also, I'd rather see it of the form of (!count) instead of using >> addition to make it clear that we care about each value being zero. > > I did consider that, and most of the code in commit-reach.c at least > prefers x and !x over x != 0 and x == 0, but my thinking was that > other code in the repo did use comparison operators specifically > for things like counters. Happy to change it to conform better though! I just worry about the idea that a negative number (or an addition overflow) would create conditions for termination that we did not intend. That's why using the nonzero status as true/false combined with ands and ors is better. >> Finally, I think we actually want this case to get the benefit: >> >> if ((!queue.p1_count || !queue.p2_count) && >> !queue.pending_merge_bases) >> >> I do see that you have this condition in patch 3 with the extra >> detail that the max generation in the queue is finite. I think this >> is more reason to include this in the data structure method and not >> in the loop. > > Yes, but just to be clear, you don't want to merge together patch 2 and 3 > here, just grouping the halt conditions closer together > (within paint_queue_get)? Keeping patch 2 and 3 separate would be nice > to make it easier to show that introducing this extra counter bookkeeping > does not negatively impact the overall performance too much. No, I don't want you to squash them. I was perhaps unclear as I was discovering the structure as we went. The thing I was missing above was the "finite generation number" condition, which you make very clear in patch 3. Thanks, -Stolee ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters 2026-06-22 20:23 ` Derrick Stolee @ 2026-06-23 10:13 ` Kristofer Karlsson 2026-06-23 13:50 ` Derrick Stolee 0 siblings, 1 reply; 84+ messages in thread From: Kristofer Karlsson @ 2026-06-23 10:13 UTC (permalink / raw) To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren On Mon, 22 Jun 2026 at 22:23, Derrick Stolee <stolee@gmail.com> wrote: > > On 6/22/2026 3:14 PM, Kristofer Karlsson wrote: > > > > On Mon, 22 Jun 2026 at 20:10, Derrick Stolee <stolee@gmail.com> wrote: > >> > >> When possible, I like to try to make loops only have one terminating > >> condition. Should we have paint_queue_get() return NULL when it sees > >> this internal state condition? > > > > Possibly, but that would couple the paint_queue struct very tightly with > > the usage. Not a problem in practice since it only has one call site, and > > it's unlikely that we want to add more of them but it may feel more natural > > to let the paint_queue purely have the queue semantics and counters, > > and keep the halt condition within the function itself. I don't feel > > super-strongly about this and can change it if needed, I will just need to > > verify that nothing else gets complex as a result, I have not fully thought > > through the effects. > > Hm. Interesting. The coupling is perhaps expected, because the data > structure tracks counts that don't otherwise need to be tracked. > Maybe the terminating condition method could be descriptively named > to say why it would be completing. > I have been working on v2 locally and most of the changes landed nicely and were clear improvements but there's one point I would want to discuss a bit more. For the termination conditions, I moved them into paint_queue_get() as you suggested. The all-zero check was straightforward since it only depends on the counters but the side-exhaustion check also needs to know whether we have entered the finite-generation region, so I pass last_gen (already a local in paint_down_to_common) as a parameter: static struct commit *paint_queue_get(struct paint_state *state, timestamp_t last_gen) Inside, the two conditions merge nicely under a shared guard: if (!state->pending_merge_bases) { if (!state->p1_count && !state->p2_count) return NULL; if (last_gen < GENERATION_NUMBER_INFINITY && (!state->p1_count || !state->p2_count)) return NULL; } Both conditions require pending_merge_bases == 0, so the nesting felt natural. The first is "nothing non-stale left" (works in any region). The second is "one side exhausted" (only in the finite region where topological ordering holds). I think passing in last_gen into paint_queue_get() feels _slightly_ awkward but not too bad in practice. However, we also have my older (first) patch with the fast-exit if the caller only needs one merge base -- that has a separate break that also could be folded into paint_queue_get(). The messy part here is that we would need to also pass the mb_flags parameter to paint_queue_get(). Perhaps we should just let this remain as-is for now and follow up with _removing_ that optimization. I think the value of having it is much diminished (but not fully gone) by the side-exhaust approach. Additionally there's a correctness argument to be made -- perhaps all callers _should_ care about multiple merge bases existing, and instead bail out if it finds more than one. The only use case where this matters today is "git merge-base A B" without --all. Right now I am leaning towards simply passing in last_gen and containing all of the halt conditions there (except the old !FIND_ALL). The nicest alternative I can think of is to let this part only break when the queue is empty: while ((commit = paint_queue_get(&state))) and then adding a logical halt-section at the end of the while-loop (where all the useful variables we need are already available), and we could logically think of that as an optimization section, never strictly needed for correctness. > I just worry about the idea that a negative number (or an addition > overflow) would create conditions for termination that we did not > intend. That's why using the nonzero status as true/false combined > with ands and ors is better. Good point, I have addressed that locally too. Thanks, Kristofer ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters 2026-06-23 10:13 ` Kristofer Karlsson @ 2026-06-23 13:50 ` Derrick Stolee 2026-06-23 14:09 ` Kristofer Karlsson 0 siblings, 1 reply; 84+ messages in thread From: Derrick Stolee @ 2026-06-23 13:50 UTC (permalink / raw) To: Kristofer Karlsson Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren On 6/23/2026 6:13 AM, Kristofer Karlsson wrote: > On Mon, 22 Jun 2026 at 22:23, Derrick Stolee <stolee@gmail.com> wrote: >> >> On 6/22/2026 3:14 PM, Kristofer Karlsson wrote: >>> >>> On Mon, 22 Jun 2026 at 20:10, Derrick Stolee <stolee@gmail.com> wrote: >>>> >>>> When possible, I like to try to make loops only have one terminating >>>> condition. Should we have paint_queue_get() return NULL when it sees >>>> this internal state condition? >>> >>> Possibly, but that would couple the paint_queue struct very tightly with >>> the usage. Not a problem in practice since it only has one call site, and >>> it's unlikely that we want to add more of them but it may feel more natural >>> to let the paint_queue purely have the queue semantics and counters, >>> and keep the halt condition within the function itself. I don't feel >>> super-strongly about this and can change it if needed, I will just need to >>> verify that nothing else gets complex as a result, I have not fully thought >>> through the effects. >> >> Hm. Interesting. The coupling is perhaps expected, because the data >> structure tracks counts that don't otherwise need to be tracked. >> Maybe the terminating condition method could be descriptively named >> to say why it would be completing. >> > > I have been working on v2 locally and most of the changes landed > nicely and were clear improvements but there's one point I would > want to discuss a bit more. > > For the termination conditions, I moved them into paint_queue_get() > as you suggested. The all-zero check was straightforward since it > only depends on the counters but the side-exhaustion check also > needs to know whether we have entered the finite-generation region, > so I pass last_gen (already a local in paint_down_to_common) as a > parameter: > > static struct commit *paint_queue_get(struct paint_state *state, > timestamp_t last_gen) > > Inside, the two conditions merge nicely under a shared guard: > > if (!state->pending_merge_bases) { > if (!state->p1_count && !state->p2_count) > return NULL; > if (last_gen < GENERATION_NUMBER_INFINITY && > (!state->p1_count || !state->p2_count)) > return NULL; > } This looks good to me. I'm not even bothered by the last_gen parameter. You do make a good point about it being a potentially leaky abstraction. > Both conditions require pending_merge_bases == 0, so the nesting > felt natural. The first is "nothing non-stale left" (works in any > region). The second is "one side exhausted" (only in the finite > region where topological ordering holds). > > I think passing in last_gen into paint_queue_get() feels _slightly_ > awkward but not too bad in practice. However, we also have my > older (first) patch with the fast-exit if the caller only needs one > merge base -- that has a separate break that also could be folded > into paint_queue_get(). The messy part here is that we would need > to also pass the mb_flags parameter to paint_queue_get(). How much of this data that you are passing into the method could be state in the paint_queue struct? Could we have the paint_queue manage all of the state necessary to make decisions around the walk termination? Or, could we do a peek into the queue to see the "top" commit, and check if it is a finite commit or not? I know that 'last_gen' is supposed to be the commit walked in the previous cycle, but it seems that we only care about "the remaining commits are finite" as our condition. > Perhaps we should just let this remain as-is for now and follow up > with _removing_ that optimization. I think the value of having it > is much diminished (but not fully gone) by the side-exhaust approach. > > Additionally there's a correctness argument to be made -- perhaps > all callers _should_ care about multiple merge bases existing, and > instead bail out if it finds more than one. The only use case > where this matters today is "git merge-base A B" without --all. > Right now I am leaning towards simply passing in last_gen and > containing all of the halt conditions there > (except the old !FIND_ALL). This is a good start, but hopefully storing the data in the struct would be a good way to handle that. Thanks, -Stolee ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters 2026-06-23 13:50 ` Derrick Stolee @ 2026-06-23 14:09 ` Kristofer Karlsson 2026-06-23 14:17 ` Derrick Stolee 0 siblings, 1 reply; 84+ messages in thread From: Kristofer Karlsson @ 2026-06-23 14:09 UTC (permalink / raw) To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren On Tue, 23 Jun 2026 at 15:50, Derrick Stolee <stolee@gmail.com> wrote: > > > For the termination conditions, I moved them into paint_queue_get() > > as you suggested. The all-zero check was straightforward since it > > only depends on the counters but the side-exhaustion check also > > needs to know whether we have entered the finite-generation region, > > so I pass last_gen (already a local in paint_down_to_common) as a > > parameter: > > > > static struct commit *paint_queue_get(struct paint_state *state, > > timestamp_t last_gen) > > > > Inside, the two conditions merge nicely under a shared guard: > > > > if (!state->pending_merge_bases) { > > if (!state->p1_count && !state->p2_count) > > return NULL; > > if (last_gen < GENERATION_NUMBER_INFINITY && > > (!state->p1_count || !state->p2_count)) > > return NULL; > > } > > This looks good to me. I'm not even bothered by the last_gen > parameter. You do make a good point about it being a potentially > leaky abstraction. Agreed, I am not also bothered by it. > > Both conditions require pending_merge_bases == 0, so the nesting > > felt natural. The first is "nothing non-stale left" (works in any > > region). The second is "one side exhausted" (only in the finite > > region where topological ordering holds). > > > > I think passing in last_gen into paint_queue_get() feels _slightly_ > > awkward but not too bad in practice. However, we also have my > > older (first) patch with the fast-exit if the caller only needs one > > merge base -- that has a separate break that also could be folded > > into paint_queue_get(). The messy part here is that we would need > > to also pass the mb_flags parameter to paint_queue_get(). > > How much of this data that you are passing into the method could be > state in the paint_queue struct? Could we have the paint_queue manage > all of the state necessary to make decisions around the walk > termination? Good idea, I think adding last_gen to the struct is doable and makes it cleaner. If needed we could also add the mb_flags there (but would be a followup patch) Minor note: I renamed the struct to paint_state so that I could rename the prio_queue to queue and not have "queue.queue" which felt confusing in the code. > Or, could we do a peek into the queue to see the "top" commit, and > check if it is a finite commit or not? I know that 'last_gen' is > supposed to be the commit walked in the previous cycle, but it seems > that we only care about "the remaining commits are finite" as our > condition. Yes, peeking into the queue would work too, but it would feel awkward, commit = prio_queue_peek(); if (halt conditions) return NULL; prio_queue_get(); And if we get first, the condition is not valid - that said, it would be doable to instead put the halt conditions _between_ popping the commit and updating the counters. I am not sure how ugly or confusing it would be, but I could add a comment to explain why that sequencing is important. (Popping the commit and updating the counters may lead to temporary 0 counts, but then when we enqueue parents of the commits they move away from the 0 anyway). It would become something like: // dry-/pseudo-coded commit *paint_queue_pop() { commit = prio_queue_pop(); if (!commit) return NULL; if (halt_condition(state, commit.generation)) return NULL; // important: don't decrement counters before checking the halt condition paint_count_update(state, commit->object.flags, -1); return commit; } > > Right now I am leaning towards simply passing in last_gen and > > containing all of the halt conditions there > > (except the old !FIND_ALL). > > This is a good start, but hopefully storing the data in the > struct would be a good way to handle that. Sounds good, I will massage the code a bit, store the relevant pieces in the struct and see how clean I can make it. Thanks, Kristofer ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters 2026-06-23 14:09 ` Kristofer Karlsson @ 2026-06-23 14:17 ` Derrick Stolee 2026-06-24 11:25 ` Kristofer Karlsson 0 siblings, 1 reply; 84+ messages in thread From: Derrick Stolee @ 2026-06-23 14:17 UTC (permalink / raw) To: Kristofer Karlsson Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren On 6/23/2026 10:09 AM, Kristofer Karlsson wrote: > On Tue, 23 Jun 2026 at 15:50, Derrick Stolee <stolee@gmail.com> wrote: >> How much of this data that you are passing into the method could be >> state in the paint_queue struct? Could we have the paint_queue manage >> all of the state necessary to make decisions around the walk >> termination? > > Good idea, I think adding last_gen to the struct is doable and makes it cleaner. > If needed we could also add the mb_flags there (but would be a followup patch) > Minor note: I renamed the struct to paint_state so that I could rename > the prio_queue to queue and not have "queue.queue" which felt > confusing in the code. > >> Or, could we do a peek into the queue to see the "top" commit, and >> check if it is a finite commit or not? I know that 'last_gen' is >> supposed to be the commit walked in the previous cycle, but it seems >> that we only care about "the remaining commits are finite" as our >> condition. > > Yes, peeking into the queue would work too, but it would feel awkward, > > commit = prio_queue_peek(); > if (halt conditions) return NULL; > prio_queue_get(); Good instinct to notice that peeking and getting from the same method is awkward. > And if we get first, the condition is not valid - that said, it would be doable > to instead put the halt conditions _between_ popping the commit and > updating the counters. I am not sure how ugly or confusing it would be, > but I could add a comment to explain why that sequencing is important. > (Popping the commit and updating the counters may lead to temporary > 0 counts, but then when we enqueue parents of the commits they > move away from the 0 anyway). It would become something like: > > // dry-/pseudo-coded > commit *paint_queue_pop() { > commit = prio_queue_pop(); > if (!commit) return NULL; > if (halt_condition(state, commit.generation)) return NULL; > // important: don't decrement counters before checking the halt condition > paint_count_update(state, commit->object.flags, -1); > return commit; > } I think this would be an appropriate way to handle this. If we pop and return NULL then it's ok that we removed data from the queue because it shouldn't be reused. Thanks, -Stolee ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters 2026-06-23 14:17 ` Derrick Stolee @ 2026-06-24 11:25 ` Kristofer Karlsson 0 siblings, 0 replies; 84+ messages in thread From: Kristofer Karlsson @ 2026-06-24 11:25 UTC (permalink / raw) To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren On Tue, 23 Jun 2026 at 16:17, Derrick Stolee <stolee@gmail.com> wrote: > > I think this would be an appropriate way to handle this. If we > pop and return NULL then it's ok that we removed data from the > queue because it shouldn't be reused. I have prepared v2 on GGG which I believe addresses all of the feedback. The halt conditions now live inside paint_queue_get() as you suggested. I am not 100% happy with the halt-condition placement yet -- the existing loop in master already has several exit paths (while condition, min_generation break, FIND_ALL break) and I think there is an opportunity to consolidate them. But that is a separate discussion and I do not want to derail this series. I can propose some alternatives in a follow-up after this lands. Thanks, Kristofer ^ permalink raw reply [flat|nested] 84+ messages in thread
* [PATCH/RFC 3/6] commit-reach: terminate merge-base walk when one paint side is exhausted 2026-06-20 10:36 [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted Kristofer Karlsson via GitGitGadget 2026-06-20 10:36 ` [PATCH/RFC 1/6] commit-reach: decouple ahead_behind from nonstale_queue Kristofer Karlsson via GitGitGadget 2026-06-20 10:36 ` [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters Kristofer Karlsson via GitGitGadget @ 2026-06-20 10:36 ` Kristofer Karlsson via GitGitGadget 2026-06-22 18:12 ` Derrick Stolee 2026-06-20 10:36 ` [PATCH/RFC 4/6] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget ` (4 subsequent siblings) 7 siblings, 1 reply; 84+ messages in thread From: Kristofer Karlsson via GitGitGadget @ 2026-06-20 10:36 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Kristofer Karlsson From: Kristofer Karlsson <krka@spotify.com> Add an early termination check to paint_down_to_common() using the per-side counters introduced in the previous commit. Once the walk enters the finite-generation region, terminate early when one side's exclusive count drops to zero -- no new merge-base can form without both paint sides meeting. The check also waits for pending_merge_bases to reach zero, ensuring all merge-base candidates have been popped and recorded before exiting. The INFINITY gate ensures correctness: commits without a commit-graph entry have GENERATION_NUMBER_INFINITY and are ordered by commit date, which is not topologically reliable. The optimization only fires once the walk enters the finite-generation region where ordering guarantees hold. On large repositories with commit-graph, this yields 100-1000x speedups for merge-base queries where one side (e.g. a PR branch) is much smaller than the other. Helped-by: Derrick Stolee <stolee@gmail.com> Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- commit-reach.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/commit-reach.c b/commit-reach.c index ba1e896f0f..fcd1ad0167 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -201,6 +201,19 @@ static int paint_down_to_common(struct repository *r, if (queue.p1_count + queue.p2_count + queue.pending_merge_bases == 0) break; + + /* + * Side exhaustion: a new merge-base can only form + * when both PARENT1-only and PARENT2-only commits + * remain in the queue. In the finite-generation + * region the queue is ordered topologically, so + * no future step can add paint to visited commits + * and an exhausted side cannot reappear. + */ + if (generation < GENERATION_NUMBER_INFINITY && + queue.pending_merge_bases == 0 && + (queue.p1_count == 0 || queue.p2_count == 0)) + break; } clear_prio_queue(&queue.pq); -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
* Re: [PATCH/RFC 3/6] commit-reach: terminate merge-base walk when one paint side is exhausted 2026-06-20 10:36 ` [PATCH/RFC 3/6] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget @ 2026-06-22 18:12 ` Derrick Stolee 2026-06-22 19:19 ` Kristofer Karlsson 0 siblings, 1 reply; 84+ messages in thread From: Derrick Stolee @ 2026-06-22 18:12 UTC (permalink / raw) To: Kristofer Karlsson via GitGitGadget, git Cc: Elijah Newren, Kristofer Karlsson On 6/20/2026 6:36 AM, Kristofer Karlsson via GitGitGadget wrote: > From: Kristofer Karlsson <krka@spotify.com> > > Add an early termination check to paint_down_to_common() using the > per-side counters introduced in the previous commit. Once the walk > enters the finite-generation region, terminate early when one side's > exclusive count drops to zero -- no new merge-base can form without > both paint sides meeting. > > The check also waits for pending_merge_bases to reach zero, ensuring > all merge-base candidates have been popped and recorded before > exiting. > > The INFINITY gate ensures correctness: commits without a commit-graph > entry have GENERATION_NUMBER_INFINITY and are ordered by commit date, > which is not topologically reliable. The optimization only fires > once the walk enters the finite-generation region where ordering > guarantees hold. > > On large repositories with commit-graph, this yields 100-1000x > speedups for merge-base queries where one side (e.g. a PR branch) is > much smaller than the other. > > Helped-by: Derrick Stolee <stolee@gmail.com> > Helped-by: Elijah Newren <newren@gmail.com> > Signed-off-by: Kristofer Karlsson <krka@spotify.com> > --- > commit-reach.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/commit-reach.c b/commit-reach.c > index ba1e896f0f..fcd1ad0167 100644 > --- a/commit-reach.c > +++ b/commit-reach.c > @@ -201,6 +201,19 @@ static int paint_down_to_common(struct repository *r, > if (queue.p1_count + queue.p2_count + > queue.pending_merge_bases == 0) > break; > + > + /* > + * Side exhaustion: a new merge-base can only form > + * when both PARENT1-only and PARENT2-only commits > + * remain in the queue. In the finite-generation > + * region the queue is ordered topologically, so > + * no future step can add paint to visited commits > + * and an exhausted side cannot reappear. > + */ > + if (generation < GENERATION_NUMBER_INFINITY && > + queue.pending_merge_bases == 0 && > + (queue.p1_count == 0 || queue.p2_count == 0)) > + break; I mentioned it earlier, but I think this check should be in the dequeueing method instead of in the tail of the loop. But I think this is the correct ending case. I like that you broke this out into its own patch to demonstrate that this is the key performance boost. It may be good to have some performance test numbers that demonstrate that patch 2 does not add any substantial overhead (timing should match previous code) and in patch 3 this single condition gets us a huge benefit, though it requires the data tracking of patch 2 to work. Thanks, -Stolee ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH/RFC 3/6] commit-reach: terminate merge-base walk when one paint side is exhausted 2026-06-22 18:12 ` Derrick Stolee @ 2026-06-22 19:19 ` Kristofer Karlsson 2026-06-22 20:26 ` Derrick Stolee 0 siblings, 1 reply; 84+ messages in thread From: Kristofer Karlsson @ 2026-06-22 19:19 UTC (permalink / raw) To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren On Mon, 22 Jun 2026 at 20:12, Derrick Stolee <stolee@gmail.com> wrote: > > + if (generation < GENERATION_NUMBER_INFINITY && > > + queue.pending_merge_bases == 0 && > > + (queue.p1_count == 0 || queue.p2_count == 0)) > > + break; > I mentioned it earlier, but I think this check should be in the > dequeueing method instead of in the tail of the loop. Yes, I will try to fold this one into the paint_queue_get as well. > I like that you broke this out into its own patch to demonstrate > that this is the key performance boost. It may be good to have > some performance test numbers that demonstrate that patch 2 does > not add any substantial overhead (timing should match previous > code) and in patch 3 this single condition gets us a huge benefit, > though it requires the data tracking of patch 2 to work. Good point, I will try to run enough local tests to ensure that patch 2 does not add too much overhead to slow things down. I think I may need to create some type of (temporary, internal) test runner that runs the same walk multiple times to reduce the noise from parsing commits. I am not sure if I should also commit such a performance test or simply include a brief summary in the commit message Thanks, Kristofer ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH/RFC 3/6] commit-reach: terminate merge-base walk when one paint side is exhausted 2026-06-22 19:19 ` Kristofer Karlsson @ 2026-06-22 20:26 ` Derrick Stolee 2026-06-22 21:03 ` Kristofer Karlsson 0 siblings, 1 reply; 84+ messages in thread From: Derrick Stolee @ 2026-06-22 20:26 UTC (permalink / raw) To: Kristofer Karlsson Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren On 6/22/2026 3:19 PM, Kristofer Karlsson wrote: > I think I may need to create some type of (temporary, internal) > test runner that runs the same walk multiple times to reduce > the noise from parsing commits. I've used hyperfine [1] when doing specific performance tests in the past. You can build Git before and after and have hyperfine run the two modes and compare them: hyperfine --warmup=3 \ -n 'old' "~/git-old/bin-wrappers/git -C $repo merge-base $A $B" \ -n 'new' "~/git-new/bin-wrappers/git -C $repo merge-base $A $B" [1] https://github.com/sharkdp/hyperfine Good luck! -Stolee ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH/RFC 3/6] commit-reach: terminate merge-base walk when one paint side is exhausted 2026-06-22 20:26 ` Derrick Stolee @ 2026-06-22 21:03 ` Kristofer Karlsson 2026-06-23 13:40 ` Derrick Stolee 0 siblings, 1 reply; 84+ messages in thread From: Kristofer Karlsson @ 2026-06-22 21:03 UTC (permalink / raw) To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren On Mon, 22 Jun 2026 at 22:26, Derrick Stolee <stolee@gmail.com> wrote: > > I've used hyperfine [1] when doing specific performance tests > in the past. You can build Git before and after and have hyperfine > run the two modes and compare them: > > hyperfine --warmup=3 \ > -n 'old' "~/git-old/bin-wrappers/git -C $repo merge-base $A $B" \ > -n 'new' "~/git-new/bin-wrappers/git -C $repo merge-base $A $B" > > [1] https://github.com/sharkdp/hyperfine I can definitely use that, but I was thinking that the overhead of operations such as repo_parse_commit would be high relative to the overhead of the new paint_queue struct such that it would be hard to properly measure and that it would be easier if I could spread out that cost across multiple internal runs (which requires a custom binary of some sort), but perhaps it's enough to just show that there's no measurable regression here and then hyperfine is indeed the right fit. I'll start with that and see if I need to do anything more complex. Thanks, Kristofer ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH/RFC 3/6] commit-reach: terminate merge-base walk when one paint side is exhausted 2026-06-22 21:03 ` Kristofer Karlsson @ 2026-06-23 13:40 ` Derrick Stolee 0 siblings, 0 replies; 84+ messages in thread From: Derrick Stolee @ 2026-06-23 13:40 UTC (permalink / raw) To: Kristofer Karlsson Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren On 6/22/2026 5:03 PM, Kristofer Karlsson wrote: > On Mon, 22 Jun 2026 at 22:26, Derrick Stolee <stolee@gmail.com> wrote: >> >> I've used hyperfine [1] when doing specific performance tests >> in the past. You can build Git before and after and have hyperfine >> run the two modes and compare them: >> >> hyperfine --warmup=3 \ >> -n 'old' "~/git-old/bin-wrappers/git -C $repo merge-base $A $B" \ >> -n 'new' "~/git-new/bin-wrappers/git -C $repo merge-base $A $B" >> >> [1] https://github.com/sharkdp/hyperfine > > I can definitely use that, but I was thinking that the overhead > of operations such as repo_parse_commit would be high relative > to the overhead of the new paint_queue struct such that it would > be hard to properly measure and that it would be easier if I could > spread out that cost across multiple internal runs (which requires > a custom binary of some sort), but perhaps it's enough to just > show that there's no measurable regression here and then > hyperfine is indeed the right fit. I'll start with that and see if I need > to do anything more complex. Unit-level performance is nice, but doesn't tell the whole story. We typically focus on end-to-end performance numbers when possible. Another way to do it would be to use trace2_region_enter() and trace2_region_leave() markers and then pull the timing data out of the trace2 event logs. It's more complicated and usually only needed if we are struggling to reproduce the performance impact due to external factors. Thanks, -Stolee ^ permalink raw reply [flat|nested] 84+ messages in thread
* [PATCH/RFC 4/6] t6600: add test cases for side-exhaustion edge cases 2026-06-20 10:36 [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted Kristofer Karlsson via GitGitGadget ` (2 preceding siblings ...) 2026-06-20 10:36 ` [PATCH/RFC 3/6] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget @ 2026-06-20 10:36 ` Elijah Newren via GitGitGadget 2026-06-22 18:15 ` Derrick Stolee 2026-06-20 10:36 ` [PATCH/RFC 5/6] t6099, t6600: add side-exhaustion regression tests Kristofer Karlsson via GitGitGadget ` (3 subsequent siblings) 7 siblings, 1 reply; 84+ messages in thread From: Elijah Newren via GitGitGadget @ 2026-06-20 10:36 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Elijah Newren From: Elijah Newren <newren@gmail.com> Add test cases to t6600-test-reach.sh that exercise edge cases in the side-exhaustion optimization for paint_down_to_common(): - in_merge_bases_many:self: commit is both A and one of the X inputs - get_merge_bases_many:duplicate-twos: duplicate entries in X list - get_merge_bases_many:pending-stale: STALE transition on an already-painted commit (ps-* diamond topology) - get_merge_bases_many:infinity-both-sides: both tips outside the commit-graph with non-monotonic dates (pi-* topology) Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- t/t6600-test-reach.sh | 111 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index b5b314e570..775c077c87 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -49,6 +49,62 @@ test_expect_success 'setup' ' git tag -a -m "$x-$i" tag-$x-$i commit-$x-$i || return 1 done done && + + # Build a small side topology to exercise the (PARENT1|PARENT2) -> + # (PARENT1|PARENT2|STALE) transition in paint_down_to_common(); the + # 10x10 grid above does not exercise it because no merge-base candidate + # there is a descendant of another, so STALE never reaches a + # still-pending candidate. + # + # ps-X + # /|\ + # / | \ + # ps-Z ps-B ps-W + # | / \ | + # | / \ | + # |/ \| + # ps-T1 ps-T2 + # + # where ps-T1=merge(ps-Z,ps-B), ps-T2=merge(ps-W,ps-B), so + # merge-base(ps-T1,ps-T2) = ps-B. During the walk, ps-X transitions + # to (PARENT1|PARENT2) via ps-Z and ps-W before ps-B is dequeued; + # then the STALE-walk from ps-B transitions ps-X to + # (PARENT1|PARENT2|STALE). + git checkout --orphan ps-orphan && + test_commit ps-X && + git checkout -b ps-B-br ps-X && test_commit ps-B && + git checkout -b ps-Z-br ps-X && test_commit ps-Z && + git checkout -b ps-W-br ps-X && test_commit ps-W && + git checkout -b ps-T1 ps-Z && + git merge --no-ff -m ps-T1 ps-B && + git checkout -b ps-T2 ps-W && + git merge --no-ff -m ps-T2 ps-B && + + # Build a side topology that lives entirely outside the half + # commit-graph and has non-monotonic commit dates, to exercise the + # INFINITY-gate in paint_down_to_common. With both tips outside + # the graph, generation is INFINITY and the queue falls back to + # commit-date order, which here is non-monotonic. + # + # pi-X (date 500, PARENT1 tip) --> pi-P, pi-D + # pi-D (date 480) --> pi-C + # pi-C (date 200) --> pi-B + # pi-B (date 100, PARENT2 tip) --> pi-P + # pi-P (date 450, root) + # + # merge-base(pi-X, pi-B) = pi-B (it is an ancestor of pi-X and is + # itself one of the queried tips). + git checkout --orphan pi-orphan && + test_commit --date "@450 +0000" pi-P && + test_commit --date "@100 +0000" pi-B && + test_commit --date "@200 +0000" pi-C && + test_commit --date "@480 +0000" pi-D && + GIT_AUTHOR_DATE="@500 +0000" GIT_COMMITTER_DATE="@500 +0000" \ + git commit-tree -p pi-D -p pi-P -m pi-X pi-D^{tree} >pi-X-oid && + pi_x="$(cat pi-X-oid)" && + git branch -f pi-X-br "$pi_x" && + git tag pi-X "$pi_x" && + git commit-graph write --reachable && mv .git/objects/info/commit-graph commit-graph-full && chmod u+w commit-graph-full && @@ -146,6 +202,16 @@ test_expect_success 'in_merge_bases_many:miss-heuristic' ' test_all_modes in_merge_bases_many ' +test_expect_success 'in_merge_bases_many:self' ' + cat >input <<-\EOF && + A:commit-6-8 + X:commit-5-9 + X:commit-6-8 + EOF + echo "in_merge_bases_many(A,X):1" >expect && + test_all_modes in_merge_bases_many +' + test_expect_success 'is_descendant_of:hit' ' cat >input <<-\EOF && A:commit-5-7 @@ -183,6 +249,51 @@ test_expect_success 'get_merge_bases_many' ' test_all_modes get_merge_bases_many ' +test_expect_success 'get_merge_bases_many:duplicate-twos' ' + cat >input <<-\EOF && + A:commit-5-7 + X:commit-4-8 + X:commit-4-8 + X:commit-6-6 + X:commit-6-6 + X:commit-8-3 + EOF + { + echo "get_merge_bases_many(A,X):" && + git rev-parse commit-5-6 \ + commit-4-7 | sort + } >expect && + test_all_modes get_merge_bases_many +' + +test_expect_success 'get_merge_bases_many:pending-stale' ' + # Exercises the (PARENT1|PARENT2) -> (...|STALE) transition path in + # paint_down_to_common(). See the topology comment in the setup test. + cat >input <<-\EOF && + A:ps-T1 + X:ps-T2 + EOF + { + echo "get_merge_bases_many(A,X):" && + git rev-parse ps-B + } >expect && + test_all_modes get_merge_bases_many +' + +test_expect_success 'get_merge_bases_many:infinity-both-sides' ' + # Exercises the push-time INFINITY-gate in paint_down_to_common(). See + # the pi-* topology comment in the setup test. + cat >input <<-\EOF && + A:pi-X + X:pi-B + EOF + { + echo "get_merge_bases_many(A,X):" && + git rev-parse pi-B + } >expect && + test_all_modes get_merge_bases_many +' + test_expect_success 'reduce_heads' ' cat >input <<-\EOF && X:commit-1-10 -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
* Re: [PATCH/RFC 4/6] t6600: add test cases for side-exhaustion edge cases 2026-06-20 10:36 ` [PATCH/RFC 4/6] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget @ 2026-06-22 18:15 ` Derrick Stolee 2026-06-22 19:25 ` Kristofer Karlsson 0 siblings, 1 reply; 84+ messages in thread From: Derrick Stolee @ 2026-06-22 18:15 UTC (permalink / raw) To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren, Kristofer Karlsson On 6/20/2026 6:36 AM, Elijah Newren via GitGitGadget wrote: > From: Elijah Newren <newren@gmail.com> > > Add test cases to t6600-test-reach.sh that exercise edge cases in the > side-exhaustion optimization for paint_down_to_common(): > > - in_merge_bases_many:self: commit is both A and one of the X inputs > - get_merge_bases_many:duplicate-twos: duplicate entries in X list > - get_merge_bases_many:pending-stale: STALE transition on an > already-painted commit (ps-* diamond topology) > - get_merge_bases_many:infinity-both-sides: both tips outside the > commit-graph with non-monotonic dates (pi-* topology) It's usually my preference to see these tests show up before the new code arrives, that way we can see that they already work with the old logic and continue to work with the new logic. It's minor, but putting them after your code change may be adding enforcement of a change of behavior. One thing that could be helpful here is to consider tracing a count of "commits walked" in the merge-base code, then you could have these tests demonstrate the performance benefit by checking for that number changing. In t6600, that tracing number would not be the same across the three different data shapes (full graph, half graph, no graph) and that could be valuable to demonstrate in tests. Thanks, -Stolee ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH/RFC 4/6] t6600: add test cases for side-exhaustion edge cases 2026-06-22 18:15 ` Derrick Stolee @ 2026-06-22 19:25 ` Kristofer Karlsson 2026-06-22 20:28 ` Derrick Stolee 0 siblings, 1 reply; 84+ messages in thread From: Kristofer Karlsson @ 2026-06-22 19:25 UTC (permalink / raw) To: Derrick Stolee; +Cc: Elijah Newren via GitGitGadget, git, Elijah Newren On Mon, 22 Jun 2026 at 20:15, Derrick Stolee <stolee@gmail.com> wrote: > It's usually my preference to see these tests show up before the > new code arrives, that way we can see that they already work with > the old logic and continue to work with the new logic. > > It's minor, but putting them after your code change may be adding > enforcement of a change of behavior. Agreed, I actually also prefer that in practice so I am not sure why I ordered them this way - perhaps some attempt at making it easier to review (show the idea and change before the verification). I will reorder to put all new tests as the first commit (or second, if I will also introduce a status-quo technical first). > > One thing that could be helpful here is to consider tracing a > count of "commits walked" in the merge-base code, then you could > have these tests demonstrate the performance benefit by checking > for that number changing. Good idea, I actually had some of that locally when developing it, but I removed the ugly traces before submitting this. I will try to re-introduce that in a nice way. It would be neat to let tests inspect that side effect, though in the worst case that could make it fragile. At the very least it's good for human debugging though. > In t6600, that tracing number would not be the same across the > three different data shapes (full graph, half graph, no graph) and > that could be valuable to demonstrate in tests. Agreed, the number of commits visited would be more interesting than the relative performance numbers since it's an algorithmic change rather than a micro-optimization. Thanks, Kristofer ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH/RFC 4/6] t6600: add test cases for side-exhaustion edge cases 2026-06-22 19:25 ` Kristofer Karlsson @ 2026-06-22 20:28 ` Derrick Stolee 0 siblings, 0 replies; 84+ messages in thread From: Derrick Stolee @ 2026-06-22 20:28 UTC (permalink / raw) To: Kristofer Karlsson; +Cc: Elijah Newren via GitGitGadget, git, Elijah Newren On 6/22/2026 3:25 PM, Kristofer Karlsson wrote: > On Mon, 22 Jun 2026 at 20:15, Derrick Stolee <stolee@gmail.com> wrote: >> It's usually my preference to see these tests show up before the >> new code arrives, that way we can see that they already work with >> the old logic and continue to work with the new logic. >> >> It's minor, but putting them after your code change may be adding >> enforcement of a change of behavior. > > Agreed, I actually also prefer that in practice so I am not > sure why I ordered them this way - perhaps some attempt at > making it easier to review (show the idea and change before > the verification). I will reorder to put all new tests as the first commit > (or second, if I will also introduce a status-quo technical first). > >> >> One thing that could be helpful here is to consider tracing a >> count of "commits walked" in the merge-base code, then you could >> have these tests demonstrate the performance benefit by checking >> for that number changing. > > Good idea, I actually had some of that locally when developing it, > but I removed the ugly traces before submitting this. I will try to > re-introduce that in a nice way. It would be neat to let tests > inspect that side effect, though in the worst case that could make > it fragile. At the very least it's good for human debugging though. And to be clear, I'm suggesting using trace2_data_intmax() calls to get structured data that can be parsed in the GIT_TRACE2_EVENT logs during tests. It could also be picked up by teletry tools that listen to trace2 output, if desired. It will show up differently in GIT_TRACE2_PERF, but that's a nice human-readable way to debug things. >> In t6600, that tracing number would not be the same across the >> three different data shapes (full graph, half graph, no graph) and >> that could be valuable to demonstrate in tests. > > Agreed, the number of commits visited would be more interesting > than the relative performance numbers since it's an algorithmic > change rather than a micro-optimization. They are both interesting, but only the commit count can be guaranteed rigorously in the test suite. It's possible that a great improvement to such a trace doesn't result in great end-to- end time improvement, but I believe that it is true in this case. Thanks, -Stolee ^ permalink raw reply [flat|nested] 84+ messages in thread
* [PATCH/RFC 5/6] t6099, t6600: add side-exhaustion regression tests 2026-06-20 10:36 [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted Kristofer Karlsson via GitGitGadget ` (3 preceding siblings ...) 2026-06-20 10:36 ` [PATCH/RFC 4/6] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget @ 2026-06-20 10:36 ` Kristofer Karlsson via GitGitGadget 2026-06-22 18:16 ` Derrick Stolee 2026-06-20 10:36 ` [PATCH/RFC 6/6] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget ` (2 subsequent siblings) 7 siblings, 1 reply; 84+ messages in thread From: Kristofer Karlsson via GitGitGadget @ 2026-06-20 10:36 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Kristofer Karlsson From: Kristofer Karlsson <krka@spotify.com> Add t6099 to test the case where multiple merge-base candidates exist and one is an ancestor of another. This exercises the side-exhaustion optimization in paint_down_to_common together with the remove_redundant safety net in get_merge_bases_many_0. Add a mixed finite/INFINITY test to t6600 where one tip is outside the commit-graph (INFINITY generation) and the other is inside. This exercises the region transition: the walk starts in the INFINITY region where side-exhaustion is disabled, then crosses into the finite region where it can fire. Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- t/meson.build | 1 + t/t6099-merge-base-side-exhaustion.sh | 82 +++++++++++++++++++++++++++ t/t6600-test-reach.sh | 25 ++++++++ 3 files changed, 108 insertions(+) create mode 100755 t/t6099-merge-base-side-exhaustion.sh diff --git a/t/meson.build b/t/meson.build index 3219264fe7..ee6ebdffb9 100644 --- a/t/meson.build +++ b/t/meson.build @@ -786,6 +786,7 @@ integration_tests = [ 't6041-bisect-submodule.sh', 't6050-replace.sh', 't6060-merge-index.sh', + 't6099-merge-base-side-exhaustion.sh', 't6100-rev-list-in-order.sh', 't6101-rev-parse-parents.sh', 't6102-rev-list-unexpected-objects.sh', diff --git a/t/t6099-merge-base-side-exhaustion.sh b/t/t6099-merge-base-side-exhaustion.sh new file mode 100755 index 0000000000..bae3ea7f83 --- /dev/null +++ b/t/t6099-merge-base-side-exhaustion.sh @@ -0,0 +1,82 @@ +#!/bin/sh + +test_description='merge-base with ancestor among merge-base candidates + +Test that merge-base --all correctly handles cases where +multiple merge-base candidates exist and one is an ancestor +of another. The side-exhaustion optimization in +paint_down_to_common may exit before STALE propagation +removes the ancestor, but remove_redundant catches it. + +Graph shape (parents are below children): + + A ----------- X + |\ /| + | B---------/ | + | | | + e2 \ f2 + | | | + e1 d1 f1 + \ | / + \ | / + \| / + C + +A and X are the two tips. +B and C are both reachable from A and X. +B reaches C through d1. +Only B should appear in merge-base --all output. +' + +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME + +TEST_PASSES_SANITIZE_LEAK=true +. ./test-lib.sh + +test_expect_success 'setup ancestor merge-base candidate' ' + test_commit C && + + git checkout -b d-chain HEAD && + test_commit d1 && + test_commit B && + + git checkout -b e-path C && + test_commit e1 && + test_commit e2 && + + git checkout -b f-path C && + test_commit f1 && + test_commit f2 && + + git checkout -b branch-A e-path && + test_merge A B && + + git checkout -b branch-X f-path && + test_merge X B && + + git commit-graph write --reachable +' + +test_expect_success 'merge-base --all excludes ancestor candidate' ' + git rev-parse B >expected && + git merge-base --all A X >actual && + test_cmp expected actual +' + +test_expect_success 'merge-base (single) finds shallowest' ' + git rev-parse B >expected && + git merge-base A X >actual && + test_cmp expected actual +' + +# Without commit-graph: generation numbers are INFINITY, +# side-exhaustion optimization does not fire. +test_expect_success 'merge-base --all without commit-graph' ' + rm -f .git/objects/info/commit-graph && + git rev-parse B >expected && + git merge-base --all A X >actual && + test_cmp expected actual +' + +test_done diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index 775c077c87..f5560b0c1c 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -294,6 +294,31 @@ test_expect_success 'get_merge_bases_many:infinity-both-sides' ' test_all_modes get_merge_bases_many ' +test_expect_success 'setup mixed finite/INFINITY topology' ' + # Create a commit outside all saved commit-graph files so it always + # has INFINITY generation, while its parent (ps-X) is in the graph + # with a finite generation. Use the ps-* orphan topology so we do + # not pollute the grid-based rev-list tests. + git checkout ps-X && + test_env GIT_TEST_COMMIT_GRAPH= test_commit pm-INF +' + +test_expect_success 'get_merge_bases_many:mixed-finite-infinity' ' + # One tip (pm-INF) is outside the commit-graph with INFINITY + # generation; the other (ps-B) is in the graph with finite + # generation. The walk starts in the INFINITY region and crosses + # into the finite region where side-exhaustion can fire. + cat >input <<-\EOF && + A:pm-INF + X:ps-B + EOF + { + echo "get_merge_bases_many(A,X):" && + git rev-parse ps-X + } >expect && + test_all_modes get_merge_bases_many +' + test_expect_success 'reduce_heads' ' cat >input <<-\EOF && X:commit-1-10 -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
* Re: [PATCH/RFC 5/6] t6099, t6600: add side-exhaustion regression tests 2026-06-20 10:36 ` [PATCH/RFC 5/6] t6099, t6600: add side-exhaustion regression tests Kristofer Karlsson via GitGitGadget @ 2026-06-22 18:16 ` Derrick Stolee 0 siblings, 0 replies; 84+ messages in thread From: Derrick Stolee @ 2026-06-22 18:16 UTC (permalink / raw) To: Kristofer Karlsson via GitGitGadget, git Cc: Elijah Newren, Kristofer Karlsson On 6/20/2026 6:36 AM, Kristofer Karlsson via GitGitGadget wrote: > Add t6099 to test the case where multiple merge-base candidates exist > and one is an ancestor of another. This exercises the side-exhaustion > optimization in paint_down_to_common together with the > remove_redundant safety net in get_merge_bases_many_0. Same as the previous patch: I'd like to see these before the code change. And if we trace a count of commits walked, we'd be able to see the number change in this specific case. Thanks, -Stolee ^ permalink raw reply [flat|nested] 84+ messages in thread
* [PATCH/RFC 6/6] Documentation/technical: add paint-down-to-common doc 2026-06-20 10:36 [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted Kristofer Karlsson via GitGitGadget ` (4 preceding siblings ...) 2026-06-20 10:36 ` [PATCH/RFC 5/6] t6099, t6600: add side-exhaustion regression tests Kristofer Karlsson via GitGitGadget @ 2026-06-20 10:36 ` Kristofer Karlsson via GitGitGadget 2026-06-22 18:21 ` Derrick Stolee 2026-06-22 18:22 ` [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted Derrick Stolee 2026-06-24 12:14 ` [PATCH v2 0/7] " Kristofer Karlsson via GitGitGadget 7 siblings, 1 reply; 84+ messages in thread From: Kristofer Karlsson via GitGitGadget @ 2026-06-20 10:36 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Kristofer Karlsson From: Kristofer Karlsson <krka@spotify.com> Add a technical document describing the paint_down_to_common() algorithm used for merge-base computation. Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- Documentation/Makefile | 1 + Documentation/technical/meson.build | 1 + .../technical/paint-down-to-common.adoc | 130 ++++++++++++++++++ 3 files changed, 132 insertions(+) create mode 100644 Documentation/technical/paint-down-to-common.adoc diff --git a/Documentation/Makefile b/Documentation/Makefile index 2699f0b24a..f8dea4b395 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -129,6 +129,7 @@ TECH_DOCS += technical/long-running-process-protocol TECH_DOCS += technical/multi-pack-index TECH_DOCS += technical/packfile-uri TECH_DOCS += technical/pack-heuristics +TECH_DOCS += technical/paint-down-to-common TECH_DOCS += technical/parallel-checkout TECH_DOCS += technical/partial-clone TECH_DOCS += technical/platform-support diff --git a/Documentation/technical/meson.build b/Documentation/technical/meson.build index ec07088c57..9ce11d5e48 100644 --- a/Documentation/technical/meson.build +++ b/Documentation/technical/meson.build @@ -18,6 +18,7 @@ articles = [ 'multi-pack-index.adoc', 'packfile-uri.adoc', 'pack-heuristics.adoc', + 'paint-down-to-common.adoc', 'parallel-checkout.adoc', 'partial-clone.adoc', 'platform-support.adoc', diff --git a/Documentation/technical/paint-down-to-common.adoc b/Documentation/technical/paint-down-to-common.adoc new file mode 100644 index 0000000000..e677cce84d --- /dev/null +++ b/Documentation/technical/paint-down-to-common.adoc @@ -0,0 +1,130 @@ +Merge-Base Computation and paint_down_to_common() +================================================== + +The function `paint_down_to_common()` in `commit-reach.c` computes merge +bases by walking the commit graph backwards from two sets of tips and +finding where their ancestry meets. + +Use cases +--------- + +Computing merge bases is used in two different ways: + + 1. *Finding all merge bases* (`merge-base --all`, `merge-tree`, + `merge`, `rebase`). A merge base is a common ancestor that is + not itself an ancestor of another common ancestor. + + 2. *Ancestry checks* (`in_merge_bases`, used by `merge-base + --is-ancestor`, `branch -d`, `fetch`). These ask: "is commit A + an ancestor of commit B?" If a common ancestor equals one of the + inputs, that input is necessarily the only merge base -- no other + common ancestor can be both as recent and not an ancestor of it. + +Both use cases share the same algorithm and implementation. + +Algorithm +--------- + +Given a commit `one` and a set of commits `twos[]`, the walk paints +commits with two colors: + + - PARENT1: reachable from `one` + - PARENT2: reachable from any commit in `twos[]` + +The walk uses a priority queue ordered by generation number (falling +back to commit date when generation numbers are unavailable). Each +step dequeues the highest-priority commit (this is when we say a +commit is "visited") and propagates its paint flags to its parents, +enqueuing them if they gained new flags. When a commit receives +both PARENT1 and PARENT2, it is a merge-base candidate. A candidate +gains the STALE flag so its ancestors propagate staleness -- any +deeper common ancestor is necessarily redundant. + +INFINITY and finite generation regions +-------------------------------------- + +The commit-graph stores a generation number for each commit. Commits +not in the commit-graph have generation `GENERATION_NUMBER_INFINITY`. The +graph is closed under reachability: if a commit is in the graph, all +its ancestors are too. This partitions the commit graph into two regions: + +.... + +---------------------------------------+ + | INFINITY region | + | generation = INFINITY | + | queue order: heuristic (commit date) | + +---------------------------------------+ + | + v + +---------------------------------------+ + | Finite region | + | generation = finite | + | queue order: topological | + +---------------------------------------+ +.... + +When the commit-graph is enabled, the INFINITY region is typically +very small -- it only contains commits added since the last +commit-graph refresh. + +All reachable INFINITY-generation commits are visited before any +finite-generation commit, because INFINITY is larger than any finite +value. Once the walk crosses into the finite region, it stays there. + +In the finite region, generation ordering guarantees topological +traversal: children are always visited before their parents. This +means that paint on already-visited commits is final -- no future +traversal step can add paint to them. + +In the INFINITY region, commit-date ordering can violate this: a +parent with a later date can be visited before a child with an earlier +date. Paint flags are therefore NOT final at visit time, and a +commit visited with only one side's paint may later gain the other. + +Paint flags are only added, never removed. Since each flag can be set +at most once per commit, the number of times a commit can be +re-enqueued is bounded by the number of flag transitions. + +Termination +----------- + +Termination happens when we can prove that no extra progress is +possible. We are done with the main loop when one of the following +conditions holds: + + 1. The queue is empty. + 2. The queue only contains STALE entries. + 3. Side-exhaustion: the walk has reached the finite region and one + of the sides is fully exhausted. + +The loop waits for all pending merge-base candidates to be popped +and recorded before any early exit fires, so no separate drain phase +is needed after termination. + +Stale entry condition +~~~~~~~~~~~~~~~~~~~~~ +If all entries are stale we cannot find any new merge bases since +that requires at least one enqueued side node meeting the other side. +However, we could still invalidate merge bases (if there are more +than one). This is unnecessary since `remove_redundant()` will clean +that up as a post-process step. + +Side-exhaustion +~~~~~~~~~~~~~~~ +A commit is *exclusive* to one side if it carries that side's paint +but not the other (e.g. PARENT1 without PARENT2). + +If we have reached the finite region of the graph, no future +traversal step can add paint to an already-visited commit. Thus if +there are no exclusive PARENT2 commits in the queue, no additional +PARENT2 paint can be introduced into the walk. Even if exclusive +PARENT1 commits remain, no new merge-base candidates can be +discovered. The same holds symmetrically for PARENT1. + +This invariant is only valid in the finite region of the graph. + +Related documentation +--------------------- + + - `Documentation/technical/commit-graph.adoc` -- generation numbers + and the reachability closure property. -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
* Re: [PATCH/RFC 6/6] Documentation/technical: add paint-down-to-common doc 2026-06-20 10:36 ` [PATCH/RFC 6/6] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget @ 2026-06-22 18:21 ` Derrick Stolee 2026-06-22 19:30 ` Kristofer Karlsson 0 siblings, 1 reply; 84+ messages in thread From: Derrick Stolee @ 2026-06-22 18:21 UTC (permalink / raw) To: Kristofer Karlsson via GitGitGadget, git Cc: Elijah Newren, Kristofer Karlsson On 6/20/2026 6:36 AM, Kristofer Karlsson via GitGitGadget wrote: > From: Kristofer Karlsson <krka@spotify.com> > > Add a technical document describing the paint_down_to_common() > algorithm used for merge-base computation. I like the idea of documenting this so it's easier to understand. There is risk of drift from the actual implementation. You may want to add a comment to the method in commit-reach.c to indicate that any change should be reflected in this document. > +Termination > +----------- > + > +Termination happens when we can prove that no extra progress is > +possible. We are done with the main loop when one of the following > +conditions holds: > + > + 1. The queue is empty. > + 2. The queue only contains STALE entries. > + 3. Side-exhaustion: the walk has reached the finite region and one > + of the sides is fully exhausted. It could be an interesting exercise, but potentially wasteful, to add this document as a Patch 1, but reflecting the old algorithm and then to update the document at the same time as you update the code. The changes in your patch 2 would impact this doc in terms of the data being tracked by the paint_queue data structure instead of the nonstale_queue structure (though those details are not currently handled in the current version). The change to the termination condition would come along with patch 3. Thanks, -Stolee ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH/RFC 6/6] Documentation/technical: add paint-down-to-common doc 2026-06-22 18:21 ` Derrick Stolee @ 2026-06-22 19:30 ` Kristofer Karlsson 0 siblings, 0 replies; 84+ messages in thread From: Kristofer Karlsson @ 2026-06-22 19:30 UTC (permalink / raw) To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren On Mon, 22 Jun 2026 at 20:21, Derrick Stolee <stolee@gmail.com> wrote: > > I like the idea of documenting this so it's easier to understand. Yes I was myself thinking that I can prove it to myself now that it works, and anyone else could also prove it to themselves, but having it explicit here is even better. I found the other documents (i.e. commit-graph) to be a good source of inspiration here. > There is risk of drift from the actual implementation. You may want > to add a comment to the method in commit-reach.c to indicate that > any change should be reflected in this document. Good idea, will add that. > > +Termination > > +----------- > > + > > +Termination happens when we can prove that no extra progress is > > +possible. We are done with the main loop when one of the following > > +conditions holds: > > + > > + 1. The queue is empty. > > + 2. The queue only contains STALE entries. > > + 3. Side-exhaustion: the walk has reached the finite region and one > > + of the sides is fully exhausted. > It could be an interesting exercise, but potentially wasteful, to > add this document as a Patch 1, but reflecting the old algorithm > and then to update the document at the same time as you update the > code. I did consider that initially but I was worried it would be considered noisy. I am quite happy to rework it in a way that first explains the status quo. That would make the document diff more interesting. Agreed that should become the first patch, and the patch that changes the algorithm should include the documentation change. > The changes in your patch 2 would impact this doc in terms of the > data being tracked by the paint_queue data structure instead of the > nonstale_queue structure (though those details are not currently > handled in the current version). The change to the termination > condition would come along with patch 3. Agreed, I would need to rephrase from tracking non-stale to tracking counts of p1 and p2 (and pending merge bases) commits, but I think that would be a small tweak and well worth doing. Thanks, Kristofer ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted 2026-06-20 10:36 [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted Kristofer Karlsson via GitGitGadget ` (5 preceding siblings ...) 2026-06-20 10:36 ` [PATCH/RFC 6/6] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget @ 2026-06-22 18:22 ` Derrick Stolee 2026-06-24 12:14 ` [PATCH v2 0/7] " Kristofer Karlsson via GitGitGadget 7 siblings, 0 replies; 84+ messages in thread From: Derrick Stolee @ 2026-06-22 18:22 UTC (permalink / raw) To: Kristofer Karlsson via GitGitGadget, git Cc: Elijah Newren, Kristofer Karlsson On 6/20/2026 6:36 AM, Kristofer Karlsson via GitGitGadget wrote: > Hi, > > This follows up on my RFC [1] with a concrete proposal. I expect the design > to still be scrutinized, but that may be easier with actual code to look at. > > I tried to make this easier to review by splitting into atomic patches. The > first two patches are the meatiest parts, though they are pure refactoring. > The behavior change is in patch 3 and is in itself quite small. The last > patch adds technical documentation to support future development. Thanks for putting this together carefully. I gave some feedback on the specific code and the patch organization. Overall, I believe that this implementation is functionally correct and everything I have to say is about presentation and data gathering. I look forward to a non-RFC v2. Thanks, -Stolee ^ permalink raw reply [flat|nested] 84+ messages in thread
* [PATCH v2 0/7] commit-reach: terminate merge-base walk when one side is exhausted 2026-06-20 10:36 [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted Kristofer Karlsson via GitGitGadget ` (6 preceding siblings ...) 2026-06-22 18:22 ` [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted Derrick Stolee @ 2026-06-24 12:14 ` Kristofer Karlsson via GitGitGadget 2026-06-24 12:14 ` [PATCH v2 1/7] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget ` (9 more replies) 7 siblings, 10 replies; 84+ messages in thread From: Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson commit-reach: terminate merge-base walk when one side is exhausted Optimize paint_down_to_common() for merge-base queries that hit large one-sided histories. When the walk from one side reaches a commit with a very low generation number that the other side never paints, the walk is forced to drain most of the graph. A common trigger is a repository import that grafts a separate history with its own root, but any merge that introduces a low-generation commit never painted by the other side has the same effect. A new merge-base candidate can only be discovered when exclusive PARENT1 and PARENT2 paint meet. This series teaches paint_down_to_common() to stop as soon as one side has no exclusive commits left in the queue; once one side is exhausted, no further candidates can appear. origin/HEAD o o PR HEAD | | (import) o : / \ / | o merge-base | | : : (~2.5M commits) | | import root main root In the RFC thread [1], Derrick Stolee provided a criss-cross counterexample that sharpened the halt condition, and Elijah Newren independently discovered the same optimization and shared an implementation in PR #2150 [2]. Patches 2-3 incorporate test cases from Elijah's branch. This series implements the optimization only after the walk enters the finite-generation region, where generation ordering guarantees that paint on visited commits is final. Patch layout: 1/7 Documentation/technical: add paint-down-to-common doc 2/7 t6600: add test cases for side-exhaustion edge cases 3/7 t6099, t6600: add side-exhaustion regression tests 4/7 commit-reach: add trace2 instrumentation to paint_down_to_common() 5/7 commit-reach: introduce struct paint_state with per-side counters 6/7 commit-reach: remove unused nonstale_queue dedup wrappers 7/7 commit-reach: terminate merge-base walk when one paint side is exhausted Benchmarks Step counts are deterministic (measured via trace2_data_intmax added in patch 4). Wall-clock times are medians over 10-20 runs with CPU governor set to performance. 2.6M-commit monorepo with commit-graph (baseline v2.55.0-rc1): steps wall-clock merge-base --all (across import) 2682391 -> 53521 7.26s -> 88ms merge-base --all (1000 apart) 2659607 -> 1106 6.98s -> 8ms merge-tree (across import) - 8.11s -> 100ms git.git (88k commits, commit-graph): steps wall-clock merge-base --all v2.0.0 v2.55.0-rc1 72264 -> 44589 82ms -> 49ms merge-base --all HEAD HEAD~1000 9873 -> 3817 19ms -> 9ms merge-base --all HEAD HEAD~10000 72285 -> 41523 80ms -> 48ms merge-base HEAD HEAD~1000 - 9ms -> 9ms merge-base --is-ancestor HEAD~1000 HEAD - 6ms -> 6ms Changes since v1: * Reordered patches: documentation first (describing the existing algorithm), tests before code changes, so they demonstrate passing with old logic first. * Dropped the ahead_behind decoupling patch. paint_state is now a NEW struct alongside nonstale_queue instead of replacing it. ahead_behind() is completely untouched. * Removed nonstale_queue_put_dedup() and nonstale_queue_get_dedup() (dead code after the conversion) in a separate commit. * Renamed: struct paint_queue -> paint_state, field pq -> queue, paint_count_add/remove -> paint_count_update (single function with signed delta parameter). * Split the old paint_count_transition (which handled both old and new flags in one call) into separate remove/add calls with a signed delta. This eliminates the need for the case 0 handler (which tracked "not in the queue") and allows an exhaustive switch on (PARENT1 | PARENT2 | STALE) that documents all valid flag combinations, with BUG() in default. * Moved all termination conditions into paint_queue_get(). The all-zero check and the side-exhaustion check are merged under a shared !pending_merge_bases guard. paint_queue_get() derives the generation from the dequeued commit itself, so no extra parameter is needed. * Added trace2_data_intmax() instrumentation to report the number of commits visited per paint walk (separate commit), with deterministic step-count assertions in t6600. * Expanded switch statements to multi-line format per .clang-format. * Used !count style throughout instead of count == 0. * Updated technical documentation alongside code changes. * Added benchmark data (both git-bench wall-clock and trace2 step counts) to commit messages. [1] https://lore.kernel.org/git/CAL71e4Ps-2_0+uuZu43N9pFnXBemoAohPs_eyRJf8taXHJPAXQ@mail.gmail.com/T/#u [2] https://github.com/gitgitgadget/git/pull/2150 Elijah Newren (1): t6600: add test cases for side-exhaustion edge cases Kristofer Karlsson (6): Documentation/technical: add paint-down-to-common doc t6099, t6600: add side-exhaustion regression tests commit-reach: add trace2 instrumentation to paint_down_to_common() commit-reach: introduce struct paint_state with per-side counters commit-reach: remove unused nonstale_queue dedup wrappers commit-reach: terminate merge-base walk when one paint side is exhausted Documentation/Makefile | 1 + Documentation/technical/meson.build | 1 + .../technical/paint-down-to-common.adoc | 128 ++++++++++++++ commit-reach.c | 119 ++++++++++--- t/meson.build | 1 + t/t6099-merge-base-side-exhaustion.sh | 82 +++++++++ t/t6600-test-reach.sh | 157 ++++++++++++++++++ 7 files changed, 464 insertions(+), 25 deletions(-) create mode 100644 Documentation/technical/paint-down-to-common.adoc create mode 100755 t/t6099-merge-base-side-exhaustion.sh base-commit: ab776a62a78576513ee121424adb19597fbb7613 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2149%2Fspkrka%2Fside-exhaust-pr-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2149/spkrka/side-exhaust-pr-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/2149 Range-diff vs v1: 1: 5492acda0a < -: ---------- commit-reach: decouple ahead_behind from nonstale_queue 6: 9cbfc67d72 ! 1: 19ed743bd1 Documentation/technical: add paint-down-to-common doc @@ Commit message Documentation/technical: add paint-down-to-common doc Add a technical document describing the paint_down_to_common() - algorithm used for merge-base computation. + algorithm used for merge-base computation, covering the paint + walk, generation number regions, and termination conditions. Signed-off-by: Kristofer Karlsson <krka@spotify.com> @@ Documentation/technical/paint-down-to-common.adoc (new) +Termination +----------- + -+Termination happens when we can prove that no extra progress is -+possible. We are done with the main loop when one of the following -+conditions holds: ++The walk uses a `nonstale_queue` wrapper around `prio_queue` that ++tracks `max_nonstale`: the lowest-priority non-stale commit enqueued ++so far. Once that commit is dequeued, every remaining entry is known ++to be STALE and the loop terminates. Specifically, the main loop ++ends when one of the following conditions holds: + + 1. The queue is empty. -+ 2. The queue only contains STALE entries. -+ 3. Side-exhaustion: the walk has reached the finite region and one -+ of the sides is fully exhausted. -+ -+The loop waits for all pending merge-base candidates to be popped -+and recorded before any early exit fires, so no separate drain phase -+is needed after termination. ++ 2. `max_nonstale` has been dequeued, meaning the queue only contains ++ STALE entries. + +Stale entry condition +~~~~~~~~~~~~~~~~~~~~~ -+If all entries are stale we cannot find any new merge bases since -+that requires at least one enqueued side node meeting the other side. -+However, we could still invalidate merge bases (if there are more -+than one). This is unnecessary since `remove_redundant()` will clean -+that up as a post-process step. -+ -+Side-exhaustion -+~~~~~~~~~~~~~~~ -+A commit is *exclusive* to one side if it carries that side's paint -+but not the other (e.g. PARENT1 without PARENT2). -+ -+If we have reached the finite region of the graph, no future -+traversal step can add paint to an already-visited commit. Thus if -+there are no exclusive PARENT2 commits in the queue, no additional -+PARENT2 paint can be introduced into the walk. Even if exclusive -+PARENT1 commits remain, no new merge-base candidates can be -+discovered. The same holds symmetrically for PARENT1. -+ -+This invariant is only valid in the finite region of the graph. ++Once all queued entries are stale, no new merge-base candidates can ++be discovered -- that requires at least one non-stale commit from ++each side meeting. Continuing the walk could still invalidate ++existing candidates by proving one is an ancestor of another, but ++`remove_redundant()` handles that as a post-processing step, so it ++is safe to exit early. + +Related documentation +--------------------- + + - `Documentation/technical/commit-graph.adoc` -- generation numbers + and the reachability closure property. + + ## commit-reach.c ## +@@ commit-reach.c: static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue) + return commit; + } + +-/* all input commits in one and twos[] must have been parsed! */ ++/* ++ * See Documentation/technical/paint-down-to-common.adoc ++ * ++ * All input commits in one and twos[] must have been parsed! ++ */ + static int paint_down_to_common(struct repository *r, + struct commit *one, int n, + struct commit **twos, 4: 91372b975f ! 2: 6151b8e0a3 t6600: add test cases for side-exhaustion edge cases @@ t/t6600-test-reach.sh: test_expect_success 'setup' ' + # ps-T1 ps-T2 + # + # where ps-T1=merge(ps-Z,ps-B), ps-T2=merge(ps-W,ps-B), so -+ # merge-base(ps-T1,ps-T2) = ps-B. During the walk, ps-X transitions ++ # merge-base(ps-T1,ps-T2) = ps-B. During the walk, ps-X transitions + # to (PARENT1|PARENT2) via ps-Z and ps-W before ps-B is dequeued; + # then the STALE-walk from ps-B transitions ps-X to + # (PARENT1|PARENT2|STALE). @@ t/t6600-test-reach.sh: test_expect_success 'setup' ' + + # Build a side topology that lives entirely outside the half + # commit-graph and has non-monotonic commit dates, to exercise the -+ # INFINITY-gate in paint_down_to_common. With both tips outside ++ # INFINITY-gate in paint_down_to_common. With both tips outside + # the graph, generation is INFINITY and the queue falls back to + # commit-date order, which here is non-monotonic. + # @@ t/t6600-test-reach.sh: test_expect_success 'get_merge_bases_many' ' + +test_expect_success 'get_merge_bases_many:pending-stale' ' + # Exercises the (PARENT1|PARENT2) -> (...|STALE) transition path in -+ # paint_down_to_common(). See the topology comment in the setup test. ++ # paint_down_to_common(). See the topology comment in the setup test. + cat >input <<-\EOF && + A:ps-T1 + X:ps-T2 @@ t/t6600-test-reach.sh: test_expect_success 'get_merge_bases_many' ' +' + +test_expect_success 'get_merge_bases_many:infinity-both-sides' ' -+ # Exercises the push-time INFINITY-gate in paint_down_to_common(). See ++ # Exercises the push-time INFINITY-gate in paint_down_to_common(). See + # the pi-* topology comment in the setup test. + cat >input <<-\EOF && + A:pi-X 5: faf5bc98ed ! 3: 90f09ecb5c t6099, t6600: add side-exhaustion regression tests @@ t/t6099-merge-base-side-exhaustion.sh (new) + +Test that merge-base --all correctly handles cases where +multiple merge-base candidates exist and one is an ancestor -+of another. The side-exhaustion optimization in ++of another. The side-exhaustion optimization in +paint_down_to_common may exit before STALE propagation +removes the ancestor, but remove_redundant catches it. + @@ t/t6600-test-reach.sh: test_expect_success 'get_merge_bases_many:infinity-both-s +test_expect_success 'setup mixed finite/INFINITY topology' ' + # Create a commit outside all saved commit-graph files so it always + # has INFINITY generation, while its parent (ps-X) is in the graph -+ # with a finite generation. Use the ps-* orphan topology so we do ++ # with a finite generation. Use the ps-* orphan topology so we do + # not pollute the grid-based rev-list tests. + git checkout ps-X && + test_env GIT_TEST_COMMIT_GRAPH= test_commit pm-INF @@ t/t6600-test-reach.sh: test_expect_success 'get_merge_bases_many:infinity-both-s +test_expect_success 'get_merge_bases_many:mixed-finite-infinity' ' + # One tip (pm-INF) is outside the commit-graph with INFINITY + # generation; the other (ps-B) is in the graph with finite -+ # generation. The walk starts in the INFINITY region and crosses ++ # generation. The walk starts in the INFINITY region and crosses + # into the finite region where side-exhaustion can fire. + cat >input <<-\EOF && + A:pm-INF -: ---------- > 4: 6ade4df2ed commit-reach: add trace2 instrumentation to paint_down_to_common() 2: 316e4dfe26 ! 5: f24edd45f0 commit-reach: introduce struct paint_queue with per-side counters @@ Metadata Author: Kristofer Karlsson <krka@spotify.com> ## Commit message ## - commit-reach: introduce struct paint_queue with per-side counters + commit-reach: introduce struct paint_state with per-side counters - Replace the nonstale_queue abstraction in paint_down_to_common() with - a new paint_queue struct that tracks per-side commit counts. Each - non-stale queued commit occupies exactly one counter bucket based on - its paint flags: PARENT1-only, PARENT2-only, or both sides (a pending + Add a paint_state struct for use by paint_down_to_common() that + wraps a prio_queue with per-side commit counters. Each non-stale + queued commit occupies exactly one counter bucket based on its + paint flags: PARENT1-only, PARENT2-only, or both sides (a pending merge-base candidate). - The counters are maintained by paint_count_transition() which handles - all flag changes as bucket transfers: remove from the old bucket, add - to the new one. Either step is a no-op when the respective state has - no bucket (stale or zero). + The counters are maintained by paint_count_update() which adjusts + the appropriate bucket by a signed delta. An exhaustive switch on + the paint+stale bits documents all valid flag combinations in one + place. - The loop now drains the queue via paint_queue_get() and breaks when - all counters reach zero, replacing the old pointer-based termination - (max_nonstale). This is equivalent behavior. + Convert paint_down_to_common() to use paint_state. The loop now + drains the queue via paint_queue_get() which returns NULL when all + counters reach zero, replacing the old pointer-based termination + (max_nonstale). This is equivalent behavior -- both conditions + detect that no non-stale entries remain. + + The existing nonstale_queue is left in place for ahead_behind(). + + Step counts (via trace2 from the previous commit) are identical + before and after this refactoring, confirming no behavioral change. Signed-off-by: Kristofer Karlsson <krka@spotify.com> + ## Documentation/technical/paint-down-to-common.adoc ## +@@ Documentation/technical/paint-down-to-common.adoc: re-enqueued is bounded by the number of flag transitions. + Termination + ----------- + +-The walk uses a `nonstale_queue` wrapper around `prio_queue` that +-tracks `max_nonstale`: the lowest-priority non-stale commit enqueued +-so far. Once that commit is dequeued, every remaining entry is known +-to be STALE and the loop terminates. Specifically, the main loop ++The walk tracks the number of commits of each type in the queue ++(PARENT1-only, PARENT2-only, pending merge-base). The main loop + ends when one of the following conditions holds: + + 1. The queue is empty. +- 2. `max_nonstale` has been dequeued, meaning the queue only contains +- STALE entries. ++ 2. The queue contains only stale entries. + + Stale entry condition + ~~~~~~~~~~~~~~~~~~~~~ + ## commit-reach.c ## -@@ commit-reach.c: static int compare_commits_by_gen(const void *_a, const void *_b) +@@ commit-reach.c: static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue) + return commit; } - /* -- * A prio_queue with O(1) termination check. 'max_nonstale' tracks -- * the lowest-priority non-stale commit enqueued so far; once it is -- * popped, every remaining entry is known to be STALE. ++/* + * Priority queue with per-side commit counters for paint_down_to_common(). + * Each non-stale queued commit occupies exactly one bucket: PARENT1-only, + * PARENT2-only, or both (a pending merge-base candidate). - */ --struct nonstale_queue { -+struct paint_queue { - struct prio_queue pq; -- struct commit *max_nonstale; ++ */ ++struct paint_state { ++ struct prio_queue queue; + int p1_count; + int p2_count; + int pending_merge_bases; - }; - --static void nonstale_queue_put(struct nonstale_queue *queue, -- struct commit *c) -+/* -+ * Adjust per-side counters for a paint-state transition. Non-stale -+ * commits are counted in one of three counters: PARENT1-only, -+ * PARENT2-only, or both. Zero means "not in the queue" (used on -+ * enqueue/dequeue); stale commits are not counted at all. -+ */ -+static void paint_count_transition(struct paint_queue *queue, -+ unsigned old_flags, unsigned new_flags) - { -- struct commit *old = queue->max_nonstale; -+ unsigned old_paint = old_flags & (PARENT1 | PARENT2 | STALE); -+ unsigned new_paint = new_flags & (PARENT1 | PARENT2 | STALE); - -- prio_queue_put(&queue->pq, c); -- if (c->object.flags & STALE) -+ if (old_paint == new_paint) - return; -- if (!old || queue->pq.compare(old, c, queue->pq.cb_data) <= 0) -- queue->max_nonstale = c; --} -- --static struct commit *nonstale_queue_get(struct nonstale_queue *queue) --{ -- struct commit *commit = prio_queue_get(&queue->pq); - -- if (commit == queue->max_nonstale) -- queue->max_nonstale = NULL; -- -- return commit; -+ if (!(old_paint & STALE)) { -+ switch (old_paint & (PARENT1 | PARENT2)) { -+ case 0: break; -+ case PARENT1: queue->p1_count--; break; -+ case PARENT2: queue->p2_count--; break; -+ case PARENT1 | PARENT2: queue->pending_merge_bases--; break; -+ default: BUG("unexpected paint state"); -+ } -+ } -+ if (!(new_paint & STALE)) { -+ switch (new_paint & (PARENT1 | PARENT2)) { -+ case 0: break; -+ case PARENT1: queue->p1_count++; break; -+ case PARENT2: queue->p2_count++; break; -+ case PARENT1 | PARENT2: queue->pending_merge_bases++; break; -+ default: BUG("unexpected paint state"); -+ } ++}; ++ ++static void paint_count_update(struct paint_state *state, ++ unsigned flags, int delta) ++{ ++ switch (flags & (PARENT1 | PARENT2 | STALE)) { ++ case PARENT1: ++ state->p1_count += delta; ++ break; ++ ++ case PARENT2: ++ state->p2_count += delta; ++ break; ++ ++ case PARENT1 | PARENT2: ++ state->pending_merge_bases += delta; ++ break; ++ ++ case PARENT1 | PARENT2 | STALE: ++ break; ++ ++ default: ++ BUG("unexpected paint state"); + } - } - --static void clear_nonstale_queue(struct nonstale_queue *queue) -+static void paint_queue_put(struct paint_queue *queue, ++} ++ ++static void paint_queue_put(struct paint_state *state, + struct commit *c, unsigned add_flags) - { -- clear_prio_queue(&queue->pq); -- queue->max_nonstale = NULL; --} ++{ + unsigned old_flags = c->object.flags; + c->object.flags |= add_flags; - --static void nonstale_queue_put_dedup(struct nonstale_queue *queue, -- struct commit *c) --{ -- if (c->object.flags & ENQUEUED) -- return; -- c->object.flags |= ENQUEUED; -- nonstale_queue_put(queue, c); ++ + if (old_flags & ENQUEUED) { -+ paint_count_transition(queue, old_flags, c->object.flags); ++ paint_count_update(state, old_flags, -1); ++ paint_count_update(state, c->object.flags, 1); + } else { + c->object.flags |= ENQUEUED; -+ prio_queue_put(&queue->pq, c); -+ paint_count_transition(queue, 0, c->object.flags); ++ prio_queue_put(&state->queue, c); ++ paint_count_update(state, c->object.flags, 1); + } - } - --static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue) -+static struct commit *paint_queue_get(struct paint_queue *queue) - { -- struct commit *commit = nonstale_queue_get(queue); -+ struct commit *commit = prio_queue_get(&queue->pq); - -- if (commit) ++} ++ ++static struct commit *paint_queue_get(struct paint_state *state) ++{ ++ struct commit *commit; ++ ++ if (!state->p1_count && !state->p2_count && ++ !state->pending_merge_bases) ++ return NULL; ++ ++ commit = prio_queue_get(&state->queue); + if (commit) { - commit->object.flags &= ~ENQUEUED; -+ paint_count_transition(queue, commit->object.flags, 0); ++ commit->object.flags &= ~ENQUEUED; ++ paint_count_update(state, commit->object.flags, -1); + } - return commit; - } - ++ return commit; ++} ++ + /* + * See Documentation/technical/paint-down-to-common.adoc + * @@ commit-reach.c: static int paint_down_to_common(struct repository *r, enum merge_base_flags mb_flags, struct commit_list **result) { - struct nonstale_queue queue = { - { compare_commits_by_gen_then_commit_date } -+ struct paint_queue queue = { -+ .pq = { compare_commits_by_gen_then_commit_date } ++ struct paint_state state = { ++ .queue = { compare_commits_by_gen_then_commit_date } }; + struct commit *commit; int i; + int steps = 0; timestamp_t last_gen = GENERATION_NUMBER_INFINITY; struct commit_list **tail = result; -@@ commit-reach.c: static int paint_down_to_common(struct repository *r, + + if (!min_generation && !corrected_commit_dates_enabled(r)) +- queue.pq.compare = compare_commits_by_commit_date; ++ state.queue.compare = compare_commits_by_commit_date; + + one->object.flags |= PARENT1; + if (!n) { commit_list_append(one, result); return 0; } - nonstale_queue_put_dedup(&queue, one); -+ paint_queue_put(&queue, one, 0); ++ paint_queue_put(&state, one, 0); - for (i = 0; i < n; i++) { - twos[i]->object.flags |= PARENT2; - nonstale_queue_put_dedup(&queue, twos[i]); - } + for (i = 0; i < n; i++) -+ paint_queue_put(&queue, twos[i], PARENT2); ++ paint_queue_put(&state, twos[i], PARENT2); - while (queue.max_nonstale) { - struct commit *commit = nonstale_queue_get_dedup(&queue); -+ while ((commit = paint_queue_get(&queue))) { ++ while ((commit = paint_queue_get(&state))) { struct commit_list *parents; int flags; timestamp_t generation = commit_graph_generation(commit); @@ commit-reach.c: static int paint_down_to_common(struct repository *r, continue; if (repo_parse_commit(r, p)) { - clear_nonstale_queue(&queue); -+ clear_prio_queue(&queue.pq); ++ clear_prio_queue(&state.queue); commit_list_free(*result); *result = NULL; /* @@ commit-reach.c: static int paint_down_to_common(struct repository *r, } - p->object.flags |= flags; - nonstale_queue_put_dedup(&queue, p); -+ paint_queue_put(&queue, p, flags); ++ paint_queue_put(&state, p, flags); } -+ -+ if (queue.p1_count + queue.p2_count + -+ queue.pending_merge_bases == 0) -+ break; } - clear_nonstale_queue(&queue); -+ clear_prio_queue(&queue.pq); ++ clear_prio_queue(&state.queue); + trace2_data_intmax("paint_down_to_common", r, + "steps", steps); commit_list_sort_by_date(result); - return 0; - } -: ---------- > 6: 8c72f01083 commit-reach: remove unused nonstale_queue dedup wrappers 3: ed12a5cb5b ! 7: d84b932e5b commit-reach: terminate merge-base walk when one paint side is exhausted @@ Commit message commit-reach: terminate merge-base walk when one paint side is exhausted Add an early termination check to paint_down_to_common() using the - per-side counters introduced in the previous commit. Once the walk - enters the finite-generation region, terminate early when one side's - exclusive count drops to zero -- no new merge-base can form without - both paint sides meeting. + per-side counters introduced earlier. Once the walk enters the + finite-generation region, terminate early when one side's exclusive + count drops to zero -- no new merge-base can form without both paint + sides meeting. The check also waits for pending_merge_bases to reach zero, ensuring - all merge-base candidates have been popped and recorded before + all merge-base candidates have been dequeued and recorded before exiting. The INFINITY gate ensures correctness: commits without a commit-graph @@ Commit message once the walk enters the finite-generation region where ordering guarantees hold. - On large repositories with commit-graph, this yields 100-1000x - speedups for merge-base queries where one side (e.g. a PR branch) is - much smaller than the other. + Step counts measured with trace2 on git.git with commit-graph: + + merge-base --all v2.0.0 v2.55.0-rc1: + before: 72264 steps after: 44589 steps + + merge-base --all v2.55.0-rc1 v2.55.0-rc1~5: + before: 110 steps after: 7 steps Helped-by: Derrick Stolee <stolee@gmail.com> Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Kristofer Karlsson <krka@spotify.com> + ## Documentation/technical/paint-down-to-common.adoc ## +@@ Documentation/technical/paint-down-to-common.adoc: ends when one of the following conditions holds: + + 1. The queue is empty. + 2. The queue contains only stale entries. ++ 3. Side exhaustion: no pure PARENT1 or pure PARENT2 commits ++ remain in the queue, no pending merge-base candidates exist, ++ and the walk has entered the finite-generation region. + + Stale entry condition + ~~~~~~~~~~~~~~~~~~~~~ +@@ Documentation/technical/paint-down-to-common.adoc: existing candidates by proving one is an ancestor of another, but + `remove_redundant()` handles that as a post-processing step, so it + is safe to exit early. + ++Side-exhaustion condition ++~~~~~~~~~~~~~~~~~~~~~~~~~ ++A new merge-base requires commits from both sides to meet. When one ++side's exclusive counter reaches zero and there are no pending ++merge-base candidates, no future traversal step can produce a new ++candidate. ++ ++This optimization only activates in the finite-generation region ++where topological ordering holds. In that region, children are ++always visited before parents, so paint flags are final at visit ++time and an exhausted side cannot reappear. In the INFINITY region, ++commit-date ordering can violate this guarantee, so the check is ++skipped. ++ + Related documentation + --------------------- + + ## commit-reach.c ## -@@ commit-reach.c: static int paint_down_to_common(struct repository *r, - if (queue.p1_count + queue.p2_count + - queue.pending_merge_bases == 0) - break; +@@ commit-reach.c: static void paint_queue_put(struct paint_state *state, + + static struct commit *paint_queue_get(struct paint_state *state) + { +- struct commit *commit; ++ struct commit *commit = prio_queue_get(&state->queue); + +- if (!state->p1_count && !state->p2_count && +- !state->pending_merge_bases) ++ if (!commit) + return NULL; + +- commit = prio_queue_get(&state->queue); +- if (commit) { +- commit->object.flags &= ~ENQUEUED; +- paint_count_update(state, commit->object.flags, -1); ++ commit->object.flags &= ~ENQUEUED; + ++ if (!state->pending_merge_bases) { ++ if (!state->p1_count && !state->p2_count) ++ return NULL; + /* + * Side exhaustion: a new merge-base can only form + * when both PARENT1-only and PARENT2-only commits -+ * remain in the queue. In the finite-generation ++ * remain in the queue. In the finite-generation + * region the queue is ordered topologically, so + * no future step can add paint to visited commits + * and an exhausted side cannot reappear. + */ -+ if (generation < GENERATION_NUMBER_INFINITY && -+ queue.pending_merge_bases == 0 && -+ (queue.p1_count == 0 || queue.p2_count == 0)) -+ break; ++ if ((!state->p1_count || !state->p2_count) && ++ commit_graph_generation(commit) < GENERATION_NUMBER_INFINITY) ++ return NULL; } ++ ++ paint_count_update(state, commit->object.flags, -1); + return commit; + } + + + ## t/t6600-test-reach.sh ## +@@ t/t6600-test-reach.sh: test_expect_success 'merge-base --all commit-walk steps' ' + cp commit-graph-full .git/objects/info/commit-graph && + GIT_TRACE2_EVENT="$(pwd)/trace-full.txt" \ + git merge-base --all commit-9-9 commit-9-1 >actual && +- test_trace2_data paint_down_to_common steps 80 <trace-full.txt && ++ test_trace2_data paint_down_to_common steps 9 <trace-full.txt && + + cp commit-graph-half .git/objects/info/commit-graph && + GIT_TRACE2_EVENT="$(pwd)/trace-half.txt" \ + git merge-base --all commit-9-9 commit-9-1 >actual && +- test_trace2_data paint_down_to_common steps 81 <trace-half.txt ++ test_trace2_data paint_down_to_common steps 57 <trace-half.txt + ' - clear_prio_queue(&queue.pq); + test_expect_success 'reduce_heads' ' -- gitgitgadget ^ permalink raw reply [flat|nested] 84+ messages in thread
* [PATCH v2 1/7] Documentation/technical: add paint-down-to-common doc 2026-06-24 12:14 ` [PATCH v2 0/7] " Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 ` Kristofer Karlsson via GitGitGadget 2026-06-24 17:09 ` Junio C Hamano 2026-06-24 12:14 ` [PATCH v2 2/7] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget ` (8 subsequent siblings) 9 siblings, 1 reply; 84+ messages in thread From: Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Kristofer Karlsson From: Kristofer Karlsson <krka@spotify.com> Add a technical document describing the paint_down_to_common() algorithm used for merge-base computation, covering the paint walk, generation number regions, and termination conditions. Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- Documentation/Makefile | 1 + Documentation/technical/meson.build | 1 + .../technical/paint-down-to-common.adoc | 114 ++++++++++++++++++ commit-reach.c | 6 +- 4 files changed, 121 insertions(+), 1 deletion(-) create mode 100644 Documentation/technical/paint-down-to-common.adoc diff --git a/Documentation/Makefile b/Documentation/Makefile index 2699f0b24a..f8dea4b395 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -129,6 +129,7 @@ TECH_DOCS += technical/long-running-process-protocol TECH_DOCS += technical/multi-pack-index TECH_DOCS += technical/packfile-uri TECH_DOCS += technical/pack-heuristics +TECH_DOCS += technical/paint-down-to-common TECH_DOCS += technical/parallel-checkout TECH_DOCS += technical/partial-clone TECH_DOCS += technical/platform-support diff --git a/Documentation/technical/meson.build b/Documentation/technical/meson.build index ec07088c57..9ce11d5e48 100644 --- a/Documentation/technical/meson.build +++ b/Documentation/technical/meson.build @@ -18,6 +18,7 @@ articles = [ 'multi-pack-index.adoc', 'packfile-uri.adoc', 'pack-heuristics.adoc', + 'paint-down-to-common.adoc', 'parallel-checkout.adoc', 'partial-clone.adoc', 'platform-support.adoc', diff --git a/Documentation/technical/paint-down-to-common.adoc b/Documentation/technical/paint-down-to-common.adoc new file mode 100644 index 0000000000..c10d5d2887 --- /dev/null +++ b/Documentation/technical/paint-down-to-common.adoc @@ -0,0 +1,114 @@ +Merge-Base Computation and paint_down_to_common() +================================================== + +The function `paint_down_to_common()` in `commit-reach.c` computes merge +bases by walking the commit graph backwards from two sets of tips and +finding where their ancestry meets. + +Use cases +--------- + +Computing merge bases is used in two different ways: + + 1. *Finding all merge bases* (`merge-base --all`, `merge-tree`, + `merge`, `rebase`). A merge base is a common ancestor that is + not itself an ancestor of another common ancestor. + + 2. *Ancestry checks* (`in_merge_bases`, used by `merge-base + --is-ancestor`, `branch -d`, `fetch`). These ask: "is commit A + an ancestor of commit B?" If a common ancestor equals one of the + inputs, that input is necessarily the only merge base -- no other + common ancestor can be both as recent and not an ancestor of it. + +Both use cases share the same algorithm and implementation. + +Algorithm +--------- + +Given a commit `one` and a set of commits `twos[]`, the walk paints +commits with two colors: + + - PARENT1: reachable from `one` + - PARENT2: reachable from any commit in `twos[]` + +The walk uses a priority queue ordered by generation number (falling +back to commit date when generation numbers are unavailable). Each +step dequeues the highest-priority commit (this is when we say a +commit is "visited") and propagates its paint flags to its parents, +enqueuing them if they gained new flags. When a commit receives +both PARENT1 and PARENT2, it is a merge-base candidate. A candidate +gains the STALE flag so its ancestors propagate staleness -- any +deeper common ancestor is necessarily redundant. + +INFINITY and finite generation regions +-------------------------------------- + +The commit-graph stores a generation number for each commit. Commits +not in the commit-graph have generation `GENERATION_NUMBER_INFINITY`. The +graph is closed under reachability: if a commit is in the graph, all +its ancestors are too. This partitions the commit graph into two regions: + +.... + +---------------------------------------+ + | INFINITY region | + | generation = INFINITY | + | queue order: heuristic (commit date) | + +---------------------------------------+ + | + v + +---------------------------------------+ + | Finite region | + | generation = finite | + | queue order: topological | + +---------------------------------------+ +.... + +When the commit-graph is enabled, the INFINITY region is typically +very small -- it only contains commits added since the last +commit-graph refresh. + +All reachable INFINITY-generation commits are visited before any +finite-generation commit, because INFINITY is larger than any finite +value. Once the walk crosses into the finite region, it stays there. + +In the finite region, generation ordering guarantees topological +traversal: children are always visited before their parents. This +means that paint on already-visited commits is final -- no future +traversal step can add paint to them. + +In the INFINITY region, commit-date ordering can violate this: a +parent with a later date can be visited before a child with an earlier +date. Paint flags are therefore NOT final at visit time, and a +commit visited with only one side's paint may later gain the other. + +Paint flags are only added, never removed. Since each flag can be set +at most once per commit, the number of times a commit can be +re-enqueued is bounded by the number of flag transitions. + +Termination +----------- + +The walk uses a `nonstale_queue` wrapper around `prio_queue` that +tracks `max_nonstale`: the lowest-priority non-stale commit enqueued +so far. Once that commit is dequeued, every remaining entry is known +to be STALE and the loop terminates. Specifically, the main loop +ends when one of the following conditions holds: + + 1. The queue is empty. + 2. `max_nonstale` has been dequeued, meaning the queue only contains + STALE entries. + +Stale entry condition +~~~~~~~~~~~~~~~~~~~~~ +Once all queued entries are stale, no new merge-base candidates can +be discovered -- that requires at least one non-stale commit from +each side meeting. Continuing the walk could still invalidate +existing candidates by proving one is an ancestor of another, but +`remove_redundant()` handles that as a post-processing step, so it +is safe to exit early. + +Related documentation +--------------------- + + - `Documentation/technical/commit-graph.adoc` -- generation numbers + and the reachability closure property. diff --git a/commit-reach.c b/commit-reach.c index 5df471a313..a9483759e0 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -96,7 +96,11 @@ static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue) return commit; } -/* all input commits in one and twos[] must have been parsed! */ +/* + * See Documentation/technical/paint-down-to-common.adoc + * + * All input commits in one and twos[] must have been parsed! + */ static int paint_down_to_common(struct repository *r, struct commit *one, int n, struct commit **twos, -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
* Re: [PATCH v2 1/7] Documentation/technical: add paint-down-to-common doc 2026-06-24 12:14 ` [PATCH v2 1/7] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget @ 2026-06-24 17:09 ` Junio C Hamano 0 siblings, 0 replies; 84+ messages in thread From: Junio C Hamano @ 2026-06-24 17:09 UTC (permalink / raw) To: Kristofer Karlsson via GitGitGadget Cc: git, Derrick Stolee, Elijah Newren, Kristofer Karlsson "Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Kristofer Karlsson <krka@spotify.com> > > Add a technical document describing the paint_down_to_common() > algorithm used for merge-base computation, covering the paint > walk, generation number regions, and termination conditions. > > Signed-off-by: Kristofer Karlsson <krka@spotify.com> > --- > Documentation/Makefile | 1 + > Documentation/technical/meson.build | 1 + > .../technical/paint-down-to-common.adoc | 114 ++++++++++++++++++ > commit-reach.c | 6 +- > 4 files changed, 121 insertions(+), 1 deletion(-) > create mode 100644 Documentation/technical/paint-down-to-common.adoc Great write-up that very clearly and concisely explains what goes on inside the merge-base computation. Thanks for a pleasant read. ^ permalink raw reply [flat|nested] 84+ messages in thread
* [PATCH v2 2/7] t6600: add test cases for side-exhaustion edge cases 2026-06-24 12:14 ` [PATCH v2 0/7] " Kristofer Karlsson via GitGitGadget 2026-06-24 12:14 ` [PATCH v2 1/7] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 ` Elijah Newren via GitGitGadget 2026-06-24 13:43 ` Derrick Stolee 2026-06-24 12:14 ` [PATCH v2 3/7] t6099, t6600: add side-exhaustion regression tests Kristofer Karlsson via GitGitGadget ` (7 subsequent siblings) 9 siblings, 1 reply; 84+ messages in thread From: Elijah Newren via GitGitGadget @ 2026-06-24 12:14 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Elijah Newren From: Elijah Newren <newren@gmail.com> Add test cases to t6600-test-reach.sh that exercise edge cases in the side-exhaustion optimization for paint_down_to_common(): - in_merge_bases_many:self: commit is both A and one of the X inputs - get_merge_bases_many:duplicate-twos: duplicate entries in X list - get_merge_bases_many:pending-stale: STALE transition on an already-painted commit (ps-* diamond topology) - get_merge_bases_many:infinity-both-sides: both tips outside the commit-graph with non-monotonic dates (pi-* topology) Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- t/t6600-test-reach.sh | 111 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index b5b314e570..c2e091aad1 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -49,6 +49,62 @@ test_expect_success 'setup' ' git tag -a -m "$x-$i" tag-$x-$i commit-$x-$i || return 1 done done && + + # Build a small side topology to exercise the (PARENT1|PARENT2) -> + # (PARENT1|PARENT2|STALE) transition in paint_down_to_common(); the + # 10x10 grid above does not exercise it because no merge-base candidate + # there is a descendant of another, so STALE never reaches a + # still-pending candidate. + # + # ps-X + # /|\ + # / | \ + # ps-Z ps-B ps-W + # | / \ | + # | / \ | + # |/ \| + # ps-T1 ps-T2 + # + # where ps-T1=merge(ps-Z,ps-B), ps-T2=merge(ps-W,ps-B), so + # merge-base(ps-T1,ps-T2) = ps-B. During the walk, ps-X transitions + # to (PARENT1|PARENT2) via ps-Z and ps-W before ps-B is dequeued; + # then the STALE-walk from ps-B transitions ps-X to + # (PARENT1|PARENT2|STALE). + git checkout --orphan ps-orphan && + test_commit ps-X && + git checkout -b ps-B-br ps-X && test_commit ps-B && + git checkout -b ps-Z-br ps-X && test_commit ps-Z && + git checkout -b ps-W-br ps-X && test_commit ps-W && + git checkout -b ps-T1 ps-Z && + git merge --no-ff -m ps-T1 ps-B && + git checkout -b ps-T2 ps-W && + git merge --no-ff -m ps-T2 ps-B && + + # Build a side topology that lives entirely outside the half + # commit-graph and has non-monotonic commit dates, to exercise the + # INFINITY-gate in paint_down_to_common. With both tips outside + # the graph, generation is INFINITY and the queue falls back to + # commit-date order, which here is non-monotonic. + # + # pi-X (date 500, PARENT1 tip) --> pi-P, pi-D + # pi-D (date 480) --> pi-C + # pi-C (date 200) --> pi-B + # pi-B (date 100, PARENT2 tip) --> pi-P + # pi-P (date 450, root) + # + # merge-base(pi-X, pi-B) = pi-B (it is an ancestor of pi-X and is + # itself one of the queried tips). + git checkout --orphan pi-orphan && + test_commit --date "@450 +0000" pi-P && + test_commit --date "@100 +0000" pi-B && + test_commit --date "@200 +0000" pi-C && + test_commit --date "@480 +0000" pi-D && + GIT_AUTHOR_DATE="@500 +0000" GIT_COMMITTER_DATE="@500 +0000" \ + git commit-tree -p pi-D -p pi-P -m pi-X pi-D^{tree} >pi-X-oid && + pi_x="$(cat pi-X-oid)" && + git branch -f pi-X-br "$pi_x" && + git tag pi-X "$pi_x" && + git commit-graph write --reachable && mv .git/objects/info/commit-graph commit-graph-full && chmod u+w commit-graph-full && @@ -146,6 +202,16 @@ test_expect_success 'in_merge_bases_many:miss-heuristic' ' test_all_modes in_merge_bases_many ' +test_expect_success 'in_merge_bases_many:self' ' + cat >input <<-\EOF && + A:commit-6-8 + X:commit-5-9 + X:commit-6-8 + EOF + echo "in_merge_bases_many(A,X):1" >expect && + test_all_modes in_merge_bases_many +' + test_expect_success 'is_descendant_of:hit' ' cat >input <<-\EOF && A:commit-5-7 @@ -183,6 +249,51 @@ test_expect_success 'get_merge_bases_many' ' test_all_modes get_merge_bases_many ' +test_expect_success 'get_merge_bases_many:duplicate-twos' ' + cat >input <<-\EOF && + A:commit-5-7 + X:commit-4-8 + X:commit-4-8 + X:commit-6-6 + X:commit-6-6 + X:commit-8-3 + EOF + { + echo "get_merge_bases_many(A,X):" && + git rev-parse commit-5-6 \ + commit-4-7 | sort + } >expect && + test_all_modes get_merge_bases_many +' + +test_expect_success 'get_merge_bases_many:pending-stale' ' + # Exercises the (PARENT1|PARENT2) -> (...|STALE) transition path in + # paint_down_to_common(). See the topology comment in the setup test. + cat >input <<-\EOF && + A:ps-T1 + X:ps-T2 + EOF + { + echo "get_merge_bases_many(A,X):" && + git rev-parse ps-B + } >expect && + test_all_modes get_merge_bases_many +' + +test_expect_success 'get_merge_bases_many:infinity-both-sides' ' + # Exercises the push-time INFINITY-gate in paint_down_to_common(). See + # the pi-* topology comment in the setup test. + cat >input <<-\EOF && + A:pi-X + X:pi-B + EOF + { + echo "get_merge_bases_many(A,X):" && + git rev-parse pi-B + } >expect && + test_all_modes get_merge_bases_many +' + test_expect_success 'reduce_heads' ' cat >input <<-\EOF && X:commit-1-10 -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
* Re: [PATCH v2 2/7] t6600: add test cases for side-exhaustion edge cases 2026-06-24 12:14 ` [PATCH v2 2/7] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget @ 2026-06-24 13:43 ` Derrick Stolee 2026-06-24 14:33 ` Kristofer Karlsson 0 siblings, 1 reply; 84+ messages in thread From: Derrick Stolee @ 2026-06-24 13:43 UTC (permalink / raw) To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren, Kristofer Karlsson On 6/24/2026 8:14 AM, Elijah Newren via GitGitGadget wrote: > From: Elijah Newren <newren@gmail.com> > > Add test cases to t6600-test-reach.sh that exercise edge cases in the > side-exhaustion optimization for paint_down_to_common(): > > - in_merge_bases_many:self: commit is both A and one of the X inputs > - get_merge_bases_many:duplicate-twos: duplicate entries in X list > - get_merge_bases_many:pending-stale: STALE transition on an > already-painted commit (ps-* diamond topology) > - get_merge_bases_many:infinity-both-sides: both tips outside the > commit-graph with non-monotonic dates (pi-* topology) I'm happy that these cases now exist. > +test_expect_success 'in_merge_bases_many:self' ' > + cat >input <<-\EOF && > + A:commit-6-8 > + X:commit-5-9 > + X:commit-6-8 > + EOF > + echo "in_merge_bases_many(A,X):1" >expect && > + test_all_modes in_merge_bases_many > +' and using 'test_all_modes' is great to get coverage of all the different commit-graph states. In reply to patch v2 4/7 I ask to see the results of the traces in these kinds of test cases, but each of these modes will have different values. One way to make these tests have potential to check exact stats without too much extra work would be to update 'test_all_modes' to run each command with GIT_TRACE2_EVENT set to a known trace file (reset each time) that can then be checked after verifying that the results of each command is the same. Then, these tests could have lines such as test_trace2_data paint_down_to_common steps 20 <trace-full.txt && test_trace2_data paint_down_to_common steps 30 <trace-half.txt && test_trace2_data paint_down_to_common steps 40 <trace-none.txt after the test_all_modes line. Thanks, -Stolee ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v2 2/7] t6600: add test cases for side-exhaustion edge cases 2026-06-24 13:43 ` Derrick Stolee @ 2026-06-24 14:33 ` Kristofer Karlsson 0 siblings, 0 replies; 84+ messages in thread From: Kristofer Karlsson @ 2026-06-24 14:33 UTC (permalink / raw) To: Derrick Stolee; +Cc: Elijah Newren via GitGitGadget, git, Elijah Newren On Wed, 24 Jun 2026 at 15:43, Derrick Stolee <stolee@gmail.com> wrote: > > One way to make these tests have potential to check exact stats > without too much extra work would be to update 'test_all_modes' > to run each command with GIT_TRACE2_EVENT set to a known trace > file (reset each time) that can then be checked after verifying > that the results of each command is the same. > > Then, these tests could have lines such as > > test_trace2_data paint_down_to_common steps 20 <trace-full.txt && > test_trace2_data paint_down_to_common steps 30 <trace-half.txt && > test_trace2_data paint_down_to_common steps 40 <trace-none.txt > > after the test_all_modes line. That does indeed look quite clean, I will try to massage the tests to utilize that, though I haven't fully worked it through in my head yet so I am not sure if/where I would get stuck. :) Thanks, Kristofer ^ permalink raw reply [flat|nested] 84+ messages in thread
* [PATCH v2 3/7] t6099, t6600: add side-exhaustion regression tests 2026-06-24 12:14 ` [PATCH v2 0/7] " Kristofer Karlsson via GitGitGadget 2026-06-24 12:14 ` [PATCH v2 1/7] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget 2026-06-24 12:14 ` [PATCH v2 2/7] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget @ 2026-06-24 12:14 ` Kristofer Karlsson via GitGitGadget 2026-06-24 12:14 ` [PATCH v2 4/7] commit-reach: add trace2 instrumentation to paint_down_to_common() Kristofer Karlsson via GitGitGadget ` (6 subsequent siblings) 9 siblings, 0 replies; 84+ messages in thread From: Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Kristofer Karlsson From: Kristofer Karlsson <krka@spotify.com> Add t6099 to test the case where multiple merge-base candidates exist and one is an ancestor of another. This exercises the side-exhaustion optimization in paint_down_to_common together with the remove_redundant safety net in get_merge_bases_many_0. Add a mixed finite/INFINITY test to t6600 where one tip is outside the commit-graph (INFINITY generation) and the other is inside. This exercises the region transition: the walk starts in the INFINITY region where side-exhaustion is disabled, then crosses into the finite region where it can fire. Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- t/meson.build | 1 + t/t6099-merge-base-side-exhaustion.sh | 82 +++++++++++++++++++++++++++ t/t6600-test-reach.sh | 25 ++++++++ 3 files changed, 108 insertions(+) create mode 100755 t/t6099-merge-base-side-exhaustion.sh diff --git a/t/meson.build b/t/meson.build index 3219264fe7..ee6ebdffb9 100644 --- a/t/meson.build +++ b/t/meson.build @@ -786,6 +786,7 @@ integration_tests = [ 't6041-bisect-submodule.sh', 't6050-replace.sh', 't6060-merge-index.sh', + 't6099-merge-base-side-exhaustion.sh', 't6100-rev-list-in-order.sh', 't6101-rev-parse-parents.sh', 't6102-rev-list-unexpected-objects.sh', diff --git a/t/t6099-merge-base-side-exhaustion.sh b/t/t6099-merge-base-side-exhaustion.sh new file mode 100755 index 0000000000..4f1e0d50ef --- /dev/null +++ b/t/t6099-merge-base-side-exhaustion.sh @@ -0,0 +1,82 @@ +#!/bin/sh + +test_description='merge-base with ancestor among merge-base candidates + +Test that merge-base --all correctly handles cases where +multiple merge-base candidates exist and one is an ancestor +of another. The side-exhaustion optimization in +paint_down_to_common may exit before STALE propagation +removes the ancestor, but remove_redundant catches it. + +Graph shape (parents are below children): + + A ----------- X + |\ /| + | B---------/ | + | | | + e2 \ f2 + | | | + e1 d1 f1 + \ | / + \ | / + \| / + C + +A and X are the two tips. +B and C are both reachable from A and X. +B reaches C through d1. +Only B should appear in merge-base --all output. +' + +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME + +TEST_PASSES_SANITIZE_LEAK=true +. ./test-lib.sh + +test_expect_success 'setup ancestor merge-base candidate' ' + test_commit C && + + git checkout -b d-chain HEAD && + test_commit d1 && + test_commit B && + + git checkout -b e-path C && + test_commit e1 && + test_commit e2 && + + git checkout -b f-path C && + test_commit f1 && + test_commit f2 && + + git checkout -b branch-A e-path && + test_merge A B && + + git checkout -b branch-X f-path && + test_merge X B && + + git commit-graph write --reachable +' + +test_expect_success 'merge-base --all excludes ancestor candidate' ' + git rev-parse B >expected && + git merge-base --all A X >actual && + test_cmp expected actual +' + +test_expect_success 'merge-base (single) finds shallowest' ' + git rev-parse B >expected && + git merge-base A X >actual && + test_cmp expected actual +' + +# Without commit-graph: generation numbers are INFINITY, +# side-exhaustion optimization does not fire. +test_expect_success 'merge-base --all without commit-graph' ' + rm -f .git/objects/info/commit-graph && + git rev-parse B >expected && + git merge-base --all A X >actual && + test_cmp expected actual +' + +test_done diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index c2e091aad1..4b771b4c58 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -294,6 +294,31 @@ test_expect_success 'get_merge_bases_many:infinity-both-sides' ' test_all_modes get_merge_bases_many ' +test_expect_success 'setup mixed finite/INFINITY topology' ' + # Create a commit outside all saved commit-graph files so it always + # has INFINITY generation, while its parent (ps-X) is in the graph + # with a finite generation. Use the ps-* orphan topology so we do + # not pollute the grid-based rev-list tests. + git checkout ps-X && + test_env GIT_TEST_COMMIT_GRAPH= test_commit pm-INF +' + +test_expect_success 'get_merge_bases_many:mixed-finite-infinity' ' + # One tip (pm-INF) is outside the commit-graph with INFINITY + # generation; the other (ps-B) is in the graph with finite + # generation. The walk starts in the INFINITY region and crosses + # into the finite region where side-exhaustion can fire. + cat >input <<-\EOF && + A:pm-INF + X:ps-B + EOF + { + echo "get_merge_bases_many(A,X):" && + git rev-parse ps-X + } >expect && + test_all_modes get_merge_bases_many +' + test_expect_success 'reduce_heads' ' cat >input <<-\EOF && X:commit-1-10 -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v2 4/7] commit-reach: add trace2 instrumentation to paint_down_to_common() 2026-06-24 12:14 ` [PATCH v2 0/7] " Kristofer Karlsson via GitGitGadget ` (2 preceding siblings ...) 2026-06-24 12:14 ` [PATCH v2 3/7] t6099, t6600: add side-exhaustion regression tests Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 ` Kristofer Karlsson via GitGitGadget 2026-06-24 13:41 ` Derrick Stolee 2026-06-24 12:14 ` [PATCH v2 5/7] commit-reach: introduce struct paint_state with per-side counters Kristofer Karlsson via GitGitGadget ` (5 subsequent siblings) 9 siblings, 1 reply; 84+ messages in thread From: Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Kristofer Karlsson From: Kristofer Karlsson <krka@spotify.com> Add a step counter and trace2_data_intmax() call so that the number of commits visited during the paint walk is observable via GIT_TRACE2_PERF. This provides a way to measure the impact of future optimizations without relying on wall-clock benchmarks alone. Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- commit-reach.c | 5 +++++ t/t6600-test-reach.sh | 21 +++++++++++++++++++++ 2 files changed, 26 insertions(+) diff --git a/commit-reach.c b/commit-reach.c index a9483759e0..f6a438550b 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -11,6 +11,7 @@ #include "tag.h" #include "commit-reach.h" #include "ewah/ewok.h" +#include "trace2.h" /* Remember to update object flag allocation in object.h */ #define PARENT1 (1u<<16) @@ -112,6 +113,7 @@ static int paint_down_to_common(struct repository *r, { compare_commits_by_gen_then_commit_date } }; int i; + int steps = 0; timestamp_t last_gen = GENERATION_NUMBER_INFINITY; struct commit_list **tail = result; @@ -135,6 +137,7 @@ static int paint_down_to_common(struct repository *r, struct commit_list *parents; int flags; timestamp_t generation = commit_graph_generation(commit); + steps++; if (min_generation && generation > last_gen) BUG("bad generation skip %"PRItime" > %"PRItime" at %s", @@ -190,6 +193,8 @@ static int paint_down_to_common(struct repository *r, } clear_nonstale_queue(&queue); + trace2_data_intmax("paint_down_to_common", r, + "steps", steps); commit_list_sort_by_date(result); return 0; } diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index 4b771b4c58..c1109fb42f 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -319,6 +319,27 @@ test_expect_success 'get_merge_bases_many:mixed-finite-infinity' ' test_all_modes get_merge_bases_many ' +test_expect_success 'merge-base --all commit-walk steps' ' + test_when_finished rm -rf .git/objects/info/commit-graph \ + .git/objects/info/commit-graphs && + rm -rf .git/objects/info/commit-graph \ + .git/objects/info/commit-graphs && + + GIT_TRACE2_EVENT="$(pwd)/trace-none.txt" \ + git merge-base --all commit-9-9 commit-9-1 >actual && + test_trace2_data paint_down_to_common steps 81 <trace-none.txt && + + cp commit-graph-full .git/objects/info/commit-graph && + GIT_TRACE2_EVENT="$(pwd)/trace-full.txt" \ + git merge-base --all commit-9-9 commit-9-1 >actual && + test_trace2_data paint_down_to_common steps 80 <trace-full.txt && + + cp commit-graph-half .git/objects/info/commit-graph && + GIT_TRACE2_EVENT="$(pwd)/trace-half.txt" \ + git merge-base --all commit-9-9 commit-9-1 >actual && + test_trace2_data paint_down_to_common steps 81 <trace-half.txt +' + test_expect_success 'reduce_heads' ' cat >input <<-\EOF && X:commit-1-10 -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
* Re: [PATCH v2 4/7] commit-reach: add trace2 instrumentation to paint_down_to_common() 2026-06-24 12:14 ` [PATCH v2 4/7] commit-reach: add trace2 instrumentation to paint_down_to_common() Kristofer Karlsson via GitGitGadget @ 2026-06-24 13:41 ` Derrick Stolee 2026-06-24 14:31 ` Kristofer Karlsson 0 siblings, 1 reply; 84+ messages in thread From: Derrick Stolee @ 2026-06-24 13:41 UTC (permalink / raw) To: Kristofer Karlsson via GitGitGadget, git Cc: Elijah Newren, Kristofer Karlsson On 6/24/2026 8:14 AM, Kristofer Karlsson via GitGitGadget wrote: > From: Kristofer Karlsson <krka@spotify.com> > > Add a step counter and trace2_data_intmax() call so that the number > of commits visited during the paint walk is observable via > GIT_TRACE2_PERF. This provides a way to measure the impact of > future optimizations without relying on wall-clock benchmarks alone. > + trace2_data_intmax("paint_down_to_common", r, > + "steps", steps); This is great data. Very clearly marked for what we should be doing here. > +test_expect_success 'merge-base --all commit-walk steps' ' > + test_when_finished rm -rf .git/objects/info/commit-graph \ > + .git/objects/info/commit-graphs && (highlighting this chunk) > + rm -rf .git/objects/info/commit-graph \ > + .git/objects/info/commit-graphs && > + > + GIT_TRACE2_EVENT="$(pwd)/trace-none.txt" \ > + git merge-base --all commit-9-9 commit-9-1 >actual && > + test_trace2_data paint_down_to_common steps 81 <trace-none.txt && I'd rather see the whitespace line before the `rm` to make it more obvious that it's setting up the "none" case. > + > + cp commit-graph-full .git/objects/info/commit-graph && > + GIT_TRACE2_EVENT="$(pwd)/trace-full.txt" \ > + git merge-base --all commit-9-9 commit-9-1 >actual && > + test_trace2_data paint_down_to_common steps 80 <trace-full.txt && > + > + cp commit-graph-half .git/objects/info/commit-graph && > + GIT_TRACE2_EVENT="$(pwd)/trace-half.txt" \ > + git merge-base --all commit-9-9 commit-9-1 >actual && > + test_trace2_data paint_down_to_common steps 81 <trace-half.txt > +' > + This test is a great example. I look forward to seeing that it updates in the future. One thing I was hoping to see was that your side-exhaustion tests (from patch v2 2/7) would also include these checks so they are more obviously updating when the implementation updates later. One way to accomplish that is to reorder this patch before adding those tests so their first version includes these checks and then the values update when changing the implementation. Thanks, -Stolee ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v2 4/7] commit-reach: add trace2 instrumentation to paint_down_to_common() 2026-06-24 13:41 ` Derrick Stolee @ 2026-06-24 14:31 ` Kristofer Karlsson 0 siblings, 0 replies; 84+ messages in thread From: Kristofer Karlsson @ 2026-06-24 14:31 UTC (permalink / raw) To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren On Wed, 24 Jun 2026 at 15:41, Derrick Stolee <stolee@gmail.com> wrote: > > (highlighting this chunk) > > > + rm -rf .git/objects/info/commit-graph \ > > + .git/objects/info/commit-graphs && > > + > > + GIT_TRACE2_EVENT="$(pwd)/trace-none.txt" \ > > + git merge-base --all commit-9-9 commit-9-1 >actual && > > + test_trace2_data paint_down_to_common steps 81 <trace-none.txt && > > I'd rather see the whitespace line before the `rm` to make it > more obvious that it's setting up the "none" case. Ah yes, good point, will fix. > > + cp commit-graph-full .git/objects/info/commit-graph && > > + GIT_TRACE2_EVENT="$(pwd)/trace-full.txt" \ > > + git merge-base --all commit-9-9 commit-9-1 >actual && > > + test_trace2_data paint_down_to_common steps 80 <trace-full.txt && > > + > > + cp commit-graph-half .git/objects/info/commit-graph && > > + GIT_TRACE2_EVENT="$(pwd)/trace-half.txt" \ > > + git merge-base --all commit-9-9 commit-9-1 >actual && > > + test_trace2_data paint_down_to_common steps 81 <trace-half.txt > > +' > > + > > This test is a great example. I look forward to seeing that it > updates in the future. > > One thing I was hoping to see was that your side-exhaustion tests > (from patch v2 2/7) would also include these checks so they are > more obviously updating when the implementation updates later. I was internally contemplating how much I should introduce the steps validation to existing tests. My worry was that it might make tests fragile - for example I repeatedly got some off-by-one changes after refactoring the halt condition slightly (differs depending on adding the halts solely within paint_queue_get or having it at the end of the loop) and I think potentially other future work could affect it. But I'm happy to attach the steps checks for more relevant tests, it's not much work to change. > One way to accomplish that is to reorder this patch before adding > those tests so their first version includes these checks and then > the values update when changing the implementation. I was thinking I could keep the same order, but the patch to introduce the trace could also modify the tests at the same time - that would perhaps make it even more clear. Also this means I could avoid making changes to Elijah's commit that I already _partly_ butchered (extracted the test change as-is, but dropped the other file changes) and I don't want to make that one more unclean. Thanks, Kristofer ^ permalink raw reply [flat|nested] 84+ messages in thread
* [PATCH v2 5/7] commit-reach: introduce struct paint_state with per-side counters 2026-06-24 12:14 ` [PATCH v2 0/7] " Kristofer Karlsson via GitGitGadget ` (3 preceding siblings ...) 2026-06-24 12:14 ` [PATCH v2 4/7] commit-reach: add trace2 instrumentation to paint_down_to_common() Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 ` Kristofer Karlsson via GitGitGadget 2026-06-24 13:54 ` Derrick Stolee 2026-06-24 12:14 ` [PATCH v2 6/7] commit-reach: remove unused nonstale_queue dedup wrappers Kristofer Karlsson via GitGitGadget ` (4 subsequent siblings) 9 siblings, 1 reply; 84+ messages in thread From: Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Kristofer Karlsson From: Kristofer Karlsson <krka@spotify.com> Add a paint_state struct for use by paint_down_to_common() that wraps a prio_queue with per-side commit counters. Each non-stale queued commit occupies exactly one counter bucket based on its paint flags: PARENT1-only, PARENT2-only, or both sides (a pending merge-base candidate). The counters are maintained by paint_count_update() which adjusts the appropriate bucket by a signed delta. An exhaustive switch on the paint+stale bits documents all valid flag combinations in one place. Convert paint_down_to_common() to use paint_state. The loop now drains the queue via paint_queue_get() which returns NULL when all counters reach zero, replacing the old pointer-based termination (max_nonstale). This is equivalent behavior -- both conditions detect that no non-stale entries remain. The existing nonstale_queue is left in place for ahead_behind(). Step counts (via trace2 from the previous commit) are identical before and after this refactoring, confirming no behavioral change. Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- .../technical/paint-down-to-common.adoc | 9 +- commit-reach.c | 93 ++++++++++++++++--- 2 files changed, 82 insertions(+), 20 deletions(-) diff --git a/Documentation/technical/paint-down-to-common.adoc b/Documentation/technical/paint-down-to-common.adoc index c10d5d2887..0f4e1892a5 100644 --- a/Documentation/technical/paint-down-to-common.adoc +++ b/Documentation/technical/paint-down-to-common.adoc @@ -88,15 +88,12 @@ re-enqueued is bounded by the number of flag transitions. Termination ----------- -The walk uses a `nonstale_queue` wrapper around `prio_queue` that -tracks `max_nonstale`: the lowest-priority non-stale commit enqueued -so far. Once that commit is dequeued, every remaining entry is known -to be STALE and the loop terminates. Specifically, the main loop +The walk tracks the number of commits of each type in the queue +(PARENT1-only, PARENT2-only, pending merge-base). The main loop ends when one of the following conditions holds: 1. The queue is empty. - 2. `max_nonstale` has been dequeued, meaning the queue only contains - STALE entries. + 2. The queue contains only stale entries. Stale entry condition ~~~~~~~~~~~~~~~~~~~~~ diff --git a/commit-reach.c b/commit-reach.c index f6a438550b..bf102f5e28 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -97,6 +97,74 @@ static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue) return commit; } +/* + * Priority queue with per-side commit counters for paint_down_to_common(). + * Each non-stale queued commit occupies exactly one bucket: PARENT1-only, + * PARENT2-only, or both (a pending merge-base candidate). + */ +struct paint_state { + struct prio_queue queue; + int p1_count; + int p2_count; + int pending_merge_bases; +}; + +static void paint_count_update(struct paint_state *state, + unsigned flags, int delta) +{ + switch (flags & (PARENT1 | PARENT2 | STALE)) { + case PARENT1: + state->p1_count += delta; + break; + + case PARENT2: + state->p2_count += delta; + break; + + case PARENT1 | PARENT2: + state->pending_merge_bases += delta; + break; + + case PARENT1 | PARENT2 | STALE: + break; + + default: + BUG("unexpected paint state"); + } +} + +static void paint_queue_put(struct paint_state *state, + struct commit *c, unsigned add_flags) +{ + unsigned old_flags = c->object.flags; + c->object.flags |= add_flags; + + if (old_flags & ENQUEUED) { + paint_count_update(state, old_flags, -1); + paint_count_update(state, c->object.flags, 1); + } else { + c->object.flags |= ENQUEUED; + prio_queue_put(&state->queue, c); + paint_count_update(state, c->object.flags, 1); + } +} + +static struct commit *paint_queue_get(struct paint_state *state) +{ + struct commit *commit; + + if (!state->p1_count && !state->p2_count && + !state->pending_merge_bases) + return NULL; + + commit = prio_queue_get(&state->queue); + if (commit) { + commit->object.flags &= ~ENQUEUED; + paint_count_update(state, commit->object.flags, -1); + } + return commit; +} + /* * See Documentation/technical/paint-down-to-common.adoc * @@ -109,31 +177,29 @@ static int paint_down_to_common(struct repository *r, enum merge_base_flags mb_flags, struct commit_list **result) { - struct nonstale_queue queue = { - { compare_commits_by_gen_then_commit_date } + struct paint_state state = { + .queue = { compare_commits_by_gen_then_commit_date } }; + struct commit *commit; int i; int steps = 0; timestamp_t last_gen = GENERATION_NUMBER_INFINITY; struct commit_list **tail = result; if (!min_generation && !corrected_commit_dates_enabled(r)) - queue.pq.compare = compare_commits_by_commit_date; + state.queue.compare = compare_commits_by_commit_date; one->object.flags |= PARENT1; if (!n) { commit_list_append(one, result); return 0; } - nonstale_queue_put_dedup(&queue, one); + paint_queue_put(&state, one, 0); - for (i = 0; i < n; i++) { - twos[i]->object.flags |= PARENT2; - nonstale_queue_put_dedup(&queue, twos[i]); - } + for (i = 0; i < n; i++) + paint_queue_put(&state, twos[i], PARENT2); - while (queue.max_nonstale) { - struct commit *commit = nonstale_queue_get_dedup(&queue); + while ((commit = paint_queue_get(&state))) { struct commit_list *parents; int flags; timestamp_t generation = commit_graph_generation(commit); @@ -172,7 +238,7 @@ static int paint_down_to_common(struct repository *r, if ((p->object.flags & flags) == flags) continue; if (repo_parse_commit(r, p)) { - clear_nonstale_queue(&queue); + clear_prio_queue(&state.queue); commit_list_free(*result); *result = NULL; /* @@ -187,12 +253,11 @@ static int paint_down_to_common(struct repository *r, return error(_("could not parse commit %s"), oid_to_hex(&p->object.oid)); } - p->object.flags |= flags; - nonstale_queue_put_dedup(&queue, p); + paint_queue_put(&state, p, flags); } } - clear_nonstale_queue(&queue); + clear_prio_queue(&state.queue); trace2_data_intmax("paint_down_to_common", r, "steps", steps); commit_list_sort_by_date(result); -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
* Re: [PATCH v2 5/7] commit-reach: introduce struct paint_state with per-side counters 2026-06-24 12:14 ` [PATCH v2 5/7] commit-reach: introduce struct paint_state with per-side counters Kristofer Karlsson via GitGitGadget @ 2026-06-24 13:54 ` Derrick Stolee 2026-06-24 14:38 ` Kristofer Karlsson 0 siblings, 1 reply; 84+ messages in thread From: Derrick Stolee @ 2026-06-24 13:54 UTC (permalink / raw) To: Kristofer Karlsson via GitGitGadget, git Cc: Elijah Newren, Kristofer Karlsson On 6/24/2026 8:14 AM, Kristofer Karlsson via GitGitGadget wrote: > Termination > ----------- > > -The walk uses a `nonstale_queue` wrapper around `prio_queue` that > -tracks `max_nonstale`: the lowest-priority non-stale commit enqueued > -so far. Once that commit is dequeued, every remaining entry is known > -to be STALE and the loop terminates. Specifically, the main loop > +The walk tracks the number of commits of each type in the queue > +(PARENT1-only, PARENT2-only, pending merge-base). The main loop > ends when one of the following conditions holds: > > 1. The queue is empty. > - 2. `max_nonstale` has been dequeued, meaning the queue only contains > - STALE entries. > + 2. The queue contains only stale entries. I'm grateful to see these changes happening to the doc in real- time. I know it was extra work, but I'm grateful right now. Hopefully future historians will also benefit from this effort. > +static void paint_count_update(struct paint_state *state, > + unsigned flags, int delta) > +{ > + switch (flags & (PARENT1 | PARENT2 | STALE)) { > + case PARENT1: > + state->p1_count += delta; > + break; > + > + case PARENT2: > + state->p2_count += delta; > + break; > + > + case PARENT1 | PARENT2: > + state->pending_merge_bases += delta; > + break; > + > + case PARENT1 | PARENT2 | STALE: > + break; > + > + default: > + BUG("unexpected paint state"); > + } > +} I like the use of 'delta' to allow reuse of this switch. > + > +static void paint_queue_put(struct paint_state *state, > + struct commit *c, unsigned add_flags) > +{ > + unsigned old_flags = c->object.flags; > + c->object.flags |= add_flags; > + > + if (old_flags & ENQUEUED) { > + paint_count_update(state, old_flags, -1); > + paint_count_update(state, c->object.flags, 1); > + } else { > + c->object.flags |= ENQUEUED; > + prio_queue_put(&state->queue, c); > + paint_count_update(state, c->object.flags, 1); > + } > +} ok: if we are already in the queue then we have old flags and may need to subtract their values because they were counted already. Otherwise, we need to queue it for the first time and only add the values. Makes sense. > + > +static struct commit *paint_queue_get(struct paint_state *state) > +{ Since we are going to make this a more complete termination condition, we may want to make that very explicit with a doc- comment. Something along the lines of "dequeue a commit when possible, but also signal termination of the walk when we conclude that no more merge bases will be discovered due to internal state." > @@ -187,12 +253,11 @@ static int paint_down_to_common(struct repository *r, > return error(_("could not parse commit %s"), > oid_to_hex(&p->object.oid)); > } > - p->object.flags |= flags; > - nonstale_queue_put_dedup(&queue, p); > + paint_queue_put(&state, p, flags); I like how this simplifies the flag-assignment logic somewhat. You mentioned in your cover letter how the min_generation value can add extra termination conditions. It may be a good idea to insert min_generation into the paint_queue struct and make it a termination condition for paint_queue_get(). If you consider this direction, then I'd make it a separate patch on top of this one _before_ adding the one-sided change. The extra tests that cover the exact number of walked commits can help to guarantee the same behavior, assuming that some of those tests check a non-zero min_generation input. (It may be good to add such trace tests in an earlier patch to help confidence in this case.) Thanks, -Stolee ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v2 5/7] commit-reach: introduce struct paint_state with per-side counters 2026-06-24 13:54 ` Derrick Stolee @ 2026-06-24 14:38 ` Kristofer Karlsson 0 siblings, 0 replies; 84+ messages in thread From: Kristofer Karlsson @ 2026-06-24 14:38 UTC (permalink / raw) To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren On Wed, 24 Jun 2026 at 15:54, Derrick Stolee <stolee@gmail.com> wrote: > > I'm grateful to see these changes happening to the doc in real- > time. I know it was extra work, but I'm grateful right now. > > Hopefully future historians will also benefit from this effort. It was honestly not bad at all, and I agree it felt quite nice to see how the doc naturally changed along with the implementation. > > +static struct commit *paint_queue_get(struct paint_state *state) > > +{ > > Since we are going to make this a more complete termination > condition, we may want to make that very explicit with a doc- > comment. Something along the lines of "dequeue a commit when > possible, but also signal termination of the walk when we > conclude that no more merge bases will be discovered due to > internal state." Yes, I'll make sure to clean that part up more, maybe also rename the function to be more descriptive. > You mentioned in your cover letter how the min_generation value > can add extra termination conditions. It may be a good idea to > insert min_generation into the paint_queue struct and make it a > termination condition for paint_queue_get(). If you consider this > direction, then I'd make it a separate patch on top of this one > _before_ adding the one-sided change. The extra tests that cover > the exact number of walked commits can help to guarantee the same > behavior, assuming that some of those tests check a non-zero > min_generation input. (It may be good to add such trace tests in > an earlier patch to help confidence in this case.) I think I might wait with this - the patch series already feels quite big, and I think it has a natural progression and finish now. But I can definitely commit to following up later -- it would be a smaller series that is easier to reason about, likely a single commit. Thanks, Kristofer ^ permalink raw reply [flat|nested] 84+ messages in thread
* [PATCH v2 6/7] commit-reach: remove unused nonstale_queue dedup wrappers 2026-06-24 12:14 ` [PATCH v2 0/7] " Kristofer Karlsson via GitGitGadget ` (4 preceding siblings ...) 2026-06-24 12:14 ` [PATCH v2 5/7] commit-reach: introduce struct paint_state with per-side counters Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 ` Kristofer Karlsson via GitGitGadget 2026-06-24 13:55 ` Derrick Stolee 2026-06-24 12:14 ` [PATCH v2 7/7] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget ` (3 subsequent siblings) 9 siblings, 1 reply; 84+ messages in thread From: Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Kristofer Karlsson From: Kristofer Karlsson <krka@spotify.com> nonstale_queue_put_dedup() and nonstale_queue_get_dedup() became unused after the previous commit. The core nonstale_queue functions remain in use by ahead_behind(). Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- commit-reach.c | 18 ------------------ 1 file changed, 18 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index bf102f5e28..e0d9874f99 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -79,24 +79,6 @@ static void clear_nonstale_queue(struct nonstale_queue *queue) queue->max_nonstale = NULL; } -static void nonstale_queue_put_dedup(struct nonstale_queue *queue, - struct commit *c) -{ - if (c->object.flags & ENQUEUED) - return; - c->object.flags |= ENQUEUED; - nonstale_queue_put(queue, c); -} - -static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue) -{ - struct commit *commit = nonstale_queue_get(queue); - - if (commit) - commit->object.flags &= ~ENQUEUED; - return commit; -} - /* * Priority queue with per-side commit counters for paint_down_to_common(). * Each non-stale queued commit occupies exactly one bucket: PARENT1-only, -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
* Re: [PATCH v2 6/7] commit-reach: remove unused nonstale_queue dedup wrappers 2026-06-24 12:14 ` [PATCH v2 6/7] commit-reach: remove unused nonstale_queue dedup wrappers Kristofer Karlsson via GitGitGadget @ 2026-06-24 13:55 ` Derrick Stolee 0 siblings, 0 replies; 84+ messages in thread From: Derrick Stolee @ 2026-06-24 13:55 UTC (permalink / raw) To: Kristofer Karlsson via GitGitGadget, git Cc: Elijah Newren, Kristofer Karlsson On 6/24/2026 8:14 AM, Kristofer Karlsson via GitGitGadget wrote: > From: Kristofer Karlsson <krka@spotify.com> > > nonstale_queue_put_dedup() and nonstale_queue_get_dedup() became > unused after the previous commit. The core nonstale_queue functions > remain in use by ahead_behind(). This is a nice cleanup that makes the previous diff easier to read. Thanks! ^ permalink raw reply [flat|nested] 84+ messages in thread
* [PATCH v2 7/7] commit-reach: terminate merge-base walk when one paint side is exhausted 2026-06-24 12:14 ` [PATCH v2 0/7] " Kristofer Karlsson via GitGitGadget ` (5 preceding siblings ...) 2026-06-24 12:14 ` [PATCH v2 6/7] commit-reach: remove unused nonstale_queue dedup wrappers Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 ` Kristofer Karlsson via GitGitGadget 2026-06-24 14:02 ` Derrick Stolee 2026-06-24 13:34 ` [PATCH v2 0/7] commit-reach: terminate merge-base walk when one " Derrick Stolee ` (2 subsequent siblings) 9 siblings, 1 reply; 84+ messages in thread From: Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Kristofer Karlsson From: Kristofer Karlsson <krka@spotify.com> Add an early termination check to paint_down_to_common() using the per-side counters introduced earlier. Once the walk enters the finite-generation region, terminate early when one side's exclusive count drops to zero -- no new merge-base can form without both paint sides meeting. The check also waits for pending_merge_bases to reach zero, ensuring all merge-base candidates have been dequeued and recorded before exiting. The INFINITY gate ensures correctness: commits without a commit-graph entry have GENERATION_NUMBER_INFINITY and are ordered by commit date, which is not topologically reliable. The optimization only fires once the walk enters the finite-generation region where ordering guarantees hold. Step counts measured with trace2 on git.git with commit-graph: merge-base --all v2.0.0 v2.55.0-rc1: before: 72264 steps after: 44589 steps merge-base --all v2.55.0-rc1 v2.55.0-rc1~5: before: 110 steps after: 7 steps Helped-by: Derrick Stolee <stolee@gmail.com> Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- .../technical/paint-down-to-common.adoc | 17 ++++++++++++ commit-reach.c | 27 ++++++++++++++----- t/t6600-test-reach.sh | 4 +-- 3 files changed, 39 insertions(+), 9 deletions(-) diff --git a/Documentation/technical/paint-down-to-common.adoc b/Documentation/technical/paint-down-to-common.adoc index 0f4e1892a5..983dfcf233 100644 --- a/Documentation/technical/paint-down-to-common.adoc +++ b/Documentation/technical/paint-down-to-common.adoc @@ -94,6 +94,9 @@ ends when one of the following conditions holds: 1. The queue is empty. 2. The queue contains only stale entries. + 3. Side exhaustion: no pure PARENT1 or pure PARENT2 commits + remain in the queue, no pending merge-base candidates exist, + and the walk has entered the finite-generation region. Stale entry condition ~~~~~~~~~~~~~~~~~~~~~ @@ -104,6 +107,20 @@ existing candidates by proving one is an ancestor of another, but `remove_redundant()` handles that as a post-processing step, so it is safe to exit early. +Side-exhaustion condition +~~~~~~~~~~~~~~~~~~~~~~~~~ +A new merge-base requires commits from both sides to meet. When one +side's exclusive counter reaches zero and there are no pending +merge-base candidates, no future traversal step can produce a new +candidate. + +This optimization only activates in the finite-generation region +where topological ordering holds. In that region, children are +always visited before parents, so paint flags are final at visit +time and an exhausted side cannot reappear. In the INFINITY region, +commit-date ordering can violate this guarantee, so the check is +skipped. + Related documentation --------------------- diff --git a/commit-reach.c b/commit-reach.c index e0d9874f99..f79d0b64d6 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -133,17 +133,30 @@ static void paint_queue_put(struct paint_state *state, static struct commit *paint_queue_get(struct paint_state *state) { - struct commit *commit; + struct commit *commit = prio_queue_get(&state->queue); - if (!state->p1_count && !state->p2_count && - !state->pending_merge_bases) + if (!commit) return NULL; - commit = prio_queue_get(&state->queue); - if (commit) { - commit->object.flags &= ~ENQUEUED; - paint_count_update(state, commit->object.flags, -1); + commit->object.flags &= ~ENQUEUED; + + if (!state->pending_merge_bases) { + if (!state->p1_count && !state->p2_count) + return NULL; + /* + * Side exhaustion: a new merge-base can only form + * when both PARENT1-only and PARENT2-only commits + * remain in the queue. In the finite-generation + * region the queue is ordered topologically, so + * no future step can add paint to visited commits + * and an exhausted side cannot reappear. + */ + if ((!state->p1_count || !state->p2_count) && + commit_graph_generation(commit) < GENERATION_NUMBER_INFINITY) + return NULL; } + + paint_count_update(state, commit->object.flags, -1); return commit; } diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index c1109fb42f..03175befb3 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -332,12 +332,12 @@ test_expect_success 'merge-base --all commit-walk steps' ' cp commit-graph-full .git/objects/info/commit-graph && GIT_TRACE2_EVENT="$(pwd)/trace-full.txt" \ git merge-base --all commit-9-9 commit-9-1 >actual && - test_trace2_data paint_down_to_common steps 80 <trace-full.txt && + test_trace2_data paint_down_to_common steps 9 <trace-full.txt && cp commit-graph-half .git/objects/info/commit-graph && GIT_TRACE2_EVENT="$(pwd)/trace-half.txt" \ git merge-base --all commit-9-9 commit-9-1 >actual && - test_trace2_data paint_down_to_common steps 81 <trace-half.txt + test_trace2_data paint_down_to_common steps 57 <trace-half.txt ' test_expect_success 'reduce_heads' ' -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
* Re: [PATCH v2 7/7] commit-reach: terminate merge-base walk when one paint side is exhausted 2026-06-24 12:14 ` [PATCH v2 7/7] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget @ 2026-06-24 14:02 ` Derrick Stolee 2026-06-24 14:47 ` Kristofer Karlsson 0 siblings, 1 reply; 84+ messages in thread From: Derrick Stolee @ 2026-06-24 14:02 UTC (permalink / raw) To: Kristofer Karlsson via GitGitGadget, git Cc: Elijah Newren, Kristofer Karlsson On 6/24/2026 8:14 AM, Kristofer Karlsson via GitGitGadget wrote: > From: Kristofer Karlsson <krka@spotify.com> > > Add an early termination check to paint_down_to_common() using the > per-side counters introduced earlier. Once the walk enters the > finite-generation region, terminate early when one side's exclusive > count drops to zero -- no new merge-base can form without both paint > sides meeting. Having this as the last patch is truly a nice climax moment for the patch series! > @@ -94,6 +94,9 @@ ends when one of the following conditions holds: > > 1. The queue is empty. > 2. The queue contains only stale entries. > + 3. Side exhaustion: no pure PARENT1 or pure PARENT2 commits > + remain in the queue, no pending merge-base candidates exist, > + and the walk has entered the finite-generation region. ...> +Side-exhaustion condition > +~~~~~~~~~~~~~~~~~~~~~~~~~ > +A new merge-base requires commits from both sides to meet. When one > +side's exclusive counter reaches zero and there are no pending > +merge-base candidates, no future traversal step can produce a new > +candidate. > + > +This optimization only activates in the finite-generation region > +where topological ordering holds. In that region, children are > +always visited before parents, so paint flags are final at visit > +time and an exhausted side cannot reappear. In the INFINITY region, > +commit-date ordering can violate this guarantee, so the check is > +skipped. > + And these doc updates inline make me happy. > Related documentation > --------------------- > > diff --git a/commit-reach.c b/commit-reach.c > index e0d9874f99..f79d0b64d6 100644 > --- a/commit-reach.c > +++ b/commit-reach.c > @@ -133,17 +133,30 @@ static void paint_queue_put(struct paint_state *state, > > static struct commit *paint_queue_get(struct paint_state *state) > { > - struct commit *commit; > + struct commit *commit = prio_queue_get(&state->queue); > > - if (!state->p1_count && !state->p2_count && > - !state->pending_merge_bases) > + if (!commit) > return NULL; I see how the previous implementation has a termination condition before calling prio_queue_get(), which is technically more efficient. It does make this initial diff a bit more complicated because we are moving the prio_queue_get() line. If the introduction of the method in patch 5/7 looked like this: +static struct commit *paint_queue_get(struct paint_state *state) +{ + struct commit *commit = prio_queue_get(&state->queue); + + if (!commit) + return NULL; + + if (!state->p1_count && !state->p2_count && + !state->pending_merge_bases) + return NULL; + + commit->object.flags &= ~ENQUEUED; + paint_count_update(state, commit->object.flags, -1); + return commit; +} Then this diff would look cleaner. (This is the nittiest of nitpicks so feel free to ignore if this doesn't bother you at all.) > - commit = prio_queue_get(&state->queue); > - if (commit) { > - commit->object.flags &= ~ENQUEUED; > - paint_count_update(state, commit->object.flags, -1); > + commit->object.flags &= ~ENQUEUED; > + > + if (!state->pending_merge_bases) { > + if (!state->p1_count && !state->p2_count) > + return NULL; > + /* > + * Side exhaustion: a new merge-base can only form > + * when both PARENT1-only and PARENT2-only commits > + * remain in the queue. In the finite-generation > + * region the queue is ordered topologically, so > + * no future step can add paint to visited commits > + * and an exhausted side cannot reappear. > + */ > + if ((!state->p1_count || !state->p2_count) && > + commit_graph_generation(commit) < GENERATION_NUMBER_INFINITY) > + return NULL; > } > + > + paint_count_update(state, commit->object.flags, -1); > return commit; > } I like how the crux of this implementation is entirely within paint_queue_get() now. > diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh > index c1109fb42f..03175befb3 100755 > --- a/t/t6600-test-reach.sh > +++ b/t/t6600-test-reach.sh > @@ -332,12 +332,12 @@ test_expect_success 'merge-base --all commit-walk steps' ' > cp commit-graph-full .git/objects/info/commit-graph && > GIT_TRACE2_EVENT="$(pwd)/trace-full.txt" \ > git merge-base --all commit-9-9 commit-9-1 >actual && > - test_trace2_data paint_down_to_common steps 80 <trace-full.txt && > + test_trace2_data paint_down_to_common steps 9 <trace-full.txt && > > cp commit-graph-half .git/objects/info/commit-graph && > GIT_TRACE2_EVENT="$(pwd)/trace-half.txt" \ > git merge-base --all commit-9-9 commit-9-1 >actual && > - test_trace2_data paint_down_to_common steps 81 <trace-half.txt > + test_trace2_data paint_down_to_common steps 57 <trace-half.txt > ' I love to see these steps change. If you take my suggestion to update more tests with these checks, then this diff will get bigger (but in a deserved way). Also, when I suggested that 'test_all_modes' creates the trace files on our behalf, I forgot to mention that this specific test that you added in patch 4/7 simplifies by running the merge-base check under 'test_all_modes' and then checking the trace2 data on the three well-known files afterwards. Thanks, -Stolee ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v2 7/7] commit-reach: terminate merge-base walk when one paint side is exhausted 2026-06-24 14:02 ` Derrick Stolee @ 2026-06-24 14:47 ` Kristofer Karlsson 2026-06-24 15:07 ` Derrick Stolee 0 siblings, 1 reply; 84+ messages in thread From: Kristofer Karlsson @ 2026-06-24 14:47 UTC (permalink / raw) To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren On Wed, 24 Jun 2026 at 16:02, Derrick Stolee <stolee@gmail.com> wrote: > > I see how the previous implementation has a termination condition > before calling prio_queue_get(), which is technically more > efficient. It does make this initial diff a bit more complicated > because we are moving the prio_queue_get() line. I was thinking the efficiency here does not matter in practice - prio_queue_get() only returns NULL once, and all other times where we keep looping we do need the value. I agree it does get a bit complex though. > If the introduction of the method in patch 5/7 looked like this: > > +static struct commit *paint_queue_get(struct paint_state *state) > +{ > + struct commit *commit = prio_queue_get(&state->queue); > + > + if (!commit) > + return NULL; > + > + if (!state->p1_count && !state->p2_count && > + !state->pending_merge_bases) > + return NULL; > + > + commit->object.flags &= ~ENQUEUED; > + paint_count_update(state, commit->object.flags, -1); > + return commit; > +} > > Then this diff would look cleaner. > > (This is the nittiest of nitpicks so feel free to ignore if this > doesn't bother you at all.) That's a good point. It doesn't technically bother me, but it would be cleaner. The refactor commit would effectively be looking into the future and prepare for it. I can change it for the next version - my only thinking was that the current refactor patch matched my original idea for how to best handle the halt condition, but that did indeed change after this discussion. > > - test_trace2_data paint_down_to_common steps 81 <trace-half.txt > > + test_trace2_data paint_down_to_common steps 57 <trace-half.txt > > ' > I love to see these steps change. If you take my suggestion to > update more tests with these checks, then this diff will get bigger > (but in a deserved way). I will try to add them to some (but not all) tests since it's more closely related to performance than correctness and I want to avoid making too many tests overly fragile. > Also, when I suggested that 'test_all_modes' creates the trace > files on our behalf, I forgot to mention that this specific test > that you added in patch 4/7 simplifies by running the merge-base > check under 'test_all_modes' and then checking the trace2 data > on the three well-known files afterwards. That's a nice bonus, I will try to see if I can manage to utilize it. Thanks, Kristofer ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v2 7/7] commit-reach: terminate merge-base walk when one paint side is exhausted 2026-06-24 14:47 ` Kristofer Karlsson @ 2026-06-24 15:07 ` Derrick Stolee 0 siblings, 0 replies; 84+ messages in thread From: Derrick Stolee @ 2026-06-24 15:07 UTC (permalink / raw) To: Kristofer Karlsson Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren On 6/24/2026 10:47 AM, Kristofer Karlsson wrote: > On Wed, 24 Jun 2026 at 16:02, Derrick Stolee <stolee@gmail.com> wrote: >>> - test_trace2_data paint_down_to_common steps 81 <trace-half.txt >>> + test_trace2_data paint_down_to_common steps 57 <trace-half.txt >>> ' >> I love to see these steps change. If you take my suggestion to >> update more tests with these checks, then this diff will get bigger >> (but in a deserved way). > > I will try to add them to some (but not all) tests since it's more > closely related to performance than correctness and I want to > avoid making too many tests overly fragile. In this case, I think it's more about protecting all of our special- cased termination conditions. The rigidity means that it is hard to accidentally change the behavior. It does have the downside that more tests need to change if there is an intentional change, but it also gives the same _evidence_ that the change has the intended impact. We are definitely leaning into personal preferences, though. There is no hard rule one way or another. Thanks, -Stolee ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v2 0/7] commit-reach: terminate merge-base walk when one side is exhausted 2026-06-24 12:14 ` [PATCH v2 0/7] " Kristofer Karlsson via GitGitGadget ` (6 preceding siblings ...) 2026-06-24 12:14 ` [PATCH v2 7/7] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget @ 2026-06-24 13:34 ` Derrick Stolee 2026-06-24 14:25 ` Kristofer Karlsson 2026-06-24 14:09 ` Derrick Stolee 2026-06-26 13:07 ` [PATCH v3 0/8] " Kristofer Karlsson via GitGitGadget 9 siblings, 1 reply; 84+ messages in thread From: Derrick Stolee @ 2026-06-24 13:34 UTC (permalink / raw) To: Kristofer Karlsson via GitGitGadget, git Cc: Elijah Newren, Kristofer Karlsson On 6/24/2026 8:14 AM, Kristofer Karlsson via GitGitGadget wrote: > Benchmarks > > Step counts are deterministic (measured via trace2_data_intmax added in > patch 4). Wall-clock times are medians over 10-20 runs with CPU governor set > to performance. > > 2.6M-commit monorepo with commit-graph (baseline v2.55.0-rc1): > > steps wall-clock > merge-base --all (across import) 2682391 -> 53521 7.26s -> 88ms > merge-base --all (1000 apart) 2659607 -> 1106 6.98s -> 8ms > merge-tree (across import) - 8.11s -> 100ms > > > git.git (88k commits, commit-graph): > > steps wall-clock > merge-base --all v2.0.0 v2.55.0-rc1 72264 -> 44589 82ms -> 49ms > merge-base --all HEAD HEAD~1000 9873 -> 3817 19ms -> 9ms > merge-base --all HEAD HEAD~10000 72285 -> 41523 80ms -> 48ms > merge-base HEAD HEAD~1000 - 9ms -> 9ms > merge-base --is-ancestor HEAD~1000 HEAD - 6ms -> 6ms I like seeing these updates including the deterministic steps. Is there a reason you don't include the step data for 'merge-tree (across import)' in your monorepo case? The wall-clock is substantial, so it's not like the last two examples in git.git where there may not be any difference. Thanks, -Stolee ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v2 0/7] commit-reach: terminate merge-base walk when one side is exhausted 2026-06-24 13:34 ` [PATCH v2 0/7] commit-reach: terminate merge-base walk when one " Derrick Stolee @ 2026-06-24 14:25 ` Kristofer Karlsson 0 siblings, 0 replies; 84+ messages in thread From: Kristofer Karlsson @ 2026-06-24 14:25 UTC (permalink / raw) To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren On Wed, 24 Jun 2026 at 15:34, Derrick Stolee <stolee@gmail.com> wrote: > > I like seeing these updates including the deterministic steps. Is there > a reason you don't include the step data for 'merge-tree (across import)' > in your monorepo case? The wall-clock is substantial, so it's not like the > last two examples in git.git where there may not be any difference. I will have to attribute to laziness I suppose :) I ran the initial benchmarks before adding the trace, and I didn't update all of them, just enough to show the improvement and value of the trace data. I will ensure that I include all of it in the next version though (maybe 1-2 days from now?) or maybe drop some of the benchmarks to not overload with partly redundant information. (merge-tree benchmarks doesn't perhaps add much significance on top of merge-base in practice). Thanks, Kristofer ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v2 0/7] commit-reach: terminate merge-base walk when one side is exhausted 2026-06-24 12:14 ` [PATCH v2 0/7] " Kristofer Karlsson via GitGitGadget ` (7 preceding siblings ...) 2026-06-24 13:34 ` [PATCH v2 0/7] commit-reach: terminate merge-base walk when one " Derrick Stolee @ 2026-06-24 14:09 ` Derrick Stolee 2026-06-26 13:07 ` [PATCH v3 0/8] " Kristofer Karlsson via GitGitGadget 9 siblings, 0 replies; 84+ messages in thread From: Derrick Stolee @ 2026-06-24 14:09 UTC (permalink / raw) To: Kristofer Karlsson via GitGitGadget, git Cc: Elijah Newren, Kristofer Karlsson On 6/24/2026 8:14 AM, Kristofer Karlsson via GitGitGadget wrote: > commit-reach: terminate merge-base walk when one side is exhausted > > Optimize paint_down_to_common() for merge-base queries that hit large > one-sided histories. I completed my review of this version. All of my comments are around either making the commit history a little cleaner or expanding the tests that use the trace2 data. I believe that this code is _correct_ and could be shipped as-is. My comments are focused on making it the best that it could be, with an eye towards a cleaner final result or a more robust test setup. The most actionable things are: 1. You can add tracing before the new tests, allowing the new tests to also check the step counts in their first versions and then get updated in the final patch to demonstrate how they change with that behavior change. 2. The t6600 helper 'test_all_modes' could set GIT_TRACE2_EVENT for each mode into a different trace file that can be scanned later. This will simplify your current tracing tests but also unlock easier tracing like this in the future. 3. The termination condition depending on min_generation could be refactored into paint_queue_get() to help make things even more obvious as to when we terminate. This should help with your concerns that you mentioned in response to patch 2/6 of the previous version: > I am not 100% happy with the halt-condition placement yet -- > the existing loop in master already has several exit paths > (while condition, min_generation break, FIND_ALL break) and I > think there is an opportunity to consolidate them. But that is > a separate discussion and I do not want to derail this series. > I can propose some alternatives in a follow-up after this > lands. I then have some super minor comments around making the diffs even easier to read, but they could be ignored as they are very nit-picky. It's the kind of detail that I would try to resolve if I was the author, but I'm _not_. You are. Your time is valuable so make your own conclusions as to whether you want to go down that road. You've already entertained my ideas around updating the docs as the implementation changes. Thanks, -Stolee ^ permalink raw reply [flat|nested] 84+ messages in thread
* [PATCH v3 0/8] commit-reach: terminate merge-base walk when one side is exhausted 2026-06-24 12:14 ` [PATCH v2 0/7] " Kristofer Karlsson via GitGitGadget ` (8 preceding siblings ...) 2026-06-24 14:09 ` Derrick Stolee @ 2026-06-26 13:07 ` Kristofer Karlsson via GitGitGadget 2026-06-26 13:07 ` [PATCH v3 1/8] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget ` (9 more replies) 9 siblings, 10 replies; 84+ messages in thread From: Kristofer Karlsson via GitGitGadget @ 2026-06-26 13:07 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson commit-reach: terminate merge-base walk when one paint side is exhausted Optimize paint_down_to_common() for merge-base queries that hit large one-sided histories. When the walk from one side reaches a commit with a very low generation number that the other side never paints, the walk is forced to drain most of the graph. A common trigger is a repository import that grafts a separate history with its own root, but any merge that introduces a low-generation commit never painted by the other side has the same effect. A new merge-base candidate can only be discovered when exclusive PARENT1 and PARENT2 paint meet. This series teaches paint_down_to_common() to stop as soon as one side has no exclusive commits left in the queue; once one side is exhausted, no further candidates can appear. origin/HEAD o o PR HEAD | | (import) o : / \ / | o merge-base | | : : (~2.5M commits) | | import root main root In the RFC thread [1], Derrick Stolee provided a criss-cross counterexample that sharpened the halt condition, and Elijah Newren independently discovered the same optimization and shared an implementation in PR #2150 [2]. Patches 2-3 incorporate test cases from Elijah's branch. This series implements the optimization only after the walk enters the finite-generation region, where generation ordering guarantees that paint on visited commits is final. Patch layout: 1/8 Documentation/technical: add paint-down-to-common doc 2/8 t6600: add test cases for side-exhaustion edge cases 3/8 t6099, t6600: add side-exhaustion regression tests 4/8 commit-reach: add trace2 instrumentation to paint_down_to_common() 5/8 commit-reach: introduce struct paint_state with per-side counters 6/8 commit-reach: remove unused nonstale_queue dedup wrappers 7/8 commit-reach: terminate merge-base walk when one paint side is exhausted 8/8 commit-reach: move min_generation check into paint_queue_get() Benchmarks Step counts are deterministic (measured via trace2_data_intmax added in patch 4). Wall-clock times are best-of-11 runs. 2.6M-commit monorepo with commit-graph: steps wall-clock merge-base --all (across import) 2143438 -> 3 3.67s -> 5ms merge-base --all (1000 apart) 2692915 -> 1035 4.41s -> 7ms merge-base --all (5000 apart) 2692915 -> 6401 4.45s -> 13ms merge-base --all (HEAD vs import) 2698872 -> 45960 4.50s -> 79ms merge-tree (across import) 2143438 -> 3 4.42s -> 11ms git.git (88k commits, commit-graph): steps wall-clock merge-base --all v2.0.0 v2.55.0-rc1 72264 -> 44589 110ms -> 68ms merge-base --all HEAD HEAD~1000 9891 -> 3828 18ms -> 10ms merge-base --all HEAD HEAD~10000 72303 -> 41487 101ms -> 50ms Changes since v2: * New patch 8/8: moved the min_generation termination check and the last_gen monotonicity assertion into paint_queue_get(), consolidating halt conditions. commit_graph_generation() is now called once per dequeued commit and shared across all checks. * Widened the generation-monotonicity BUG assertion to fire unconditionally, not only when min_generation is set. The side-exhaustion optimization depends on correct generation ordering, so the assertion should always be active. This is a behavior change: the BUG() now fires for any generation ordering violation, regardless of the caller. * Moved all halt conditions inside paint_queue_get() with the "pop first" form: pop, check, then decrement counters. This keeps the optimization commit's diff minimal (just inserting the new checks between pop and decrement). * Shortened the doc comment on paint_queue_get() to describe what it does rather than how. Inline comments on each return NULL explain the specific halt condition. * Replaced the manual commit-graph setup in the step-count test with run_all_modes, which now sets GIT_TRACE2_EVENT per mode and produces trace-mode-{none,full,half,no-gdat}.txt files. * Added a test_paint_down_steps helper for concise 4-mode step assertions with diagnostic output on mismatch (prints "expected X, got Y" instead of a silent grep failure). * Added step-count assertions to the single-walk edge-case tests: in_merge_bases_many:self, pending-stale, infinity-both-sides, mixed-finite-infinity. * Included step counts alongside wall-clock times in the benchmark tables. Changes since v1: * Reordered patches: documentation first (describing the existing algorithm), tests before code changes, so they demonstrate passing with old logic first. * Dropped the ahead_behind decoupling patch. paint_state is now a NEW struct alongside nonstale_queue instead of replacing it. ahead_behind() is completely untouched. * Removed nonstale_queue_put_dedup() and nonstale_queue_get_dedup() (dead code after the conversion) in a separate commit. * Renamed: struct paint_queue -> paint_state, field pq -> queue, paint_count_add/remove -> paint_count_update (single function with signed delta parameter). * Split the old paint_count_transition (which handled both old and new flags in one call) into separate remove/add calls with a signed delta. This eliminates the need for the case 0 handler (which tracked "not in the queue") and allows an exhaustive switch on (PARENT1 | PARENT2 | STALE) that documents all valid flag combinations, with BUG() in default. * Added trace2_data_intmax() instrumentation to report the number of commits visited per paint walk (separate commit), with deterministic step-count assertions in t6600. * Expanded switch statements to multi-line format per .clang-format. * Used !count style throughout instead of count == 0. * Updated technical documentation alongside code changes. [1] https://lore.kernel.org/git/CAL71e4Ps-2_0+uuZu43N9pFnXBemoAohPs_eyRJf8taXHJPAXQ@mail.gmail.com/T/#u [2] https://github.com/gitgitgadget/git/pull/2150 Elijah Newren (1): t6600: add test cases for side-exhaustion edge cases Kristofer Karlsson (7): Documentation/technical: add paint-down-to-common doc t6099, t6600: add side-exhaustion regression tests commit-reach: add trace2 instrumentation to paint_down_to_common() commit-reach: introduce struct paint_state with per-side counters commit-reach: remove unused nonstale_queue dedup wrappers commit-reach: terminate merge-base walk when one paint side is exhausted commit-reach: move min_generation check into paint_queue_get() Documentation/Makefile | 1 + Documentation/technical/meson.build | 1 + .../technical/paint-down-to-common.adoc | 137 +++++++++++++ commit-reach.c | 147 ++++++++++---- t/meson.build | 1 + t/t6099-merge-base-side-exhaustion.sh | 82 ++++++++ t/t6600-test-reach.sh | 181 ++++++++++++++++-- 7 files changed, 501 insertions(+), 49 deletions(-) create mode 100644 Documentation/technical/paint-down-to-common.adoc create mode 100755 t/t6099-merge-base-side-exhaustion.sh base-commit: 6c3d7b73556db708feb3b16232fab1efc4353428 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2149%2Fspkrka%2Fside-exhaust-pr-v3 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2149/spkrka/side-exhaust-pr-v3 Pull-Request: https://github.com/gitgitgadget/git/pull/2149 Range-diff vs v2: 1: 19ed743bd1 = 1: 2593866bce Documentation/technical: add paint-down-to-common doc 2: 6151b8e0a3 = 2: 9efc084850 t6600: add test cases for side-exhaustion edge cases 3: 90f09ecb5c = 3: 14b0d86b93 t6099, t6600: add side-exhaustion regression tests 4: 6ade4df2ed ! 4: 2592264cda commit-reach: add trace2 instrumentation to paint_down_to_common() @@ Commit message Add a step counter and trace2_data_intmax() call so that the number of commits visited during the paint walk is observable via - GIT_TRACE2_PERF. This provides a way to measure the impact of + GIT_TRACE2_EVENT. This provides a way to measure the impact of future optimizations without relying on wall-clock benchmarks alone. Signed-off-by: Kristofer Karlsson <krka@spotify.com> @@ commit-reach.c: static int paint_down_to_common(struct repository *r, } ## t/t6600-test-reach.sh ## -@@ t/t6600-test-reach.sh: test_expect_success 'get_merge_bases_many:mixed-finite-infinity' ' - test_all_modes get_merge_bases_many +@@ t/t6600-test-reach.sh: test_expect_success 'setup' ' ' -+test_expect_success 'merge-base --all commit-walk steps' ' -+ test_when_finished rm -rf .git/objects/info/commit-graph \ -+ .git/objects/info/commit-graphs && -+ rm -rf .git/objects/info/commit-graph \ -+ .git/objects/info/commit-graphs && -+ -+ GIT_TRACE2_EVENT="$(pwd)/trace-none.txt" \ -+ git merge-base --all commit-9-9 commit-9-1 >actual && -+ test_trace2_data paint_down_to_common steps 81 <trace-none.txt && + run_all_modes () { +- test_when_finished rm -rf .git/objects/info/commit-graph && +- "$@" <input >actual && +- test_cmp expect actual && +- cp commit-graph-full .git/objects/info/commit-graph && +- "$@" <input >actual && +- test_cmp expect actual && +- cp commit-graph-half .git/objects/info/commit-graph && +- "$@" <input >actual && +- test_cmp expect actual && +- cp commit-graph-no-gdat .git/objects/info/commit-graph && +- "$@" <input >actual && +- test_cmp expect actual ++ graph=.git/objects/info/commit-graph && ++ test_when_finished rm -rf "$graph" "${graph}s" && ++ rm -f trace-mode-*.txt && + -+ cp commit-graph-full .git/objects/info/commit-graph && -+ GIT_TRACE2_EVENT="$(pwd)/trace-full.txt" \ -+ git merge-base --all commit-9-9 commit-9-1 >actual && -+ test_trace2_data paint_down_to_common steps 80 <trace-full.txt && ++ for mode in none full half no-gdat ++ do ++ rm -rf "$graph" "${graph}s" && ++ cp "commit-graph-${mode}" "$graph" 2>/dev/null || ++ true && ++ GIT_TRACE2_EVENT="$(pwd)/trace-mode-${mode}.txt" \ ++ "$@" <input >actual && ++ test_cmp expect actual || return 1 ++ done + } + + test_all_modes () { + run_all_modes test-tool reach "$@" + } + ++test_paint_down_steps () { ++ for mode in none full half no-gdat ++ do ++ test_trace2_data paint_down_to_common steps "$1" \ ++ <"trace-mode-${mode}.txt" || return 1 ++ shift ++ done ++} + -+ cp commit-graph-half .git/objects/info/commit-graph && -+ GIT_TRACE2_EVENT="$(pwd)/trace-half.txt" \ -+ git merge-base --all commit-9-9 commit-9-1 >actual && -+ test_trace2_data paint_down_to_common steps 81 <trace-half.txt + test_expect_success 'ref_newer:miss' ' + cat >input <<-\EOF && + A:commit-5-7 +@@ t/t6600-test-reach.sh: test_expect_success 'in_merge_bases_many:self' ' + X:commit-6-8 + EOF + echo "in_merge_bases_many(A,X):1" >expect && +- test_all_modes in_merge_bases_many ++ test_all_modes in_merge_bases_many && ++ test_paint_down_steps 45 2 25 3 + ' + + test_expect_success 'is_descendant_of:hit' ' +@@ t/t6600-test-reach.sh: test_expect_success 'get_merge_bases_many:pending-stale' ' + echo "get_merge_bases_many(A,X):" && + git rev-parse ps-B + } >expect && +- test_all_modes get_merge_bases_many ++ test_all_modes get_merge_bases_many && ++ test_paint_down_steps 6 6 6 6 + ' + + test_expect_success 'get_merge_bases_many:infinity-both-sides' ' +@@ t/t6600-test-reach.sh: test_expect_success 'get_merge_bases_many:infinity-both-sides' ' + echo "get_merge_bases_many(A,X):" && + git rev-parse pi-B + } >expect && +- test_all_modes get_merge_bases_many ++ test_all_modes get_merge_bases_many && ++ test_paint_down_steps 5 5 5 5 + ' + + test_expect_success 'setup mixed finite/INFINITY topology' ' +@@ t/t6600-test-reach.sh: test_expect_success 'get_merge_bases_many:mixed-finite-infinity' ' + echo "get_merge_bases_many(A,X):" && + git rev-parse ps-X + } >expect && +- test_all_modes get_merge_bases_many ++ test_all_modes get_merge_bases_many && ++ test_paint_down_steps 3 3 3 3 +' + ++test_expect_success 'merge-base --all commit-walk steps' ' ++ >input && ++ git rev-parse commit-9-1 >expect && ++ run_all_modes git merge-base --all commit-9-9 commit-9-1 && ++ test_paint_down_steps 81 80 81 81 + ' + test_expect_success 'reduce_heads' ' - cat >input <<-\EOF && - X:commit-1-10 5: f24edd45f0 ! 5: e82e0c72b6 commit-reach: introduce struct paint_state with per-side counters @@ Commit message (max_nonstale). This is equivalent behavior -- both conditions detect that no non-stale entries remain. - The existing nonstale_queue is left in place for ahead_behind(). + paint_queue_get() uses a "pop first" form: it dequeues a commit, + then checks the counters. This means the loop exits one iteration + earlier than the old code in some topologies (the popped stale + commit is never processed), so a few step counts drop by one. - Step counts (via trace2 from the previous commit) are identical - before and after this refactoring, confirming no behavioral change. + The existing nonstale_queue is left in place for ahead_behind(). Signed-off-by: Kristofer Karlsson <krka@spotify.com> @@ commit-reach.c: static struct commit *nonstale_queue_get_dedup(struct nonstale_q + +static struct commit *paint_queue_get(struct paint_state *state) +{ -+ struct commit *commit; ++ struct commit *commit = prio_queue_get(&state->queue); ++ ++ if (!commit) ++ return NULL; ++ ++ commit->object.flags &= ~ENQUEUED; + + if (!state->p1_count && !state->p2_count && + !state->pending_merge_bases) + return NULL; + -+ commit = prio_queue_get(&state->queue); -+ if (commit) { -+ commit->object.flags &= ~ENQUEUED; -+ paint_count_update(state, commit->object.flags, -1); -+ } ++ paint_count_update(state, commit->object.flags, -1); + return commit; +} + @@ commit-reach.c: static int paint_down_to_common(struct repository *r, trace2_data_intmax("paint_down_to_common", r, "steps", steps); commit_list_sort_by_date(result); + + ## t/t6600-test-reach.sh ## +@@ t/t6600-test-reach.sh: test_expect_success 'get_merge_bases_many:pending-stale' ' + git rev-parse ps-B + } >expect && + test_all_modes get_merge_bases_many && +- test_paint_down_steps 6 6 6 6 ++ test_paint_down_steps 5 5 5 5 + ' + + test_expect_success 'get_merge_bases_many:infinity-both-sides' ' +@@ t/t6600-test-reach.sh: test_expect_success 'get_merge_bases_many:infinity-both-sides' ' + git rev-parse pi-B + } >expect && + test_all_modes get_merge_bases_many && +- test_paint_down_steps 5 5 5 5 ++ test_paint_down_steps 5 4 5 5 + ' + + test_expect_success 'setup mixed finite/INFINITY topology' ' 6: 8c72f01083 = 6: e6181bf3c1 commit-reach: remove unused nonstale_queue dedup wrappers 7: d84b932e5b ! 7: f3572a8a89 commit-reach: terminate merge-base walk when one paint side is exhausted @@ Commit message once the walk enters the finite-generation region where ordering guarantees hold. + Widen the existing generation-monotonicity BUG assertion to fire + unconditionally, not only when min_generation is set. The + side-exhaustion optimization depends on correct generation ordering, + so the assertion should always be active. + Step counts measured with trace2 on git.git with commit-graph: merge-base --all v2.0.0 v2.55.0-rc1: @@ Documentation/technical/paint-down-to-common.adoc: existing candidates by provin ## commit-reach.c ## @@ commit-reach.c: static void paint_queue_put(struct paint_state *state, + } + } ++/* ++ * Dequeue the next commit for the paint walk, or return NULL when ++ * no more merge bases can be discovered. ++ */ static struct commit *paint_queue_get(struct paint_state *state) { -- struct commit *commit; -+ struct commit *commit = prio_queue_get(&state->queue); + struct commit *commit = prio_queue_get(&state->queue); +@@ commit-reach.c: static struct commit *paint_queue_get(struct paint_state *state) + + commit->object.flags &= ~ENQUEUED; - if (!state->p1_count && !state->p2_count && - !state->pending_merge_bases) -+ if (!commit) - return NULL; - -- commit = prio_queue_get(&state->queue); -- if (commit) { -- commit->object.flags &= ~ENQUEUED; -- paint_count_update(state, commit->object.flags, -1); -+ commit->object.flags &= ~ENQUEUED; -+ +- return NULL; + if (!state->pending_merge_bases) { ++ /* only stale entries remain */ + if (!state->p1_count && !state->p2_count) + return NULL; -+ /* -+ * Side exhaustion: a new merge-base can only form -+ * when both PARENT1-only and PARENT2-only commits -+ * remain in the queue. In the finite-generation -+ * region the queue is ordered topologically, so -+ * no future step can add paint to visited commits -+ * and an exhausted side cannot reappear. -+ */ ++ ++ /* one side is exhausted */ + if ((!state->p1_count || !state->p2_count) && + commit_graph_generation(commit) < GENERATION_NUMBER_INFINITY) + return NULL; - } -+ -+ paint_count_update(state, commit->object.flags, -1); ++ } + + paint_count_update(state, commit->object.flags, -1); return commit; - } +@@ commit-reach.c: static int paint_down_to_common(struct repository *r, + timestamp_t generation = commit_graph_generation(commit); + steps++; +- if (min_generation && generation > last_gen) ++ if (generation > last_gen) + BUG("bad generation skip %"PRItime" > %"PRItime" at %s", + generation, last_gen, + oid_to_hex(&commit->object.oid)); ## t/t6600-test-reach.sh ## -@@ t/t6600-test-reach.sh: test_expect_success 'merge-base --all commit-walk steps' ' - cp commit-graph-full .git/objects/info/commit-graph && - GIT_TRACE2_EVENT="$(pwd)/trace-full.txt" \ - git merge-base --all commit-9-9 commit-9-1 >actual && -- test_trace2_data paint_down_to_common steps 80 <trace-full.txt && -+ test_trace2_data paint_down_to_common steps 9 <trace-full.txt && +@@ t/t6600-test-reach.sh: test_expect_success 'in_merge_bases_many:self' ' + EOF + echo "in_merge_bases_many(A,X):1" >expect && + test_all_modes in_merge_bases_many && +- test_paint_down_steps 45 2 25 3 ++ test_paint_down_steps 45 1 25 1 + ' - cp commit-graph-half .git/objects/info/commit-graph && - GIT_TRACE2_EVENT="$(pwd)/trace-half.txt" \ - git merge-base --all commit-9-9 commit-9-1 >actual && -- test_trace2_data paint_down_to_common steps 81 <trace-half.txt -+ test_trace2_data paint_down_to_common steps 57 <trace-half.txt + test_expect_success 'is_descendant_of:hit' ' +@@ t/t6600-test-reach.sh: test_expect_success 'merge-base --all commit-walk steps' ' + >input && + git rev-parse commit-9-1 >expect && + run_all_modes git merge-base --all commit-9-9 commit-9-1 && +- test_paint_down_steps 81 80 81 81 ++ test_paint_down_steps 81 9 57 10 ' test_expect_success 'reduce_heads' ' -: ---------- > 8: 4b9f192d98 commit-reach: move min_generation check into paint_queue_get() -- gitgitgadget ^ permalink raw reply [flat|nested] 84+ messages in thread
* [PATCH v3 1/8] Documentation/technical: add paint-down-to-common doc 2026-06-26 13:07 ` [PATCH v3 0/8] " Kristofer Karlsson via GitGitGadget @ 2026-06-26 13:07 ` Kristofer Karlsson via GitGitGadget 2026-06-26 13:07 ` [PATCH v3 2/8] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget ` (8 subsequent siblings) 9 siblings, 0 replies; 84+ messages in thread From: Kristofer Karlsson via GitGitGadget @ 2026-06-26 13:07 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Kristofer Karlsson From: Kristofer Karlsson <krka@spotify.com> Add a technical document describing the paint_down_to_common() algorithm used for merge-base computation, covering the paint walk, generation number regions, and termination conditions. Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- Documentation/Makefile | 1 + Documentation/technical/meson.build | 1 + .../technical/paint-down-to-common.adoc | 114 ++++++++++++++++++ commit-reach.c | 6 +- 4 files changed, 121 insertions(+), 1 deletion(-) create mode 100644 Documentation/technical/paint-down-to-common.adoc diff --git a/Documentation/Makefile b/Documentation/Makefile index 2699f0b24a..f8dea4b395 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -129,6 +129,7 @@ TECH_DOCS += technical/long-running-process-protocol TECH_DOCS += technical/multi-pack-index TECH_DOCS += technical/packfile-uri TECH_DOCS += technical/pack-heuristics +TECH_DOCS += technical/paint-down-to-common TECH_DOCS += technical/parallel-checkout TECH_DOCS += technical/partial-clone TECH_DOCS += technical/platform-support diff --git a/Documentation/technical/meson.build b/Documentation/technical/meson.build index ec07088c57..9ce11d5e48 100644 --- a/Documentation/technical/meson.build +++ b/Documentation/technical/meson.build @@ -18,6 +18,7 @@ articles = [ 'multi-pack-index.adoc', 'packfile-uri.adoc', 'pack-heuristics.adoc', + 'paint-down-to-common.adoc', 'parallel-checkout.adoc', 'partial-clone.adoc', 'platform-support.adoc', diff --git a/Documentation/technical/paint-down-to-common.adoc b/Documentation/technical/paint-down-to-common.adoc new file mode 100644 index 0000000000..c10d5d2887 --- /dev/null +++ b/Documentation/technical/paint-down-to-common.adoc @@ -0,0 +1,114 @@ +Merge-Base Computation and paint_down_to_common() +================================================== + +The function `paint_down_to_common()` in `commit-reach.c` computes merge +bases by walking the commit graph backwards from two sets of tips and +finding where their ancestry meets. + +Use cases +--------- + +Computing merge bases is used in two different ways: + + 1. *Finding all merge bases* (`merge-base --all`, `merge-tree`, + `merge`, `rebase`). A merge base is a common ancestor that is + not itself an ancestor of another common ancestor. + + 2. *Ancestry checks* (`in_merge_bases`, used by `merge-base + --is-ancestor`, `branch -d`, `fetch`). These ask: "is commit A + an ancestor of commit B?" If a common ancestor equals one of the + inputs, that input is necessarily the only merge base -- no other + common ancestor can be both as recent and not an ancestor of it. + +Both use cases share the same algorithm and implementation. + +Algorithm +--------- + +Given a commit `one` and a set of commits `twos[]`, the walk paints +commits with two colors: + + - PARENT1: reachable from `one` + - PARENT2: reachable from any commit in `twos[]` + +The walk uses a priority queue ordered by generation number (falling +back to commit date when generation numbers are unavailable). Each +step dequeues the highest-priority commit (this is when we say a +commit is "visited") and propagates its paint flags to its parents, +enqueuing them if they gained new flags. When a commit receives +both PARENT1 and PARENT2, it is a merge-base candidate. A candidate +gains the STALE flag so its ancestors propagate staleness -- any +deeper common ancestor is necessarily redundant. + +INFINITY and finite generation regions +-------------------------------------- + +The commit-graph stores a generation number for each commit. Commits +not in the commit-graph have generation `GENERATION_NUMBER_INFINITY`. The +graph is closed under reachability: if a commit is in the graph, all +its ancestors are too. This partitions the commit graph into two regions: + +.... + +---------------------------------------+ + | INFINITY region | + | generation = INFINITY | + | queue order: heuristic (commit date) | + +---------------------------------------+ + | + v + +---------------------------------------+ + | Finite region | + | generation = finite | + | queue order: topological | + +---------------------------------------+ +.... + +When the commit-graph is enabled, the INFINITY region is typically +very small -- it only contains commits added since the last +commit-graph refresh. + +All reachable INFINITY-generation commits are visited before any +finite-generation commit, because INFINITY is larger than any finite +value. Once the walk crosses into the finite region, it stays there. + +In the finite region, generation ordering guarantees topological +traversal: children are always visited before their parents. This +means that paint on already-visited commits is final -- no future +traversal step can add paint to them. + +In the INFINITY region, commit-date ordering can violate this: a +parent with a later date can be visited before a child with an earlier +date. Paint flags are therefore NOT final at visit time, and a +commit visited with only one side's paint may later gain the other. + +Paint flags are only added, never removed. Since each flag can be set +at most once per commit, the number of times a commit can be +re-enqueued is bounded by the number of flag transitions. + +Termination +----------- + +The walk uses a `nonstale_queue` wrapper around `prio_queue` that +tracks `max_nonstale`: the lowest-priority non-stale commit enqueued +so far. Once that commit is dequeued, every remaining entry is known +to be STALE and the loop terminates. Specifically, the main loop +ends when one of the following conditions holds: + + 1. The queue is empty. + 2. `max_nonstale` has been dequeued, meaning the queue only contains + STALE entries. + +Stale entry condition +~~~~~~~~~~~~~~~~~~~~~ +Once all queued entries are stale, no new merge-base candidates can +be discovered -- that requires at least one non-stale commit from +each side meeting. Continuing the walk could still invalidate +existing candidates by proving one is an ancestor of another, but +`remove_redundant()` handles that as a post-processing step, so it +is safe to exit early. + +Related documentation +--------------------- + + - `Documentation/technical/commit-graph.adoc` -- generation numbers + and the reachability closure property. diff --git a/commit-reach.c b/commit-reach.c index 5df471a313..a9483759e0 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -96,7 +96,11 @@ static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue) return commit; } -/* all input commits in one and twos[] must have been parsed! */ +/* + * See Documentation/technical/paint-down-to-common.adoc + * + * All input commits in one and twos[] must have been parsed! + */ static int paint_down_to_common(struct repository *r, struct commit *one, int n, struct commit **twos, -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v3 2/8] t6600: add test cases for side-exhaustion edge cases 2026-06-26 13:07 ` [PATCH v3 0/8] " Kristofer Karlsson via GitGitGadget 2026-06-26 13:07 ` [PATCH v3 1/8] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget @ 2026-06-26 13:07 ` Elijah Newren via GitGitGadget 2026-06-26 13:08 ` [PATCH v3 3/8] t6099, t6600: add side-exhaustion regression tests Kristofer Karlsson via GitGitGadget ` (7 subsequent siblings) 9 siblings, 0 replies; 84+ messages in thread From: Elijah Newren via GitGitGadget @ 2026-06-26 13:07 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Elijah Newren From: Elijah Newren <newren@gmail.com> Add test cases to t6600-test-reach.sh that exercise edge cases in the side-exhaustion optimization for paint_down_to_common(): - in_merge_bases_many:self: commit is both A and one of the X inputs - get_merge_bases_many:duplicate-twos: duplicate entries in X list - get_merge_bases_many:pending-stale: STALE transition on an already-painted commit (ps-* diamond topology) - get_merge_bases_many:infinity-both-sides: both tips outside the commit-graph with non-monotonic dates (pi-* topology) Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- t/t6600-test-reach.sh | 111 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index b5b314e570..c2e091aad1 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -49,6 +49,62 @@ test_expect_success 'setup' ' git tag -a -m "$x-$i" tag-$x-$i commit-$x-$i || return 1 done done && + + # Build a small side topology to exercise the (PARENT1|PARENT2) -> + # (PARENT1|PARENT2|STALE) transition in paint_down_to_common(); the + # 10x10 grid above does not exercise it because no merge-base candidate + # there is a descendant of another, so STALE never reaches a + # still-pending candidate. + # + # ps-X + # /|\ + # / | \ + # ps-Z ps-B ps-W + # | / \ | + # | / \ | + # |/ \| + # ps-T1 ps-T2 + # + # where ps-T1=merge(ps-Z,ps-B), ps-T2=merge(ps-W,ps-B), so + # merge-base(ps-T1,ps-T2) = ps-B. During the walk, ps-X transitions + # to (PARENT1|PARENT2) via ps-Z and ps-W before ps-B is dequeued; + # then the STALE-walk from ps-B transitions ps-X to + # (PARENT1|PARENT2|STALE). + git checkout --orphan ps-orphan && + test_commit ps-X && + git checkout -b ps-B-br ps-X && test_commit ps-B && + git checkout -b ps-Z-br ps-X && test_commit ps-Z && + git checkout -b ps-W-br ps-X && test_commit ps-W && + git checkout -b ps-T1 ps-Z && + git merge --no-ff -m ps-T1 ps-B && + git checkout -b ps-T2 ps-W && + git merge --no-ff -m ps-T2 ps-B && + + # Build a side topology that lives entirely outside the half + # commit-graph and has non-monotonic commit dates, to exercise the + # INFINITY-gate in paint_down_to_common. With both tips outside + # the graph, generation is INFINITY and the queue falls back to + # commit-date order, which here is non-monotonic. + # + # pi-X (date 500, PARENT1 tip) --> pi-P, pi-D + # pi-D (date 480) --> pi-C + # pi-C (date 200) --> pi-B + # pi-B (date 100, PARENT2 tip) --> pi-P + # pi-P (date 450, root) + # + # merge-base(pi-X, pi-B) = pi-B (it is an ancestor of pi-X and is + # itself one of the queried tips). + git checkout --orphan pi-orphan && + test_commit --date "@450 +0000" pi-P && + test_commit --date "@100 +0000" pi-B && + test_commit --date "@200 +0000" pi-C && + test_commit --date "@480 +0000" pi-D && + GIT_AUTHOR_DATE="@500 +0000" GIT_COMMITTER_DATE="@500 +0000" \ + git commit-tree -p pi-D -p pi-P -m pi-X pi-D^{tree} >pi-X-oid && + pi_x="$(cat pi-X-oid)" && + git branch -f pi-X-br "$pi_x" && + git tag pi-X "$pi_x" && + git commit-graph write --reachable && mv .git/objects/info/commit-graph commit-graph-full && chmod u+w commit-graph-full && @@ -146,6 +202,16 @@ test_expect_success 'in_merge_bases_many:miss-heuristic' ' test_all_modes in_merge_bases_many ' +test_expect_success 'in_merge_bases_many:self' ' + cat >input <<-\EOF && + A:commit-6-8 + X:commit-5-9 + X:commit-6-8 + EOF + echo "in_merge_bases_many(A,X):1" >expect && + test_all_modes in_merge_bases_many +' + test_expect_success 'is_descendant_of:hit' ' cat >input <<-\EOF && A:commit-5-7 @@ -183,6 +249,51 @@ test_expect_success 'get_merge_bases_many' ' test_all_modes get_merge_bases_many ' +test_expect_success 'get_merge_bases_many:duplicate-twos' ' + cat >input <<-\EOF && + A:commit-5-7 + X:commit-4-8 + X:commit-4-8 + X:commit-6-6 + X:commit-6-6 + X:commit-8-3 + EOF + { + echo "get_merge_bases_many(A,X):" && + git rev-parse commit-5-6 \ + commit-4-7 | sort + } >expect && + test_all_modes get_merge_bases_many +' + +test_expect_success 'get_merge_bases_many:pending-stale' ' + # Exercises the (PARENT1|PARENT2) -> (...|STALE) transition path in + # paint_down_to_common(). See the topology comment in the setup test. + cat >input <<-\EOF && + A:ps-T1 + X:ps-T2 + EOF + { + echo "get_merge_bases_many(A,X):" && + git rev-parse ps-B + } >expect && + test_all_modes get_merge_bases_many +' + +test_expect_success 'get_merge_bases_many:infinity-both-sides' ' + # Exercises the push-time INFINITY-gate in paint_down_to_common(). See + # the pi-* topology comment in the setup test. + cat >input <<-\EOF && + A:pi-X + X:pi-B + EOF + { + echo "get_merge_bases_many(A,X):" && + git rev-parse pi-B + } >expect && + test_all_modes get_merge_bases_many +' + test_expect_success 'reduce_heads' ' cat >input <<-\EOF && X:commit-1-10 -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v3 3/8] t6099, t6600: add side-exhaustion regression tests 2026-06-26 13:07 ` [PATCH v3 0/8] " Kristofer Karlsson via GitGitGadget 2026-06-26 13:07 ` [PATCH v3 1/8] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget 2026-06-26 13:07 ` [PATCH v3 2/8] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget @ 2026-06-26 13:08 ` Kristofer Karlsson via GitGitGadget 2026-06-26 13:08 ` [PATCH v3 4/8] commit-reach: add trace2 instrumentation to paint_down_to_common() Kristofer Karlsson via GitGitGadget ` (6 subsequent siblings) 9 siblings, 0 replies; 84+ messages in thread From: Kristofer Karlsson via GitGitGadget @ 2026-06-26 13:08 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Kristofer Karlsson From: Kristofer Karlsson <krka@spotify.com> Add t6099 to test the case where multiple merge-base candidates exist and one is an ancestor of another. This exercises the side-exhaustion optimization in paint_down_to_common together with the remove_redundant safety net in get_merge_bases_many_0. Add a mixed finite/INFINITY test to t6600 where one tip is outside the commit-graph (INFINITY generation) and the other is inside. This exercises the region transition: the walk starts in the INFINITY region where side-exhaustion is disabled, then crosses into the finite region where it can fire. Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- t/meson.build | 1 + t/t6099-merge-base-side-exhaustion.sh | 82 +++++++++++++++++++++++++++ t/t6600-test-reach.sh | 25 ++++++++ 3 files changed, 108 insertions(+) create mode 100755 t/t6099-merge-base-side-exhaustion.sh diff --git a/t/meson.build b/t/meson.build index 3219264fe7..ee6ebdffb9 100644 --- a/t/meson.build +++ b/t/meson.build @@ -786,6 +786,7 @@ integration_tests = [ 't6041-bisect-submodule.sh', 't6050-replace.sh', 't6060-merge-index.sh', + 't6099-merge-base-side-exhaustion.sh', 't6100-rev-list-in-order.sh', 't6101-rev-parse-parents.sh', 't6102-rev-list-unexpected-objects.sh', diff --git a/t/t6099-merge-base-side-exhaustion.sh b/t/t6099-merge-base-side-exhaustion.sh new file mode 100755 index 0000000000..4f1e0d50ef --- /dev/null +++ b/t/t6099-merge-base-side-exhaustion.sh @@ -0,0 +1,82 @@ +#!/bin/sh + +test_description='merge-base with ancestor among merge-base candidates + +Test that merge-base --all correctly handles cases where +multiple merge-base candidates exist and one is an ancestor +of another. The side-exhaustion optimization in +paint_down_to_common may exit before STALE propagation +removes the ancestor, but remove_redundant catches it. + +Graph shape (parents are below children): + + A ----------- X + |\ /| + | B---------/ | + | | | + e2 \ f2 + | | | + e1 d1 f1 + \ | / + \ | / + \| / + C + +A and X are the two tips. +B and C are both reachable from A and X. +B reaches C through d1. +Only B should appear in merge-base --all output. +' + +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME + +TEST_PASSES_SANITIZE_LEAK=true +. ./test-lib.sh + +test_expect_success 'setup ancestor merge-base candidate' ' + test_commit C && + + git checkout -b d-chain HEAD && + test_commit d1 && + test_commit B && + + git checkout -b e-path C && + test_commit e1 && + test_commit e2 && + + git checkout -b f-path C && + test_commit f1 && + test_commit f2 && + + git checkout -b branch-A e-path && + test_merge A B && + + git checkout -b branch-X f-path && + test_merge X B && + + git commit-graph write --reachable +' + +test_expect_success 'merge-base --all excludes ancestor candidate' ' + git rev-parse B >expected && + git merge-base --all A X >actual && + test_cmp expected actual +' + +test_expect_success 'merge-base (single) finds shallowest' ' + git rev-parse B >expected && + git merge-base A X >actual && + test_cmp expected actual +' + +# Without commit-graph: generation numbers are INFINITY, +# side-exhaustion optimization does not fire. +test_expect_success 'merge-base --all without commit-graph' ' + rm -f .git/objects/info/commit-graph && + git rev-parse B >expected && + git merge-base --all A X >actual && + test_cmp expected actual +' + +test_done diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index c2e091aad1..4b771b4c58 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -294,6 +294,31 @@ test_expect_success 'get_merge_bases_many:infinity-both-sides' ' test_all_modes get_merge_bases_many ' +test_expect_success 'setup mixed finite/INFINITY topology' ' + # Create a commit outside all saved commit-graph files so it always + # has INFINITY generation, while its parent (ps-X) is in the graph + # with a finite generation. Use the ps-* orphan topology so we do + # not pollute the grid-based rev-list tests. + git checkout ps-X && + test_env GIT_TEST_COMMIT_GRAPH= test_commit pm-INF +' + +test_expect_success 'get_merge_bases_many:mixed-finite-infinity' ' + # One tip (pm-INF) is outside the commit-graph with INFINITY + # generation; the other (ps-B) is in the graph with finite + # generation. The walk starts in the INFINITY region and crosses + # into the finite region where side-exhaustion can fire. + cat >input <<-\EOF && + A:pm-INF + X:ps-B + EOF + { + echo "get_merge_bases_many(A,X):" && + git rev-parse ps-X + } >expect && + test_all_modes get_merge_bases_many +' + test_expect_success 'reduce_heads' ' cat >input <<-\EOF && X:commit-1-10 -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v3 4/8] commit-reach: add trace2 instrumentation to paint_down_to_common() 2026-06-26 13:07 ` [PATCH v3 0/8] " Kristofer Karlsson via GitGitGadget ` (2 preceding siblings ...) 2026-06-26 13:08 ` [PATCH v3 3/8] t6099, t6600: add side-exhaustion regression tests Kristofer Karlsson via GitGitGadget @ 2026-06-26 13:08 ` Kristofer Karlsson via GitGitGadget 2026-06-26 14:31 ` Derrick Stolee 2026-06-26 13:08 ` [PATCH v3 5/8] commit-reach: introduce struct paint_state with per-side counters Kristofer Karlsson via GitGitGadget ` (5 subsequent siblings) 9 siblings, 1 reply; 84+ messages in thread From: Kristofer Karlsson via GitGitGadget @ 2026-06-26 13:08 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Kristofer Karlsson From: Kristofer Karlsson <krka@spotify.com> Add a step counter and trace2_data_intmax() call so that the number of commits visited during the paint walk is observable via GIT_TRACE2_EVENT. This provides a way to measure the impact of future optimizations without relying on wall-clock benchmarks alone. Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- commit-reach.c | 5 ++++ t/t6600-test-reach.sh | 53 ++++++++++++++++++++++++++++++------------- 2 files changed, 42 insertions(+), 16 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index a9483759e0..f6a438550b 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -11,6 +11,7 @@ #include "tag.h" #include "commit-reach.h" #include "ewah/ewok.h" +#include "trace2.h" /* Remember to update object flag allocation in object.h */ #define PARENT1 (1u<<16) @@ -112,6 +113,7 @@ static int paint_down_to_common(struct repository *r, { compare_commits_by_gen_then_commit_date } }; int i; + int steps = 0; timestamp_t last_gen = GENERATION_NUMBER_INFINITY; struct commit_list **tail = result; @@ -135,6 +137,7 @@ static int paint_down_to_common(struct repository *r, struct commit_list *parents; int flags; timestamp_t generation = commit_graph_generation(commit); + steps++; if (min_generation && generation > last_gen) BUG("bad generation skip %"PRItime" > %"PRItime" at %s", @@ -190,6 +193,8 @@ static int paint_down_to_common(struct repository *r, } clear_nonstale_queue(&queue); + trace2_data_intmax("paint_down_to_common", r, + "steps", steps); commit_list_sort_by_date(result); return 0; } diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index 4b771b4c58..b3a31b80ac 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -118,24 +118,34 @@ test_expect_success 'setup' ' ' run_all_modes () { - test_when_finished rm -rf .git/objects/info/commit-graph && - "$@" <input >actual && - test_cmp expect actual && - cp commit-graph-full .git/objects/info/commit-graph && - "$@" <input >actual && - test_cmp expect actual && - cp commit-graph-half .git/objects/info/commit-graph && - "$@" <input >actual && - test_cmp expect actual && - cp commit-graph-no-gdat .git/objects/info/commit-graph && - "$@" <input >actual && - test_cmp expect actual + graph=.git/objects/info/commit-graph && + test_when_finished rm -rf "$graph" "${graph}s" && + rm -f trace-mode-*.txt && + + for mode in none full half no-gdat + do + rm -rf "$graph" "${graph}s" && + cp "commit-graph-${mode}" "$graph" 2>/dev/null || + true && + GIT_TRACE2_EVENT="$(pwd)/trace-mode-${mode}.txt" \ + "$@" <input >actual && + test_cmp expect actual || return 1 + done } test_all_modes () { run_all_modes test-tool reach "$@" } +test_paint_down_steps () { + for mode in none full half no-gdat + do + test_trace2_data paint_down_to_common steps "$1" \ + <"trace-mode-${mode}.txt" || return 1 + shift + done +} + test_expect_success 'ref_newer:miss' ' cat >input <<-\EOF && A:commit-5-7 @@ -209,7 +219,8 @@ test_expect_success 'in_merge_bases_many:self' ' X:commit-6-8 EOF echo "in_merge_bases_many(A,X):1" >expect && - test_all_modes in_merge_bases_many + test_all_modes in_merge_bases_many && + test_paint_down_steps 45 2 25 3 ' test_expect_success 'is_descendant_of:hit' ' @@ -277,7 +288,8 @@ test_expect_success 'get_merge_bases_many:pending-stale' ' echo "get_merge_bases_many(A,X):" && git rev-parse ps-B } >expect && - test_all_modes get_merge_bases_many + test_all_modes get_merge_bases_many && + test_paint_down_steps 6 6 6 6 ' test_expect_success 'get_merge_bases_many:infinity-both-sides' ' @@ -291,7 +303,8 @@ test_expect_success 'get_merge_bases_many:infinity-both-sides' ' echo "get_merge_bases_many(A,X):" && git rev-parse pi-B } >expect && - test_all_modes get_merge_bases_many + test_all_modes get_merge_bases_many && + test_paint_down_steps 5 5 5 5 ' test_expect_success 'setup mixed finite/INFINITY topology' ' @@ -316,7 +329,15 @@ test_expect_success 'get_merge_bases_many:mixed-finite-infinity' ' echo "get_merge_bases_many(A,X):" && git rev-parse ps-X } >expect && - test_all_modes get_merge_bases_many + test_all_modes get_merge_bases_many && + test_paint_down_steps 3 3 3 3 +' + +test_expect_success 'merge-base --all commit-walk steps' ' + >input && + git rev-parse commit-9-1 >expect && + run_all_modes git merge-base --all commit-9-9 commit-9-1 && + test_paint_down_steps 81 80 81 81 ' test_expect_success 'reduce_heads' ' -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
* Re: [PATCH v3 4/8] commit-reach: add trace2 instrumentation to paint_down_to_common() 2026-06-26 13:08 ` [PATCH v3 4/8] commit-reach: add trace2 instrumentation to paint_down_to_common() Kristofer Karlsson via GitGitGadget @ 2026-06-26 14:31 ` Derrick Stolee 2026-06-26 14:35 ` Kristofer Karlsson 0 siblings, 1 reply; 84+ messages in thread From: Derrick Stolee @ 2026-06-26 14:31 UTC (permalink / raw) To: Kristofer Karlsson via GitGitGadget, git Cc: Elijah Newren, Kristofer Karlsson On 6/26/2026 9:08 AM, Kristofer Karlsson via GitGitGadget wrote: > From: Kristofer Karlsson <krka@spotify.com> > run_all_modes () { > - test_when_finished rm -rf .git/objects/info/commit-graph && > - "$@" <input >actual && > - test_cmp expect actual && > - cp commit-graph-full .git/objects/info/commit-graph && > - "$@" <input >actual && > - test_cmp expect actual && > - cp commit-graph-half .git/objects/info/commit-graph && > - "$@" <input >actual && > - test_cmp expect actual && > - cp commit-graph-no-gdat .git/objects/info/commit-graph && > - "$@" <input >actual && > - test_cmp expect actual > + graph=.git/objects/info/commit-graph && > + test_when_finished rm -rf "$graph" "${graph}s" && > + rm -f trace-mode-*.txt && > + > + for mode in none full half no-gdat > + do > + rm -rf "$graph" "${graph}s" && > + cp "commit-graph-${mode}" "$graph" 2>/dev/null || > + true && > + GIT_TRACE2_EVENT="$(pwd)/trace-mode-${mode}.txt" \ > + "$@" <input >actual && > + test_cmp expect actual || return 1 > + done > } Thank you for putting these traces into this helper AND for making it cleaner at the same time! > +test_paint_down_steps () { > + for mode in none full half no-gdat > + do > + test_trace2_data paint_down_to_common steps "$1" \ > + <"trace-mode-${mode}.txt" || return 1 > + shift > + done > +} > + > test_expect_success 'ref_newer:miss' ' > cat >input <<-\EOF && > A:commit-5-7 > @@ -209,7 +219,8 @@ test_expect_success 'in_merge_bases_many:self' ' > X:commit-6-8 > EOF > echo "in_merge_bases_many(A,X):1" >expect && > - test_all_modes in_merge_bases_many > + test_all_modes in_merge_bases_many && > + test_paint_down_steps 45 2 25 3 > ' oooh that's clean. Thanks! Way to over-achieve here. Thanks for going the extra mile with this patch. Thanks, -Stolee ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v3 4/8] commit-reach: add trace2 instrumentation to paint_down_to_common() 2026-06-26 14:31 ` Derrick Stolee @ 2026-06-26 14:35 ` Kristofer Karlsson 0 siblings, 0 replies; 84+ messages in thread From: Kristofer Karlsson @ 2026-06-26 14:35 UTC (permalink / raw) To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren On Fri, 26 Jun 2026 at 16:31, Derrick Stolee <stolee@gmail.com> wrote: > > > test_expect_success 'ref_newer:miss' ' > > cat >input <<-\EOF && > > A:commit-5-7 > > @@ -209,7 +219,8 @@ test_expect_success 'in_merge_bases_many:self' ' > > X:commit-6-8 > > EOF > > echo "in_merge_bases_many(A,X):1" >expect && > > - test_all_modes in_merge_bases_many > > + test_all_modes in_merge_bases_many && > > + test_paint_down_steps 45 2 25 3 > > ' > > oooh that's clean. Thanks! > > Way to over-achieve here. Thanks for going the extra mile with > this patch. Thanks! I did not want to change it too much but this felt like a natural place to simplify it a bit. I also have a local branch now for adding a test_trace2_data_singular helper function that provides more diagnostics on failures (show expected vs actual similar to test_cmp) but I'll submit that separately later to limit the scope creep here. Thanks, Kristofer ^ permalink raw reply [flat|nested] 84+ messages in thread
* [PATCH v3 5/8] commit-reach: introduce struct paint_state with per-side counters 2026-06-26 13:07 ` [PATCH v3 0/8] " Kristofer Karlsson via GitGitGadget ` (3 preceding siblings ...) 2026-06-26 13:08 ` [PATCH v3 4/8] commit-reach: add trace2 instrumentation to paint_down_to_common() Kristofer Karlsson via GitGitGadget @ 2026-06-26 13:08 ` Kristofer Karlsson via GitGitGadget 2026-06-26 21:13 ` René Scharfe 2026-06-26 13:08 ` [PATCH v3 6/8] commit-reach: remove unused nonstale_queue dedup wrappers Kristofer Karlsson via GitGitGadget ` (4 subsequent siblings) 9 siblings, 1 reply; 84+ messages in thread From: Kristofer Karlsson via GitGitGadget @ 2026-06-26 13:08 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Kristofer Karlsson From: Kristofer Karlsson <krka@spotify.com> Add a paint_state struct for use by paint_down_to_common() that wraps a prio_queue with per-side commit counters. Each non-stale queued commit occupies exactly one counter bucket based on its paint flags: PARENT1-only, PARENT2-only, or both sides (a pending merge-base candidate). The counters are maintained by paint_count_update() which adjusts the appropriate bucket by a signed delta. An exhaustive switch on the paint+stale bits documents all valid flag combinations in one place. Convert paint_down_to_common() to use paint_state. The loop now drains the queue via paint_queue_get() which returns NULL when all counters reach zero, replacing the old pointer-based termination (max_nonstale). This is equivalent behavior -- both conditions detect that no non-stale entries remain. paint_queue_get() uses a "pop first" form: it dequeues a commit, then checks the counters. This means the loop exits one iteration earlier than the old code in some topologies (the popped stale commit is never processed), so a few step counts drop by one. The existing nonstale_queue is left in place for ahead_behind(). Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- .../technical/paint-down-to-common.adoc | 9 +- commit-reach.c | 94 ++++++++++++++++--- t/t6600-test-reach.sh | 4 +- 3 files changed, 85 insertions(+), 22 deletions(-) diff --git a/Documentation/technical/paint-down-to-common.adoc b/Documentation/technical/paint-down-to-common.adoc index c10d5d2887..0f4e1892a5 100644 --- a/Documentation/technical/paint-down-to-common.adoc +++ b/Documentation/technical/paint-down-to-common.adoc @@ -88,15 +88,12 @@ re-enqueued is bounded by the number of flag transitions. Termination ----------- -The walk uses a `nonstale_queue` wrapper around `prio_queue` that -tracks `max_nonstale`: the lowest-priority non-stale commit enqueued -so far. Once that commit is dequeued, every remaining entry is known -to be STALE and the loop terminates. Specifically, the main loop +The walk tracks the number of commits of each type in the queue +(PARENT1-only, PARENT2-only, pending merge-base). The main loop ends when one of the following conditions holds: 1. The queue is empty. - 2. `max_nonstale` has been dequeued, meaning the queue only contains - STALE entries. + 2. The queue contains only stale entries. Stale entry condition ~~~~~~~~~~~~~~~~~~~~~ diff --git a/commit-reach.c b/commit-reach.c index f6a438550b..0f29b143bd 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -97,6 +97,75 @@ static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue) return commit; } +/* + * Priority queue with per-side commit counters for paint_down_to_common(). + * Each non-stale queued commit occupies exactly one bucket: PARENT1-only, + * PARENT2-only, or both (a pending merge-base candidate). + */ +struct paint_state { + struct prio_queue queue; + int p1_count; + int p2_count; + int pending_merge_bases; +}; + +static void paint_count_update(struct paint_state *state, + unsigned flags, int delta) +{ + switch (flags & (PARENT1 | PARENT2 | STALE)) { + case PARENT1: + state->p1_count += delta; + break; + + case PARENT2: + state->p2_count += delta; + break; + + case PARENT1 | PARENT2: + state->pending_merge_bases += delta; + break; + + case PARENT1 | PARENT2 | STALE: + break; + + default: + BUG("unexpected paint state"); + } +} + +static void paint_queue_put(struct paint_state *state, + struct commit *c, unsigned add_flags) +{ + unsigned old_flags = c->object.flags; + c->object.flags |= add_flags; + + if (old_flags & ENQUEUED) { + paint_count_update(state, old_flags, -1); + paint_count_update(state, c->object.flags, 1); + } else { + c->object.flags |= ENQUEUED; + prio_queue_put(&state->queue, c); + paint_count_update(state, c->object.flags, 1); + } +} + +static struct commit *paint_queue_get(struct paint_state *state) +{ + struct commit *commit = prio_queue_get(&state->queue); + + if (!commit) + return NULL; + + commit->object.flags &= ~ENQUEUED; + + if (!state->p1_count && !state->p2_count && + !state->pending_merge_bases) + return NULL; + + paint_count_update(state, commit->object.flags, -1); + return commit; +} + /* * See Documentation/technical/paint-down-to-common.adoc * @@ -109,31 +178,29 @@ static int paint_down_to_common(struct repository *r, enum merge_base_flags mb_flags, struct commit_list **result) { - struct nonstale_queue queue = { - { compare_commits_by_gen_then_commit_date } + struct paint_state state = { + .queue = { compare_commits_by_gen_then_commit_date } }; + struct commit *commit; int i; int steps = 0; timestamp_t last_gen = GENERATION_NUMBER_INFINITY; struct commit_list **tail = result; if (!min_generation && !corrected_commit_dates_enabled(r)) - queue.pq.compare = compare_commits_by_commit_date; + state.queue.compare = compare_commits_by_commit_date; one->object.flags |= PARENT1; if (!n) { commit_list_append(one, result); return 0; } - nonstale_queue_put_dedup(&queue, one); + paint_queue_put(&state, one, 0); - for (i = 0; i < n; i++) { - twos[i]->object.flags |= PARENT2; - nonstale_queue_put_dedup(&queue, twos[i]); - } + for (i = 0; i < n; i++) + paint_queue_put(&state, twos[i], PARENT2); - while (queue.max_nonstale) { - struct commit *commit = nonstale_queue_get_dedup(&queue); + while ((commit = paint_queue_get(&state))) { struct commit_list *parents; int flags; timestamp_t generation = commit_graph_generation(commit); @@ -172,7 +239,7 @@ static int paint_down_to_common(struct repository *r, if ((p->object.flags & flags) == flags) continue; if (repo_parse_commit(r, p)) { - clear_nonstale_queue(&queue); + clear_prio_queue(&state.queue); commit_list_free(*result); *result = NULL; /* @@ -187,12 +254,11 @@ static int paint_down_to_common(struct repository *r, return error(_("could not parse commit %s"), oid_to_hex(&p->object.oid)); } - p->object.flags |= flags; - nonstale_queue_put_dedup(&queue, p); + paint_queue_put(&state, p, flags); } } - clear_nonstale_queue(&queue); + clear_prio_queue(&state.queue); trace2_data_intmax("paint_down_to_common", r, "steps", steps); commit_list_sort_by_date(result); diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index b3a31b80ac..51f3d70492 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -289,7 +289,7 @@ test_expect_success 'get_merge_bases_many:pending-stale' ' git rev-parse ps-B } >expect && test_all_modes get_merge_bases_many && - test_paint_down_steps 6 6 6 6 + test_paint_down_steps 5 5 5 5 ' test_expect_success 'get_merge_bases_many:infinity-both-sides' ' @@ -304,7 +304,7 @@ test_expect_success 'get_merge_bases_many:infinity-both-sides' ' git rev-parse pi-B } >expect && test_all_modes get_merge_bases_many && - test_paint_down_steps 5 5 5 5 + test_paint_down_steps 5 4 5 5 ' test_expect_success 'setup mixed finite/INFINITY topology' ' -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
* Re: [PATCH v3 5/8] commit-reach: introduce struct paint_state with per-side counters 2026-06-26 13:08 ` [PATCH v3 5/8] commit-reach: introduce struct paint_state with per-side counters Kristofer Karlsson via GitGitGadget @ 2026-06-26 21:13 ` René Scharfe 2026-06-26 21:57 ` Kristofer Karlsson 0 siblings, 1 reply; 84+ messages in thread From: René Scharfe @ 2026-06-26 21:13 UTC (permalink / raw) To: Kristofer Karlsson via GitGitGadget, git Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson On 6/26/26 3:08 PM, Kristofer Karlsson via GitGitGadget wrote: > > diff --git a/commit-reach.c b/commit-reach.c > index f6a438550b..0f29b143bd 100644 > --- a/commit-reach.c > +++ b/commit-reach.c > @@ -97,6 +97,75 @@ static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue) > return commit; > } > > +/* > + * Priority queue with per-side commit counters for paint_down_to_common(). > + * Each non-stale queued commit occupies exactly one bucket: PARENT1-only, > + * PARENT2-only, or both (a pending merge-base candidate). > + */ > +struct paint_state { > + struct prio_queue queue; > + int p1_count; > + int p2_count; > + int pending_merge_bases; > +}; Can they become negative? Wouldn't size_t be a more natural fit, matching nr from struct prio_queue? And some bikeshedding: Why abbreviate? parent1_count and parent2_count would be slightly easier to read and associate with PARENT1 and PARENT2. And pending_merge_bases is a counter as well. Why not call it like that, pending_merge_base_count? Well, that's pretty long. both_count? That's quite generic and nondescript. Call the other counters parents1 and parents2? Nah. Or parent1s and parent2s? Not sure why this inconsistency bothers me to begin with. René ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v3 5/8] commit-reach: introduce struct paint_state with per-side counters 2026-06-26 21:13 ` René Scharfe @ 2026-06-26 21:57 ` Kristofer Karlsson 0 siblings, 0 replies; 84+ messages in thread From: Kristofer Karlsson @ 2026-06-26 21:57 UTC (permalink / raw) To: René Scharfe Cc: Kristofer Karlsson via GitGitGadget, git, Derrick Stolee, Elijah Newren On Fri, 26 Jun 2026 at 23:13, René Scharfe <l.s.r@web.de> wrote: > > > +struct paint_state { > > + struct prio_queue queue; > > + int p1_count; > > + int p2_count; > > + int pending_merge_bases; > > +}; > Can they become negative? Wouldn't size_t be a more natural fit, > matching nr from struct prio_queue? Negative would be a clear indication of a bug though that's not checked right now anyway. And since it's not checked we might as well use size_t instead - and it would technically be more correct though I struggle to imagine a case where the number of active elements in the frontier exceeds 2^31 or whatever a signed int would give. I am happy to change to size_t. > And some bikeshedding: > > Why abbreviate? parent1_count and parent2_count would be slightly > easier to read and associate with PARENT1 and PARENT2. > > And pending_merge_bases is a counter as well. Why not call it > like that, pending_merge_base_count? Well, that's pretty long. > both_count? That's quite generic and nondescript. Call the other > counters parents1 and parents2? Nah. Or parent1s and parent2s? > Not sure why this inconsistency bothers me to begin with. Fair point, I was thinking that the surrounding context is so small that the naming almost doesn't matter - the terms don't escape paint_down_to_common. I am happy to change to something like: parent1_count, parent2_count, mb_candidate_count to make it more consistent. It seems the mb_ prefix is already used for merge bases in some files - best example is perhaps builtin/diff.c I see in the codebase that we are using multiple styles, perhaps depending on specific context. - nr_ prefix: nr_objects, nr_paths_watching - num_ prefix: num_commits, num_hashes, num_workers - _count suffix: entry_count, max_count, skip_count so I think _count suffix is a good choice at least - it matches other usages where we typically just increment or decrement. Thanks, Kristofer ^ permalink raw reply [flat|nested] 84+ messages in thread
* [PATCH v3 6/8] commit-reach: remove unused nonstale_queue dedup wrappers 2026-06-26 13:07 ` [PATCH v3 0/8] " Kristofer Karlsson via GitGitGadget ` (4 preceding siblings ...) 2026-06-26 13:08 ` [PATCH v3 5/8] commit-reach: introduce struct paint_state with per-side counters Kristofer Karlsson via GitGitGadget @ 2026-06-26 13:08 ` Kristofer Karlsson via GitGitGadget 2026-06-26 13:08 ` [PATCH v3 7/8] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget ` (3 subsequent siblings) 9 siblings, 0 replies; 84+ messages in thread From: Kristofer Karlsson via GitGitGadget @ 2026-06-26 13:08 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Kristofer Karlsson From: Kristofer Karlsson <krka@spotify.com> nonstale_queue_put_dedup() and nonstale_queue_get_dedup() became unused after the previous commit. The core nonstale_queue functions remain in use by ahead_behind(). Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- commit-reach.c | 18 ------------------ 1 file changed, 18 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index 0f29b143bd..ee0e0fdf6e 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -79,24 +79,6 @@ static void clear_nonstale_queue(struct nonstale_queue *queue) queue->max_nonstale = NULL; } -static void nonstale_queue_put_dedup(struct nonstale_queue *queue, - struct commit *c) -{ - if (c->object.flags & ENQUEUED) - return; - c->object.flags |= ENQUEUED; - nonstale_queue_put(queue, c); -} - -static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue) -{ - struct commit *commit = nonstale_queue_get(queue); - - if (commit) - commit->object.flags &= ~ENQUEUED; - return commit; -} - /* * Priority queue with per-side commit counters for paint_down_to_common(). * Each non-stale queued commit occupies exactly one bucket: PARENT1-only, -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v3 7/8] commit-reach: terminate merge-base walk when one paint side is exhausted 2026-06-26 13:07 ` [PATCH v3 0/8] " Kristofer Karlsson via GitGitGadget ` (5 preceding siblings ...) 2026-06-26 13:08 ` [PATCH v3 6/8] commit-reach: remove unused nonstale_queue dedup wrappers Kristofer Karlsson via GitGitGadget @ 2026-06-26 13:08 ` Kristofer Karlsson via GitGitGadget 2026-06-26 14:29 ` Kristofer Karlsson 2026-06-26 14:35 ` Derrick Stolee 2026-06-26 13:08 ` [PATCH v3 8/8] commit-reach: move min_generation check into paint_queue_get() Kristofer Karlsson via GitGitGadget ` (2 subsequent siblings) 9 siblings, 2 replies; 84+ messages in thread From: Kristofer Karlsson via GitGitGadget @ 2026-06-26 13:08 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Kristofer Karlsson From: Kristofer Karlsson <krka@spotify.com> Add an early termination check to paint_down_to_common() using the per-side counters introduced earlier. Once the walk enters the finite-generation region, terminate early when one side's exclusive count drops to zero -- no new merge-base can form without both paint sides meeting. The check also waits for pending_merge_bases to reach zero, ensuring all merge-base candidates have been dequeued and recorded before exiting. The INFINITY gate ensures correctness: commits without a commit-graph entry have GENERATION_NUMBER_INFINITY and are ordered by commit date, which is not topologically reliable. The optimization only fires once the walk enters the finite-generation region where ordering guarantees hold. Widen the existing generation-monotonicity BUG assertion to fire unconditionally, not only when min_generation is set. The side-exhaustion optimization depends on correct generation ordering, so the assertion should always be active. Step counts measured with trace2 on git.git with commit-graph: merge-base --all v2.0.0 v2.55.0-rc1: before: 72264 steps after: 44589 steps merge-base --all v2.55.0-rc1 v2.55.0-rc1~5: before: 110 steps after: 7 steps Helped-by: Derrick Stolee <stolee@gmail.com> Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- .../technical/paint-down-to-common.adoc | 17 +++++++++++++++++ commit-reach.c | 19 +++++++++++++++---- t/t6600-test-reach.sh | 4 ++-- 3 files changed, 34 insertions(+), 6 deletions(-) diff --git a/Documentation/technical/paint-down-to-common.adoc b/Documentation/technical/paint-down-to-common.adoc index 0f4e1892a5..983dfcf233 100644 --- a/Documentation/technical/paint-down-to-common.adoc +++ b/Documentation/technical/paint-down-to-common.adoc @@ -94,6 +94,9 @@ ends when one of the following conditions holds: 1. The queue is empty. 2. The queue contains only stale entries. + 3. Side exhaustion: no pure PARENT1 or pure PARENT2 commits + remain in the queue, no pending merge-base candidates exist, + and the walk has entered the finite-generation region. Stale entry condition ~~~~~~~~~~~~~~~~~~~~~ @@ -104,6 +107,20 @@ existing candidates by proving one is an ancestor of another, but `remove_redundant()` handles that as a post-processing step, so it is safe to exit early. +Side-exhaustion condition +~~~~~~~~~~~~~~~~~~~~~~~~~ +A new merge-base requires commits from both sides to meet. When one +side's exclusive counter reaches zero and there are no pending +merge-base candidates, no future traversal step can produce a new +candidate. + +This optimization only activates in the finite-generation region +where topological ordering holds. In that region, children are +always visited before parents, so paint flags are final at visit +time and an exhausted side cannot reappear. In the INFINITY region, +commit-date ordering can violate this guarantee, so the check is +skipped. + Related documentation --------------------- diff --git a/commit-reach.c b/commit-reach.c index ee0e0fdf6e..0248d6fedb 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -131,6 +131,10 @@ static void paint_queue_put(struct paint_state *state, } } +/* + * Dequeue the next commit for the paint walk, or return NULL when + * no more merge bases can be discovered. + */ static struct commit *paint_queue_get(struct paint_state *state) { struct commit *commit = prio_queue_get(&state->queue); @@ -140,9 +144,16 @@ static struct commit *paint_queue_get(struct paint_state *state) commit->object.flags &= ~ENQUEUED; - if (!state->p1_count && !state->p2_count && - !state->pending_merge_bases) - return NULL; + if (!state->pending_merge_bases) { + /* only stale entries remain */ + if (!state->p1_count && !state->p2_count) + return NULL; + + /* one side is exhausted */ + if ((!state->p1_count || !state->p2_count) && + commit_graph_generation(commit) < GENERATION_NUMBER_INFINITY) + return NULL; + } paint_count_update(state, commit->object.flags, -1); return commit; @@ -188,7 +199,7 @@ static int paint_down_to_common(struct repository *r, timestamp_t generation = commit_graph_generation(commit); steps++; - if (min_generation && generation > last_gen) + if (generation > last_gen) BUG("bad generation skip %"PRItime" > %"PRItime" at %s", generation, last_gen, oid_to_hex(&commit->object.oid)); diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index 51f3d70492..6365007560 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -220,7 +220,7 @@ test_expect_success 'in_merge_bases_many:self' ' EOF echo "in_merge_bases_many(A,X):1" >expect && test_all_modes in_merge_bases_many && - test_paint_down_steps 45 2 25 3 + test_paint_down_steps 45 1 25 1 ' test_expect_success 'is_descendant_of:hit' ' @@ -337,7 +337,7 @@ test_expect_success 'merge-base --all commit-walk steps' ' >input && git rev-parse commit-9-1 >expect && run_all_modes git merge-base --all commit-9-9 commit-9-1 && - test_paint_down_steps 81 80 81 81 + test_paint_down_steps 81 9 57 10 ' test_expect_success 'reduce_heads' ' -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
* Re: [PATCH v3 7/8] commit-reach: terminate merge-base walk when one paint side is exhausted 2026-06-26 13:08 ` [PATCH v3 7/8] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget @ 2026-06-26 14:29 ` Kristofer Karlsson 2026-06-26 14:32 ` Derrick Stolee 2026-06-26 14:35 ` Derrick Stolee 1 sibling, 1 reply; 84+ messages in thread From: Kristofer Karlsson @ 2026-06-26 14:29 UTC (permalink / raw) To: Kristofer Karlsson via GitGitGadget; +Cc: git, Derrick Stolee, Elijah Newren On Fri, 26 Jun 2026 at 15:08, Kristofer Karlsson via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Kristofer Karlsson <krka@spotify.com> > > - if (min_generation && generation > last_gen) > + if (generation > last_gen) I have to note that I accidentally pushed this version before noticing that it now fails for a subset of commit-graph modes. Apologies for that - I will rework the logic here later to preserve the behavior better. I think (and hope) the rest of the patch series is in good shape though and addressed the previous feedback, so any partial new review feedback would still be appreciated. Thanks, Kristofer ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v3 7/8] commit-reach: terminate merge-base walk when one paint side is exhausted 2026-06-26 14:29 ` Kristofer Karlsson @ 2026-06-26 14:32 ` Derrick Stolee 2026-06-26 16:41 ` Kristofer Karlsson 0 siblings, 1 reply; 84+ messages in thread From: Derrick Stolee @ 2026-06-26 14:32 UTC (permalink / raw) To: Kristofer Karlsson, Kristofer Karlsson via GitGitGadget Cc: git, Elijah Newren On 6/26/2026 10:29 AM, Kristofer Karlsson wrote: > On Fri, 26 Jun 2026 at 15:08, Kristofer Karlsson via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> >> From: Kristofer Karlsson <krka@spotify.com> >> >> - if (min_generation && generation > last_gen) >> + if (generation > last_gen) > > I have to note that I accidentally pushed this version before noticing > that it now fails for a subset of commit-graph modes. > Apologies for that - I will rework the logic here later > to preserve the behavior better. And do we catch this with a test case? I'm hoping that you discovered this error through the test suite, even if you submitted the series a little early. > I think (and hope) the rest of the patch series is in good shape though > and addressed the previous feedback, so any partial new review > feedback would still be appreciated. Thanks for calling this out, as now I can avoid trying to understand this change during my review. Thanks, -Stolee ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v3 7/8] commit-reach: terminate merge-base walk when one paint side is exhausted 2026-06-26 14:32 ` Derrick Stolee @ 2026-06-26 16:41 ` Kristofer Karlsson 0 siblings, 0 replies; 84+ messages in thread From: Kristofer Karlsson @ 2026-06-26 16:41 UTC (permalink / raw) To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren On Fri, 26 Jun 2026 at 16:32, Derrick Stolee <stolee@gmail.com> wrote: > > > I have to note that I accidentally pushed this version before noticing > > that it now fails for a subset of commit-graph modes. > > Apologies for that - I will rework the logic here later > > to preserve the behavior better. > > And do we catch this with a test case? I'm hoping that you discovered > this error through the test suite, even if you submitted the series a > little early. (I missed replying to this message initially, sorry) Yes exactly - it was caught by t6600 but somehow I missed running it before submitting it (so I noticed it in the GGG CI instead) So the existing tests are good, I only wish I could be equally reliable as them. Thanks, Kristofer ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v3 7/8] commit-reach: terminate merge-base walk when one paint side is exhausted 2026-06-26 13:08 ` [PATCH v3 7/8] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget 2026-06-26 14:29 ` Kristofer Karlsson @ 2026-06-26 14:35 ` Derrick Stolee 2026-06-26 14:39 ` Kristofer Karlsson 1 sibling, 1 reply; 84+ messages in thread From: Derrick Stolee @ 2026-06-26 14:35 UTC (permalink / raw) To: Kristofer Karlsson via GitGitGadget, git Cc: Elijah Newren, Kristofer Karlsson On 6/26/2026 9:08 AM, Kristofer Karlsson via GitGitGadget wrote: > From: Kristofer Karlsson <krka@spotify.com> > @@ -140,9 +144,16 @@ static struct commit *paint_queue_get(struct paint_state *state) > > commit->object.flags &= ~ENQUEUED; > > - if (!state->p1_count && !state->p2_count && > - !state->pending_merge_bases) > - return NULL; > + if (!state->pending_merge_bases) { > + /* only stale entries remain */ > + if (!state->p1_count && !state->p2_count) > + return NULL; > + > + /* one side is exhausted */ > + if ((!state->p1_count || !state->p2_count) && > + commit_graph_generation(commit) < GENERATION_NUMBER_INFINITY) > + return NULL; > + } This continues to look correct. > paint_count_update(state, commit->object.flags, -1); > return commit; > @@ -188,7 +199,7 @@ static int paint_down_to_common(struct repository *r, > timestamp_t generation = commit_graph_generation(commit); > steps++; > > - if (min_generation && generation > last_gen) > + if (generation > last_gen) > BUG("bad generation skip %"PRItime" > %"PRItime" at %s", > generation, last_gen, > oid_to_hex(&commit->object.oid)); You mention in your own reply that this is broken. This also looks like a stray change for this patch, so perhaps your end state is correct despite this patch causing failures. Will inspect soon. > - test_paint_down_steps 45 2 25 3 > + test_paint_down_steps 45 1 25 1 ...> - test_paint_down_steps 81 80 81 81 > + test_paint_down_steps 81 9 57 10 These diffs are satisfying. Thanks, -Stolee ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v3 7/8] commit-reach: terminate merge-base walk when one paint side is exhausted 2026-06-26 14:35 ` Derrick Stolee @ 2026-06-26 14:39 ` Kristofer Karlsson 0 siblings, 0 replies; 84+ messages in thread From: Kristofer Karlsson @ 2026-06-26 14:39 UTC (permalink / raw) To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren On Fri, 26 Jun 2026 at 16:35, Derrick Stolee <stolee@gmail.com> wrote: > > > - if (min_generation && generation > last_gen) > > + if (generation > last_gen) > > BUG("bad generation skip %"PRItime" > %"PRItime" at %s", > > generation, last_gen, > > oid_to_hex(&commit->object.oid)); > > You mention in your own reply that this is broken. This also looks > like a stray change for this patch, so perhaps your end state is > correct despite this patch causing failures. Will inspect soon. I did not intend it to be a stray change, but rather a natural followup to the idea that we could fold all of the halt conditions into the same place. I am happy to either revert that part for v4 (to keep the change simpler, but not fully unified) or fix it properly - I think it should be easy since this was just human error, not a sign of a fundamentally tricky problem. > > - test_paint_down_steps 45 2 25 3 > > + test_paint_down_steps 45 1 25 1 > ...> - test_paint_down_steps 81 80 81 81 > > + test_paint_down_steps 81 9 57 10 > These diffs are satisfying. Agreed! It was nice to introduce the steps counter to the test suite, showing that the patch reached its intended goal which is clearer than just having benchmarks in the messages. Thanks again, Kristofer ^ permalink raw reply [flat|nested] 84+ messages in thread
* [PATCH v3 8/8] commit-reach: move min_generation check into paint_queue_get() 2026-06-26 13:07 ` [PATCH v3 0/8] " Kristofer Karlsson via GitGitGadget ` (6 preceding siblings ...) 2026-06-26 13:08 ` [PATCH v3 7/8] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget @ 2026-06-26 13:08 ` Kristofer Karlsson via GitGitGadget 2026-06-26 14:42 ` Derrick Stolee 2026-06-26 16:36 ` [PATCH v3 0/8] commit-reach: terminate merge-base walk when one side is exhausted Junio C Hamano 2026-06-28 12:25 ` [PATCH v4 " Kristofer Karlsson via GitGitGadget 9 siblings, 1 reply; 84+ messages in thread From: Kristofer Karlsson via GitGitGadget @ 2026-06-26 13:08 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Kristofer Karlsson From: Kristofer Karlsson <krka@spotify.com> Consolidate the min_generation termination condition into paint_queue_get(), alongside the existing stale-entry and side-exhaustion checks. Move last_gen into struct paint_state so that commit_graph_generation() is called exactly once per dequeued commit and the result is shared across all termination checks and the monotonicity BUG assertion. The loop body in paint_down_to_common() reads state.last_gen instead of recomputing the generation number. Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- .../technical/paint-down-to-common.adoc | 9 ++++++ commit-reach.c | 31 +++++++++++-------- 2 files changed, 27 insertions(+), 13 deletions(-) diff --git a/Documentation/technical/paint-down-to-common.adoc b/Documentation/technical/paint-down-to-common.adoc index 983dfcf233..eef249f4a4 100644 --- a/Documentation/technical/paint-down-to-common.adoc +++ b/Documentation/technical/paint-down-to-common.adoc @@ -97,6 +97,8 @@ ends when one of the following conditions holds: 3. Side exhaustion: no pure PARENT1 or pure PARENT2 commits remain in the queue, no pending merge-base candidates exist, and the walk has entered the finite-generation region. + 4. Generation cutoff: the dequeued commit's generation is below + a caller-supplied `min_generation` threshold. Stale entry condition ~~~~~~~~~~~~~~~~~~~~~ @@ -121,6 +123,13 @@ time and an exhausted side cannot reappear. In the INFINITY region, commit-date ordering can violate this guarantee, so the check is skipped. +Generation cutoff +~~~~~~~~~~~~~~~~~ +Some callers (notably `remove_redundant()`) supply a `min_generation` +threshold -- the minimum generation of the input commits. No merge +base can have a generation below this threshold, so the walk +terminates as soon as it dequeues such a commit. + Related documentation --------------------- diff --git a/commit-reach.c b/commit-reach.c index 0248d6fedb..0cd785c31b 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -89,6 +89,8 @@ struct paint_state { int p1_count; int p2_count; int pending_merge_bases; + timestamp_t min_generation; + timestamp_t last_gen; }; static void paint_count_update(struct paint_state *state, @@ -138,11 +140,23 @@ static void paint_queue_put(struct paint_state *state, static struct commit *paint_queue_get(struct paint_state *state) { struct commit *commit = prio_queue_get(&state->queue); + timestamp_t generation; if (!commit) return NULL; commit->object.flags &= ~ENQUEUED; + generation = commit_graph_generation(commit); + + if (generation > state->last_gen) + BUG("bad generation skip %"PRItime" > %"PRItime" at %s", + generation, state->last_gen, + oid_to_hex(&commit->object.oid)); + state->last_gen = generation; + + /* generation cutoff */ + if (generation < state->min_generation) + return NULL; if (!state->pending_merge_bases) { /* only stale entries remain */ @@ -151,7 +165,7 @@ static struct commit *paint_queue_get(struct paint_state *state) /* one side is exhausted */ if ((!state->p1_count || !state->p2_count) && - commit_graph_generation(commit) < GENERATION_NUMBER_INFINITY) + generation < GENERATION_NUMBER_INFINITY) return NULL; } @@ -177,9 +191,10 @@ static int paint_down_to_common(struct repository *r, struct commit *commit; int i; int steps = 0; - timestamp_t last_gen = GENERATION_NUMBER_INFINITY; struct commit_list **tail = result; + state.min_generation = min_generation; + state.last_gen = GENERATION_NUMBER_INFINITY; if (!min_generation && !corrected_commit_dates_enabled(r)) state.queue.compare = compare_commits_by_commit_date; @@ -196,18 +211,8 @@ static int paint_down_to_common(struct repository *r, while ((commit = paint_queue_get(&state))) { struct commit_list *parents; int flags; - timestamp_t generation = commit_graph_generation(commit); steps++; - if (generation > last_gen) - BUG("bad generation skip %"PRItime" > %"PRItime" at %s", - generation, last_gen, - oid_to_hex(&commit->object.oid)); - last_gen = generation; - - if (generation < min_generation) - break; - flags = commit->object.flags & (PARENT1 | PARENT2 | STALE); if (flags == (PARENT1 | PARENT2)) { if (!(commit->object.flags & RESULT)) { @@ -219,7 +224,7 @@ static int paint_down_to_common(struct repository *r, * descendant of this one. */ if (!(mb_flags & MERGE_BASE_FIND_ALL) && - generation < GENERATION_NUMBER_INFINITY) + state.last_gen < GENERATION_NUMBER_INFINITY) break; } /* Mark parents of a found merge stale */ -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
* Re: [PATCH v3 8/8] commit-reach: move min_generation check into paint_queue_get() 2026-06-26 13:08 ` [PATCH v3 8/8] commit-reach: move min_generation check into paint_queue_get() Kristofer Karlsson via GitGitGadget @ 2026-06-26 14:42 ` Derrick Stolee 2026-06-26 14:53 ` Kristofer Karlsson 0 siblings, 1 reply; 84+ messages in thread From: Derrick Stolee @ 2026-06-26 14:42 UTC (permalink / raw) To: Kristofer Karlsson via GitGitGadget, git Cc: Elijah Newren, Kristofer Karlsson On 6/26/2026 9:08 AM, Kristofer Karlsson via GitGitGadget wrote: > From: Kristofer Karlsson <krka@spotify.com> > > Consolidate the min_generation termination condition into > paint_queue_get(), alongside the existing stale-entry and > side-exhaustion checks. > > Move last_gen into struct paint_state so that > commit_graph_generation() is called exactly once per dequeued commit > and the result is shared across all termination checks and the > monotonicity BUG assertion. The loop body in paint_down_to_common() > reads state.last_gen instead of recomputing the generation number. Thanks for incorporating this change into this version. > + 4. Generation cutoff: the dequeued commit's generation is below > + a caller-supplied `min_generation` threshold. Technically, this was always a termination condition of the walk, but now we are correcting the documentation to match. It was just not part of the termination in the dequeue method until now. > @@ -89,6 +89,8 @@ struct paint_state { > int p1_count; > int p2_count; > int pending_merge_bases; > + timestamp_t min_generation; > + timestamp_t last_gen; > }; I'm happy that these details are being imported into the struct. My first reaction is that last_gen shouldn't be here because we can see a generation from the dequeued commit. I'll read on to be sure. > static void paint_count_update(struct paint_state *state, > @@ -138,11 +140,23 @@ static void paint_queue_put(struct paint_state *state, > static struct commit *paint_queue_get(struct paint_state *state) > { > struct commit *commit = prio_queue_get(&state->queue); > + timestamp_t generation; > > if (!commit) > return NULL; > > commit->object.flags &= ~ENQUEUED; > + generation = commit_graph_generation(commit); > + > + if (generation > state->last_gen) > + BUG("bad generation skip %"PRItime" > %"PRItime" at %s", > + generation, state->last_gen, > + oid_to_hex(&commit->object.oid)); Oh I see. It's just for this condition. Does this case still break without 'state->min_generation' in the condition? > + state->last_gen = generation; This is an appropriate use of this value. My concerns are no longer valid. Thanks for letting me think out loud. > + /* generation cutoff */ > + if (generation < state->min_generation) > + return NULL; And here's the crux. Again, impossible for this to halt when min_generation is zero. > if (!state->pending_merge_bases) { > /* only stale entries remain */ > @@ -151,7 +165,7 @@ static struct commit *paint_queue_get(struct paint_state *state) > > /* one side is exhausted */ > if ((!state->p1_count || !state->p2_count) && > - commit_graph_generation(commit) < GENERATION_NUMBER_INFINITY) > + generation < GENERATION_NUMBER_INFINITY) > return NULL; Good reuse of the value. > } > > @@ -177,9 +191,10 @@ static int paint_down_to_common(struct repository *r, > struct commit *commit; > int i; > int steps = 0; > - timestamp_t last_gen = GENERATION_NUMBER_INFINITY; > struct commit_list **tail = result; > > + state.min_generation = min_generation; > + state.last_gen = GENERATION_NUMBER_INFINITY; > if (!min_generation && !corrected_commit_dates_enabled(r)) > state.queue.compare = compare_commits_by_commit_date; > > @@ -196,18 +211,8 @@ static int paint_down_to_common(struct repository *r, > while ((commit = paint_queue_get(&state))) { > struct commit_list *parents; > int flags; > - timestamp_t generation = commit_graph_generation(commit); > steps++; > > - if (generation > last_gen) > - BUG("bad generation skip %"PRItime" > %"PRItime" at %s", > - generation, last_gen, > - oid_to_hex(&commit->object.oid)); > - last_gen = generation; > - > - if (generation < min_generation) > - break; > - I'm happy this is getting cleaned up. > flags = commit->object.flags & (PARENT1 | PARENT2 | STALE); > if (flags == (PARENT1 | PARENT2)) { > if (!(commit->object.flags & RESULT)) { > @@ -219,7 +224,7 @@ static int paint_down_to_common(struct repository *r, > * descendant of this one. > */ > if (!(mb_flags & MERGE_BASE_FIND_ALL) && > - generation < GENERATION_NUMBER_INFINITY) > + state.last_gen < GENERATION_NUMBER_INFINITY) > break; > } > /* Mark parents of a found merge stale */ And here's another termination condition. We are now leaking the abstraction of the 'state.last_gen' which give me some bad feelings. We are getting to the point where I'd leave such a thing for a follow-up, but since you are needing to re-roll, then this is another case where we can move this into the paint_queue_get(). I don't think this is me "raising the bar" from earlier recommendations, because I was asking for all loop termination to be in the get() method, if possible. But also: I'm not looking at the full method right now to see if terminating _at this location in the loop_ is critical. So it may very well be impossible to move this into the get() call, in which case please ignore this suggestion and use state.last_gen. Thanks, -Stolee ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v3 8/8] commit-reach: move min_generation check into paint_queue_get() 2026-06-26 14:42 ` Derrick Stolee @ 2026-06-26 14:53 ` Kristofer Karlsson 2026-06-26 14:58 ` Derrick Stolee 0 siblings, 1 reply; 84+ messages in thread From: Kristofer Karlsson @ 2026-06-26 14:53 UTC (permalink / raw) To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren On Fri, 26 Jun 2026 at 16:42, Derrick Stolee <stolee@gmail.com> wrote: > > > + 4. Generation cutoff: the dequeued commit's generation is below > > + a caller-supplied `min_generation` threshold. > > Technically, this was always a termination condition of the walk, > but now we are correcting the documentation to match. It was just > not part of the termination in the dequeue method until now. You're right, I should perhaps fold it into the first patch instead, which would be logically more accurate. Would be an easy thing to fix for a v4. > > flags = commit->object.flags & (PARENT1 | PARENT2 | STALE); > > if (flags == (PARENT1 | PARENT2)) { > > if (!(commit->object.flags & RESULT)) { > > @@ -219,7 +224,7 @@ static int paint_down_to_common(struct repository *r, > > * descendant of this one. > > */ > > if (!(mb_flags & MERGE_BASE_FIND_ALL) && > > - generation < GENERATION_NUMBER_INFINITY) > > + state.last_gen < GENERATION_NUMBER_INFINITY) > > break; > > } > > /* Mark parents of a found merge stale */ > > And here's another termination condition. We are now leaking the > abstraction of the 'state.last_gen' which give me some bad feelings. Yes, this is one of the minor annoyances I also noticed, but it's not too bad. I think a followup could be to either: 1. remove this optimization entirely (though I will have to spend some time reasoning if there are realistic use cases where this would trigger much earlier than side exhaustion. 2. tweak the logic to instead halting on exactly this commit, instead halt inside paint_queue_get if: generation < INFINITY && !FIND_ALL && num_results >= 1 This would change the semantics slightly (but for the better?) in the the found merge-base could be in the infinite region but near the finite region and thus would unlock the optimization as soon as we pass that boundary. But I did not want to include that change in this series, which is perhaps already getting too complex. > We are getting to the point where I'd leave such a thing for a > follow-up, but since you are needing to re-roll, then this is > another case where we can move this into the paint_queue_get(). I > don't think this is me "raising the bar" from earlier recommendations, > because I was asking for all loop termination to be in the get() > method, if possible. > > But also: I'm not looking at the full method right now to see if > terminating _at this location in the loop_ is critical. So it may > very well be impossible to move this into the get() call, in which > case please ignore this suggestion and use state.last_gen. I think it's not critical (as I mentioned above) and I think I will need to follow up on this later. Thanks, Kristofer ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v3 8/8] commit-reach: move min_generation check into paint_queue_get() 2026-06-26 14:53 ` Kristofer Karlsson @ 2026-06-26 14:58 ` Derrick Stolee 0 siblings, 0 replies; 84+ messages in thread From: Derrick Stolee @ 2026-06-26 14:58 UTC (permalink / raw) To: Kristofer Karlsson Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren On 6/26/2026 10:53 AM, Kristofer Karlsson wrote: > On Fri, 26 Jun 2026 at 16:42, Derrick Stolee <stolee@gmail.com> wrote: >> >>> + 4. Generation cutoff: the dequeued commit's generation is below >>> + a caller-supplied `min_generation` threshold. >> >> Technically, this was always a termination condition of the walk, >> but now we are correcting the documentation to match. It was just >> not part of the termination in the dequeue method until now. > > You're right, I should perhaps fold it into the first patch instead, > which would be logically more accurate. Would be an easy thing > to fix for a v4. And the remaining condition exposed in this diff isn't included, either: >>> flags = commit->object.flags & (PARENT1 | PARENT2 | STALE); >>> if (flags == (PARENT1 | PARENT2)) { >>> if (!(commit->object.flags & RESULT)) { >>> @@ -219,7 +224,7 @@ static int paint_down_to_common(struct repository *r, >>> * descendant of this one. >>> */ >>> if (!(mb_flags & MERGE_BASE_FIND_ALL) && >>> - generation < GENERATION_NUMBER_INFINITY) >>> + state.last_gen < GENERATION_NUMBER_INFINITY) >>> break; >>> } >>> /* Mark parents of a found merge stale */ >> >> And here's another termination condition. We are now leaking the >> abstraction of the 'state.last_gen' which give me some bad feelings. > > Yes, this is one of the minor annoyances I also noticed, > but it's not too bad. I think a followup could be to either: > > 1. remove this optimization entirely (though I will have to spend > some time reasoning if there are realistic use cases where this > would trigger much earlier than side exhaustion. > > 2. tweak the logic to instead halting on exactly this commit, > instead halt inside paint_queue_get if: > generation < INFINITY && !FIND_ALL && num_results >= 1 > This would change the semantics slightly (but for the better?) > in the the found merge-base could be in the infinite region but > near the finite region and thus would unlock the optimization > as soon as we pass that boundary. But I did not want to include > that change in this series, which is perhaps already getting > too complex. I'm happy to keep this one out of the series. Perhaps it would be good for you to finish this series with the current scope and then for another contributor (me, probably) to do another round of cleanup/reaction on top. I only say "another contributor" because new eyes can help to see new things, outside of a patch diff. >> We are getting to the point where I'd leave such a thing for a >> follow-up, but since you are needing to re-roll, then this is >> another case where we can move this into the paint_queue_get(). I >> don't think this is me "raising the bar" from earlier recommendations, >> because I was asking for all loop termination to be in the get() >> method, if possible. >> >> But also: I'm not looking at the full method right now to see if >> terminating _at this location in the loop_ is critical. So it may >> very well be impossible to move this into the get() call, in which >> case please ignore this suggestion and use state.last_gen. > > I think it's not critical (as I mentioned above) and I think I will > need to follow up on this later. Sounds good. -Stolee ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v3 0/8] commit-reach: terminate merge-base walk when one side is exhausted 2026-06-26 13:07 ` [PATCH v3 0/8] " Kristofer Karlsson via GitGitGadget ` (7 preceding siblings ...) 2026-06-26 13:08 ` [PATCH v3 8/8] commit-reach: move min_generation check into paint_queue_get() Kristofer Karlsson via GitGitGadget @ 2026-06-26 16:36 ` Junio C Hamano 2026-06-26 16:43 ` Kristofer Karlsson 2026-06-28 12:25 ` [PATCH v4 " Kristofer Karlsson via GitGitGadget 9 siblings, 1 reply; 84+ messages in thread From: Junio C Hamano @ 2026-06-26 16:36 UTC (permalink / raw) To: Kristofer Karlsson via GitGitGadget Cc: git, Derrick Stolee, Elijah Newren, Kristofer Karlsson "Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com> writes: > Changes since v2: > > * New patch 8/8: moved the min_generation termination check and the > last_gen monotonicity assertion into paint_queue_get(), consolidating > halt conditions. commit_graph_generation() is now called once per > dequeued commit and shared across all checks. > > * Widened the generation-monotonicity BUG assertion to fire > unconditionally, not only when min_generation is set. The side-exhaustion > optimization depends on correct generation ordering, so the assertion > should always be active. This is a behavior change: the BUG() now fires > for any generation ordering violation, regardless of the caller. > > * Moved all halt conditions inside paint_queue_get() with the "pop first" > form: pop, check, then decrement counters. This keeps the optimization > commit's diff minimal (just inserting the new checks between pop and > decrement). > > * Shortened the doc comment on paint_queue_get() to describe what it does > rather than how. Inline comments on each return NULL explain the specific > halt condition. > > * Replaced the manual commit-graph setup in the step-count test with > run_all_modes, which now sets GIT_TRACE2_EVENT per mode and produces > trace-mode-{none,full,half,no-gdat}.txt files. > > * Added a test_paint_down_steps helper for concise 4-mode step assertions > with diagnostic output on mismatch (prints "expected X, got Y" instead of > a silent grep failure). > > * Added step-count assertions to the single-walk edge-case tests: > in_merge_bases_many:self, pending-stale, infinity-both-sides, > mixed-finite-infinity. > > * Included step counts alongside wall-clock times in the benchmark tables. I am getting this failure standalone, when applied on the same base as where v2 was applied earlier. For now I'll eject it from 'seen'. expecting success of 6600.12 'get_merge_bases_many': cat >input <<-\EOF && A:commit-5-7 X:commit-4-8 X:commit-6-6 X:commit-8-3 EOF { echo "get_merge_bases_many(A,X):" && git rev-parse commit-5-6 \ commit-4-7 | sort } >expect && test_all_modes get_merge_bases_many BUG: commit-reach.c:152: bad generation skip 9 > 8 at 772be737f220103f706a39013c0d115009feefec not ok 12 - get_merge_bases_many # # cat >input <<-\EOF && # A:commit-5-7 # X:commit-4-8 ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v3 0/8] commit-reach: terminate merge-base walk when one side is exhausted 2026-06-26 16:36 ` [PATCH v3 0/8] commit-reach: terminate merge-base walk when one side is exhausted Junio C Hamano @ 2026-06-26 16:43 ` Kristofer Karlsson 2026-06-26 18:43 ` Junio C Hamano 0 siblings, 1 reply; 84+ messages in thread From: Kristofer Karlsson @ 2026-06-26 16:43 UTC (permalink / raw) To: Junio C Hamano Cc: Kristofer Karlsson via GitGitGadget, git, Derrick Stolee, Elijah Newren On Fri, 26 Jun 2026 at 18:36, Junio C Hamano <gitster@pobox.com> wrote: > > I am getting this failure standalone, when applied on the same base > as where v2 was applied earlier. For now I'll eject it from 'seen'. > Yes, I am sorry about that. I submitted it after having missed running t6600 and accidentally having introduced a bug. I tried to self-report it to avoid wasting your time, but I only did so in the relevant v3 patch and not this main message. I will be more careful for v4 (and onwards) Thanks, Kristofer ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v3 0/8] commit-reach: terminate merge-base walk when one side is exhausted 2026-06-26 16:43 ` Kristofer Karlsson @ 2026-06-26 18:43 ` Junio C Hamano 0 siblings, 0 replies; 84+ messages in thread From: Junio C Hamano @ 2026-06-26 18:43 UTC (permalink / raw) To: Kristofer Karlsson Cc: Kristofer Karlsson via GitGitGadget, git, Derrick Stolee, Elijah Newren Kristofer Karlsson <krka@spotify.com> writes: > On Fri, 26 Jun 2026 at 18:36, Junio C Hamano <gitster@pobox.com> wrote: >> >> I am getting this failure standalone, when applied on the same base >> as where v2 was applied earlier. For now I'll eject it from 'seen'. >> > > Yes, I am sorry about that. I submitted it after having missed running t6600 > and accidentally having introduced a bug. I tried to self-report it to avoid > wasting your time, but I only did so in the relevant v3 patch and > not this main message. Ah, I see. Mistakes happen, and do not need to rush, as collaboration is asynchronous around here anyway, and we may read our e-mails in different order ;-) Thanks. It would have been a more troubling experience if only my set-up were seeing the issue, but I am glad that is not the case. ^ permalink raw reply [flat|nested] 84+ messages in thread
* [PATCH v4 0/8] commit-reach: terminate merge-base walk when one side is exhausted 2026-06-26 13:07 ` [PATCH v3 0/8] " Kristofer Karlsson via GitGitGadget ` (8 preceding siblings ...) 2026-06-26 16:36 ` [PATCH v3 0/8] commit-reach: terminate merge-base walk when one side is exhausted Junio C Hamano @ 2026-06-28 12:25 ` Kristofer Karlsson via GitGitGadget 2026-06-28 12:25 ` [PATCH v4 1/8] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget ` (7 more replies) 9 siblings, 8 replies; 84+ messages in thread From: Kristofer Karlsson via GitGitGadget @ 2026-06-28 12:25 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson commit-reach: terminate merge-base walk when one paint side is exhausted Optimize paint_down_to_common() for merge-base queries that hit large one-sided histories. When the walk from one side reaches a commit with a very low generation number that the other side never paints, the walk is forced to drain most of the graph. A common trigger is a repository import that grafts a separate history with its own root, but any merge that introduces a low-generation commit never painted by the other side has the same effect. A new merge-base candidate can only be discovered when exclusive PARENT1 and PARENT2 paint meet. This series teaches paint_down_to_common() to stop as soon as one side has no exclusive commits left in the queue; once one side is exhausted, no further candidates can appear. origin/HEAD o o PR HEAD | | (import) o : / \ / | o merge-base | | : : (~2.5M commits) | | import root main root In the RFC thread [1], Derrick Stolee provided a criss-cross counterexample that sharpened the halt condition, and Elijah Newren independently discovered the same optimization and shared an implementation in PR #2150 [2]. Patches 2-3 incorporate test cases from Elijah's branch. This series implements the optimization only after the walk enters the finite-generation region, where generation ordering guarantees that paint on visited commits is final. Patch layout: 1/8 Documentation/technical: add paint-down-to-common doc 2/8 t6600: add test cases for side-exhaustion edge cases 3/8 t6099, t6600: add side-exhaustion regression tests 4/8 commit-reach: add trace2 instrumentation to paint_down_to_common() 5/8 commit-reach: introduce struct paint_state with per-side counters 6/8 commit-reach: remove unused nonstale_queue dedup wrappers 7/8 commit-reach: terminate merge-base walk when one paint side is exhausted 8/8 commit-reach: move min_generation check into paint_queue_get() Benchmarks Step counts are deterministic (measured via trace2_data_intmax added in patch 4). Wall-clock times are best-of-11 runs. 2.6M-commit monorepo with commit-graph: steps wall-clock merge-base --all (across import) 2143438 -> 3 3.67s -> 5ms merge-base --all (1000 apart) 2692915 -> 1035 4.41s -> 7ms merge-base --all (5000 apart) 2692915 -> 6401 4.45s -> 13ms merge-base --all (HEAD vs import) 2698872 -> 45960 4.50s -> 79ms merge-tree (across import) 2143438 -> 3 4.42s -> 11ms git.git (88k commits, commit-graph): steps wall-clock merge-base --all v2.0.0 v2.55.0-rc1 72264 -> 44589 110ms -> 68ms merge-base --all HEAD HEAD~1000 9891 -> 3828 18ms -> 10ms merge-base --all HEAD HEAD~10000 72303 -> 41487 101ms -> 50ms Changes since v3: * Fixed BUG assertion that was accidentally made unconditional in v3: restored the min_generation guard so it only fires when generation-based ordering is active. * Moved generation cutoff and single-result termination conditions into the documentation in patch 1/8, since they describe existing behavior. * Renamed paint_state counter fields for clarity: p1_count -> parent1_count, p2_count -> parent2_count, pending_merge_bases -> mb_candidate_count. Changed counter types from int to size_t. (Suggested by Rene Scharfe.) Changes since v2: * New patch 8/8: moved the min_generation termination check and the last_gen monotonicity assertion into paint_queue_get(), consolidating halt conditions. commit_graph_generation() is now called once per dequeued commit and shared across all checks. * Moved all halt conditions inside paint_queue_get() with the "pop first" form: pop, check, then decrement counters. This keeps the optimization commit's diff minimal (just inserting the new checks between pop and decrement). * Shortened the doc comment on paint_queue_get() to describe what it does rather than how. Inline comments on each return NULL explain the specific halt condition. * Replaced the manual commit-graph setup in the step-count test with run_all_modes, which now sets GIT_TRACE2_EVENT per mode and produces trace-mode-{none,full,half,no-gdat}.txt files. * Added a test_paint_down_steps helper for concise 4-mode step assertions with diagnostic output on mismatch (prints "expected X, got Y" instead of a silent grep failure). * Added step-count assertions to the single-walk edge-case tests: in_merge_bases_many:self, pending-stale, infinity-both-sides, mixed-finite-infinity. * Included step counts alongside wall-clock times in the benchmark tables. Changes since v1: * Reordered patches: documentation first (describing the existing algorithm), tests before code changes, so they demonstrate passing with old logic first. * Dropped the ahead_behind decoupling patch. paint_state is now a NEW struct alongside nonstale_queue instead of replacing it. ahead_behind() is completely untouched. * Removed nonstale_queue_put_dedup() and nonstale_queue_get_dedup() (dead code after the conversion) in a separate commit. * Renamed: struct paint_queue -> paint_state, field pq -> queue, paint_count_add/remove -> paint_count_update (single function with signed delta parameter). * Split the old paint_count_transition (which handled both old and new flags in one call) into separate remove/add calls with a signed delta. This eliminates the need for the case 0 handler (which tracked "not in the queue") and allows an exhaustive switch on (PARENT1 | PARENT2 | STALE) that documents all valid flag combinations, with BUG() in default. * Added trace2_data_intmax() instrumentation to report the number of commits visited per paint walk (separate commit), with deterministic step-count assertions in t6600. * Expanded switch statements to multi-line format per .clang-format. * Used !count style throughout instead of count == 0. * Updated technical documentation alongside code changes. [1] https://lore.kernel.org/git/CAL71e4Ps-2_0+uuZu43N9pFnXBemoAohPs_eyRJf8taXHJPAXQ@mail.gmail.com/T/#u [2] https://github.com/gitgitgadget/git/pull/2150 Elijah Newren (1): t6600: add test cases for side-exhaustion edge cases Kristofer Karlsson (7): Documentation/technical: add paint-down-to-common doc t6099, t6600: add side-exhaustion regression tests commit-reach: add trace2 instrumentation to paint_down_to_common() commit-reach: introduce struct paint_state with per-side counters commit-reach: remove unused nonstale_queue dedup wrappers commit-reach: terminate merge-base walk when one paint side is exhausted commit-reach: move min_generation check into paint_queue_get() Documentation/Makefile | 1 + Documentation/technical/meson.build | 1 + .../technical/paint-down-to-common.adoc | 149 ++++++++++++++ commit-reach.c | 147 ++++++++++---- t/meson.build | 1 + t/t6099-merge-base-side-exhaustion.sh | 82 ++++++++ t/t6600-test-reach.sh | 181 ++++++++++++++++-- 7 files changed, 513 insertions(+), 49 deletions(-) create mode 100644 Documentation/technical/paint-down-to-common.adoc create mode 100755 t/t6099-merge-base-side-exhaustion.sh base-commit: 6c3d7b73556db708feb3b16232fab1efc4353428 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2149%2Fspkrka%2Fside-exhaust-pr-v4 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2149/spkrka/side-exhaust-pr-v4 Pull-Request: https://github.com/gitgitgadget/git/pull/2149 Range-diff vs v3: 1: 2593866bce ! 1: 3efb095b03 Documentation/technical: add paint-down-to-common doc @@ Documentation/technical/paint-down-to-common.adoc (new) + 1. The queue is empty. + 2. `max_nonstale` has been dequeued, meaning the queue only contains + STALE entries. ++ 3. Generation cutoff: the dequeued commit's generation is below ++ a caller-supplied `min_generation` threshold. ++ 4. Single result: the caller only needs one merge base, one has ++ been found, and the walk has entered the finite-generation ++ region. + +Stale entry condition +~~~~~~~~~~~~~~~~~~~~~ @@ Documentation/technical/paint-down-to-common.adoc (new) +`remove_redundant()` handles that as a post-processing step, so it +is safe to exit early. + ++Generation cutoff ++~~~~~~~~~~~~~~~~~ ++Some callers (notably `remove_redundant()`) supply a `min_generation` ++threshold -- the minimum generation of the input commits. No merge ++base can have a generation below this threshold, so the walk ++terminates as soon as it dequeues such a commit. ++ ++Single result ++~~~~~~~~~~~~~ ++When only one merge base is needed and the walk is in the ++finite-generation region, the first candidate found is necessarily ++the highest-generation common ancestor. No remaining commit in the ++queue can be a descendant of this candidate (generation ordering ++guarantees children are visited first), so it cannot be redundant ++and the walk can stop immediately. ++ +Related documentation +--------------------- + 2: 9efc084850 = 2: 1a0154b406 t6600: add test cases for side-exhaustion edge cases 3: 14b0d86b93 = 3: 017bf156c5 t6099, t6600: add side-exhaustion regression tests 4: 2592264cda = 4: df3b090a2b commit-reach: add trace2 instrumentation to paint_down_to_common() 5: e82e0c72b6 ! 5: fed9f2c368 commit-reach: introduce struct paint_state with per-side counters @@ Documentation/technical/paint-down-to-common.adoc: re-enqueued is bounded by the - 2. `max_nonstale` has been dequeued, meaning the queue only contains - STALE entries. + 2. The queue contains only stale entries. - - Stale entry condition - ~~~~~~~~~~~~~~~~~~~~~ + 3. Generation cutoff: the dequeued commit's generation is below + a caller-supplied `min_generation` threshold. + 4. Single result: the caller only needs one merge base, one has ## commit-reach.c ## @@ commit-reach.c: static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue) @@ commit-reach.c: static struct commit *nonstale_queue_get_dedup(struct nonstale_q + */ +struct paint_state { + struct prio_queue queue; -+ int p1_count; -+ int p2_count; -+ int pending_merge_bases; ++ size_t parent1_count; ++ size_t parent2_count; ++ size_t mb_candidate_count; +}; + +static void paint_count_update(struct paint_state *state, @@ commit-reach.c: static struct commit *nonstale_queue_get_dedup(struct nonstale_q +{ + switch (flags & (PARENT1 | PARENT2 | STALE)) { + case PARENT1: -+ state->p1_count += delta; ++ state->parent1_count += delta; + break; + + case PARENT2: -+ state->p2_count += delta; ++ state->parent2_count += delta; + break; + + case PARENT1 | PARENT2: -+ state->pending_merge_bases += delta; ++ state->mb_candidate_count += delta; + break; + + case PARENT1 | PARENT2 | STALE: @@ commit-reach.c: static struct commit *nonstale_queue_get_dedup(struct nonstale_q + + commit->object.flags &= ~ENQUEUED; + -+ if (!state->p1_count && !state->p2_count && -+ !state->pending_merge_bases) ++ if (!state->parent1_count && !state->parent2_count && ++ !state->mb_candidate_count) + return NULL; + + paint_count_update(state, commit->object.flags, -1); 6: e6181bf3c1 = 6: 4db485b48a commit-reach: remove unused nonstale_queue dedup wrappers 7: f3572a8a89 ! 7: 4506780649 commit-reach: terminate merge-base walk when one paint side is exhausted @@ Commit message commit-reach: terminate merge-base walk when one paint side is exhausted Add an early termination check to paint_down_to_common() using the - per-side counters introduced earlier. Once the walk enters the + per-side counters introduced earlier. Once the walk enters the finite-generation region, terminate early when one side's exclusive count drops to zero -- no new merge-base can form without both paint sides meeting. @@ Commit message The INFINITY gate ensures correctness: commits without a commit-graph entry have GENERATION_NUMBER_INFINITY and are ordered by commit date, - which is not topologically reliable. The optimization only fires + which is not topologically reliable. The optimization only fires once the walk enters the finite-generation region where ordering guarantees hold. - Widen the existing generation-monotonicity BUG assertion to fire - unconditionally, not only when min_generation is set. The - side-exhaustion optimization depends on correct generation ordering, - so the assertion should always be active. - Step counts measured with trace2 on git.git with commit-graph: merge-base --all v2.0.0 v2.55.0-rc1: @@ Commit message ## Documentation/technical/paint-down-to-common.adoc ## @@ Documentation/technical/paint-down-to-common.adoc: ends when one of the following conditions holds: - - 1. The queue is empty. - 2. The queue contains only stale entries. -+ 3. Side exhaustion: no pure PARENT1 or pure PARENT2 commits + 4. Single result: the caller only needs one merge base, one has + been found, and the walk has entered the finite-generation + region. ++ 5. Side exhaustion: no pure PARENT1 or pure PARENT2 commits + remain in the queue, no pending merge-base candidates exist, + and the walk has entered the finite-generation region. @@ Documentation/technical/paint-down-to-common.adoc: existing candidates by provin +commit-date ordering can violate this guarantee, so the check is +skipped. + - Related documentation - --------------------- - + Generation cutoff + ~~~~~~~~~~~~~~~~~ + Some callers (notably `remove_redundant()`) supply a `min_generation` ## commit-reach.c ## @@ commit-reach.c: static void paint_queue_put(struct paint_state *state, @@ commit-reach.c: static struct commit *paint_queue_get(struct paint_state *state) commit->object.flags &= ~ENQUEUED; -- if (!state->p1_count && !state->p2_count && -- !state->pending_merge_bases) +- if (!state->parent1_count && !state->parent2_count && +- !state->mb_candidate_count) - return NULL; -+ if (!state->pending_merge_bases) { ++ if (!state->mb_candidate_count) { + /* only stale entries remain */ -+ if (!state->p1_count && !state->p2_count) ++ if (!state->parent1_count && !state->parent2_count) + return NULL; + + /* one side is exhausted */ -+ if ((!state->p1_count || !state->p2_count) && ++ if ((!state->parent1_count || !state->parent2_count) && + commit_graph_generation(commit) < GENERATION_NUMBER_INFINITY) + return NULL; + } paint_count_update(state, commit->object.flags, -1); return commit; -@@ commit-reach.c: static int paint_down_to_common(struct repository *r, - timestamp_t generation = commit_graph_generation(commit); - steps++; - -- if (min_generation && generation > last_gen) -+ if (generation > last_gen) - BUG("bad generation skip %"PRItime" > %"PRItime" at %s", - generation, last_gen, - oid_to_hex(&commit->object.oid)); ## t/t6600-test-reach.sh ## @@ t/t6600-test-reach.sh: test_expect_success 'in_merge_bases_many:self' ' 8: 4b9f192d98 ! 8: 8dd15d44e6 commit-reach: move min_generation check into paint_queue_get() @@ Commit message Move last_gen into struct paint_state so that commit_graph_generation() is called exactly once per dequeued commit and the result is shared across all termination checks and the - monotonicity BUG assertion. The loop body in paint_down_to_common() - reads state.last_gen instead of recomputing the generation number. + monotonicity BUG assertion. Signed-off-by: Kristofer Karlsson <krka@spotify.com> - ## Documentation/technical/paint-down-to-common.adoc ## -@@ Documentation/technical/paint-down-to-common.adoc: ends when one of the following conditions holds: - 3. Side exhaustion: no pure PARENT1 or pure PARENT2 commits - remain in the queue, no pending merge-base candidates exist, - and the walk has entered the finite-generation region. -+ 4. Generation cutoff: the dequeued commit's generation is below -+ a caller-supplied `min_generation` threshold. - - Stale entry condition - ~~~~~~~~~~~~~~~~~~~~~ -@@ Documentation/technical/paint-down-to-common.adoc: time and an exhausted side cannot reappear. In the INFINITY region, - commit-date ordering can violate this guarantee, so the check is - skipped. - -+Generation cutoff -+~~~~~~~~~~~~~~~~~ -+Some callers (notably `remove_redundant()`) supply a `min_generation` -+threshold -- the minimum generation of the input commits. No merge -+base can have a generation below this threshold, so the walk -+terminates as soon as it dequeues such a commit. -+ - Related documentation - --------------------- - - ## commit-reach.c ## @@ commit-reach.c: struct paint_state { - int p1_count; - int p2_count; - int pending_merge_bases; + size_t parent1_count; + size_t parent2_count; + size_t mb_candidate_count; + timestamp_t min_generation; + timestamp_t last_gen; }; @@ commit-reach.c: static void paint_queue_put(struct paint_state *state, commit->object.flags &= ~ENQUEUED; + generation = commit_graph_generation(commit); + -+ if (generation > state->last_gen) ++ if (state->min_generation && generation > state->last_gen) + BUG("bad generation skip %"PRItime" > %"PRItime" at %s", + generation, state->last_gen, + oid_to_hex(&commit->object.oid)); @@ commit-reach.c: static void paint_queue_put(struct paint_state *state, + if (generation < state->min_generation) + return NULL; - if (!state->pending_merge_bases) { + if (!state->mb_candidate_count) { /* only stale entries remain */ @@ commit-reach.c: static struct commit *paint_queue_get(struct paint_state *state) /* one side is exhausted */ - if ((!state->p1_count || !state->p2_count) && + if ((!state->parent1_count || !state->parent2_count) && - commit_graph_generation(commit) < GENERATION_NUMBER_INFINITY) + generation < GENERATION_NUMBER_INFINITY) return NULL; @@ commit-reach.c: static int paint_down_to_common(struct repository *r, - timestamp_t generation = commit_graph_generation(commit); steps++; -- if (generation > last_gen) +- if (min_generation && generation > last_gen) - BUG("bad generation skip %"PRItime" > %"PRItime" at %s", - generation, last_gen, - oid_to_hex(&commit->object.oid)); -- gitgitgadget ^ permalink raw reply [flat|nested] 84+ messages in thread
* [PATCH v4 1/8] Documentation/technical: add paint-down-to-common doc 2026-06-28 12:25 ` [PATCH v4 " Kristofer Karlsson via GitGitGadget @ 2026-06-28 12:25 ` Kristofer Karlsson via GitGitGadget 2026-06-28 12:25 ` [PATCH v4 2/8] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget ` (6 subsequent siblings) 7 siblings, 0 replies; 84+ messages in thread From: Kristofer Karlsson via GitGitGadget @ 2026-06-28 12:25 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Kristofer Karlsson From: Kristofer Karlsson <krka@spotify.com> Add a technical document describing the paint_down_to_common() algorithm used for merge-base computation, covering the paint walk, generation number regions, and termination conditions. Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- Documentation/Makefile | 1 + Documentation/technical/meson.build | 1 + .../technical/paint-down-to-common.adoc | 135 ++++++++++++++++++ commit-reach.c | 6 +- 4 files changed, 142 insertions(+), 1 deletion(-) create mode 100644 Documentation/technical/paint-down-to-common.adoc diff --git a/Documentation/Makefile b/Documentation/Makefile index 2699f0b24a..f8dea4b395 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -129,6 +129,7 @@ TECH_DOCS += technical/long-running-process-protocol TECH_DOCS += technical/multi-pack-index TECH_DOCS += technical/packfile-uri TECH_DOCS += technical/pack-heuristics +TECH_DOCS += technical/paint-down-to-common TECH_DOCS += technical/parallel-checkout TECH_DOCS += technical/partial-clone TECH_DOCS += technical/platform-support diff --git a/Documentation/technical/meson.build b/Documentation/technical/meson.build index ec07088c57..9ce11d5e48 100644 --- a/Documentation/technical/meson.build +++ b/Documentation/technical/meson.build @@ -18,6 +18,7 @@ articles = [ 'multi-pack-index.adoc', 'packfile-uri.adoc', 'pack-heuristics.adoc', + 'paint-down-to-common.adoc', 'parallel-checkout.adoc', 'partial-clone.adoc', 'platform-support.adoc', diff --git a/Documentation/technical/paint-down-to-common.adoc b/Documentation/technical/paint-down-to-common.adoc new file mode 100644 index 0000000000..a4dfcba038 --- /dev/null +++ b/Documentation/technical/paint-down-to-common.adoc @@ -0,0 +1,135 @@ +Merge-Base Computation and paint_down_to_common() +================================================== + +The function `paint_down_to_common()` in `commit-reach.c` computes merge +bases by walking the commit graph backwards from two sets of tips and +finding where their ancestry meets. + +Use cases +--------- + +Computing merge bases is used in two different ways: + + 1. *Finding all merge bases* (`merge-base --all`, `merge-tree`, + `merge`, `rebase`). A merge base is a common ancestor that is + not itself an ancestor of another common ancestor. + + 2. *Ancestry checks* (`in_merge_bases`, used by `merge-base + --is-ancestor`, `branch -d`, `fetch`). These ask: "is commit A + an ancestor of commit B?" If a common ancestor equals one of the + inputs, that input is necessarily the only merge base -- no other + common ancestor can be both as recent and not an ancestor of it. + +Both use cases share the same algorithm and implementation. + +Algorithm +--------- + +Given a commit `one` and a set of commits `twos[]`, the walk paints +commits with two colors: + + - PARENT1: reachable from `one` + - PARENT2: reachable from any commit in `twos[]` + +The walk uses a priority queue ordered by generation number (falling +back to commit date when generation numbers are unavailable). Each +step dequeues the highest-priority commit (this is when we say a +commit is "visited") and propagates its paint flags to its parents, +enqueuing them if they gained new flags. When a commit receives +both PARENT1 and PARENT2, it is a merge-base candidate. A candidate +gains the STALE flag so its ancestors propagate staleness -- any +deeper common ancestor is necessarily redundant. + +INFINITY and finite generation regions +-------------------------------------- + +The commit-graph stores a generation number for each commit. Commits +not in the commit-graph have generation `GENERATION_NUMBER_INFINITY`. The +graph is closed under reachability: if a commit is in the graph, all +its ancestors are too. This partitions the commit graph into two regions: + +.... + +---------------------------------------+ + | INFINITY region | + | generation = INFINITY | + | queue order: heuristic (commit date) | + +---------------------------------------+ + | + v + +---------------------------------------+ + | Finite region | + | generation = finite | + | queue order: topological | + +---------------------------------------+ +.... + +When the commit-graph is enabled, the INFINITY region is typically +very small -- it only contains commits added since the last +commit-graph refresh. + +All reachable INFINITY-generation commits are visited before any +finite-generation commit, because INFINITY is larger than any finite +value. Once the walk crosses into the finite region, it stays there. + +In the finite region, generation ordering guarantees topological +traversal: children are always visited before their parents. This +means that paint on already-visited commits is final -- no future +traversal step can add paint to them. + +In the INFINITY region, commit-date ordering can violate this: a +parent with a later date can be visited before a child with an earlier +date. Paint flags are therefore NOT final at visit time, and a +commit visited with only one side's paint may later gain the other. + +Paint flags are only added, never removed. Since each flag can be set +at most once per commit, the number of times a commit can be +re-enqueued is bounded by the number of flag transitions. + +Termination +----------- + +The walk uses a `nonstale_queue` wrapper around `prio_queue` that +tracks `max_nonstale`: the lowest-priority non-stale commit enqueued +so far. Once that commit is dequeued, every remaining entry is known +to be STALE and the loop terminates. Specifically, the main loop +ends when one of the following conditions holds: + + 1. The queue is empty. + 2. `max_nonstale` has been dequeued, meaning the queue only contains + STALE entries. + 3. Generation cutoff: the dequeued commit's generation is below + a caller-supplied `min_generation` threshold. + 4. Single result: the caller only needs one merge base, one has + been found, and the walk has entered the finite-generation + region. + +Stale entry condition +~~~~~~~~~~~~~~~~~~~~~ +Once all queued entries are stale, no new merge-base candidates can +be discovered -- that requires at least one non-stale commit from +each side meeting. Continuing the walk could still invalidate +existing candidates by proving one is an ancestor of another, but +`remove_redundant()` handles that as a post-processing step, so it +is safe to exit early. + +Generation cutoff +~~~~~~~~~~~~~~~~~ +Some callers (notably `remove_redundant()`) supply a `min_generation` +threshold -- the minimum generation of the input commits. No merge +base can have a generation below this threshold, so the walk +terminates as soon as it dequeues such a commit. + +Single result +~~~~~~~~~~~~~ +When only one merge base is needed and the walk is in the +finite-generation region, the first candidate found is necessarily +the highest-generation common ancestor. No remaining commit in the +queue can be a descendant of this candidate (generation ordering +guarantees children are visited first), so it cannot be redundant +and the walk can stop immediately. + +Related documentation +--------------------- + + - `Documentation/technical/commit-graph.adoc` -- generation numbers + and the reachability closure property. diff --git a/commit-reach.c b/commit-reach.c index 5df471a313..a9483759e0 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -96,7 +96,11 @@ static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue) return commit; } -/* all input commits in one and twos[] must have been parsed! */ +/* + * See Documentation/technical/paint-down-to-common.adoc + * + * All input commits in one and twos[] must have been parsed! + */ static int paint_down_to_common(struct repository *r, struct commit *one, int n, struct commit **twos, -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v4 2/8] t6600: add test cases for side-exhaustion edge cases 2026-06-28 12:25 ` [PATCH v4 " Kristofer Karlsson via GitGitGadget 2026-06-28 12:25 ` [PATCH v4 1/8] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget @ 2026-06-28 12:25 ` Elijah Newren via GitGitGadget 2026-06-28 12:25 ` [PATCH v4 3/8] t6099, t6600: add side-exhaustion regression tests Kristofer Karlsson via GitGitGadget ` (5 subsequent siblings) 7 siblings, 0 replies; 84+ messages in thread From: Elijah Newren via GitGitGadget @ 2026-06-28 12:25 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Elijah Newren From: Elijah Newren <newren@gmail.com> Add test cases to t6600-test-reach.sh that exercise edge cases in the side-exhaustion optimization for paint_down_to_common(): - in_merge_bases_many:self: commit is both A and one of the X inputs - get_merge_bases_many:duplicate-twos: duplicate entries in X list - get_merge_bases_many:pending-stale: STALE transition on an already-painted commit (ps-* diamond topology) - get_merge_bases_many:infinity-both-sides: both tips outside the commit-graph with non-monotonic dates (pi-* topology) Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- t/t6600-test-reach.sh | 111 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index b5b314e570..c2e091aad1 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -49,6 +49,62 @@ test_expect_success 'setup' ' git tag -a -m "$x-$i" tag-$x-$i commit-$x-$i || return 1 done done && + + # Build a small side topology to exercise the (PARENT1|PARENT2) -> + # (PARENT1|PARENT2|STALE) transition in paint_down_to_common(); the + # 10x10 grid above does not exercise it because no merge-base candidate + # there is a descendant of another, so STALE never reaches a + # still-pending candidate. + # + # ps-X + # /|\ + # / | \ + # ps-Z ps-B ps-W + # | / \ | + # | / \ | + # |/ \| + # ps-T1 ps-T2 + # + # where ps-T1=merge(ps-Z,ps-B), ps-T2=merge(ps-W,ps-B), so + # merge-base(ps-T1,ps-T2) = ps-B. During the walk, ps-X transitions + # to (PARENT1|PARENT2) via ps-Z and ps-W before ps-B is dequeued; + # then the STALE-walk from ps-B transitions ps-X to + # (PARENT1|PARENT2|STALE). + git checkout --orphan ps-orphan && + test_commit ps-X && + git checkout -b ps-B-br ps-X && test_commit ps-B && + git checkout -b ps-Z-br ps-X && test_commit ps-Z && + git checkout -b ps-W-br ps-X && test_commit ps-W && + git checkout -b ps-T1 ps-Z && + git merge --no-ff -m ps-T1 ps-B && + git checkout -b ps-T2 ps-W && + git merge --no-ff -m ps-T2 ps-B && + + # Build a side topology that lives entirely outside the half + # commit-graph and has non-monotonic commit dates, to exercise the + # INFINITY-gate in paint_down_to_common. With both tips outside + # the graph, generation is INFINITY and the queue falls back to + # commit-date order, which here is non-monotonic. + # + # pi-X (date 500, PARENT1 tip) --> pi-P, pi-D + # pi-D (date 480) --> pi-C + # pi-C (date 200) --> pi-B + # pi-B (date 100, PARENT2 tip) --> pi-P + # pi-P (date 450, root) + # + # merge-base(pi-X, pi-B) = pi-B (it is an ancestor of pi-X and is + # itself one of the queried tips). + git checkout --orphan pi-orphan && + test_commit --date "@450 +0000" pi-P && + test_commit --date "@100 +0000" pi-B && + test_commit --date "@200 +0000" pi-C && + test_commit --date "@480 +0000" pi-D && + GIT_AUTHOR_DATE="@500 +0000" GIT_COMMITTER_DATE="@500 +0000" \ + git commit-tree -p pi-D -p pi-P -m pi-X pi-D^{tree} >pi-X-oid && + pi_x="$(cat pi-X-oid)" && + git branch -f pi-X-br "$pi_x" && + git tag pi-X "$pi_x" && + git commit-graph write --reachable && mv .git/objects/info/commit-graph commit-graph-full && chmod u+w commit-graph-full && @@ -146,6 +202,16 @@ test_expect_success 'in_merge_bases_many:miss-heuristic' ' test_all_modes in_merge_bases_many ' +test_expect_success 'in_merge_bases_many:self' ' + cat >input <<-\EOF && + A:commit-6-8 + X:commit-5-9 + X:commit-6-8 + EOF + echo "in_merge_bases_many(A,X):1" >expect && + test_all_modes in_merge_bases_many +' + test_expect_success 'is_descendant_of:hit' ' cat >input <<-\EOF && A:commit-5-7 @@ -183,6 +249,51 @@ test_expect_success 'get_merge_bases_many' ' test_all_modes get_merge_bases_many ' +test_expect_success 'get_merge_bases_many:duplicate-twos' ' + cat >input <<-\EOF && + A:commit-5-7 + X:commit-4-8 + X:commit-4-8 + X:commit-6-6 + X:commit-6-6 + X:commit-8-3 + EOF + { + echo "get_merge_bases_many(A,X):" && + git rev-parse commit-5-6 \ + commit-4-7 | sort + } >expect && + test_all_modes get_merge_bases_many +' + +test_expect_success 'get_merge_bases_many:pending-stale' ' + # Exercises the (PARENT1|PARENT2) -> (...|STALE) transition path in + # paint_down_to_common(). See the topology comment in the setup test. + cat >input <<-\EOF && + A:ps-T1 + X:ps-T2 + EOF + { + echo "get_merge_bases_many(A,X):" && + git rev-parse ps-B + } >expect && + test_all_modes get_merge_bases_many +' + +test_expect_success 'get_merge_bases_many:infinity-both-sides' ' + # Exercises the push-time INFINITY-gate in paint_down_to_common(). See + # the pi-* topology comment in the setup test. + cat >input <<-\EOF && + A:pi-X + X:pi-B + EOF + { + echo "get_merge_bases_many(A,X):" && + git rev-parse pi-B + } >expect && + test_all_modes get_merge_bases_many +' + test_expect_success 'reduce_heads' ' cat >input <<-\EOF && X:commit-1-10 -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v4 3/8] t6099, t6600: add side-exhaustion regression tests 2026-06-28 12:25 ` [PATCH v4 " Kristofer Karlsson via GitGitGadget 2026-06-28 12:25 ` [PATCH v4 1/8] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget 2026-06-28 12:25 ` [PATCH v4 2/8] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget @ 2026-06-28 12:25 ` Kristofer Karlsson via GitGitGadget 2026-06-28 12:25 ` [PATCH v4 4/8] commit-reach: add trace2 instrumentation to paint_down_to_common() Kristofer Karlsson via GitGitGadget ` (4 subsequent siblings) 7 siblings, 0 replies; 84+ messages in thread From: Kristofer Karlsson via GitGitGadget @ 2026-06-28 12:25 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Kristofer Karlsson From: Kristofer Karlsson <krka@spotify.com> Add t6099 to test the case where multiple merge-base candidates exist and one is an ancestor of another. This exercises the side-exhaustion optimization in paint_down_to_common together with the remove_redundant safety net in get_merge_bases_many_0. Add a mixed finite/INFINITY test to t6600 where one tip is outside the commit-graph (INFINITY generation) and the other is inside. This exercises the region transition: the walk starts in the INFINITY region where side-exhaustion is disabled, then crosses into the finite region where it can fire. Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- t/meson.build | 1 + t/t6099-merge-base-side-exhaustion.sh | 82 +++++++++++++++++++++++++++ t/t6600-test-reach.sh | 25 ++++++++ 3 files changed, 108 insertions(+) create mode 100755 t/t6099-merge-base-side-exhaustion.sh diff --git a/t/meson.build b/t/meson.build index 3219264fe7..ee6ebdffb9 100644 --- a/t/meson.build +++ b/t/meson.build @@ -786,6 +786,7 @@ integration_tests = [ 't6041-bisect-submodule.sh', 't6050-replace.sh', 't6060-merge-index.sh', + 't6099-merge-base-side-exhaustion.sh', 't6100-rev-list-in-order.sh', 't6101-rev-parse-parents.sh', 't6102-rev-list-unexpected-objects.sh', diff --git a/t/t6099-merge-base-side-exhaustion.sh b/t/t6099-merge-base-side-exhaustion.sh new file mode 100755 index 0000000000..4f1e0d50ef --- /dev/null +++ b/t/t6099-merge-base-side-exhaustion.sh @@ -0,0 +1,82 @@ +#!/bin/sh + +test_description='merge-base with ancestor among merge-base candidates + +Test that merge-base --all correctly handles cases where +multiple merge-base candidates exist and one is an ancestor +of another. The side-exhaustion optimization in +paint_down_to_common may exit before STALE propagation +removes the ancestor, but remove_redundant catches it. + +Graph shape (parents are below children): + + A ----------- X + |\ /| + | B---------/ | + | | | + e2 \ f2 + | | | + e1 d1 f1 + \ | / + \ | / + \| / + C + +A and X are the two tips. +B and C are both reachable from A and X. +B reaches C through d1. +Only B should appear in merge-base --all output. +' + +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME + +TEST_PASSES_SANITIZE_LEAK=true +. ./test-lib.sh + +test_expect_success 'setup ancestor merge-base candidate' ' + test_commit C && + + git checkout -b d-chain HEAD && + test_commit d1 && + test_commit B && + + git checkout -b e-path C && + test_commit e1 && + test_commit e2 && + + git checkout -b f-path C && + test_commit f1 && + test_commit f2 && + + git checkout -b branch-A e-path && + test_merge A B && + + git checkout -b branch-X f-path && + test_merge X B && + + git commit-graph write --reachable +' + +test_expect_success 'merge-base --all excludes ancestor candidate' ' + git rev-parse B >expected && + git merge-base --all A X >actual && + test_cmp expected actual +' + +test_expect_success 'merge-base (single) finds shallowest' ' + git rev-parse B >expected && + git merge-base A X >actual && + test_cmp expected actual +' + +# Without commit-graph: generation numbers are INFINITY, +# side-exhaustion optimization does not fire. +test_expect_success 'merge-base --all without commit-graph' ' + rm -f .git/objects/info/commit-graph && + git rev-parse B >expected && + git merge-base --all A X >actual && + test_cmp expected actual +' + +test_done diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index c2e091aad1..4b771b4c58 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -294,6 +294,31 @@ test_expect_success 'get_merge_bases_many:infinity-both-sides' ' test_all_modes get_merge_bases_many ' +test_expect_success 'setup mixed finite/INFINITY topology' ' + # Create a commit outside all saved commit-graph files so it always + # has INFINITY generation, while its parent (ps-X) is in the graph + # with a finite generation. Use the ps-* orphan topology so we do + # not pollute the grid-based rev-list tests. + git checkout ps-X && + test_env GIT_TEST_COMMIT_GRAPH= test_commit pm-INF +' + +test_expect_success 'get_merge_bases_many:mixed-finite-infinity' ' + # One tip (pm-INF) is outside the commit-graph with INFINITY + # generation; the other (ps-B) is in the graph with finite + # generation. The walk starts in the INFINITY region and crosses + # into the finite region where side-exhaustion can fire. + cat >input <<-\EOF && + A:pm-INF + X:ps-B + EOF + { + echo "get_merge_bases_many(A,X):" && + git rev-parse ps-X + } >expect && + test_all_modes get_merge_bases_many +' + test_expect_success 'reduce_heads' ' cat >input <<-\EOF && X:commit-1-10 -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v4 4/8] commit-reach: add trace2 instrumentation to paint_down_to_common() 2026-06-28 12:25 ` [PATCH v4 " Kristofer Karlsson via GitGitGadget ` (2 preceding siblings ...) 2026-06-28 12:25 ` [PATCH v4 3/8] t6099, t6600: add side-exhaustion regression tests Kristofer Karlsson via GitGitGadget @ 2026-06-28 12:25 ` Kristofer Karlsson via GitGitGadget 2026-06-28 12:25 ` [PATCH v4 5/8] commit-reach: introduce struct paint_state with per-side counters Kristofer Karlsson via GitGitGadget ` (3 subsequent siblings) 7 siblings, 0 replies; 84+ messages in thread From: Kristofer Karlsson via GitGitGadget @ 2026-06-28 12:25 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Kristofer Karlsson From: Kristofer Karlsson <krka@spotify.com> Add a step counter and trace2_data_intmax() call so that the number of commits visited during the paint walk is observable via GIT_TRACE2_EVENT. This provides a way to measure the impact of future optimizations without relying on wall-clock benchmarks alone. Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- commit-reach.c | 5 ++++ t/t6600-test-reach.sh | 53 ++++++++++++++++++++++++++++++------------- 2 files changed, 42 insertions(+), 16 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index a9483759e0..f6a438550b 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -11,6 +11,7 @@ #include "tag.h" #include "commit-reach.h" #include "ewah/ewok.h" +#include "trace2.h" /* Remember to update object flag allocation in object.h */ #define PARENT1 (1u<<16) @@ -112,6 +113,7 @@ static int paint_down_to_common(struct repository *r, { compare_commits_by_gen_then_commit_date } }; int i; + int steps = 0; timestamp_t last_gen = GENERATION_NUMBER_INFINITY; struct commit_list **tail = result; @@ -135,6 +137,7 @@ static int paint_down_to_common(struct repository *r, struct commit_list *parents; int flags; timestamp_t generation = commit_graph_generation(commit); + steps++; if (min_generation && generation > last_gen) BUG("bad generation skip %"PRItime" > %"PRItime" at %s", @@ -190,6 +193,8 @@ static int paint_down_to_common(struct repository *r, } clear_nonstale_queue(&queue); + trace2_data_intmax("paint_down_to_common", r, + "steps", steps); commit_list_sort_by_date(result); return 0; } diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index 4b771b4c58..b3a31b80ac 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -118,24 +118,34 @@ test_expect_success 'setup' ' ' run_all_modes () { - test_when_finished rm -rf .git/objects/info/commit-graph && - "$@" <input >actual && - test_cmp expect actual && - cp commit-graph-full .git/objects/info/commit-graph && - "$@" <input >actual && - test_cmp expect actual && - cp commit-graph-half .git/objects/info/commit-graph && - "$@" <input >actual && - test_cmp expect actual && - cp commit-graph-no-gdat .git/objects/info/commit-graph && - "$@" <input >actual && - test_cmp expect actual + graph=.git/objects/info/commit-graph && + test_when_finished rm -rf "$graph" "${graph}s" && + rm -f trace-mode-*.txt && + + for mode in none full half no-gdat + do + rm -rf "$graph" "${graph}s" && + cp "commit-graph-${mode}" "$graph" 2>/dev/null || + true && + GIT_TRACE2_EVENT="$(pwd)/trace-mode-${mode}.txt" \ + "$@" <input >actual && + test_cmp expect actual || return 1 + done } test_all_modes () { run_all_modes test-tool reach "$@" } +test_paint_down_steps () { + for mode in none full half no-gdat + do + test_trace2_data paint_down_to_common steps "$1" \ + <"trace-mode-${mode}.txt" || return 1 + shift + done +} + test_expect_success 'ref_newer:miss' ' cat >input <<-\EOF && A:commit-5-7 @@ -209,7 +219,8 @@ test_expect_success 'in_merge_bases_many:self' ' X:commit-6-8 EOF echo "in_merge_bases_many(A,X):1" >expect && - test_all_modes in_merge_bases_many + test_all_modes in_merge_bases_many && + test_paint_down_steps 45 2 25 3 ' test_expect_success 'is_descendant_of:hit' ' @@ -277,7 +288,8 @@ test_expect_success 'get_merge_bases_many:pending-stale' ' echo "get_merge_bases_many(A,X):" && git rev-parse ps-B } >expect && - test_all_modes get_merge_bases_many + test_all_modes get_merge_bases_many && + test_paint_down_steps 6 6 6 6 ' test_expect_success 'get_merge_bases_many:infinity-both-sides' ' @@ -291,7 +303,8 @@ test_expect_success 'get_merge_bases_many:infinity-both-sides' ' echo "get_merge_bases_many(A,X):" && git rev-parse pi-B } >expect && - test_all_modes get_merge_bases_many + test_all_modes get_merge_bases_many && + test_paint_down_steps 5 5 5 5 ' test_expect_success 'setup mixed finite/INFINITY topology' ' @@ -316,7 +329,15 @@ test_expect_success 'get_merge_bases_many:mixed-finite-infinity' ' echo "get_merge_bases_many(A,X):" && git rev-parse ps-X } >expect && - test_all_modes get_merge_bases_many + test_all_modes get_merge_bases_many && + test_paint_down_steps 3 3 3 3 +' + +test_expect_success 'merge-base --all commit-walk steps' ' + >input && + git rev-parse commit-9-1 >expect && + run_all_modes git merge-base --all commit-9-9 commit-9-1 && + test_paint_down_steps 81 80 81 81 ' test_expect_success 'reduce_heads' ' -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v4 5/8] commit-reach: introduce struct paint_state with per-side counters 2026-06-28 12:25 ` [PATCH v4 " Kristofer Karlsson via GitGitGadget ` (3 preceding siblings ...) 2026-06-28 12:25 ` [PATCH v4 4/8] commit-reach: add trace2 instrumentation to paint_down_to_common() Kristofer Karlsson via GitGitGadget @ 2026-06-28 12:25 ` Kristofer Karlsson via GitGitGadget 2026-06-28 12:25 ` [PATCH v4 6/8] commit-reach: remove unused nonstale_queue dedup wrappers Kristofer Karlsson via GitGitGadget ` (2 subsequent siblings) 7 siblings, 0 replies; 84+ messages in thread From: Kristofer Karlsson via GitGitGadget @ 2026-06-28 12:25 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Kristofer Karlsson From: Kristofer Karlsson <krka@spotify.com> Add a paint_state struct for use by paint_down_to_common() that wraps a prio_queue with per-side commit counters. Each non-stale queued commit occupies exactly one counter bucket based on its paint flags: PARENT1-only, PARENT2-only, or both sides (a pending merge-base candidate). The counters are maintained by paint_count_update() which adjusts the appropriate bucket by a signed delta. An exhaustive switch on the paint+stale bits documents all valid flag combinations in one place. Convert paint_down_to_common() to use paint_state. The loop now drains the queue via paint_queue_get() which returns NULL when all counters reach zero, replacing the old pointer-based termination (max_nonstale). This is equivalent behavior -- both conditions detect that no non-stale entries remain. paint_queue_get() uses a "pop first" form: it dequeues a commit, then checks the counters. This means the loop exits one iteration earlier than the old code in some topologies (the popped stale commit is never processed), so a few step counts drop by one. The existing nonstale_queue is left in place for ahead_behind(). Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- .../technical/paint-down-to-common.adoc | 9 +- commit-reach.c | 94 ++++++++++++++++--- t/t6600-test-reach.sh | 4 +- 3 files changed, 85 insertions(+), 22 deletions(-) diff --git a/Documentation/technical/paint-down-to-common.adoc b/Documentation/technical/paint-down-to-common.adoc index a4dfcba038..ac3e2b39a5 100644 --- a/Documentation/technical/paint-down-to-common.adoc +++ b/Documentation/technical/paint-down-to-common.adoc @@ -88,15 +88,12 @@ re-enqueued is bounded by the number of flag transitions. Termination ----------- -The walk uses a `nonstale_queue` wrapper around `prio_queue` that -tracks `max_nonstale`: the lowest-priority non-stale commit enqueued -so far. Once that commit is dequeued, every remaining entry is known -to be STALE and the loop terminates. Specifically, the main loop +The walk tracks the number of commits of each type in the queue +(PARENT1-only, PARENT2-only, pending merge-base). The main loop ends when one of the following conditions holds: 1. The queue is empty. - 2. `max_nonstale` has been dequeued, meaning the queue only contains - STALE entries. + 2. The queue contains only stale entries. 3. Generation cutoff: the dequeued commit's generation is below a caller-supplied `min_generation` threshold. 4. Single result: the caller only needs one merge base, one has diff --git a/commit-reach.c b/commit-reach.c index f6a438550b..9ae306f60c 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -97,6 +97,75 @@ static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue) return commit; } +/* + * Priority queue with per-side commit counters for paint_down_to_common(). + * Each non-stale queued commit occupies exactly one bucket: PARENT1-only, + * PARENT2-only, or both (a pending merge-base candidate). + */ +struct paint_state { + struct prio_queue queue; + size_t parent1_count; + size_t parent2_count; + size_t mb_candidate_count; +}; + +static void paint_count_update(struct paint_state *state, + unsigned flags, int delta) +{ + switch (flags & (PARENT1 | PARENT2 | STALE)) { + case PARENT1: + state->parent1_count += delta; + break; + + case PARENT2: + state->parent2_count += delta; + break; + + case PARENT1 | PARENT2: + state->mb_candidate_count += delta; + break; + + case PARENT1 | PARENT2 | STALE: + break; + + default: + BUG("unexpected paint state"); + } +} + +static void paint_queue_put(struct paint_state *state, + struct commit *c, unsigned add_flags) +{ + unsigned old_flags = c->object.flags; + c->object.flags |= add_flags; + + if (old_flags & ENQUEUED) { + paint_count_update(state, old_flags, -1); + paint_count_update(state, c->object.flags, 1); + } else { + c->object.flags |= ENQUEUED; + prio_queue_put(&state->queue, c); + paint_count_update(state, c->object.flags, 1); + } +} + +static struct commit *paint_queue_get(struct paint_state *state) +{ + struct commit *commit = prio_queue_get(&state->queue); + + if (!commit) + return NULL; + + commit->object.flags &= ~ENQUEUED; + + if (!state->parent1_count && !state->parent2_count && + !state->mb_candidate_count) + return NULL; + + paint_count_update(state, commit->object.flags, -1); + return commit; +} + /* * See Documentation/technical/paint-down-to-common.adoc * @@ -109,31 +178,29 @@ static int paint_down_to_common(struct repository *r, enum merge_base_flags mb_flags, struct commit_list **result) { - struct nonstale_queue queue = { - { compare_commits_by_gen_then_commit_date } + struct paint_state state = { + .queue = { compare_commits_by_gen_then_commit_date } }; + struct commit *commit; int i; int steps = 0; timestamp_t last_gen = GENERATION_NUMBER_INFINITY; struct commit_list **tail = result; if (!min_generation && !corrected_commit_dates_enabled(r)) - queue.pq.compare = compare_commits_by_commit_date; + state.queue.compare = compare_commits_by_commit_date; one->object.flags |= PARENT1; if (!n) { commit_list_append(one, result); return 0; } - nonstale_queue_put_dedup(&queue, one); + paint_queue_put(&state, one, 0); - for (i = 0; i < n; i++) { - twos[i]->object.flags |= PARENT2; - nonstale_queue_put_dedup(&queue, twos[i]); - } + for (i = 0; i < n; i++) + paint_queue_put(&state, twos[i], PARENT2); - while (queue.max_nonstale) { - struct commit *commit = nonstale_queue_get_dedup(&queue); + while ((commit = paint_queue_get(&state))) { struct commit_list *parents; int flags; timestamp_t generation = commit_graph_generation(commit); @@ -172,7 +239,7 @@ static int paint_down_to_common(struct repository *r, if ((p->object.flags & flags) == flags) continue; if (repo_parse_commit(r, p)) { - clear_nonstale_queue(&queue); + clear_prio_queue(&state.queue); commit_list_free(*result); *result = NULL; /* @@ -187,12 +254,11 @@ static int paint_down_to_common(struct repository *r, return error(_("could not parse commit %s"), oid_to_hex(&p->object.oid)); } - p->object.flags |= flags; - nonstale_queue_put_dedup(&queue, p); + paint_queue_put(&state, p, flags); } } - clear_nonstale_queue(&queue); + clear_prio_queue(&state.queue); trace2_data_intmax("paint_down_to_common", r, "steps", steps); commit_list_sort_by_date(result); diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index b3a31b80ac..51f3d70492 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -289,7 +289,7 @@ test_expect_success 'get_merge_bases_many:pending-stale' ' git rev-parse ps-B } >expect && test_all_modes get_merge_bases_many && - test_paint_down_steps 6 6 6 6 + test_paint_down_steps 5 5 5 5 ' test_expect_success 'get_merge_bases_many:infinity-both-sides' ' @@ -304,7 +304,7 @@ test_expect_success 'get_merge_bases_many:infinity-both-sides' ' git rev-parse pi-B } >expect && test_all_modes get_merge_bases_many && - test_paint_down_steps 5 5 5 5 + test_paint_down_steps 5 4 5 5 ' test_expect_success 'setup mixed finite/INFINITY topology' ' -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v4 6/8] commit-reach: remove unused nonstale_queue dedup wrappers 2026-06-28 12:25 ` [PATCH v4 " Kristofer Karlsson via GitGitGadget ` (4 preceding siblings ...) 2026-06-28 12:25 ` [PATCH v4 5/8] commit-reach: introduce struct paint_state with per-side counters Kristofer Karlsson via GitGitGadget @ 2026-06-28 12:25 ` Kristofer Karlsson via GitGitGadget 2026-06-28 12:25 ` [PATCH v4 7/8] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget 2026-06-28 12:25 ` [PATCH v4 8/8] commit-reach: move min_generation check into paint_queue_get() Kristofer Karlsson via GitGitGadget 7 siblings, 0 replies; 84+ messages in thread From: Kristofer Karlsson via GitGitGadget @ 2026-06-28 12:25 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Kristofer Karlsson From: Kristofer Karlsson <krka@spotify.com> nonstale_queue_put_dedup() and nonstale_queue_get_dedup() became unused after the previous commit. The core nonstale_queue functions remain in use by ahead_behind(). Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- commit-reach.c | 18 ------------------ 1 file changed, 18 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index 9ae306f60c..176ffd68d0 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -79,24 +79,6 @@ static void clear_nonstale_queue(struct nonstale_queue *queue) queue->max_nonstale = NULL; } -static void nonstale_queue_put_dedup(struct nonstale_queue *queue, - struct commit *c) -{ - if (c->object.flags & ENQUEUED) - return; - c->object.flags |= ENQUEUED; - nonstale_queue_put(queue, c); -} - -static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue) -{ - struct commit *commit = nonstale_queue_get(queue); - - if (commit) - commit->object.flags &= ~ENQUEUED; - return commit; -} - /* * Priority queue with per-side commit counters for paint_down_to_common(). * Each non-stale queued commit occupies exactly one bucket: PARENT1-only, -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v4 7/8] commit-reach: terminate merge-base walk when one paint side is exhausted 2026-06-28 12:25 ` [PATCH v4 " Kristofer Karlsson via GitGitGadget ` (5 preceding siblings ...) 2026-06-28 12:25 ` [PATCH v4 6/8] commit-reach: remove unused nonstale_queue dedup wrappers Kristofer Karlsson via GitGitGadget @ 2026-06-28 12:25 ` Kristofer Karlsson via GitGitGadget 2026-06-28 12:25 ` [PATCH v4 8/8] commit-reach: move min_generation check into paint_queue_get() Kristofer Karlsson via GitGitGadget 7 siblings, 0 replies; 84+ messages in thread From: Kristofer Karlsson via GitGitGadget @ 2026-06-28 12:25 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Kristofer Karlsson From: Kristofer Karlsson <krka@spotify.com> Add an early termination check to paint_down_to_common() using the per-side counters introduced earlier. Once the walk enters the finite-generation region, terminate early when one side's exclusive count drops to zero -- no new merge-base can form without both paint sides meeting. The check also waits for pending_merge_bases to reach zero, ensuring all merge-base candidates have been dequeued and recorded before exiting. The INFINITY gate ensures correctness: commits without a commit-graph entry have GENERATION_NUMBER_INFINITY and are ordered by commit date, which is not topologically reliable. The optimization only fires once the walk enters the finite-generation region where ordering guarantees hold. Step counts measured with trace2 on git.git with commit-graph: merge-base --all v2.0.0 v2.55.0-rc1: before: 72264 steps after: 44589 steps merge-base --all v2.55.0-rc1 v2.55.0-rc1~5: before: 110 steps after: 7 steps Helped-by: Derrick Stolee <stolee@gmail.com> Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- .../technical/paint-down-to-common.adoc | 17 +++++++++++++++++ commit-reach.c | 17 ++++++++++++++--- t/t6600-test-reach.sh | 4 ++-- 3 files changed, 33 insertions(+), 5 deletions(-) diff --git a/Documentation/technical/paint-down-to-common.adoc b/Documentation/technical/paint-down-to-common.adoc index ac3e2b39a5..15adac7885 100644 --- a/Documentation/technical/paint-down-to-common.adoc +++ b/Documentation/technical/paint-down-to-common.adoc @@ -99,6 +99,9 @@ ends when one of the following conditions holds: 4. Single result: the caller only needs one merge base, one has been found, and the walk has entered the finite-generation region. + 5. Side exhaustion: no pure PARENT1 or pure PARENT2 commits + remain in the queue, no pending merge-base candidates exist, + and the walk has entered the finite-generation region. Stale entry condition ~~~~~~~~~~~~~~~~~~~~~ @@ -109,6 +112,20 @@ existing candidates by proving one is an ancestor of another, but `remove_redundant()` handles that as a post-processing step, so it is safe to exit early. +Side-exhaustion condition +~~~~~~~~~~~~~~~~~~~~~~~~~ +A new merge-base requires commits from both sides to meet. When one +side's exclusive counter reaches zero and there are no pending +merge-base candidates, no future traversal step can produce a new +candidate. + +This optimization only activates in the finite-generation region +where topological ordering holds. In that region, children are +always visited before parents, so paint flags are final at visit +time and an exhausted side cannot reappear. In the INFINITY region, +commit-date ordering can violate this guarantee, so the check is +skipped. + Generation cutoff ~~~~~~~~~~~~~~~~~ Some callers (notably `remove_redundant()`) supply a `min_generation` diff --git a/commit-reach.c b/commit-reach.c index 176ffd68d0..e174b219c6 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -131,6 +131,10 @@ static void paint_queue_put(struct paint_state *state, } } +/* + * Dequeue the next commit for the paint walk, or return NULL when + * no more merge bases can be discovered. + */ static struct commit *paint_queue_get(struct paint_state *state) { struct commit *commit = prio_queue_get(&state->queue); @@ -140,9 +144,16 @@ static struct commit *paint_queue_get(struct paint_state *state) commit->object.flags &= ~ENQUEUED; - if (!state->parent1_count && !state->parent2_count && - !state->mb_candidate_count) - return NULL; + if (!state->mb_candidate_count) { + /* only stale entries remain */ + if (!state->parent1_count && !state->parent2_count) + return NULL; + + /* one side is exhausted */ + if ((!state->parent1_count || !state->parent2_count) && + commit_graph_generation(commit) < GENERATION_NUMBER_INFINITY) + return NULL; + } paint_count_update(state, commit->object.flags, -1); return commit; diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index 51f3d70492..6365007560 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -220,7 +220,7 @@ test_expect_success 'in_merge_bases_many:self' ' EOF echo "in_merge_bases_many(A,X):1" >expect && test_all_modes in_merge_bases_many && - test_paint_down_steps 45 2 25 3 + test_paint_down_steps 45 1 25 1 ' test_expect_success 'is_descendant_of:hit' ' @@ -337,7 +337,7 @@ test_expect_success 'merge-base --all commit-walk steps' ' >input && git rev-parse commit-9-1 >expect && run_all_modes git merge-base --all commit-9-9 commit-9-1 && - test_paint_down_steps 81 80 81 81 + test_paint_down_steps 81 9 57 10 ' test_expect_success 'reduce_heads' ' -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v4 8/8] commit-reach: move min_generation check into paint_queue_get() 2026-06-28 12:25 ` [PATCH v4 " Kristofer Karlsson via GitGitGadget ` (6 preceding siblings ...) 2026-06-28 12:25 ` [PATCH v4 7/8] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget @ 2026-06-28 12:25 ` Kristofer Karlsson via GitGitGadget 7 siblings, 0 replies; 84+ messages in thread From: Kristofer Karlsson via GitGitGadget @ 2026-06-28 12:25 UTC (permalink / raw) To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Kristofer Karlsson From: Kristofer Karlsson <krka@spotify.com> Consolidate the min_generation termination condition into paint_queue_get(), alongside the existing stale-entry and side-exhaustion checks. Move last_gen into struct paint_state so that commit_graph_generation() is called exactly once per dequeued commit and the result is shared across all termination checks and the monotonicity BUG assertion. Signed-off-by: Kristofer Karlsson <krka@spotify.com> --- commit-reach.c | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index e174b219c6..5c5c54d66e 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -89,6 +89,8 @@ struct paint_state { size_t parent1_count; size_t parent2_count; size_t mb_candidate_count; + timestamp_t min_generation; + timestamp_t last_gen; }; static void paint_count_update(struct paint_state *state, @@ -138,11 +140,23 @@ static void paint_queue_put(struct paint_state *state, static struct commit *paint_queue_get(struct paint_state *state) { struct commit *commit = prio_queue_get(&state->queue); + timestamp_t generation; if (!commit) return NULL; commit->object.flags &= ~ENQUEUED; + generation = commit_graph_generation(commit); + + if (state->min_generation && generation > state->last_gen) + BUG("bad generation skip %"PRItime" > %"PRItime" at %s", + generation, state->last_gen, + oid_to_hex(&commit->object.oid)); + state->last_gen = generation; + + /* generation cutoff */ + if (generation < state->min_generation) + return NULL; if (!state->mb_candidate_count) { /* only stale entries remain */ @@ -151,7 +165,7 @@ static struct commit *paint_queue_get(struct paint_state *state) /* one side is exhausted */ if ((!state->parent1_count || !state->parent2_count) && - commit_graph_generation(commit) < GENERATION_NUMBER_INFINITY) + generation < GENERATION_NUMBER_INFINITY) return NULL; } @@ -177,9 +191,10 @@ static int paint_down_to_common(struct repository *r, struct commit *commit; int i; int steps = 0; - timestamp_t last_gen = GENERATION_NUMBER_INFINITY; struct commit_list **tail = result; + state.min_generation = min_generation; + state.last_gen = GENERATION_NUMBER_INFINITY; if (!min_generation && !corrected_commit_dates_enabled(r)) state.queue.compare = compare_commits_by_commit_date; @@ -196,18 +211,8 @@ static int paint_down_to_common(struct repository *r, while ((commit = paint_queue_get(&state))) { struct commit_list *parents; int flags; - timestamp_t generation = commit_graph_generation(commit); steps++; - if (min_generation && generation > last_gen) - BUG("bad generation skip %"PRItime" > %"PRItime" at %s", - generation, last_gen, - oid_to_hex(&commit->object.oid)); - last_gen = generation; - - if (generation < min_generation) - break; - flags = commit->object.flags & (PARENT1 | PARENT2 | STALE); if (flags == (PARENT1 | PARENT2)) { if (!(commit->object.flags & RESULT)) { @@ -219,7 +224,7 @@ static int paint_down_to_common(struct repository *r, * descendant of this one. */ if (!(mb_flags & MERGE_BASE_FIND_ALL) && - generation < GENERATION_NUMBER_INFINITY) + state.last_gen < GENERATION_NUMBER_INFINITY) break; } /* Mark parents of a found merge stale */ -- gitgitgadget ^ permalink raw reply related [flat|nested] 84+ messages in thread
end of thread, other threads:[~2026-06-28 12:26 UTC | newest] Thread overview: 84+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-20 10:36 [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted Kristofer Karlsson via GitGitGadget 2026-06-20 10:36 ` [PATCH/RFC 1/6] commit-reach: decouple ahead_behind from nonstale_queue Kristofer Karlsson via GitGitGadget 2026-06-22 18:00 ` Derrick Stolee 2026-06-22 18:53 ` Kristofer Karlsson 2026-06-20 10:36 ` [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters Kristofer Karlsson via GitGitGadget 2026-06-22 18:10 ` Derrick Stolee 2026-06-22 19:14 ` Kristofer Karlsson 2026-06-22 20:23 ` Derrick Stolee 2026-06-23 10:13 ` Kristofer Karlsson 2026-06-23 13:50 ` Derrick Stolee 2026-06-23 14:09 ` Kristofer Karlsson 2026-06-23 14:17 ` Derrick Stolee 2026-06-24 11:25 ` Kristofer Karlsson 2026-06-20 10:36 ` [PATCH/RFC 3/6] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget 2026-06-22 18:12 ` Derrick Stolee 2026-06-22 19:19 ` Kristofer Karlsson 2026-06-22 20:26 ` Derrick Stolee 2026-06-22 21:03 ` Kristofer Karlsson 2026-06-23 13:40 ` Derrick Stolee 2026-06-20 10:36 ` [PATCH/RFC 4/6] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget 2026-06-22 18:15 ` Derrick Stolee 2026-06-22 19:25 ` Kristofer Karlsson 2026-06-22 20:28 ` Derrick Stolee 2026-06-20 10:36 ` [PATCH/RFC 5/6] t6099, t6600: add side-exhaustion regression tests Kristofer Karlsson via GitGitGadget 2026-06-22 18:16 ` Derrick Stolee 2026-06-20 10:36 ` [PATCH/RFC 6/6] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget 2026-06-22 18:21 ` Derrick Stolee 2026-06-22 19:30 ` Kristofer Karlsson 2026-06-22 18:22 ` [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted Derrick Stolee 2026-06-24 12:14 ` [PATCH v2 0/7] " Kristofer Karlsson via GitGitGadget 2026-06-24 12:14 ` [PATCH v2 1/7] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget 2026-06-24 17:09 ` Junio C Hamano 2026-06-24 12:14 ` [PATCH v2 2/7] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget 2026-06-24 13:43 ` Derrick Stolee 2026-06-24 14:33 ` Kristofer Karlsson 2026-06-24 12:14 ` [PATCH v2 3/7] t6099, t6600: add side-exhaustion regression tests Kristofer Karlsson via GitGitGadget 2026-06-24 12:14 ` [PATCH v2 4/7] commit-reach: add trace2 instrumentation to paint_down_to_common() Kristofer Karlsson via GitGitGadget 2026-06-24 13:41 ` Derrick Stolee 2026-06-24 14:31 ` Kristofer Karlsson 2026-06-24 12:14 ` [PATCH v2 5/7] commit-reach: introduce struct paint_state with per-side counters Kristofer Karlsson via GitGitGadget 2026-06-24 13:54 ` Derrick Stolee 2026-06-24 14:38 ` Kristofer Karlsson 2026-06-24 12:14 ` [PATCH v2 6/7] commit-reach: remove unused nonstale_queue dedup wrappers Kristofer Karlsson via GitGitGadget 2026-06-24 13:55 ` Derrick Stolee 2026-06-24 12:14 ` [PATCH v2 7/7] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget 2026-06-24 14:02 ` Derrick Stolee 2026-06-24 14:47 ` Kristofer Karlsson 2026-06-24 15:07 ` Derrick Stolee 2026-06-24 13:34 ` [PATCH v2 0/7] commit-reach: terminate merge-base walk when one " Derrick Stolee 2026-06-24 14:25 ` Kristofer Karlsson 2026-06-24 14:09 ` Derrick Stolee 2026-06-26 13:07 ` [PATCH v3 0/8] " Kristofer Karlsson via GitGitGadget 2026-06-26 13:07 ` [PATCH v3 1/8] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget 2026-06-26 13:07 ` [PATCH v3 2/8] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget 2026-06-26 13:08 ` [PATCH v3 3/8] t6099, t6600: add side-exhaustion regression tests Kristofer Karlsson via GitGitGadget 2026-06-26 13:08 ` [PATCH v3 4/8] commit-reach: add trace2 instrumentation to paint_down_to_common() Kristofer Karlsson via GitGitGadget 2026-06-26 14:31 ` Derrick Stolee 2026-06-26 14:35 ` Kristofer Karlsson 2026-06-26 13:08 ` [PATCH v3 5/8] commit-reach: introduce struct paint_state with per-side counters Kristofer Karlsson via GitGitGadget 2026-06-26 21:13 ` René Scharfe 2026-06-26 21:57 ` Kristofer Karlsson 2026-06-26 13:08 ` [PATCH v3 6/8] commit-reach: remove unused nonstale_queue dedup wrappers Kristofer Karlsson via GitGitGadget 2026-06-26 13:08 ` [PATCH v3 7/8] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget 2026-06-26 14:29 ` Kristofer Karlsson 2026-06-26 14:32 ` Derrick Stolee 2026-06-26 16:41 ` Kristofer Karlsson 2026-06-26 14:35 ` Derrick Stolee 2026-06-26 14:39 ` Kristofer Karlsson 2026-06-26 13:08 ` [PATCH v3 8/8] commit-reach: move min_generation check into paint_queue_get() Kristofer Karlsson via GitGitGadget 2026-06-26 14:42 ` Derrick Stolee 2026-06-26 14:53 ` Kristofer Karlsson 2026-06-26 14:58 ` Derrick Stolee 2026-06-26 16:36 ` [PATCH v3 0/8] commit-reach: terminate merge-base walk when one side is exhausted Junio C Hamano 2026-06-26 16:43 ` Kristofer Karlsson 2026-06-26 18:43 ` Junio C Hamano 2026-06-28 12:25 ` [PATCH v4 " Kristofer Karlsson via GitGitGadget 2026-06-28 12:25 ` [PATCH v4 1/8] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget 2026-06-28 12:25 ` [PATCH v4 2/8] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget 2026-06-28 12:25 ` [PATCH v4 3/8] t6099, t6600: add side-exhaustion regression tests Kristofer Karlsson via GitGitGadget 2026-06-28 12:25 ` [PATCH v4 4/8] commit-reach: add trace2 instrumentation to paint_down_to_common() Kristofer Karlsson via GitGitGadget 2026-06-28 12:25 ` [PATCH v4 5/8] commit-reach: introduce struct paint_state with per-side counters Kristofer Karlsson via GitGitGadget 2026-06-28 12:25 ` [PATCH v4 6/8] commit-reach: remove unused nonstale_queue dedup wrappers Kristofer Karlsson via GitGitGadget 2026-06-28 12:25 ` [PATCH v4 7/8] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget 2026-06-28 12:25 ` [PATCH v4 8/8] commit-reach: move min_generation check into paint_queue_get() Kristofer Karlsson via GitGitGadget
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox