Git development
 help / color / mirror / Atom feed
* [RFC] commit-reach: terminate merge-base walk when one paint side is exhausted
@ 2026-06-12 11:15 Kristofer Karlsson
  2026-06-12 12:52 ` Derrick Stolee
  0 siblings, 1 reply; 6+ messages in thread
From: Kristofer Karlsson @ 2026-06-12 11:15 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee

Hi! I previously sent a patch[1] to optimize paint_down_to_common
for the single merge-base case. I believe I have found a stronger
optimization, but before sending a patch I wanted to discuss the
correctness argument.

The main problem to solve is that computing merge-bases is slow today
in some scenarios, especially large monorepos with complex graphs.
This affects multiple operations, including merge-base and merge-tree.

The previous patch improved it for the special case of the
merge-base being part of the commit-graph and the caller only
needing to know about one merge-base.

I have an idea to make it faster for fetching all merge-bases for
common flows in large repos, as long as the commit graph is
reasonably up to date.

The key part is the exit condition in paint_down_to_common.
Instead of waiting for the queue to only contain stale entries,
it is enough to wait for one of the sides to be exhausted,
i.e. side 1 is exhausted if no more commits exist in the
traversal queue flagged with only PARENT1. For example, if
the two sides are origin/HEAD and a small PR branch, the PR
branch will quickly become exhausted at the merge-base, while
the main side will continue.

Now you may ask: why is that a safe condition?

The traversal in paint_down_to_common has two logical phases
due to the priority queue ordering:

  1. Process all commits with infinite generation numbers.
     This includes all commits when there is no commit-graph.
  2. Process all commits with finite generation numbers.

These happen in strict order -- all INFINITY commits are popped
before any finite-generation commit.

The optimization only applies after the walk enters the second phase.
In the first phase, the traversal behaves exactly as today
and uses the existing termination condition.

In the second phase, traversal follows strict topological
order -- descendants are processed before ancestors. Paint flags
propagate from each processed commit to its parents, which have
strictly lower generation and are therefore not yet examined.

A new merge-base candidate can only form when a PARENT1-only path
meets a PARENT2-only path. Once a commit acquires both paint flags
in this phase, any descendant carrying both paint flags would
already have been processed.

Once one side is exhausted from the queue, no new meeting between
pure sides can occur. Any commit that subsequently acquires both
paint flags must inherit them from a commit that already had both
flags -- it is deeper in the graph and cannot affect the final
merge-base set. We can stop.

On a large monorepo with previously expensive merge-base and
merge-tree queries, I observed speedups ranging from roughly 300x
to 1000x. The nice thing is that this works for merge-base --all
and every internal caller of paint_down_to_common -- we no longer
have to restrict the optimization to finding just the first merge-base.

Does the correctness argument above hold?

Happy to come back with a patch later if the logic holds and the
overall approach is wanted.

Thanks,
Kristofer

[1] https://lore.kernel.org/git/pull.2109.v4.git.1778504352.gitgitgadget@gmail.com/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-06-12 15:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-12 11:15 [RFC] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson
2026-06-12 12:52 ` Derrick Stolee
2026-06-12 14:32   ` Kristofer Karlsson
2026-06-12 15:04     ` Derrick Stolee
2026-06-12 15:21       ` Kristofer Karlsson
2026-06-12 15:48         ` Derrick Stolee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox