All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Kristofer Karlsson <krka@spotify.com>,
	Kristofer Karlsson <krka@spotify.com>
Subject: [PATCH v5 01/10] Documentation/technical: add paint-down-to-common doc
Date: Wed, 01 Jul 2026 16:37:02 +0000	[thread overview]
Message-ID: <be00f5aaa163d18a36bfa399346370e03322bbe2.1782923832.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2149.v5.git.1782923832.gitgitgadget@gmail.com>

From: Kristofer Karlsson <krka@spotify.com>

Add a technical document describing the paint_down_to_common()
algorithm used for merge-base computation, covering the paint
walk, generation number regions, and termination conditions.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
 Documentation/Makefile                        |   1 +
 Documentation/technical/meson.build           |   1 +
 .../technical/paint-down-to-common.adoc       | 177 ++++++++++++++++++
 commit-reach.c                                |   6 +-
 4 files changed, 184 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/technical/paint-down-to-common.adoc

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 2699f0b24a..f8dea4b395 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -129,6 +129,7 @@ TECH_DOCS += technical/long-running-process-protocol
 TECH_DOCS += technical/multi-pack-index
 TECH_DOCS += technical/packfile-uri
 TECH_DOCS += technical/pack-heuristics
+TECH_DOCS += technical/paint-down-to-common
 TECH_DOCS += technical/parallel-checkout
 TECH_DOCS += technical/partial-clone
 TECH_DOCS += technical/platform-support
diff --git a/Documentation/technical/meson.build b/Documentation/technical/meson.build
index ec07088c57..9ce11d5e48 100644
--- a/Documentation/technical/meson.build
+++ b/Documentation/technical/meson.build
@@ -18,6 +18,7 @@ articles = [
   'multi-pack-index.adoc',
   'packfile-uri.adoc',
   'pack-heuristics.adoc',
+  'paint-down-to-common.adoc',
   'parallel-checkout.adoc',
   'partial-clone.adoc',
   'platform-support.adoc',
diff --git a/Documentation/technical/paint-down-to-common.adoc b/Documentation/technical/paint-down-to-common.adoc
new file mode 100644
index 0000000000..ff015c5c8f
--- /dev/null
+++ b/Documentation/technical/paint-down-to-common.adoc
@@ -0,0 +1,177 @@
+Merge-Base Computation and paint_down_to_common()
+==================================================
+
+The function `paint_down_to_common()` in `commit-reach.c` computes merge
+bases by walking the commit graph backwards from two sets of tips and
+finding where their ancestry meets.
+
+Use cases
+---------
+
+Computing merge bases is used in two different ways:
+
+ 1. *Finding all merge bases* (`merge-base --all`, `merge-tree`,
+    `merge`, `rebase`). A merge base is a common ancestor that is
+    not itself an ancestor of another common ancestor.
+
+ 2. *Ancestry checks* (`in_merge_bases`, used by `merge-base
+    --is-ancestor`, `branch -d`, `fetch`). These ask: "is commit A
+    an ancestor of commit B?" If a common ancestor equals one of the
+    inputs, that input is necessarily the only merge base -- no other
+    common ancestor can be both as recent and not an ancestor of it.
+
+Both use cases share the same algorithm and implementation.
+
+Algorithm
+---------
+
+Given a commit `one` and a set of commits `twos[]`, the walk paints
+commits with two colors:
+
+  - PARENT1: reachable from `one`
+  - PARENT2: reachable from any commit in `twos[]`
+
+The walk uses a priority queue ordered by generation number
+(highest first), breaking ties by commit date. Each step dequeues
+the highest-priority commit (this is when we say a commit is
+"visited") and propagates its paint flags to its parents, enqueuing
+them if they gained new flags. When a commit receives both PARENT1
+and PARENT2, it is a merge-base candidate. A candidate gains the
+STALE flag so its ancestors propagate staleness -- any deeper common
+ancestor is necessarily redundant.
+
+NOTE: When the commit-graph uses only topological levels (generation
+number v1) and the caller passes `min_generation = 0`, a legacy
+fallback replaces the generation-ordered comparator with a pure
+commit-date comparator. This breaks the ordering invariants
+described below -- see <<date-ordering-fallback>>.
+
+[[generation-regions]]
+INFINITY and finite generation regions
+--------------------------------------
+
+The properties in this section assume generation-number ordering (the
+default comparator). They do NOT hold when the date-ordering fallback
+is active -- see <<date-ordering-fallback>>.
+
+The commit-graph stores a generation number for each commit. Commits
+not in the commit-graph have generation `GENERATION_NUMBER_INFINITY`. The
+graph is closed under reachability: if a commit is in the graph, all
+its ancestors are too. This partitions the commit graph into two regions:
+
+....
+    +---------------------------------------+
+    |          INFINITY region              |
+    |  generation = INFINITY                |
+    |  queue order: heuristic (commit date) |
+    +---------------------------------------+
+                    |
+                    v
+    +---------------------------------------+
+    |          Finite region                |
+    |  generation = finite                  |
+    |  queue order: topological             |
+    +---------------------------------------+
+....
+
+When the commit-graph is enabled, the INFINITY region is typically
+very small -- it only contains commits added since the last
+commit-graph refresh.
+
+All reachable INFINITY-generation commits are visited before any
+finite-generation commit, because INFINITY is larger than any finite
+value. Once the walk crosses into the finite region, it stays there.
+
+In the finite region, generation ordering guarantees topological
+traversal: children are always visited before their parents. This
+means that paint on already-visited commits is final -- no future
+traversal step can add paint to them.
+
+In the INFINITY region, commit-date ordering can violate this: a
+parent with a later date can be visited before a child with an earlier
+date. Paint flags are therefore NOT final at visit time, and a
+commit visited with only one side's paint may later gain the other.
+
+Paint flags are only added, never removed. Since each flag can be set
+at most once per commit, the number of times a commit can be
+re-enqueued is bounded by the number of flag transitions.
+
+Termination
+-----------
+
+The walk uses a `nonstale_queue` wrapper around `prio_queue` that
+tracks `max_nonstale`: the lowest-priority non-stale commit enqueued
+so far. Once that commit is dequeued, every remaining entry is known
+to be STALE and the loop terminates. Specifically, the main loop
+ends when one of the following conditions holds:
+
+  1. The queue is empty.
+  2. `max_nonstale` has been dequeued, meaning the queue only contains
+     STALE entries.
+  3. Generation cutoff: the dequeued commit's generation is below
+     a caller-supplied `min_generation` threshold.
+  4. Single result: the caller only needs one merge base, one has
+     been found, and the walk has entered the finite-generation
+     region.
+
+Stale entry condition
+~~~~~~~~~~~~~~~~~~~~~
+Once all queued entries are stale, no new merge-base candidates can
+be discovered -- that requires at least one non-stale commit from
+each side meeting. Continuing the walk could still invalidate
+existing candidates by proving one is an ancestor of another, but
+`remove_redundant()` handles that as a post-processing step, so it
+is safe to exit early.
+
+Generation cutoff
+~~~~~~~~~~~~~~~~~
+Some callers (notably `remove_redundant()`) supply a `min_generation`
+threshold -- the minimum generation of the input commits. No merge
+base can have a generation below this threshold, so the walk
+terminates as soon as it dequeues such a commit.
+
+Single result
+~~~~~~~~~~~~~
+When only one merge base is needed, the walk is in the
+finite-generation region, and the queue uses generation ordering,
+the first candidate found is necessarily the highest-generation
+common ancestor. No remaining commit in the queue can be a
+descendant of this candidate (generation ordering guarantees
+children are visited first), so it cannot be redundant and the walk
+can stop immediately.
+
+This optimization is NOT safe when the date-ordering fallback is
+active, because commit-date order can visit a deeper ancestor
+before a shallower one -- see <<date-ordering-fallback>>.
+
+[[date-ordering-fallback]]
+Date-ordering fallback
+----------------------
+
+When `min_generation` is zero and the commit-graph does not contain
+corrected commit dates (generation number v1, which stores only
+topological levels), `paint_down_to_common()` replaces the default
+generation-ordered comparator with `compare_commits_by_commit_date`.
+
+This was introduced as a performance heuristic: topological levels
+are coarser than commit dates, so date ordering can reach merge
+bases in fewer steps when timestamps are well-behaved. However,
+commit dates are not required to be monotonic -- a parent can have
+a later date than its child (clock skew, rebases, etc.) -- so the
+queue may visit commits out of topological order.
+
+This disables optimizations that depend on generation ordering:
+
+  1. *Single result*: the first merge-base candidate found may not
+     be the shallowest, because a deeper ancestor with a higher
+     commit date can be dequeued first.
+
+  2. *Side-exhaustion* (see subsequent commits): one paint side can
+     appear to drain from the queue while commits from that side are
+     still waiting with lower dates, causing premature termination.
+
+Related documentation
+---------------------
+
+  - `Documentation/technical/commit-graph.adoc` -- generation numbers
+    and the reachability closure property.
diff --git a/commit-reach.c b/commit-reach.c
index 5df471a313..a9483759e0 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -96,7 +96,11 @@ static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue)
 	return commit;
 }
 
-/* all input commits in one and twos[] must have been parsed! */
+/*
+ * See Documentation/technical/paint-down-to-common.adoc
+ *
+ * All input commits in one and twos[] must have been parsed!
+ */
 static int paint_down_to_common(struct repository *r,
 				struct commit *one, int n,
 				struct commit **twos,
-- 
gitgitgadget


  reply	other threads:[~2026-07-01 16:37 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-20 10:36 [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted Kristofer Karlsson via GitGitGadget
2026-06-20 10:36 ` [PATCH/RFC 1/6] commit-reach: decouple ahead_behind from nonstale_queue Kristofer Karlsson via GitGitGadget
2026-06-22 18:00   ` Derrick Stolee
2026-06-22 18:53     ` Kristofer Karlsson
2026-06-20 10:36 ` [PATCH/RFC 2/6] commit-reach: introduce struct paint_queue with per-side counters Kristofer Karlsson via GitGitGadget
2026-06-22 18:10   ` Derrick Stolee
2026-06-22 19:14     ` Kristofer Karlsson
2026-06-22 20:23       ` Derrick Stolee
2026-06-23 10:13         ` Kristofer Karlsson
2026-06-23 13:50           ` Derrick Stolee
2026-06-23 14:09             ` Kristofer Karlsson
2026-06-23 14:17               ` Derrick Stolee
2026-06-24 11:25                 ` Kristofer Karlsson
2026-06-20 10:36 ` [PATCH/RFC 3/6] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget
2026-06-22 18:12   ` Derrick Stolee
2026-06-22 19:19     ` Kristofer Karlsson
2026-06-22 20:26       ` Derrick Stolee
2026-06-22 21:03         ` Kristofer Karlsson
2026-06-23 13:40           ` Derrick Stolee
2026-06-20 10:36 ` [PATCH/RFC 4/6] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget
2026-06-22 18:15   ` Derrick Stolee
2026-06-22 19:25     ` Kristofer Karlsson
2026-06-22 20:28       ` Derrick Stolee
2026-06-20 10:36 ` [PATCH/RFC 5/6] t6099, t6600: add side-exhaustion regression tests Kristofer Karlsson via GitGitGadget
2026-06-22 18:16   ` Derrick Stolee
2026-06-20 10:36 ` [PATCH/RFC 6/6] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget
2026-06-22 18:21   ` Derrick Stolee
2026-06-22 19:30     ` Kristofer Karlsson
2026-06-22 18:22 ` [PATCH/RFC 0/6] commit-reach: terminate merge-base walk when one side is exhausted Derrick Stolee
2026-06-24 12:14 ` [PATCH v2 0/7] " Kristofer Karlsson via GitGitGadget
2026-06-24 12:14   ` [PATCH v2 1/7] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget
2026-06-24 17:09     ` Junio C Hamano
2026-06-24 12:14   ` [PATCH v2 2/7] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget
2026-06-24 13:43     ` Derrick Stolee
2026-06-24 14:33       ` Kristofer Karlsson
2026-06-24 12:14   ` [PATCH v2 3/7] t6099, t6600: add side-exhaustion regression tests Kristofer Karlsson via GitGitGadget
2026-06-24 12:14   ` [PATCH v2 4/7] commit-reach: add trace2 instrumentation to paint_down_to_common() Kristofer Karlsson via GitGitGadget
2026-06-24 13:41     ` Derrick Stolee
2026-06-24 14:31       ` Kristofer Karlsson
2026-06-24 12:14   ` [PATCH v2 5/7] commit-reach: introduce struct paint_state with per-side counters Kristofer Karlsson via GitGitGadget
2026-06-24 13:54     ` Derrick Stolee
2026-06-24 14:38       ` Kristofer Karlsson
2026-06-24 12:14   ` [PATCH v2 6/7] commit-reach: remove unused nonstale_queue dedup wrappers Kristofer Karlsson via GitGitGadget
2026-06-24 13:55     ` Derrick Stolee
2026-06-24 12:14   ` [PATCH v2 7/7] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget
2026-06-24 14:02     ` Derrick Stolee
2026-06-24 14:47       ` Kristofer Karlsson
2026-06-24 15:07         ` Derrick Stolee
2026-06-24 13:34   ` [PATCH v2 0/7] commit-reach: terminate merge-base walk when one " Derrick Stolee
2026-06-24 14:25     ` Kristofer Karlsson
2026-06-24 14:09   ` Derrick Stolee
2026-06-26 13:07   ` [PATCH v3 0/8] " Kristofer Karlsson via GitGitGadget
2026-06-26 13:07     ` [PATCH v3 1/8] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget
2026-06-26 13:07     ` [PATCH v3 2/8] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget
2026-06-26 13:08     ` [PATCH v3 3/8] t6099, t6600: add side-exhaustion regression tests Kristofer Karlsson via GitGitGadget
2026-06-26 13:08     ` [PATCH v3 4/8] commit-reach: add trace2 instrumentation to paint_down_to_common() Kristofer Karlsson via GitGitGadget
2026-06-26 14:31       ` Derrick Stolee
2026-06-26 14:35         ` Kristofer Karlsson
2026-06-26 13:08     ` [PATCH v3 5/8] commit-reach: introduce struct paint_state with per-side counters Kristofer Karlsson via GitGitGadget
2026-06-26 21:13       ` René Scharfe
2026-06-26 21:57         ` Kristofer Karlsson
2026-06-26 13:08     ` [PATCH v3 6/8] commit-reach: remove unused nonstale_queue dedup wrappers Kristofer Karlsson via GitGitGadget
2026-06-26 13:08     ` [PATCH v3 7/8] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget
2026-06-26 14:29       ` Kristofer Karlsson
2026-06-26 14:32         ` Derrick Stolee
2026-06-26 16:41           ` Kristofer Karlsson
2026-06-26 14:35       ` Derrick Stolee
2026-06-26 14:39         ` Kristofer Karlsson
2026-06-26 13:08     ` [PATCH v3 8/8] commit-reach: move min_generation check into paint_queue_get() Kristofer Karlsson via GitGitGadget
2026-06-26 14:42       ` Derrick Stolee
2026-06-26 14:53         ` Kristofer Karlsson
2026-06-26 14:58           ` Derrick Stolee
2026-06-26 16:36     ` [PATCH v3 0/8] commit-reach: terminate merge-base walk when one side is exhausted Junio C Hamano
2026-06-26 16:43       ` Kristofer Karlsson
2026-06-26 18:43         ` Junio C Hamano
2026-06-28 12:25     ` [PATCH v4 " Kristofer Karlsson via GitGitGadget
2026-06-28 12:25       ` [PATCH v4 1/8] Documentation/technical: add paint-down-to-common doc Kristofer Karlsson via GitGitGadget
2026-06-28 12:25       ` [PATCH v4 2/8] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget
2026-06-28 12:25       ` [PATCH v4 3/8] t6099, t6600: add side-exhaustion regression tests Kristofer Karlsson via GitGitGadget
2026-06-28 12:25       ` [PATCH v4 4/8] commit-reach: add trace2 instrumentation to paint_down_to_common() Kristofer Karlsson via GitGitGadget
2026-06-28 12:25       ` [PATCH v4 5/8] commit-reach: introduce struct paint_state with per-side counters Kristofer Karlsson via GitGitGadget
2026-06-28 12:25       ` [PATCH v4 6/8] commit-reach: remove unused nonstale_queue dedup wrappers Kristofer Karlsson via GitGitGadget
2026-06-29  5:25         ` SZEDER Gábor
2026-06-29 10:09           ` Kristofer Karlsson
2026-06-28 12:25       ` [PATCH v4 7/8] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget
2026-06-28 12:25       ` [PATCH v4 8/8] commit-reach: move min_generation check into paint_queue_get() Kristofer Karlsson via GitGitGadget
2026-06-28 15:15         ` Derrick Stolee
2026-06-28 15:16       ` [PATCH v4 0/8] commit-reach: terminate merge-base walk when one side is exhausted Derrick Stolee
2026-06-29 12:11         ` Kristofer Karlsson
2026-06-29 12:40           ` Derrick Stolee
2026-06-29 12:59             ` Kristofer Karlsson
2026-07-01 16:37       ` [PATCH v5 00/10] " Kristofer Karlsson via GitGitGadget
2026-07-01 16:37         ` Kristofer Karlsson via GitGitGadget [this message]
2026-07-01 16:37         ` [PATCH v5 02/10] test-lib-functions: improve diagnostic output for trace2 data assertions Kristofer Karlsson via GitGitGadget
2026-07-01 16:37         ` [PATCH v5 03/10] t6600: add test cases for side-exhaustion edge cases Elijah Newren via GitGitGadget
2026-07-01 16:37         ` [PATCH v5 04/10] t6099, t6600: add side-exhaustion regression tests Kristofer Karlsson via GitGitGadget
2026-07-01 16:37         ` [PATCH v5 05/10] commit-reach: add trace2 instrumentation to paint_down_to_common() Kristofer Karlsson via GitGitGadget
2026-07-01 16:37         ` [PATCH v5 06/10] t6600: add clock-skew topologies and step counts for edge cases Kristofer Karlsson via GitGitGadget
2026-07-01 16:37         ` [PATCH v5 07/10] commit-reach: introduce struct paint_state with per-side counters Kristofer Karlsson via GitGitGadget
2026-07-01 16:37         ` [PATCH v5 08/10] commit-reach: terminate merge-base walk when one paint side is exhausted Kristofer Karlsson via GitGitGadget
2026-07-01 16:37         ` [PATCH v5 09/10] commit-reach: move min_generation check into paint_queue_get() Kristofer Karlsson via GitGitGadget
2026-07-01 16:37         ` [PATCH v5 10/10] commit-reach: remove commit-date ordering fallback Kristofer Karlsson via GitGitGadget
2026-07-01 20:06         ` [PATCH v5 00/10] commit-reach: terminate merge-base walk when one side is exhausted Junio C Hamano
2026-07-01 21:15           ` Kristofer Karlsson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=be00f5aaa163d18a36bfa399346370e03322bbe2.1782923832.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=krka@spotify.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.