Git development
 help / color / mirror / Atom feed
From: "Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Derrick Stolee <stolee@gmail.com>, Jeff King <peff@peff.net>,
	Kristofer Karlsson <krka@spotify.com>,
	Kristofer Karlsson <krka@spotify.com>
Subject: [PATCH v2 0/3] commit-reach: replace queue_has_nonstale() scan with O(1) tracking
Date: Mon, 25 May 2026 14:28:02 +0000	[thread overview]
Message-ID: <pull.2124.v2.git.1779719286.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2124.git.1779644541.gitgitgadget@gmail.com>

This is v2 of the series to replace the O(n) queue_has_nonstale() scan with
O(1) tracking.

Changes since v1:

 * Replaced the nonstale counter with a max_nonstale pointer approach, as
   suggested by Jeff King. Instead of counting non-stale entries, we track
   the lowest-priority non-stale commit; when it is popped, all remaining
   entries must be stale.

 * Restructured from 5 patches to 3:
   
   1. object.h: fix stale entries in object flag allocation table
   2. commit-reach: deduplicate queue entries in paint_down_to_common
   3. commit-reach: replace queue_has_nonstale() scan with O(1) tracking

 * Separated concerns: ENQUEUED dedup is a paint_down_to_common concern
   (commit 2), while nonstale_queue is a general wrapper usable by both
   paint_down_to_common and ahead_behind (commit 3). ahead_behind uses its
   own PARENT2-based dedup via insert_no_dup.

 * The nonstale_queue struct is intentionally kept thin (no ENQUEUED
   handling). Dedup variants (nonstale_queue_put_dedup /
   nonstale_queue_get_dedup) are layered on top for paint_down_to_common.

Performance on a large monorepo (3.7M commits), merge-base --all on deep
import branches:

                                      Baseline        Patched
component import, wide frontier (1):  8536ms           3956ms
component import, wide frontier (2):  5757ms           4383ms
component import, wide frontier (3):  4743ms           1927ms


Profiling shows paint_down_to_common() drops from 50% to 4% of total runtime
(~27x faster). The remaining time is in commit graph lookups, heap
operations, and object management — per-commit costs that are not addressed
by this series.

Simple/linear cases show no regression (sub-15ms regardless).

Kristofer Karlsson (3):
  object.h: fix stale entries in object flag allocation table
  commit-reach: deduplicate queue entries in paint_down_to_common
  commit-reach: replace queue_has_nonstale() scan with O(1) tracking

 commit-reach.c | 101 +++++++++++++++++++++++++++++++++++++------------
 object.h       |   7 ++--
 2 files changed, 80 insertions(+), 28 deletions(-)


base-commit: 56a4f3c3a221adf1df9b39da69b8a6890f803157
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2124%2Fspkrka%2Fqueue-has-nonstale-v3-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2124/spkrka/queue-has-nonstale-v3-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/2124

Range-diff vs v1:

 -:  ---------- > 1:  105f4646c2 object.h: fix stale entries in object flag allocation table
 1:  1d3751569b ! 2:  fc38c0f856 commit-reach: deduplicate queue entries in paint_down_to_common
     @@ Commit message
          combinations. Add an ENQUEUED flag to track whether a commit is
          currently in the priority queue, and skip it if already present.
      
     +    Introduce prio_queue_put_dedup() and prio_queue_get_dedup()
     +    wrappers that manage the ENQUEUED flag on enqueue and dequeue.
     +
          This change is performance-neutral on its own: the O(n)
          queue_has_nonstale() scan still dominates the per-iteration cost.
          However, the deduplication guarantee (each commit appears in the
          queue at most once) is a prerequisite for the next commit, which
     -    replaces that scan with an O(1) nonstale counter.
     +    replaces that scan with O(1) tracking.
      
          Signed-off-by: Kristofer Karlsson <krka@spotify.com>
      
     @@ commit-reach.c: static int compare_commits_by_gen(const void *_a, const void *_b
       	return 0;
       }
       
     -+static void maybe_enqueue(struct prio_queue *queue, struct commit *c)
     ++static void prio_queue_put_dedup(struct prio_queue *queue, struct commit *c)
      +{
      +	if (c->object.flags & ENQUEUED)
      +		return;
      +	c->object.flags |= ENQUEUED;
      +	prio_queue_put(queue, c);
      +}
     ++
     ++static struct commit *prio_queue_get_dedup(struct prio_queue *queue)
     ++{
     ++	struct commit *commit = prio_queue_get(queue);
     ++	if (commit)
     ++		commit->object.flags &= ~ENQUEUED;
     ++	return commit;
     ++}
      +
       static int queue_has_nonstale(struct prio_queue *queue)
       {
     @@ commit-reach.c: static int paint_down_to_common(struct repository *r,
       		return 0;
       	}
      -	prio_queue_put(&queue, one);
     -+	maybe_enqueue(&queue, one);
     ++	prio_queue_put_dedup(&queue, one);
       
       	for (i = 0; i < n; i++) {
       		twos[i]->object.flags |= PARENT2;
      -		prio_queue_put(&queue, twos[i]);
     -+		maybe_enqueue(&queue, twos[i]);
     ++		prio_queue_put_dedup(&queue, twos[i]);
       	}
       
       	while (queue_has_nonstale(&queue)) {
     -@@ commit-reach.c: static int paint_down_to_common(struct repository *r,
     +-		struct commit *commit = prio_queue_get(&queue);
     ++		struct commit *commit = prio_queue_get_dedup(&queue);
     + 		struct commit_list *parents;
       		int flags;
       		timestamp_t generation = commit_graph_generation(commit);
     - 
     -+		commit->object.flags &= ~ENQUEUED;
     -+
     - 		if (min_generation && generation > last_gen)
     - 			BUG("bad generation skip %"PRItime" > %"PRItime" at %s",
     - 			    generation, last_gen,
      @@ commit-reach.c: static int paint_down_to_common(struct repository *r,
       					     oid_to_hex(&p->object.oid));
       			}
       			p->object.flags |= flags;
      -			prio_queue_put(&queue, p);
     -+			maybe_enqueue(&queue, p);
     ++			prio_queue_put_dedup(&queue, p);
       		}
       	}
       
     @@ object.h: void object_array_init(struct object_array *array);
      - * commit-reach.c:                                  16-----19
      + * commit-reach.c:                                  16-------20
        * builtin/last-modified.c:                         1617
     -  * sha1-name.c:                                              20
     +  * object-name.c:                                            20
        * list-objects-filter.c:                                      21
 2:  4742f5e634 < -:  ---------- commit-reach: optimize queue scan in paint_down_to_common
 3:  711a0e2235 ! 3:  03771eb34c commit-reach: optimize queue scan in ahead_behind
     @@ Metadata
      Author: Kristofer Karlsson <krka@spotify.com>
      
       ## Commit message ##
     -    commit-reach: optimize queue scan in ahead_behind
     +    commit-reach: replace queue_has_nonstale() scan with O(1) tracking
      
     -    Apply the same nonstale_count optimization from the previous commit
     -    to ahead_behind(). This replaces the remaining caller of the O(n)
     -    queue_has_nonstale() scan with an O(1) counter check, allowing
     -    queue_has_nonstale() to be removed.
     +    paint_down_to_common() and ahead_behind() call queue_has_nonstale()
     +    on every iteration to decide whether to continue the walk.
     +    queue_has_nonstale() performs a linear scan of the priority queue,
     +    making the overall walk O(n*m) where n is the number of commits
     +    walked and m is the queue size.
      
     -    ahead_behind() already deduplicates queue entries using the PARENT2
     -    flag (via insert_no_dup), so the counter is maintained through
     -    insert_no_dup() and mark_stale() using PARENT2 as the queued_flag.
     +    Introduce 'struct nonstale_queue', a thin wrapper around prio_queue
     +    that maintains a 'max_nonstale' pointer — the lowest-priority
     +    (oldest) non-stale commit seen so far. When this commit is popped,
     +    every remaining queue entry is known to be stale, so the walk can
     +    stop. This reduces the per-iteration termination check from O(m)
     +    to O(1).
      
     +    Uses <= 0 (not < 0) when comparing priorities so that among distinct
     +    commits with equal priority (same generation and timestamp) the
     +    last-enqueued one is tracked. Since prio_queue breaks ties by
     +    insertion order, this ensures max_nonstale is always the last in its
     +    priority class to be popped, making pointer equality on pop
     +    sufficient for correctness.
     +
     +    The previous commit's ENQUEUED deduplication guarantees each commit
     +    appears at most once in the queue, which is required for the pointer
     +    equality check to be unambiguous.
     +
     +    On a large monorepo (3.7M commits), this yields ~2x end-to-end
     +    speedup for merge-base calculations on deep import branches.
     +    Profiling shows paint_down_to_common() drops from 50% to 4% of
     +    total runtime (~27x faster), with the remaining time in commit
     +    graph lookups and heap operations:
     +
     +      Before: 8536ms / 5757ms / 4743ms  (three test cases)
     +      After:  3956ms / 4383ms / 1927ms
     +
     +    Suggested-by: Jeff King <peff@peff.net>
          Signed-off-by: Kristofer Karlsson <krka@spotify.com>
      
       ## commit-reach.c ##
     -@@ commit-reach.c: static void mark_stale(struct commit *c, unsigned queued_flag,
     - 	}
     +@@ commit-reach.c: static int compare_commits_by_gen(const void *_a, const void *_b)
     + 	return 0;
     + }
     + 
     +-static void prio_queue_put_dedup(struct prio_queue *queue, struct commit *c)
     ++/*
     ++ * A prio_queue with O(1) termination check.  'max_nonstale' tracks
     ++ * the lowest-priority non-stale commit enqueued so far; once it is
     ++ * popped, every remaining entry is known to be STALE.
     ++ */
     ++struct nonstale_queue {
     ++	struct prio_queue pq;
     ++	struct commit *max_nonstale;
     ++};
     ++
     ++static void nonstale_queue_put(struct nonstale_queue *queue,
     ++			       struct commit *c)
     ++{
     ++	struct commit *old = queue->max_nonstale;
     ++
     ++	prio_queue_put(&queue->pq, c);
     ++	if (c->object.flags & STALE)
     ++		return;
     ++	if (!old || queue->pq.compare(old, c, queue->pq.cb_data) <= 0)
     ++		queue->max_nonstale = c;
     ++}
     ++
     ++static struct commit *nonstale_queue_get(struct nonstale_queue *queue)
     ++{
     ++	struct commit *commit = prio_queue_get(&queue->pq);
     ++
     ++	if (commit == queue->max_nonstale)
     ++		queue->max_nonstale = NULL;
     ++
     ++	return commit;
     ++}
     ++
     ++static void clear_nonstale_queue(struct nonstale_queue *queue)
     ++{
     ++	clear_prio_queue(&queue->pq);
     ++	queue->max_nonstale = NULL;
     ++}
     ++
     ++static void nonstale_queue_put_dedup(struct nonstale_queue *queue,
     ++				     struct commit *c)
     + {
     + 	if (c->object.flags & ENQUEUED)
     + 		return;
     + 	c->object.flags |= ENQUEUED;
     +-	prio_queue_put(queue, c);
     ++	nonstale_queue_put(queue, c);
     + }
     + 
     +-static struct commit *prio_queue_get_dedup(struct prio_queue *queue)
     ++static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue)
     + {
     +-	struct commit *commit = prio_queue_get(queue);
     ++	struct commit *commit = nonstale_queue_get(queue);
     ++
     + 	if (commit)
     + 		commit->object.flags &= ~ENQUEUED;
     + 	return commit;
       }
       
      -static int queue_has_nonstale(struct prio_queue *queue)
     @@ commit-reach.c: static void mark_stale(struct commit *c, unsigned queued_flag,
       /* all input commits in one and twos[] must have been parsed! */
       static int paint_down_to_common(struct repository *r,
       				struct commit *one, int n,
     +@@ commit-reach.c: static int paint_down_to_common(struct repository *r,
     + 				enum merge_base_flags mb_flags,
     + 				struct commit_list **result)
     + {
     +-	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
     ++	struct nonstale_queue queue = {
     ++		{ compare_commits_by_gen_then_commit_date }
     ++	};
     + 	int i;
     + 	timestamp_t last_gen = GENERATION_NUMBER_INFINITY;
     + 	struct commit_list **tail = result;
     + 
     + 	if (!min_generation && !corrected_commit_dates_enabled(r))
     +-		queue.compare = compare_commits_by_commit_date;
     ++		queue.pq.compare = compare_commits_by_commit_date;
     + 
     + 	one->object.flags |= PARENT1;
     + 	if (!n) {
     + 		commit_list_append(one, result);
     + 		return 0;
     + 	}
     +-	prio_queue_put_dedup(&queue, one);
     ++	nonstale_queue_put_dedup(&queue, one);
     + 
     + 	for (i = 0; i < n; i++) {
     + 		twos[i]->object.flags |= PARENT2;
     +-		prio_queue_put_dedup(&queue, twos[i]);
     ++		nonstale_queue_put_dedup(&queue, twos[i]);
     + 	}
     + 
     +-	while (queue_has_nonstale(&queue)) {
     +-		struct commit *commit = prio_queue_get_dedup(&queue);
     ++	while (queue.max_nonstale) {
     ++		struct commit *commit = nonstale_queue_get_dedup(&queue);
     + 		struct commit_list *parents;
     + 		int flags;
     + 		timestamp_t generation = commit_graph_generation(commit);
     +@@ commit-reach.c: static int paint_down_to_common(struct repository *r,
     + 			if ((p->object.flags & flags) == flags)
     + 				continue;
     + 			if (repo_parse_commit(r, p)) {
     +-				clear_prio_queue(&queue);
     ++				clear_nonstale_queue(&queue);
     + 				commit_list_free(*result);
     + 				*result = NULL;
     + 				/*
     +@@ commit-reach.c: static int paint_down_to_common(struct repository *r,
     + 					     oid_to_hex(&p->object.oid));
     + 			}
     + 			p->object.flags |= flags;
     +-			prio_queue_put_dedup(&queue, p);
     ++			nonstale_queue_put_dedup(&queue, p);
     + 		}
     + 	}
     + 
     +-	clear_prio_queue(&queue);
     ++	clear_nonstale_queue(&queue);
     + 	commit_list_sort_by_date(result);
     + 	return 0;
     + }
      @@ commit-reach.c: struct commit_list *get_reachable_subset(struct commit **from, size_t nr_from,
       define_commit_slab(bit_arrays, struct bitmap *);
       static struct bit_arrays bit_arrays;
       
      -static void insert_no_dup(struct prio_queue *queue, struct commit *c)
     -+static void insert_no_dup(struct prio_queue *queue, struct commit *c,
     -+			  int *nonstale_count)
     ++static void insert_no_dup(struct nonstale_queue *queue, struct commit *c)
       {
       	if (c->object.flags & PARENT2)
       		return;
     - 	prio_queue_put(queue, c);
     +-	prio_queue_put(queue, c);
     ++	nonstale_queue_put(queue, c);
       	c->object.flags |= PARENT2;
     -+	if (!(c->object.flags & STALE))
     -+		(*nonstale_count)++;
       }
       
     - static struct bitmap *get_bit_array(struct commit *c, int width)
      @@ commit-reach.c: void ahead_behind(struct repository *r,
     + 		  struct commit **commits, size_t commits_nr,
     + 		  struct ahead_behind_count *counts, size_t counts_nr)
       {
     - 	struct prio_queue queue = { .compare = compare_commits_by_gen_then_commit_date };
     +-	struct prio_queue queue = { .compare = compare_commits_by_gen_then_commit_date };
     ++	struct nonstale_queue queue = {
     ++		{ .compare = compare_commits_by_gen_then_commit_date }
     ++	};
       	size_t width = DIV_ROUND_UP(commits_nr, BITS_IN_EWORD);
     -+	int nonstale_count = 0;
       
       	if (!commits_nr || !counts_nr)
     - 		return;
      @@ commit-reach.c: void ahead_behind(struct repository *r,
     - 		struct bitmap *bitmap = get_bit_array(c, width);
     - 
     - 		bitmap_set(bitmap, i);
     --		insert_no_dup(&queue, c);
     -+		insert_no_dup(&queue, c, &nonstale_count);
     + 		insert_no_dup(&queue, c);
       	}
       
      -	while (queue_has_nonstale(&queue)) {
     -+	while (nonstale_count > 0) {
     - 		struct commit *c = prio_queue_get(&queue);
     +-		struct commit *c = prio_queue_get(&queue);
     ++	while (queue.max_nonstale) {
     ++		struct commit *c = nonstale_queue_get(&queue);
       		struct commit_list *p;
       		struct bitmap *bitmap_c = get_bit_array(c, width);
       
     -+		if (!(c->object.flags & STALE))
     -+			nonstale_count--;
     -+
     - 		for (size_t i = 0; i < counts_nr; i++) {
     - 			int reach_from_tip = !!bitmap_get(bitmap_c, counts[i].tip_index);
     - 			int reach_from_base = !!bitmap_get(bitmap_c, counts[i].base_index);
      @@ commit-reach.c: void ahead_behind(struct repository *r,
     - 			 * queue is STALE.
     - 			 */
     - 			if (bitmap_popcount(bitmap_p) == commits_nr)
     --				p->item->object.flags |= STALE;
     -+				mark_stale(p->item, PARENT2, &nonstale_count);
       
     --			insert_no_dup(&queue, p->item);
     -+			insert_no_dup(&queue, p->item, &nonstale_count);
     - 		}
     + 	/* STALE is used here, PARENT2 is used by insert_no_dup(). */
     + 	repo_clear_commit_marks(r, PARENT2 | STALE);
     +-	for (size_t i = 0; i < queue.nr; i++)
     +-		free_bit_array(queue.array[i].data);
     ++	for (size_t i = 0; i < queue.pq.nr; i++)
     ++		free_bit_array(queue.pq.array[i].data);
     + 	clear_bit_arrays(&bit_arrays);
     +-	clear_prio_queue(&queue);
     ++	clear_nonstale_queue(&queue);
     + }
       
     - 		free_bit_array(c);
     + struct commit_and_index {

-- 
gitgitgadget

  parent reply	other threads:[~2026-05-25 14:28 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-24 17:42 [PATCH 0/3] commit-reach: replace queue_has_nonstale with a counter Kristofer Karlsson via GitGitGadget
2026-05-24 17:42 ` [PATCH 1/3] commit-reach: deduplicate queue entries in paint_down_to_common Kristofer Karlsson via GitGitGadget
2026-05-24 23:40   ` Junio C Hamano
2026-05-25  1:43     ` Derrick Stolee
2026-05-25  6:50       ` Kristofer Karlsson
2026-05-25  7:17         ` Junio C Hamano
2026-05-25  7:53           ` Kristofer Karlsson
2026-05-25 10:02             ` Jeff King
2026-05-25  7:01   ` Jeff King
2026-05-25  7:15     ` Jeff King
2026-05-24 17:42 ` [PATCH 2/3] commit-reach: optimize queue scan " Kristofer Karlsson via GitGitGadget
2026-05-25  1:59   ` Derrick Stolee
2026-05-25  8:54     ` Kristofer Karlsson
2026-05-24 17:42 ` [PATCH 3/3] commit-reach: optimize queue scan in ahead_behind Kristofer Karlsson via GitGitGadget
2026-05-25  7:11   ` Jeff King
2026-05-25  6:47 ` [PATCH 0/3] commit-reach: replace queue_has_nonstale with a counter Jeff King
2026-05-25  7:59   ` Kristofer Karlsson
2026-05-25  8:38     ` Junio C Hamano
2026-05-25  9:55     ` Jeff King
2026-05-25 10:47       ` Kristofer Karlsson
2026-05-25 14:28 ` Kristofer Karlsson via GitGitGadget [this message]
2026-05-25 14:28   ` [PATCH v2 1/3] object.h: fix stale entries in object flag allocation table Kristofer Karlsson via GitGitGadget
2026-05-25 14:28   ` [PATCH v2 2/3] commit-reach: deduplicate queue entries in paint_down_to_common Kristofer Karlsson via GitGitGadget
2026-05-25 14:28   ` [PATCH v2 3/3] commit-reach: replace queue_has_nonstale() scan with O(1) tracking Kristofer Karlsson via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.2124.v2.git.1779719286.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=krka@spotify.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox