[PATCH] revision: use priority queue in limit_list() - Kristofer Karlsson via GitGitGadget

Git development
 help / color / mirror / Atom feed

From: "Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Kristofer Karlsson <krka@spotify.com>,
	Kristofer Karlsson <krka@spotify.com>
Subject: [PATCH] revision: use priority queue in limit_list()
Date: Thu, 14 May 2026 16:51:31 +0000	[thread overview]
Message-ID: <pull.2114.git.1778777491939.gitgitgadget@gmail.com> (raw)

From: Kristofer Karlsson <krka@spotify.com>

limit_list() maintains a date-sorted work queue of commits using a
linked list with commit_list_insert_by_date() for insertion.  Each
insertion walks the list to find the right position — O(n) per insert.
In repositories with merge-heavy histories, the symmetric difference
can contain thousands of commits, making this O(n) insertion the
dominant cost.

Replace the sorted linked list with a prio_queue (binary heap).  This
gives O(log n) insertion and O(log n) extraction instead of O(n)
insertion and O(1) extraction, which is a net win when the queue is
large.

The still_interesting() and everybody_uninteresting() helpers are
updated to scan the prio_queue's contiguous array instead of walking a
linked list.  process_parents() already accepts both a commit_list and
a prio_queue parameter, so the change in limit_list() simply switches
which one is passed.

Benchmark: git rev-list --left-right --count HEAD~N...HEAD
Repository: 2.3M commits, merge-heavy DAG (monorepo)
Best of 5 runs, times in seconds:

  commits in
  symmetric diff   baseline   patched    speedup
  --------------   --------   -------    -------
            10       0.01      0.01       1.0x
            50       0.01      0.01       1.0x
          3751      21.23      8.49       2.5x
          4524      21.70      8.29       2.6x
         10130      20.10      6.65       3.0x

No change for small traversals; 2.5-3.0x faster when the queue grows
to thousands of commits.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
    revision: use priority queue in limit_list()
    
    This patch speeds up limit_list() by 2.5–3x on large, merge-heavy
    repositories by replacing a sorted linked list with a priority queue.
    
    The sorted linked list used as a work queue in limit_list() has O(n)
    insertion cost per commit, where n is the current queue length (the
    "width" of the active walk frontier). In merge-heavy DAGs this frontier
    grows wide — profiling on a 2.3M-commit monorepo showed 59% of total CPU
    time in commit_list_insert_by_date(). Total cost is O(N·w) where N is
    commits walked and w is peak queue width; in merge-heavy histories w
    scales with N, approaching O(N²).
    
    Switching to a prio_queue (binary heap) reduces insertion cost to O(log
    w), bringing total cost to O(N·log w). The practical result on the same
    repository:
    
    commits in
    symmetric diff   before     after      speedup
    --------------   --------   -------    -------
            3751      21.2s      8.5s       2.5x
            4524      21.7s      8.3s       2.6x
           10130      20.1s      6.6s       3.0x
    
    
    This affects any command that triggers limit_list() — i.e., when
    revs->limited is set — including --left-right, --cherry-mark,
    --cherry-pick, --ancestry-path, bisect, and rebase's fork-point
    computation. The practical trigger is git status --ahead-behind on a
    branch that has diverged from upstream in a merge-heavy repository.
    
    The change is minimal (+21/−17 lines, single file) because
    process_parents() already accepts both a commit_list and a prio_queue
    parameter — limit_list() just switches which one it passes.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2114%2Fspkrka%2Flimit-list-prio-queue-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2114/spkrka/limit-list-prio-queue-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/2114

 revision.c | 38 +++++++++++++++++++++-----------------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/revision.c b/revision.c
index 599b3a66c3..2b1b3bb10e 100644
--- a/revision.c
+++ b/revision.c
@@ -473,10 +473,10 @@ static struct commit *handle_commit(struct rev_info *revs,
 	die("%s is unknown object", name);
 }
 
-static int everybody_uninteresting(struct commit_list *orig,
+static int everybody_uninteresting(struct prio_queue *orig,
 				   struct commit **interesting_cache)
 {
-	struct commit_list *list = orig;
+	size_t i;
 
 	if (*interesting_cache) {
 		struct commit *commit = *interesting_cache;
@@ -484,9 +484,8 @@ static int everybody_uninteresting(struct commit_list *orig,
 			return 0;
 	}
 
-	while (list) {
-		struct commit *commit = list->item;
-		list = list->next;
+	for (i = 0; i < orig->nr; i++) {
+		struct commit *commit = orig->array[i].data;
 		if (commit->object.flags & UNINTERESTING)
 			continue;
 
@@ -1300,20 +1299,17 @@ static void cherry_pick_list(struct commit_list *list, struct rev_info *revs)
 /* How many extra uninteresting commits we want to see.. */
 #define SLOP 5
 
-static int still_interesting(struct commit_list *src, timestamp_t date, int slop,
+static int still_interesting(struct prio_queue *src, timestamp_t date, int slop,
 			     struct commit **interesting_cache)
 {
 	/*
-	 * No source list at all? We're definitely done..
+	 * Since src is sorted by date, it is enough to peek at the
+	 * first entry to compare dates.  No entry at all means done.
 	 */
-	if (!src)
+	struct commit *commit = prio_queue_peek(src);
+	if (!commit)
 		return 0;
-
-	/*
-	 * Does the destination list contain entries with a date
-	 * before the source list? Definitely _not_ done.
-	 */
-	if (date <= src->item->date)
+	if (date <= commit->date)
 		return SLOP;
 
 	/*
@@ -1451,6 +1447,7 @@ static int limit_list(struct rev_info *revs)
 	struct commit_list *newlist = NULL;
 	struct commit_list **p = &newlist;
 	struct commit *interesting_cache = NULL;
+	struct prio_queue queue = { .compare = compare_commits_by_commit_date };
 
 	if (revs->ancestry_path_implicit_bottoms) {
 		collect_bottom_commits(original_list,
@@ -1461,6 +1458,11 @@ static int limit_list(struct rev_info *revs)
 
 	while (original_list) {
 		struct commit *commit = pop_commit(&original_list);
+		prio_queue_put(&queue, commit);
+	}
+
+	while (queue.nr) {
+		struct commit *commit = prio_queue_get(&queue);
 		struct object *obj = &commit->object;
 
 		if (commit == interesting_cache)
@@ -1468,11 +1470,13 @@ static int limit_list(struct rev_info *revs)
 
 		if (revs->max_age != -1 && (commit->date < revs->max_age))
 			obj->flags |= UNINTERESTING;
-		if (process_parents(revs, commit, &original_list, NULL) < 0)
+		if (process_parents(revs, commit, NULL, &queue) < 0) {
+			clear_prio_queue(&queue);
 			return -1;
+		}
 		if (obj->flags & UNINTERESTING) {
 			mark_parents_uninteresting(revs, commit);
-			slop = still_interesting(original_list, date, slop, &interesting_cache);
+			slop = still_interesting(&queue, date, slop, &interesting_cache);
 			if (slop)
 				continue;
 			break;
@@ -1509,7 +1513,7 @@ static int limit_list(struct rev_info *revs)
 		}
 	}
 
-	commit_list_free(original_list);
+	clear_prio_queue(&queue);
 	revs->commits = newlist;
 	return 0;
 }

base-commit: 59ff4886a579f4bc91e976fe18590b9ae02c7a08
-- 
gitgitgadget

next             reply	other threads:[~2026-05-14 16:51 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-14 16:51 Kristofer Karlsson via GitGitGadget [this message]
2026-05-14 19:40 ` [PATCH] revision: use priority queue in limit_list() Junio C Hamano
2026-05-14 19:57 ` Derrick Stolee
2026-05-15  4:16 ` Jeff King
2026-05-15  7:47   ` Kristofer Karlsson
2026-05-15 13:10     ` Derrick Stolee

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:599b3a66c dfblob:2b1b3bb10 )
 OR (
bs:"[PATCH] revision: use priority queue in limit_list()" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.2114.git.1778777491939.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=krka@spotify.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox