From: "Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Kristofer Karlsson <krka@spotify.com>,
Kristofer Karlsson <krka@spotify.com>
Subject: [PATCH] revision: use priority queue in limit_list()
Date: Thu, 14 May 2026 16:51:31 +0000 [thread overview]
Message-ID: <pull.2114.git.1778777491939.gitgitgadget@gmail.com> (raw)
From: Kristofer Karlsson <krka@spotify.com>
limit_list() maintains a date-sorted work queue of commits using a
linked list with commit_list_insert_by_date() for insertion. Each
insertion walks the list to find the right position — O(n) per insert.
In repositories with merge-heavy histories, the symmetric difference
can contain thousands of commits, making this O(n) insertion the
dominant cost.
Replace the sorted linked list with a prio_queue (binary heap). This
gives O(log n) insertion and O(log n) extraction instead of O(n)
insertion and O(1) extraction, which is a net win when the queue is
large.
The still_interesting() and everybody_uninteresting() helpers are
updated to scan the prio_queue's contiguous array instead of walking a
linked list. process_parents() already accepts both a commit_list and
a prio_queue parameter, so the change in limit_list() simply switches
which one is passed.
Benchmark: git rev-list --left-right --count HEAD~N...HEAD
Repository: 2.3M commits, merge-heavy DAG (monorepo)
Best of 5 runs, times in seconds:
commits in
symmetric diff baseline patched speedup
-------------- -------- ------- -------
10 0.01 0.01 1.0x
50 0.01 0.01 1.0x
3751 21.23 8.49 2.5x
4524 21.70 8.29 2.6x
10130 20.10 6.65 3.0x
No change for small traversals; 2.5-3.0x faster when the queue grows
to thousands of commits.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
revision: use priority queue in limit_list()
This patch speeds up limit_list() by 2.5–3x on large, merge-heavy
repositories by replacing a sorted linked list with a priority queue.
The sorted linked list used as a work queue in limit_list() has O(n)
insertion cost per commit, where n is the current queue length (the
"width" of the active walk frontier). In merge-heavy DAGs this frontier
grows wide — profiling on a 2.3M-commit monorepo showed 59% of total CPU
time in commit_list_insert_by_date(). Total cost is O(N·w) where N is
commits walked and w is peak queue width; in merge-heavy histories w
scales with N, approaching O(N²).
Switching to a prio_queue (binary heap) reduces insertion cost to O(log
w), bringing total cost to O(N·log w). The practical result on the same
repository:
commits in
symmetric diff before after speedup
-------------- -------- ------- -------
3751 21.2s 8.5s 2.5x
4524 21.7s 8.3s 2.6x
10130 20.1s 6.6s 3.0x
This affects any command that triggers limit_list() — i.e., when
revs->limited is set — including --left-right, --cherry-mark,
--cherry-pick, --ancestry-path, bisect, and rebase's fork-point
computation. The practical trigger is git status --ahead-behind on a
branch that has diverged from upstream in a merge-heavy repository.
The change is minimal (+21/−17 lines, single file) because
process_parents() already accepts both a commit_list and a prio_queue
parameter — limit_list() just switches which one it passes.
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2114%2Fspkrka%2Flimit-list-prio-queue-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2114/spkrka/limit-list-prio-queue-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/2114
revision.c | 38 +++++++++++++++++++++-----------------
1 file changed, 21 insertions(+), 17 deletions(-)
diff --git a/revision.c b/revision.c
index 599b3a66c3..2b1b3bb10e 100644
--- a/revision.c
+++ b/revision.c
@@ -473,10 +473,10 @@ static struct commit *handle_commit(struct rev_info *revs,
die("%s is unknown object", name);
}
-static int everybody_uninteresting(struct commit_list *orig,
+static int everybody_uninteresting(struct prio_queue *orig,
struct commit **interesting_cache)
{
- struct commit_list *list = orig;
+ size_t i;
if (*interesting_cache) {
struct commit *commit = *interesting_cache;
@@ -484,9 +484,8 @@ static int everybody_uninteresting(struct commit_list *orig,
return 0;
}
- while (list) {
- struct commit *commit = list->item;
- list = list->next;
+ for (i = 0; i < orig->nr; i++) {
+ struct commit *commit = orig->array[i].data;
if (commit->object.flags & UNINTERESTING)
continue;
@@ -1300,20 +1299,17 @@ static void cherry_pick_list(struct commit_list *list, struct rev_info *revs)
/* How many extra uninteresting commits we want to see.. */
#define SLOP 5
-static int still_interesting(struct commit_list *src, timestamp_t date, int slop,
+static int still_interesting(struct prio_queue *src, timestamp_t date, int slop,
struct commit **interesting_cache)
{
/*
- * No source list at all? We're definitely done..
+ * Since src is sorted by date, it is enough to peek at the
+ * first entry to compare dates. No entry at all means done.
*/
- if (!src)
+ struct commit *commit = prio_queue_peek(src);
+ if (!commit)
return 0;
-
- /*
- * Does the destination list contain entries with a date
- * before the source list? Definitely _not_ done.
- */
- if (date <= src->item->date)
+ if (date <= commit->date)
return SLOP;
/*
@@ -1451,6 +1447,7 @@ static int limit_list(struct rev_info *revs)
struct commit_list *newlist = NULL;
struct commit_list **p = &newlist;
struct commit *interesting_cache = NULL;
+ struct prio_queue queue = { .compare = compare_commits_by_commit_date };
if (revs->ancestry_path_implicit_bottoms) {
collect_bottom_commits(original_list,
@@ -1461,6 +1458,11 @@ static int limit_list(struct rev_info *revs)
while (original_list) {
struct commit *commit = pop_commit(&original_list);
+ prio_queue_put(&queue, commit);
+ }
+
+ while (queue.nr) {
+ struct commit *commit = prio_queue_get(&queue);
struct object *obj = &commit->object;
if (commit == interesting_cache)
@@ -1468,11 +1470,13 @@ static int limit_list(struct rev_info *revs)
if (revs->max_age != -1 && (commit->date < revs->max_age))
obj->flags |= UNINTERESTING;
- if (process_parents(revs, commit, &original_list, NULL) < 0)
+ if (process_parents(revs, commit, NULL, &queue) < 0) {
+ clear_prio_queue(&queue);
return -1;
+ }
if (obj->flags & UNINTERESTING) {
mark_parents_uninteresting(revs, commit);
- slop = still_interesting(original_list, date, slop, &interesting_cache);
+ slop = still_interesting(&queue, date, slop, &interesting_cache);
if (slop)
continue;
break;
@@ -1509,7 +1513,7 @@ static int limit_list(struct rev_info *revs)
}
}
- commit_list_free(original_list);
+ clear_prio_queue(&queue);
revs->commits = newlist;
return 0;
}
base-commit: 59ff4886a579f4bc91e976fe18590b9ae02c7a08
--
gitgitgadget
next reply other threads:[~2026-05-14 16:51 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-14 16:51 Kristofer Karlsson via GitGitGadget [this message]
2026-05-14 19:40 ` [PATCH] revision: use priority queue in limit_list() Junio C Hamano
2026-05-14 19:57 ` Derrick Stolee
2026-05-15 4:16 ` Jeff King
2026-05-15 7:47 ` Kristofer Karlsson
2026-05-15 13:10 ` Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.2114.git.1778777491939.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=git@vger.kernel.org \
--cc=krka@spotify.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.