From: "Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Kristofer Karlsson <krka@spotify.com>,
Kristofer Karlsson <krka@spotify.com>
Subject: [PATCH] revision: use priority queue in limit_list()
Date: Thu, 14 May 2026 16:51:31 +0000 [thread overview]
Message-ID: <pull.2114.git.1778777491939.gitgitgadget@gmail.com> (raw)
From: Kristofer Karlsson <krka@spotify.com>
limit_list() maintains a date-sorted work queue of commits using a
linked list with commit_list_insert_by_date() for insertion. Each
insertion walks the list to find the right position — O(n) per insert.
In repositories with merge-heavy histories, the symmetric difference
can contain thousands of commits, making this O(n) insertion the
dominant cost.
Replace the sorted linked list with a prio_queue (binary heap). This
gives O(log n) insertion and O(log n) extraction instead of O(n)
insertion and O(1) extraction, which is a net win when the queue is
large.
The still_interesting() and everybody_uninteresting() helpers are
updated to scan the prio_queue's contiguous array instead of walking a
linked list. process_parents() already accepts both a commit_list and
a prio_queue parameter, so the change in limit_list() simply switches
which one is passed.
Benchmark: git rev-list --left-right --count HEAD~N...HEAD
Repository: 2.3M commits, merge-heavy DAG (monorepo)
Best of 5 runs, times in seconds:
commits in
symmetric diff baseline patched speedup
-------------- -------- ------- -------
10 0.01 0.01 1.0x
50 0.01 0.01 1.0x
3751 21.23 8.49 2.5x
4524 21.70 8.29 2.6x
10130 20.10 6.65 3.0x
No change for small traversals; 2.5-3.0x faster when the queue grows
to thousands of commits.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
revision: use priority queue in limit_list()
This patch speeds up limit_list() by 2.5–3x on large, merge-heavy
repositories by replacing a sorted linked list with a priority queue.
The sorted linked list used as a work queue in limit_list() has O(n)
insertion cost per commit, where n is the current queue length (the
"width" of the active walk frontier). In merge-heavy DAGs this frontier
grows wide — profiling on a 2.3M-commit monorepo showed 59% of total CPU
time in commit_list_insert_by_date(). Total cost is O(N·w) where N is
commits walked and w is peak queue width; in merge-heavy histories w
scales with N, approaching O(N²).
Switching to a prio_queue (binary heap) reduces insertion cost to O(log
w), bringing total cost to O(N·log w). The practical result on the same
repository:
commits in
symmetric diff before after speedup
-------------- -------- ------- -------
3751 21.2s 8.5s 2.5x
4524 21.7s 8.3s 2.6x
10130 20.1s 6.6s 3.0x
This affects any command that triggers limit_list() — i.e., when
revs->limited is set — including --left-right, --cherry-mark,
--cherry-pick, --ancestry-path, bisect, and rebase's fork-point
computation. The practical trigger is git status --ahead-behind on a
branch that has diverged from upstream in a merge-heavy repository.
The change is minimal (+21/−17 lines, single file) because
process_parents() already accepts both a commit_list and a prio_queue
parameter — limit_list() just switches which one it passes.
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2114%2Fspkrka%2Flimit-list-prio-queue-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2114/spkrka/limit-list-prio-queue-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/2114
revision.c | 38 +++++++++++++++++++++-----------------
1 file changed, 21 insertions(+), 17 deletions(-)
diff --git a/revision.c b/revision.c
index 599b3a66c3..2b1b3bb10e 100644
--- a/revision.c
+++ b/revision.c
@@ -473,10 +473,10 @@ static struct commit *handle_commit(struct rev_info *revs,
die("%s is unknown object", name);
}
-static int everybody_uninteresting(struct commit_list *orig,
+static int everybody_uninteresting(struct prio_queue *orig,
struct commit **interesting_cache)
{
- struct commit_list *list = orig;
+ size_t i;
if (*interesting_cache) {
struct commit *commit = *interesting_cache;
@@ -484,9 +484,8 @@ static int everybody_uninteresting(struct commit_list *orig,
return 0;
}
- while (list) {
- struct commit *commit = list->item;
- list = list->next;
+ for (i = 0; i < orig->nr; i++) {
+ struct commit *commit = orig->array[i].data;
if (commit->object.flags & UNINTERESTING)
continue;
@@ -1300,20 +1299,17 @@ static void cherry_pick_list(struct commit_list *list, struct rev_info *revs)
/* How many extra uninteresting commits we want to see.. */
#define SLOP 5
-static int still_interesting(struct commit_list *src, timestamp_t date, int slop,
+static int still_interesting(struct prio_queue *src, timestamp_t date, int slop,
struct commit **interesting_cache)
{
/*
- * No source list at all? We're definitely done..
+ * Since src is sorted by date, it is enough to peek at the
+ * first entry to compare dates. No entry at all means done.
*/
- if (!src)
+ struct commit *commit = prio_queue_peek(src);
+ if (!commit)
return 0;
-
- /*
- * Does the destination list contain entries with a date
- * before the source list? Definitely _not_ done.
- */
- if (date <= src->item->date)
+ if (date <= commit->date)
return SLOP;
/*
@@ -1451,6 +1447,7 @@ static int limit_list(struct rev_info *revs)
struct commit_list *newlist = NULL;
struct commit_list **p = &newlist;
struct commit *interesting_cache = NULL;
+ struct prio_queue queue = { .compare = compare_commits_by_commit_date };
if (revs->ancestry_path_implicit_bottoms) {
collect_bottom_commits(original_list,
@@ -1461,6 +1458,11 @@ static int limit_list(struct rev_info *revs)
while (original_list) {
struct commit *commit = pop_commit(&original_list);
+ prio_queue_put(&queue, commit);
+ }
+
+ while (queue.nr) {
+ struct commit *commit = prio_queue_get(&queue);
struct object *obj = &commit->object;
if (commit == interesting_cache)
@@ -1468,11 +1470,13 @@ static int limit_list(struct rev_info *revs)
if (revs->max_age != -1 && (commit->date < revs->max_age))
obj->flags |= UNINTERESTING;
- if (process_parents(revs, commit, &original_list, NULL) < 0)
+ if (process_parents(revs, commit, NULL, &queue) < 0) {
+ clear_prio_queue(&queue);
return -1;
+ }
if (obj->flags & UNINTERESTING) {
mark_parents_uninteresting(revs, commit);
- slop = still_interesting(original_list, date, slop, &interesting_cache);
+ slop = still_interesting(&queue, date, slop, &interesting_cache);
if (slop)
continue;
break;
@@ -1509,7 +1513,7 @@ static int limit_list(struct rev_info *revs)
}
}
- commit_list_free(original_list);
+ clear_prio_queue(&queue);
revs->commits = newlist;
return 0;
}
base-commit: 59ff4886a579f4bc91e976fe18590b9ae02c7a08
--
gitgitgadget
next reply other threads:[~2026-05-14 16:51 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-14 16:51 Kristofer Karlsson via GitGitGadget [this message]
2026-05-14 19:40 ` [PATCH] revision: use priority queue in limit_list() Junio C Hamano
2026-05-14 19:57 ` Derrick Stolee
2026-05-15 4:16 ` Jeff King
2026-05-15 7:47 ` Kristofer Karlsson
2026-05-15 13:10 ` Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.2114.git.1778777491939.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=git@vger.kernel.org \
--cc=krka@spotify.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox