[PATCH] prio-queue: use cascade-down sift for faster extract-min

Git development
 help / color / mirror / Atom feed

From: "Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Kristofer Karlsson <krka@spotify.com>,
	Kristofer Karlsson <krka@spotify.com>
Subject: [PATCH] prio-queue: use cascade-down sift for faster extract-min
Date: Sun, 31 May 2026 17:57:15 +0000	[thread overview]
Message-ID: <pull.2132.git.1780250236304.gitgitgadget@gmail.com> (raw)

From: Kristofer Karlsson <krka@spotify.com>

Replace the standard sift-down in prio_queue_get() with a
cascade-down approach.

The standard approach places the last array element at the root,
then sifts it down.  At each level this requires two comparisons
(left vs right child, then element vs winner) and, when the
element is larger, a swap (three 16-byte copies).

The cascade approach instead promotes the smaller child into the
vacant root slot at each level — one comparison and one copy.
The vacancy sinks to a leaf, where the last array element is
placed and sifted up if needed — typically zero levels since the
last array element tends to be large.

In the common case, work per extract drops from 2d comparisons
+ 3d copies to d comparisons + d copies: roughly half the
comparisons and a third of the data movement.  The sift-up phase
can add work when the last element is smaller than ancestors of
the leaf vacancy, but this is rare in practice.

Simplify prio_queue_replace() to a plain get+put sequence.  This
is semantically equivalent: the old implementation wrote to slot 0
and sifted down, which has the same observable effect as removing
the root and inserting a new element.  No caller observes queue
state between the two operations.  The previous implementation
shared sift_down_root() with get, but the cascade approach no
longer accommodates that cleanly since sift_down_root() now
expects the element to reinsert at queue->array[queue->nr], left
there by prio_queue_get() after decrementing nr.  This is fine in
practice: replace is only called from pop_most_recent_commit()
(fetch-pack, object-name, walker) and show-branch — none of
which appear in any hot path.

A synthetic benchmark (10 rounds of 10M put+get cycles, ascending
integer keys, CPU-pinned, median of 3 runs, same compiler and
Makefile flags) shows consistent improvement across all queue
sizes, with no regressions:

    queue width       baseline    cascade    speedup
    ------------------------------------------------
             10        4.32s      3.97s      1.09x
            100        7.95s      6.49s      1.23x
          1,000       11.30s      9.66s      1.17x
         10,000       16.34s     14.15s      1.16x
        100,000       21.43s     18.66s      1.15x

With descending keys (worst case — the last element always sinks
to a leaf in both approaches) the cascade still wins slightly
(1-4%) by replacing swaps with copies, and never regresses.

In end-to-end git commands the improvement is modest because
sift_down_root is only ~8% of total runtime.  Profiling
rev-list --count on a 2.5M-commit monorepo shows sift_down_root
dropping from 8.2% to 0.4% of total runtime.  The improvement
scales with DAG width: wider DAGs produce larger priority queues,
amplifying the per-level savings.  In small or narrow repos the
queues stay shallow and the effect is negligible.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
    prio-queue: use cascade-down sift for faster extract-min
    
    Hi, I am not sure this is just noise or not but I thought it at least
    was interesting.
    
    I looked into the internals of prio_queue and found it was technically
    doing too much work and could be simplified/optimized. I found I could
    optimize it by ~20% for the common case (adding commits that would
    typically end up far back in the queue) but only ~1% for the reverse
    case (adding things to the front of the prio queue). The average speedup
    is somewhere in between I suppose. That said, this is not really the
    bottleneck so the overall boost seems to be around ~3-4% improvement for
    repos with wide DAGs.
    
    I would normally classify this as not urgent or important, but I think
    the advantage is that the change is very small and simple and it already
    has good unit tests (t/unit-tests/u-prio-queue.c).
    
    With that said, here are the details:
    
    The prio_queue_get impl is based on removing the root entry, then moving
    the very last element into the root slot, then sifting it down into the
    right place. This uses both comparisons between sibling elements in the
    heap as well as comparisons between the element to add and one of the
    siblings. Then it uses swap operations to move things correctly.
    
    This patch instead promotes the smaller child upward at each level,
    leaving a vacancy that sinks to a leaf, then places the removed element
    there with a short sift-up to keep the heap balanced.
    
    We can analytically compare this - for a sift-distance of d we can
    reason about the number of operations to execute.
    
    Before: 2d comparisons + 3d copies
    After:   d comparisons +  d copies
    
    
    After changing sift_down in this way, the replace operation can't simply
    depend on it anymore, so I reimplemented it as a sequence of get + put.
    This is technically correct but maybe not as efficient. However, I am
    not sure that it matters, since I couldn't see any usage of the replace
    operation in any hot path.
    
    Performance: Profiling git rev-list --count on a 2.5M-commit monorepo
    shows sift_down_root dropping from 8.2% to 0.4% of total runtime,
    effectively eliminated as significant overhead.
    
    Synthetic benchmark 10 rounds of 10M put+get cycles, CPU-pinned, median
    of 3 runs, same compiler and Makefile flags.
    
    Ascending keys (git's typical pattern -- parents have lower priority
    than children):
    
    queue width  baseline  patched  speedup
             10     4.32s    3.97s    1.09x
            100     7.95s    6.49s    1.23x
          1,000    11.30s    9.66s    1.17x
         10,000    16.34s   14.15s    1.16x
        100,000    21.43s   18.66s    1.15x
    
    
    Descending keys (worst case — last element always sinks to leaf in both
    approaches):
    
    queue width  baseline  patched  speedup
             10     4.84s    4.78s    1.01x
            100     9.43s    9.20s    1.03x
          1,000    15.28s   14.71s    1.04x
         10,000    23.61s   23.49s    1.01x
        100,000    29.16s   28.22s    1.03x
    
    
    No regressions in any scenario.
    
    End-to-end benchmarks
    
    All benchmarks use a benchmark setup of 1 warmup run followed by 10
    timed runs. Each configuration is built from the same source tree and
    tested on the same repo in alternating order.
    
    linux kernel (1.4M commits) — range v5.0..v6.0 (311K commits):
    
    Command                      baseline  patched  speedup
    rev-list --count v5.0..v6.0     455ms    440ms    1.04x
    
    
    I also ran it on git.git but did not see any performance diff at all,
    due to the size and narrow DAG.
    
    The improvement scales with DAG width: wider DAGs produce larger
    priority queues, amplifying the per-level savings. In small or narrow
    repositories the priority queues stay shallow and the sift-down cost is
    already negligible, so the change is not noticeable.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2132%2Fspkrka%2Fcascade-sift-down-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2132/spkrka/cascade-sift-down-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/2132

 prio-queue.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/prio-queue.c b/prio-queue.c
index 9748528ce6..18005c43c4 100644
--- a/prio-queue.c
+++ b/prio-queue.c
@@ -62,17 +62,21 @@ static void sift_down_root(struct prio_queue *queue)
 {
 	size_t ix, child;
 
-	/* Push down the one at the root */
-	for (ix = 0; ix * 2 + 1 < queue->nr; ix = child) {
-		child = ix * 2 + 1; /* left */
+	for (ix = 0; (child = ix * 2 + 1) < queue->nr; ix = child) {
 		if (child + 1 < queue->nr &&
 		    compare(queue, child, child + 1) >= 0)
 			child++; /* use right child */
+		queue->array[ix] = queue->array[child];
+	}
 
-		if (compare(queue, ix, child) <= 0)
+	/* Place queue->array[queue->nr] (left by caller) and sift up. */
+	queue->array[ix] = queue->array[queue->nr];
+	while (ix) {
+		size_t parent = (ix - 1) / 2;
+		if (compare(queue, parent, ix) <= 0)
 			break;
-
-		swap(queue, child, ix);
+		swap(queue, parent, ix);
+		ix = parent;
 	}
 }
 
@@ -89,7 +93,6 @@ void *prio_queue_get(struct prio_queue *queue)
 	if (!--queue->nr)
 		return result;
 
-	queue->array[0] = queue->array[queue->nr];
 	sift_down_root(queue);
 	return result;
 }
@@ -111,8 +114,7 @@ void prio_queue_replace(struct prio_queue *queue, void *thing)
 		queue->array[queue->nr - 1].ctr = queue->insertion_ctr++;
 		queue->array[queue->nr - 1].data = thing;
 	} else {
-		queue->array[0].ctr = queue->insertion_ctr++;
-		queue->array[0].data = thing;
-		sift_down_root(queue);
+		prio_queue_get(queue);
+		prio_queue_put(queue, thing);
 	}
 }

base-commit: c69baaf57ba26cf117c2b6793802877f19738b0d
-- 
gitgitgadget

next             reply	other threads:[~2026-05-31 17:57 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-31 17:57 Kristofer Karlsson via GitGitGadget [this message]
2026-06-01  0:09 ` [PATCH] prio-queue: use cascade-down sift for faster extract-min Junio C Hamano
2026-06-01  6:16 ` Junio C Hamano
2026-06-01  6:21   ` Kristofer Karlsson
2026-06-01  8:17 ` [PATCH v2] prio-queue: use cascade-down " Kristofer Karlsson via GitGitGadget
2026-06-02 16:36   ` René Scharfe
2026-06-02 22:40     ` Kristofer Karlsson

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:9748528ce dfblob:18005c43c )
 OR (
bs:"[PATCH] prio-queue: use cascade-down sift for faster extract-min" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.2132.git.1780250236304.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=krka@spotify.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox