[PATCH] prio-queue: use cascade-down sift for faster extract-min

Git development
 help / color / mirror / Atom feed

* [PATCH] prio-queue: use cascade-down sift for faster extract-min
@ 2026-05-31 17:57 Kristofer Karlsson via GitGitGadget
  2026-06-01  0:09 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Kristofer Karlsson via GitGitGadget @ 2026-05-31 17:57 UTC (permalink / raw)
  To: git; +Cc: Kristofer Karlsson, Kristofer Karlsson

From: Kristofer Karlsson <krka@spotify.com>

Replace the standard sift-down in prio_queue_get() with a
cascade-down approach.

The standard approach places the last array element at the root,
then sifts it down.  At each level this requires two comparisons
(left vs right child, then element vs winner) and, when the
element is larger, a swap (three 16-byte copies).

The cascade approach instead promotes the smaller child into the
vacant root slot at each level — one comparison and one copy.
The vacancy sinks to a leaf, where the last array element is
placed and sifted up if needed — typically zero levels since the
last array element tends to be large.

In the common case, work per extract drops from 2d comparisons
+ 3d copies to d comparisons + d copies: roughly half the
comparisons and a third of the data movement.  The sift-up phase
can add work when the last element is smaller than ancestors of
the leaf vacancy, but this is rare in practice.

Simplify prio_queue_replace() to a plain get+put sequence.  This
is semantically equivalent: the old implementation wrote to slot 0
and sifted down, which has the same observable effect as removing
the root and inserting a new element.  No caller observes queue
state between the two operations.  The previous implementation
shared sift_down_root() with get, but the cascade approach no
longer accommodates that cleanly since sift_down_root() now
expects the element to reinsert at queue->array[queue->nr], left
there by prio_queue_get() after decrementing nr.  This is fine in
practice: replace is only called from pop_most_recent_commit()
(fetch-pack, object-name, walker) and show-branch — none of
which appear in any hot path.

A synthetic benchmark (10 rounds of 10M put+get cycles, ascending
integer keys, CPU-pinned, median of 3 runs, same compiler and
Makefile flags) shows consistent improvement across all queue
sizes, with no regressions:

    queue width       baseline    cascade    speedup
    ------------------------------------------------
             10        4.32s      3.97s      1.09x
            100        7.95s      6.49s      1.23x
          1,000       11.30s      9.66s      1.17x
         10,000       16.34s     14.15s      1.16x
        100,000       21.43s     18.66s      1.15x

With descending keys (worst case — the last element always sinks
to a leaf in both approaches) the cascade still wins slightly
(1-4%) by replacing swaps with copies, and never regresses.

In end-to-end git commands the improvement is modest because
sift_down_root is only ~8% of total runtime.  Profiling
rev-list --count on a 2.5M-commit monorepo shows sift_down_root
dropping from 8.2% to 0.4% of total runtime.  The improvement
scales with DAG width: wider DAGs produce larger priority queues,
amplifying the per-level savings.  In small or narrow repos the
queues stay shallow and the effect is negligible.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
    prio-queue: use cascade-down sift for faster extract-min
    
    Hi, I am not sure this is just noise or not but I thought it at least
    was interesting.
    
    I looked into the internals of prio_queue and found it was technically
    doing too much work and could be simplified/optimized. I found I could
    optimize it by ~20% for the common case (adding commits that would
    typically end up far back in the queue) but only ~1% for the reverse
    case (adding things to the front of the prio queue). The average speedup
    is somewhere in between I suppose. That said, this is not really the
    bottleneck so the overall boost seems to be around ~3-4% improvement for
    repos with wide DAGs.
    
    I would normally classify this as not urgent or important, but I think
    the advantage is that the change is very small and simple and it already
    has good unit tests (t/unit-tests/u-prio-queue.c).
    
    With that said, here are the details:
    
    The prio_queue_get impl is based on removing the root entry, then moving
    the very last element into the root slot, then sifting it down into the
    right place. This uses both comparisons between sibling elements in the
    heap as well as comparisons between the element to add and one of the
    siblings. Then it uses swap operations to move things correctly.
    
    This patch instead promotes the smaller child upward at each level,
    leaving a vacancy that sinks to a leaf, then places the removed element
    there with a short sift-up to keep the heap balanced.
    
    We can analytically compare this - for a sift-distance of d we can
    reason about the number of operations to execute.
    
    Before: 2d comparisons + 3d copies
    After:   d comparisons +  d copies
    
    
    After changing sift_down in this way, the replace operation can't simply
    depend on it anymore, so I reimplemented it as a sequence of get + put.
    This is technically correct but maybe not as efficient. However, I am
    not sure that it matters, since I couldn't see any usage of the replace
    operation in any hot path.
    
    Performance: Profiling git rev-list --count on a 2.5M-commit monorepo
    shows sift_down_root dropping from 8.2% to 0.4% of total runtime,
    effectively eliminated as significant overhead.
    
    Synthetic benchmark 10 rounds of 10M put+get cycles, CPU-pinned, median
    of 3 runs, same compiler and Makefile flags.
    
    Ascending keys (git's typical pattern -- parents have lower priority
    than children):
    
    queue width  baseline  patched  speedup
             10     4.32s    3.97s    1.09x
            100     7.95s    6.49s    1.23x
          1,000    11.30s    9.66s    1.17x
         10,000    16.34s   14.15s    1.16x
        100,000    21.43s   18.66s    1.15x
    
    
    Descending keys (worst case — last element always sinks to leaf in both
    approaches):
    
    queue width  baseline  patched  speedup
             10     4.84s    4.78s    1.01x
            100     9.43s    9.20s    1.03x
          1,000    15.28s   14.71s    1.04x
         10,000    23.61s   23.49s    1.01x
        100,000    29.16s   28.22s    1.03x
    
    
    No regressions in any scenario.
    
    End-to-end benchmarks
    
    All benchmarks use a benchmark setup of 1 warmup run followed by 10
    timed runs. Each configuration is built from the same source tree and
    tested on the same repo in alternating order.
    
    linux kernel (1.4M commits) — range v5.0..v6.0 (311K commits):
    
    Command                      baseline  patched  speedup
    rev-list --count v5.0..v6.0     455ms    440ms    1.04x
    
    
    I also ran it on git.git but did not see any performance diff at all,
    due to the size and narrow DAG.
    
    The improvement scales with DAG width: wider DAGs produce larger
    priority queues, amplifying the per-level savings. In small or narrow
    repositories the priority queues stay shallow and the sift-down cost is
    already negligible, so the change is not noticeable.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2132%2Fspkrka%2Fcascade-sift-down-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2132/spkrka/cascade-sift-down-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/2132

 prio-queue.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/prio-queue.c b/prio-queue.c
index 9748528ce6..18005c43c4 100644
--- a/prio-queue.c
+++ b/prio-queue.c
@@ -62,17 +62,21 @@ static void sift_down_root(struct prio_queue *queue)
 {
 	size_t ix, child;
 
-	/* Push down the one at the root */
-	for (ix = 0; ix * 2 + 1 < queue->nr; ix = child) {
-		child = ix * 2 + 1; /* left */
+	for (ix = 0; (child = ix * 2 + 1) < queue->nr; ix = child) {
 		if (child + 1 < queue->nr &&
 		    compare(queue, child, child + 1) >= 0)
 			child++; /* use right child */
+		queue->array[ix] = queue->array[child];
+	}
 
-		if (compare(queue, ix, child) <= 0)
+	/* Place queue->array[queue->nr] (left by caller) and sift up. */
+	queue->array[ix] = queue->array[queue->nr];
+	while (ix) {
+		size_t parent = (ix - 1) / 2;
+		if (compare(queue, parent, ix) <= 0)
 			break;
-
-		swap(queue, child, ix);
+		swap(queue, parent, ix);
+		ix = parent;
 	}
 }
 
@@ -89,7 +93,6 @@ void *prio_queue_get(struct prio_queue *queue)
 	if (!--queue->nr)
 		return result;
 
-	queue->array[0] = queue->array[queue->nr];
 	sift_down_root(queue);
 	return result;
 }
@@ -111,8 +114,7 @@ void prio_queue_replace(struct prio_queue *queue, void *thing)
 		queue->array[queue->nr - 1].ctr = queue->insertion_ctr++;
 		queue->array[queue->nr - 1].data = thing;
 	} else {
-		queue->array[0].ctr = queue->insertion_ctr++;
-		queue->array[0].data = thing;
-		sift_down_root(queue);
+		prio_queue_get(queue);
+		prio_queue_put(queue, thing);
 	}
 }

base-commit: c69baaf57ba26cf117c2b6793802877f19738b0d
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] prio-queue: use cascade-down sift for faster extract-min
  2026-05-31 17:57 [PATCH] prio-queue: use cascade-down sift for faster extract-min Kristofer Karlsson via GitGitGadget
@ 2026-06-01  0:09 ` Junio C Hamano
  2026-06-01  6:16 ` Junio C Hamano
  2026-06-01  8:17 ` [PATCH v2] prio-queue: use cascade-down " Kristofer Karlsson via GitGitGadget
  2 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2026-06-01  0:09 UTC (permalink / raw)
  To: René Scharfe, Kristofer Karlsson via GitGitGadget
  Cc: git, Kristofer Karlsson

"Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
writes:

I'll add René the recipients, as _replace() was added by him as
optimization, so "this new one is functionally equivalent to the
original" somewhat misses the point, even though we may all agree
that the change is a very good one overall in the end when we look
at the entire picture.

> From: Kristofer Karlsson <krka@spotify.com>
>
> Replace the standard sift-down in prio_queue_get() with a
> cascade-down approach.
>
> The standard approach places the last array element at the root,
> then sifts it down.  At each level this requires two comparisons
> (left vs right child, then element vs winner) and, when the
> element is larger, a swap (three 16-byte copies).
>
> The cascade approach instead promotes the smaller child into the
> vacant root slot at each level — one comparison and one copy.
> The vacancy sinks to a leaf, where the last array element is
> placed and sifted up if needed — typically zero levels since the
> last array element tends to be large.
>
> In the common case, work per extract drops from 2d comparisons
> + 3d copies to d comparisons + d copies: roughly half the
> comparisons and a third of the data movement.  The sift-up phase
> can add work when the last element is smaller than ancestors of
> the leaf vacancy, but this is rare in practice.
>
> Simplify prio_queue_replace() to a plain get+put sequence.  This
> is semantically equivalent: the old implementation wrote to slot 0
> and sifted down, which has the same observable effect as removing
> the root and inserting a new element.  No caller observes queue
> state between the two operations.  The previous implementation
> shared sift_down_root() with get, but the cascade approach no
> longer accommodates that cleanly since sift_down_root() now
> expects the element to reinsert at queue->array[queue->nr], left
> there by prio_queue_get() after decrementing nr.  This is fine in
> practice: replace is only called from pop_most_recent_commit()
> (fetch-pack, object-name, walker) and show-branch — none of
> which appear in any hot path.
>
> A synthetic benchmark (10 rounds of 10M put+get cycles, ascending
> integer keys, CPU-pinned, median of 3 runs, same compiler and
> Makefile flags) shows consistent improvement across all queue
> sizes, with no regressions:
>
>     queue width       baseline    cascade    speedup
>     ------------------------------------------------
>              10        4.32s      3.97s      1.09x
>             100        7.95s      6.49s      1.23x
>           1,000       11.30s      9.66s      1.17x
>          10,000       16.34s     14.15s      1.16x
>         100,000       21.43s     18.66s      1.15x
>
> With descending keys (worst case — the last element always sinks
> to a leaf in both approaches) the cascade still wins slightly
> (1-4%) by replacing swaps with copies, and never regresses.
>
> In end-to-end git commands the improvement is modest because
> sift_down_root is only ~8% of total runtime.  Profiling
> rev-list --count on a 2.5M-commit monorepo shows sift_down_root
> dropping from 8.2% to 0.4% of total runtime.  The improvement
> scales with DAG width: wider DAGs produce larger priority queues,
> amplifying the per-level savings.  In small or narrow repos the
> queues stay shallow and the effect is negligible.
>
> Signed-off-by: Kristofer Karlsson <krka@spotify.com>
> ---
>     prio-queue: use cascade-down sift for faster extract-min
>     
>     Hi, I am not sure this is just noise or not but I thought it at least
>     was interesting.
>     
>     I looked into the internals of prio_queue and found it was technically
>     doing too much work and could be simplified/optimized. I found I could
>     optimize it by ~20% for the common case (adding commits that would
>     typically end up far back in the queue) but only ~1% for the reverse
>     case (adding things to the front of the prio queue). The average speedup
>     is somewhere in between I suppose. That said, this is not really the
>     bottleneck so the overall boost seems to be around ~3-4% improvement for
>     repos with wide DAGs.
>     
>     I would normally classify this as not urgent or important, but I think
>     the advantage is that the change is very small and simple and it already
>     has good unit tests (t/unit-tests/u-prio-queue.c).
>     
>     With that said, here are the details:
>     
>     The prio_queue_get impl is based on removing the root entry, then moving
>     the very last element into the root slot, then sifting it down into the
>     right place. This uses both comparisons between sibling elements in the
>     heap as well as comparisons between the element to add and one of the
>     siblings. Then it uses swap operations to move things correctly.
>     
>     This patch instead promotes the smaller child upward at each level,
>     leaving a vacancy that sinks to a leaf, then places the removed element
>     there with a short sift-up to keep the heap balanced.
>     
>     We can analytically compare this - for a sift-distance of d we can
>     reason about the number of operations to execute.
>     
>     Before: 2d comparisons + 3d copies
>     After:   d comparisons +  d copies
>     
>     
>     After changing sift_down in this way, the replace operation can't simply
>     depend on it anymore, so I reimplemented it as a sequence of get + put.
>     This is technically correct but maybe not as efficient. However, I am
>     not sure that it matters, since I couldn't see any usage of the replace
>     operation in any hot path.
>     
>     Performance: Profiling git rev-list --count on a 2.5M-commit monorepo
>     shows sift_down_root dropping from 8.2% to 0.4% of total runtime,
>     effectively eliminated as significant overhead.
>     
>     Synthetic benchmark 10 rounds of 10M put+get cycles, CPU-pinned, median
>     of 3 runs, same compiler and Makefile flags.
>     
>     Ascending keys (git's typical pattern -- parents have lower priority
>     than children):
>     
>     queue width  baseline  patched  speedup
>              10     4.32s    3.97s    1.09x
>             100     7.95s    6.49s    1.23x
>           1,000    11.30s    9.66s    1.17x
>          10,000    16.34s   14.15s    1.16x
>         100,000    21.43s   18.66s    1.15x
>     
>     
>     Descending keys (worst case — last element always sinks to leaf in both
>     approaches):
>     
>     queue width  baseline  patched  speedup
>              10     4.84s    4.78s    1.01x
>             100     9.43s    9.20s    1.03x
>           1,000    15.28s   14.71s    1.04x
>          10,000    23.61s   23.49s    1.01x
>         100,000    29.16s   28.22s    1.03x
>     
>     
>     No regressions in any scenario.
>     
>     End-to-end benchmarks
>     
>     All benchmarks use a benchmark setup of 1 warmup run followed by 10
>     timed runs. Each configuration is built from the same source tree and
>     tested on the same repo in alternating order.
>     
>     linux kernel (1.4M commits) — range v5.0..v6.0 (311K commits):
>     
>     Command                      baseline  patched  speedup
>     rev-list --count v5.0..v6.0     455ms    440ms    1.04x
>     
>     
>     I also ran it on git.git but did not see any performance diff at all,
>     due to the size and narrow DAG.
>     
>     The improvement scales with DAG width: wider DAGs produce larger
>     priority queues, amplifying the per-level savings. In small or narrow
>     repositories the priority queues stay shallow and the sift-down cost is
>     already negligible, so the change is not noticeable.
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2132%2Fspkrka%2Fcascade-sift-down-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2132/spkrka/cascade-sift-down-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/2132
>
>  prio-queue.c | 22 ++++++++++++----------
>  1 file changed, 12 insertions(+), 10 deletions(-)
>
> diff --git a/prio-queue.c b/prio-queue.c
> index 9748528ce6..18005c43c4 100644
> --- a/prio-queue.c
> +++ b/prio-queue.c
> @@ -62,17 +62,21 @@ static void sift_down_root(struct prio_queue *queue)
>  {
>  	size_t ix, child;
>  
> -	/* Push down the one at the root */
> -	for (ix = 0; ix * 2 + 1 < queue->nr; ix = child) {
> -		child = ix * 2 + 1; /* left */
> +	for (ix = 0; (child = ix * 2 + 1) < queue->nr; ix = child) {
>  		if (child + 1 < queue->nr &&
>  		    compare(queue, child, child + 1) >= 0)
>  			child++; /* use right child */
> +		queue->array[ix] = queue->array[child];
> +	}
>  
> -		if (compare(queue, ix, child) <= 0)
> +	/* Place queue->array[queue->nr] (left by caller) and sift up. */
> +	queue->array[ix] = queue->array[queue->nr];
> +	while (ix) {
> +		size_t parent = (ix - 1) / 2;
> +		if (compare(queue, parent, ix) <= 0)
>  			break;
> -
> -		swap(queue, child, ix);
> +		swap(queue, parent, ix);
> +		ix = parent;
>  	}
>  }
>  
> @@ -89,7 +93,6 @@ void *prio_queue_get(struct prio_queue *queue)
>  	if (!--queue->nr)
>  		return result;
>  
> -	queue->array[0] = queue->array[queue->nr];
>  	sift_down_root(queue);
>  	return result;
>  }
> @@ -111,8 +114,7 @@ void prio_queue_replace(struct prio_queue *queue, void *thing)
>  		queue->array[queue->nr - 1].ctr = queue->insertion_ctr++;
>  		queue->array[queue->nr - 1].data = thing;
>  	} else {
> -		queue->array[0].ctr = queue->insertion_ctr++;
> -		queue->array[0].data = thing;
> -		sift_down_root(queue);
> +		prio_queue_get(queue);
> +		prio_queue_put(queue, thing);
>  	}
>  }
>
> base-commit: c69baaf57ba26cf117c2b6793802877f19738b0d

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] prio-queue: use cascade-down sift for faster extract-min
  2026-05-31 17:57 [PATCH] prio-queue: use cascade-down sift for faster extract-min Kristofer Karlsson via GitGitGadget
  2026-06-01  0:09 ` Junio C Hamano
@ 2026-06-01  6:16 ` Junio C Hamano
  2026-06-01  6:21   ` Kristofer Karlsson
  2026-06-01  8:17 ` [PATCH v2] prio-queue: use cascade-down " Kristofer Karlsson via GitGitGadget
  2 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2026-06-01  6:16 UTC (permalink / raw)
  To: Kristofer Karlsson via GitGitGadget; +Cc: git, Kristofer Karlsson

"Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> diff --git a/prio-queue.c b/prio-queue.c
> index 9748528ce6..18005c43c4 100644
> --- a/prio-queue.c
> +++ b/prio-queue.c
> @@ -62,17 +62,21 @@ static void sift_down_root(struct prio_queue *queue)
>  {
>  	size_t ix, child;
>  
> -	/* Push down the one at the root */
> -	for (ix = 0; ix * 2 + 1 < queue->nr; ix = child) {
> -		child = ix * 2 + 1; /* left */
> +	for (ix = 0; (child = ix * 2 + 1) < queue->nr; ix = child) {
>  		if (child + 1 < queue->nr &&
>  		    compare(queue, child, child + 1) >= 0)
>  			child++; /* use right child */
> +		queue->array[ix] = queue->array[child];
> +	}
>  
> -		if (compare(queue, ix, child) <= 0)
> +	/* Place queue->array[queue->nr] (left by caller) and sift up. */
> +	queue->array[ix] = queue->array[queue->nr];

Here we always sift/bubble up the last element.

I am wondering if it makes sense to teach sift_down_root to take an
extra argument, "struct prio_queue_entry entry" (passed by value)
and sift/bubble it up, not always queue->array[queue->nr], and ...

> +	while (ix) {
> +		size_t parent = (ix - 1) / 2;
> +		if (compare(queue, parent, ix) <= 0)
>  			break;
> -
> -		swap(queue, child, ix);
> +		swap(queue, parent, ix);
> +		ix = parent;
>  	}
>  }
>  
> @@ -89,7 +93,6 @@ void *prio_queue_get(struct prio_queue *queue)
>  	if (!--queue->nr)
>  		return result;
>  
> -	queue->array[0] = queue->array[queue->nr];
>  	sift_down_root(queue);
>  	return result;
>  }
> @@ -111,8 +114,7 @@ void prio_queue_replace(struct prio_queue *queue, void *thing)
>  		queue->array[queue->nr - 1].ctr = queue->insertion_ctr++;
>  		queue->array[queue->nr - 1].data = thing;
>  	} else {
> -		queue->array[0].ctr = queue->insertion_ctr++;
> -		queue->array[0].data = thing;
> -		sift_down_root(queue);
> +		prio_queue_get(queue);
> +		prio_queue_put(queue, thing);

... update this part in the else clause to do something like

		struct prio_queue_entry entry;
		entry.ctr = queue->insertion_ctr++;
		entry.data = thing;
		sift_down_root(queue, entry);

to retain the optimization?  It would perform a single cascade-down
sift, followed by a single sift-up, so it would save a comparison, a
copy, and a swap in the worset case compared to the get+put sequence?

Of course, the original sift_down_root() caller (i.e. prio_queue_get())
needs to pass queue->array[queue->nr] as the second parameter to match.

>  	}
>  }
>
> base-commit: c69baaf57ba26cf117c2b6793802877f19738b0d

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] prio-queue: use cascade-down sift for faster extract-min
  2026-06-01  6:16 ` Junio C Hamano
@ 2026-06-01  6:21   ` Kristofer Karlsson
  0 siblings, 0 replies; 7+ messages in thread
From: Kristofer Karlsson @ 2026-06-01  6:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kristofer Karlsson via GitGitGadget, git

Thanks for the quick and very valid feedback! I already started
investigating - I think I was too quick (and wrong) when I reasoned
about the replace operation.I will rework it a bit and come back with
a patch version 2 soon that ensures that neither get and replace have
regressed in any way.

- Kristofer

On Mon, 1 Jun 2026 at 08:16, Junio C Hamano <gitster@pobox.com> wrote:
>
> "Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > diff --git a/prio-queue.c b/prio-queue.c
> > index 9748528ce6..18005c43c4 100644
> > --- a/prio-queue.c
> > +++ b/prio-queue.c
> > @@ -62,17 +62,21 @@ static void sift_down_root(struct prio_queue *queue)
> >  {
> >       size_t ix, child;
> >
> > -     /* Push down the one at the root */
> > -     for (ix = 0; ix * 2 + 1 < queue->nr; ix = child) {
> > -             child = ix * 2 + 1; /* left */
> > +     for (ix = 0; (child = ix * 2 + 1) < queue->nr; ix = child) {
> >               if (child + 1 < queue->nr &&
> >                   compare(queue, child, child + 1) >= 0)
> >                       child++; /* use right child */
> > +             queue->array[ix] = queue->array[child];
> > +     }
> >
> > -             if (compare(queue, ix, child) <= 0)
> > +     /* Place queue->array[queue->nr] (left by caller) and sift up. */
> > +     queue->array[ix] = queue->array[queue->nr];
>
> Here we always sift/bubble up the last element.
>
> I am wondering if it makes sense to teach sift_down_root to take an
> extra argument, "struct prio_queue_entry entry" (passed by value)
> and sift/bubble it up, not always queue->array[queue->nr], and ...
>
> > +     while (ix) {
> > +             size_t parent = (ix - 1) / 2;
> > +             if (compare(queue, parent, ix) <= 0)
> >                       break;
> > -
> > -             swap(queue, child, ix);
> > +             swap(queue, parent, ix);
> > +             ix = parent;
> >       }
> >  }
> >
> > @@ -89,7 +93,6 @@ void *prio_queue_get(struct prio_queue *queue)
> >       if (!--queue->nr)
> >               return result;
> >
> > -     queue->array[0] = queue->array[queue->nr];
> >       sift_down_root(queue);
> >       return result;
> >  }
> > @@ -111,8 +114,7 @@ void prio_queue_replace(struct prio_queue *queue, void *thing)
> >               queue->array[queue->nr - 1].ctr = queue->insertion_ctr++;
> >               queue->array[queue->nr - 1].data = thing;
> >       } else {
> > -             queue->array[0].ctr = queue->insertion_ctr++;
> > -             queue->array[0].data = thing;
> > -             sift_down_root(queue);
> > +             prio_queue_get(queue);
> > +             prio_queue_put(queue, thing);
>
> ... update this part in the else clause to do something like
>
>                 struct prio_queue_entry entry;
>                 entry.ctr = queue->insertion_ctr++;
>                 entry.data = thing;
>                 sift_down_root(queue, entry);
>
> to retain the optimization?  It would perform a single cascade-down
> sift, followed by a single sift-up, so it would save a comparison, a
> copy, and a swap in the worset case compared to the get+put sequence?
>
> Of course, the original sift_down_root() caller (i.e. prio_queue_get())
> needs to pass queue->array[queue->nr] as the second parameter to match.
>
> >       }
> >  }
> >
> > base-commit: c69baaf57ba26cf117c2b6793802877f19738b0d

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2] prio-queue: use cascade-down for faster extract-min
  2026-05-31 17:57 [PATCH] prio-queue: use cascade-down sift for faster extract-min Kristofer Karlsson via GitGitGadget
  2026-06-01  0:09 ` Junio C Hamano
  2026-06-01  6:16 ` Junio C Hamano
@ 2026-06-01  8:17 ` Kristofer Karlsson via GitGitGadget
  2026-06-02 16:36   ` René Scharfe
  2 siblings, 1 reply; 7+ messages in thread
From: Kristofer Karlsson via GitGitGadget @ 2026-06-01  8:17 UTC (permalink / raw)
  To: git; +Cc: Kristofer Karlsson, Kristofer Karlsson

From: Kristofer Karlsson <krka@spotify.com>

Add sift_up_rebalance(), an alternative to sift_down_root() that
halves the number of comparisons per extract-min.

The standard extract places the last array element at the root and
sifts it down.  At each level this requires two comparisons (left
vs right child, then element vs winner) and a swap.

sift_up_rebalance() instead promotes the smaller child into the
root slot at each level — one comparison and one copy — until the
vacancy reaches a leaf.  The last array element is placed at the
vacancy and sifted up to restore heap order.  In practice the
sift-up rarely moves more than a level or two because the last
array element tends to be large.

Work per extract drops from 2d comparisons + d swaps to
d comparisons + d copies + a short sift-up.

prio_queue_get() now calls sift_up_rebalance() instead of placing
the last element at root and calling sift_down_root().

sift_down_root() and prio_queue_replace() are left unchanged.

Synthetic benchmark (10 rounds of 10M put+get cycles, CPU-pinned,
same compiler and Makefile flags):

Ascending keys (git's typical pattern — parents have lower
priority than children):

  queue width  baseline  patched  speedup
           10     4.39s    3.91s    1.12x
          100     9.10s    6.61s    1.38x
        1,000    11.84s    9.25s    1.28x
       10,000    17.50s   13.92s    1.26x
      100,000    23.97s   20.19s    1.19x

Descending keys (worst case — last element always sinks to leaf):

  queue width  baseline  patched  speedup
           10     4.94s    4.95s    1.00x
          100     9.75s    9.42s    1.03x
        1,000    15.01s   15.29s    0.98x
       10,000    24.79s   23.88s    1.04x
      100,000    29.69s   28.24s    1.05x

Random keys:

  queue width  baseline  patched  speedup
           10     5.05s    4.99s    1.01x
          100     9.90s    9.50s    1.04x
        1,000    15.35s   14.77s    1.04x
       10,000    25.35s   24.21s    1.05x
      100,000    65.71s   63.38s    1.04x

No regressions in any scenario.

End-to-end benchmark on the linux kernel repo (1.4M commits,
range v5.0..v6.0, 311K commits, 20 interleaved runs, 1 warmup):

  Command                      baseline  patched  speedup
  rev-list --count v5.0..v6.0    484ms     474ms    1.02x

The improvement scales with DAG width: wider DAGs produce larger
priority queues, amplifying the per-level savings.  In small or
narrow repositories the queues stay shallow and the sift-down
cost is already negligible.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
    prio-queue: use cascade-down sift for faster extract-min
    
    This is a small optimization to prio_queue_get() that reduces the number
    of comparisons per extract-min from 2d to d (where d is the sift
    distance).
    
    The standard extract places the last array element at the root and sifts
    it down, comparing against both children at each level. The new
    sift_up_rebalance() instead promotes the smaller child at each level
    (one comparison and one copy) leaving a vacancy that sinks to a leaf.
    The last element is placed there and sifted up, which in practice rarely
    moves more than a level or two.
    
    The improvement shows clearly in synthetic benchmarks (up to 1.38x for
    ascending keys at queue width 100) but is modest end-to-end since
    sift_down_root is only a fraction of total runtime. On the linux kernel
    repo, rev-list --count v5.0..v6.0 improves by ~2%. The effect scales
    with DAG width.
    
    Changes since v1:
    
     * Kept sift_down_root() and prio_queue_replace() completely unchanged,
       preserving René's optimization that avoids the get+put overhead for
       replace. The cascade approach now only applies to prio_queue_get().
    
     * Extracted the new logic into a separate sift_up_rebalance() function
       rather than inlining it in prio_queue_get().
    
     * Updated benchmark numbers for ascending, descending and random
       insertion ordering. No regressions in any scenario.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2132%2Fspkrka%2Fcascade-sift-down-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2132/spkrka/cascade-sift-down-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/2132

Range-diff vs v1:

 1:  9ca2fab4dc ! 1:  6051d44e59 prio-queue: use cascade-down sift for faster extract-min
     @@ Metadata
      Author: Kristofer Karlsson <krka@spotify.com>
      
       ## Commit message ##
     -    prio-queue: use cascade-down sift for faster extract-min
     -
     -    Replace the standard sift-down in prio_queue_get() with a
     -    cascade-down approach.
     -
     -    The standard approach places the last array element at the root,
     -    then sifts it down.  At each level this requires two comparisons
     -    (left vs right child, then element vs winner) and, when the
     -    element is larger, a swap (three 16-byte copies).
     -
     -    The cascade approach instead promotes the smaller child into the
     -    vacant root slot at each level — one comparison and one copy.
     -    The vacancy sinks to a leaf, where the last array element is
     -    placed and sifted up if needed — typically zero levels since the
     -    last array element tends to be large.
     -
     -    In the common case, work per extract drops from 2d comparisons
     -    + 3d copies to d comparisons + d copies: roughly half the
     -    comparisons and a third of the data movement.  The sift-up phase
     -    can add work when the last element is smaller than ancestors of
     -    the leaf vacancy, but this is rare in practice.
     -
     -    Simplify prio_queue_replace() to a plain get+put sequence.  This
     -    is semantically equivalent: the old implementation wrote to slot 0
     -    and sifted down, which has the same observable effect as removing
     -    the root and inserting a new element.  No caller observes queue
     -    state between the two operations.  The previous implementation
     -    shared sift_down_root() with get, but the cascade approach no
     -    longer accommodates that cleanly since sift_down_root() now
     -    expects the element to reinsert at queue->array[queue->nr], left
     -    there by prio_queue_get() after decrementing nr.  This is fine in
     -    practice: replace is only called from pop_most_recent_commit()
     -    (fetch-pack, object-name, walker) and show-branch — none of
     -    which appear in any hot path.
     -
     -    A synthetic benchmark (10 rounds of 10M put+get cycles, ascending
     -    integer keys, CPU-pinned, median of 3 runs, same compiler and
     -    Makefile flags) shows consistent improvement across all queue
     -    sizes, with no regressions:
     -
     -        queue width       baseline    cascade    speedup
     -        ------------------------------------------------
     -                 10        4.32s      3.97s      1.09x
     -                100        7.95s      6.49s      1.23x
     -              1,000       11.30s      9.66s      1.17x
     -             10,000       16.34s     14.15s      1.16x
     -            100,000       21.43s     18.66s      1.15x
     -
     -    With descending keys (worst case — the last element always sinks
     -    to a leaf in both approaches) the cascade still wins slightly
     -    (1-4%) by replacing swaps with copies, and never regresses.
     -
     -    In end-to-end git commands the improvement is modest because
     -    sift_down_root is only ~8% of total runtime.  Profiling
     -    rev-list --count on a 2.5M-commit monorepo shows sift_down_root
     -    dropping from 8.2% to 0.4% of total runtime.  The improvement
     -    scales with DAG width: wider DAGs produce larger priority queues,
     -    amplifying the per-level savings.  In small or narrow repos the
     -    queues stay shallow and the effect is negligible.
     +    prio-queue: use cascade-down for faster extract-min
     +
     +    Add sift_up_rebalance(), an alternative to sift_down_root() that
     +    halves the number of comparisons per extract-min.
     +
     +    The standard extract places the last array element at the root and
     +    sifts it down.  At each level this requires two comparisons (left
     +    vs right child, then element vs winner) and a swap.
     +
     +    sift_up_rebalance() instead promotes the smaller child into the
     +    root slot at each level — one comparison and one copy — until the
     +    vacancy reaches a leaf.  The last array element is placed at the
     +    vacancy and sifted up to restore heap order.  In practice the
     +    sift-up rarely moves more than a level or two because the last
     +    array element tends to be large.
     +
     +    Work per extract drops from 2d comparisons + d swaps to
     +    d comparisons + d copies + a short sift-up.
     +
     +    prio_queue_get() now calls sift_up_rebalance() instead of placing
     +    the last element at root and calling sift_down_root().
     +
     +    sift_down_root() and prio_queue_replace() are left unchanged.
     +
     +    Synthetic benchmark (10 rounds of 10M put+get cycles, CPU-pinned,
     +    same compiler and Makefile flags):
     +
     +    Ascending keys (git's typical pattern — parents have lower
     +    priority than children):
     +
     +      queue width  baseline  patched  speedup
     +               10     4.39s    3.91s    1.12x
     +              100     9.10s    6.61s    1.38x
     +            1,000    11.84s    9.25s    1.28x
     +           10,000    17.50s   13.92s    1.26x
     +          100,000    23.97s   20.19s    1.19x
     +
     +    Descending keys (worst case — last element always sinks to leaf):
     +
     +      queue width  baseline  patched  speedup
     +               10     4.94s    4.95s    1.00x
     +              100     9.75s    9.42s    1.03x
     +            1,000    15.01s   15.29s    0.98x
     +           10,000    24.79s   23.88s    1.04x
     +          100,000    29.69s   28.24s    1.05x
     +
     +    Random keys:
     +
     +      queue width  baseline  patched  speedup
     +               10     5.05s    4.99s    1.01x
     +              100     9.90s    9.50s    1.04x
     +            1,000    15.35s   14.77s    1.04x
     +           10,000    25.35s   24.21s    1.05x
     +          100,000    65.71s   63.38s    1.04x
     +
     +    No regressions in any scenario.
     +
     +    End-to-end benchmark on the linux kernel repo (1.4M commits,
     +    range v5.0..v6.0, 311K commits, 20 interleaved runs, 1 warmup):
     +
     +      Command                      baseline  patched  speedup
     +      rev-list --count v5.0..v6.0    484ms     474ms    1.02x
     +
     +    The improvement scales with DAG width: wider DAGs produce larger
     +    priority queues, amplifying the per-level savings.  In small or
     +    narrow repositories the queues stay shallow and the sift-down
     +    cost is already negligible.
      
          Signed-off-by: Kristofer Karlsson <krka@spotify.com>
      
       ## prio-queue.c ##
      @@ prio-queue.c: static void sift_down_root(struct prio_queue *queue)
     - {
     - 	size_t ix, child;
     + 	}
     + }
       
     --	/* Push down the one at the root */
     --	for (ix = 0; ix * 2 + 1 < queue->nr; ix = child) {
     --		child = ix * 2 + 1; /* left */
     ++static void sift_up_rebalance(struct prio_queue *queue)
     ++{
     ++	size_t ix, child;
     ++
     ++	/* Cascade: promote smaller child at each level. */
      +	for (ix = 0; (child = ix * 2 + 1) < queue->nr; ix = child) {
     - 		if (child + 1 < queue->nr &&
     - 		    compare(queue, child, child + 1) >= 0)
     - 			child++; /* use right child */
     ++		if (child + 1 < queue->nr &&
     ++		    compare(queue, child, child + 1) >= 0)
     ++			child++;
      +		queue->array[ix] = queue->array[child];
      +	}
     - 
     --		if (compare(queue, ix, child) <= 0)
     -+	/* Place queue->array[queue->nr] (left by caller) and sift up. */
     ++
     ++	/* Place the last element at the vacancy and sift up. */
      +	queue->array[ix] = queue->array[queue->nr];
      +	while (ix) {
      +		size_t parent = (ix - 1) / 2;
      +		if (compare(queue, parent, ix) <= 0)
     - 			break;
     --
     --		swap(queue, child, ix);
     ++			break;
      +		swap(queue, parent, ix);
      +		ix = parent;
     - 	}
     - }
     - 
     ++	}
     ++}
     ++
     + void *prio_queue_get(struct prio_queue *queue)
     + {
     + 	void *result;
      @@ prio-queue.c: void *prio_queue_get(struct prio_queue *queue)
       	if (!--queue->nr)
       		return result;
       
      -	queue->array[0] = queue->array[queue->nr];
     - 	sift_down_root(queue);
     +-	sift_down_root(queue);
     ++	sift_up_rebalance(queue);
       	return result;
       }
     -@@ prio-queue.c: void prio_queue_replace(struct prio_queue *queue, void *thing)
     - 		queue->array[queue->nr - 1].ctr = queue->insertion_ctr++;
     - 		queue->array[queue->nr - 1].data = thing;
     - 	} else {
     --		queue->array[0].ctr = queue->insertion_ctr++;
     --		queue->array[0].data = thing;
     --		sift_down_root(queue);
     -+		prio_queue_get(queue);
     -+		prio_queue_put(queue, thing);
     - 	}
     - }
     + 


 prio-queue.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/prio-queue.c b/prio-queue.c
index 9748528ce6..66d445b078 100644
--- a/prio-queue.c
+++ b/prio-queue.c
@@ -76,6 +76,29 @@ static void sift_down_root(struct prio_queue *queue)
 	}
 }
 
+static void sift_up_rebalance(struct prio_queue *queue)
+{
+	size_t ix, child;
+
+	/* Cascade: promote smaller child at each level. */
+	for (ix = 0; (child = ix * 2 + 1) < queue->nr; ix = child) {
+		if (child + 1 < queue->nr &&
+		    compare(queue, child, child + 1) >= 0)
+			child++;
+		queue->array[ix] = queue->array[child];
+	}
+
+	/* Place the last element at the vacancy and sift up. */
+	queue->array[ix] = queue->array[queue->nr];
+	while (ix) {
+		size_t parent = (ix - 1) / 2;
+		if (compare(queue, parent, ix) <= 0)
+			break;
+		swap(queue, parent, ix);
+		ix = parent;
+	}
+}
+
 void *prio_queue_get(struct prio_queue *queue)
 {
 	void *result;
@@ -89,8 +112,7 @@ void *prio_queue_get(struct prio_queue *queue)
 	if (!--queue->nr)
 		return result;
 
-	queue->array[0] = queue->array[queue->nr];
-	sift_down_root(queue);
+	sift_up_rebalance(queue);
 	return result;
 }
 

base-commit: 1666c1265231b0bc5f613fbbf3f0a9896cdef76e
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] prio-queue: use cascade-down for faster extract-min
  2026-06-01  8:17 ` [PATCH v2] prio-queue: use cascade-down " Kristofer Karlsson via GitGitGadget
@ 2026-06-02 16:36   ` René Scharfe
  2026-06-02 22:40     ` Kristofer Karlsson
  0 siblings, 1 reply; 7+ messages in thread
From: René Scharfe @ 2026-06-02 16:36 UTC (permalink / raw)
  To: Kristofer Karlsson via GitGitGadget, git; +Cc: Kristofer Karlsson

On 6/1/26 10:17 AM, Kristofer Karlsson via GitGitGadget wrote:
>     
>     Changes since v1:
>     
>      * Kept sift_down_root() and prio_queue_replace() completely unchanged,
>        preserving René's optimization that avoids the get+put overhead for
>        replace. The cascade approach now only applies to prio_queue_get().

The prospect of no longer needing prio_queue_replace() had me excited in
round 1.  The benchmarks from commits that added its callers [1][2][3]
did show performance regressions with your patch 1 plus changes to
revert prio_queue_peek()+prio_queue_replace() to prio_queue_get()+
prio_queue_put(), but for two of them low enough to be in the noise.
'git describe $(git rev-list v2.41.0..v2.47.0)' took a 50%+ hit, though.

[1] a79e3519d6 (commit: use prio_queue_replace() in pop_most_recent_commit(), 2025-07-18)
[2] 08bb69d70f (describe: use prio_queue_replace(), 2025-08-03)
[3] abf05d856f (show-branch: use prio_queue, 2025-12-26)

>      * Extracted the new logic into a separate sift_up_rebalance() function
>        rather than inlining it in prio_queue_get().
>     
>      * Updated benchmark numbers for ascending, descending and random
>        insertion ordering. No regressions in any scenario.

I don't see any regression for the benchmarks mentioned above with
patch 2 alone, unsurprisingly.  The describe command still takes that
50%+ performance hit after reverting [2] on top.

Would you be interested in benchmarking the following patch for making
prio_queue_replace() unnecessary by doing its optimization
automatically?  I get a 1% performance hit for the describe command
that I can't explain.  And it leaves the heap unbalanced after a
prio_queue_get(), which complicates things, so I found it lacking.
But I wonder how it stacks up against your cascade approach for your
use case and if there's anything to salvage.

René


---
 prio-queue.c | 60 +++++++++++++++++++++++++++++++++++++++++-------------------
 prio-queue.h |  1 +
 2 files changed, 42 insertions(+), 19 deletions(-)

diff --git a/prio-queue.c b/prio-queue.c
index 9748528ce6..ba6b460a46 100644
--- a/prio-queue.c
+++ b/prio-queue.c
@@ -34,12 +34,46 @@ void clear_prio_queue(struct prio_queue *queue)
 	queue->nr = 0;
 	queue->alloc = 0;
 	queue->insertion_ctr = 0;
+	queue->sift_down_root_pending = false;
+}
+
+static void sift_down_root(struct prio_queue *queue)
+{
+	size_t ix, child;
+
+	/* Push down the one at the root */
+	for (ix = 0; ix * 2 + 1 < queue->nr; ix = child) {
+		child = ix * 2 + 1; /* left */
+		if (child + 1 < queue->nr &&
+		    compare(queue, child, child + 1) >= 0)
+			child++; /* use right child */
+
+		if (compare(queue, ix, child) <= 0)
+			break;
+
+		swap(queue, child, ix);
+	}
+	queue->sift_down_root_pending = false;
 }
 
 void prio_queue_put(struct prio_queue *queue, void *thing)
 {
 	size_t ix, parent;
 
+	if (queue->sift_down_root_pending) {
+		/*
+		 * Restore the original heap size.  The last item is
+		 * still in the right place.
+		 */
+		queue->nr++;
+
+		/* Now fill the hole at the root with the new item. */
+		queue->array[0].ctr = queue->insertion_ctr++;
+		queue->array[0].data = thing;
+		sift_down_root(queue);
+		return;
+	}
+
 	/* Append at the end */
 	ALLOC_GROW(queue->array, queue->nr + 1, queue->alloc);
 	queue->array[queue->nr].ctr = queue->insertion_ctr++;
@@ -58,24 +92,6 @@ void prio_queue_put(struct prio_queue *queue, void *thing)
 	}
 }
 
-static void sift_down_root(struct prio_queue *queue)
-{
-	size_t ix, child;
-
-	/* Push down the one at the root */
-	for (ix = 0; ix * 2 + 1 < queue->nr; ix = child) {
-		child = ix * 2 + 1; /* left */
-		if (child + 1 < queue->nr &&
-		    compare(queue, child, child + 1) >= 0)
-			child++; /* use right child */
-
-		if (compare(queue, ix, child) <= 0)
-			break;
-
-		swap(queue, child, ix);
-	}
-}
-
 void *prio_queue_get(struct prio_queue *queue)
 {
 	void *result;
@@ -85,12 +101,14 @@ void *prio_queue_get(struct prio_queue *queue)
 	if (!queue->compare)
 		return queue->array[--queue->nr].data; /* LIFO */
 
+	if (queue->sift_down_root_pending)
+		sift_down_root(queue);
 	result = queue->array[0].data;
 	if (!--queue->nr)
 		return result;
 
 	queue->array[0] = queue->array[queue->nr];
-	sift_down_root(queue);
+	queue->sift_down_root_pending = true;
 	return result;
 }
 
@@ -100,6 +118,8 @@ void *prio_queue_peek(struct prio_queue *queue)
 		return NULL;
 	if (!queue->compare)
 		return queue->array[queue->nr - 1].data;
+	if (queue->sift_down_root_pending)
+		sift_down_root(queue);
 	return queue->array[0].data;
 }
 
@@ -111,6 +131,8 @@ void prio_queue_replace(struct prio_queue *queue, void *thing)
 		queue->array[queue->nr - 1].ctr = queue->insertion_ctr++;
 		queue->array[queue->nr - 1].data = thing;
 	} else {
+		if (queue->sift_down_root_pending)
+			sift_down_root(queue);
 		queue->array[0].ctr = queue->insertion_ctr++;
 		queue->array[0].data = thing;
 		sift_down_root(queue);
diff --git a/prio-queue.h b/prio-queue.h
index da7fad2f1f..5977fba438 100644
--- a/prio-queue.h
+++ b/prio-queue.h
@@ -32,6 +32,7 @@ struct prio_queue {
 	void *cb_data;
 	size_t alloc, nr;
 	struct prio_queue_entry *array;
+	bool sift_down_root_pending;
 };
 
 /*


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] prio-queue: use cascade-down for faster extract-min
  2026-06-02 16:36   ` René Scharfe
@ 2026-06-02 22:40     ` Kristofer Karlsson
  0 siblings, 0 replies; 7+ messages in thread
From: Kristofer Karlsson @ 2026-06-02 22:40 UTC (permalink / raw)
  To: René Scharfe; +Cc: Kristofer Karlsson via GitGitGadget, git

On Tue, 2 Jun 2026 at 18:37, René Scharfe <l.s.r@web.de> wrote:
>
> Would you be interested in benchmarking the following patch for making
> prio_queue_replace() unnecessary by doing its optimization
> automatically?  I get a 1% performance hit for the describe command
> that I can't explain.  And it leaves the heap unbalanced after a
> prio_queue_get(), which complicates things, so I found it lacking.
> But I wonder how it stacks up against your cascade approach for your
> use case and if there's anything to salvage.
>
> René

Thank you for the detailed feedback and the patch! It was very
helpful to have a concrete alternative to compare against.

I spent some time benchmarking the different approaches on a
large monorepo with a wide DAG.

All measurements include the nonstale O(1) tracking from my other
series as a common base, since that dominates the merge-base path.

The approaches I compared:

  1. cascade-only: the sift_up_rebalance from this patch (v2)
  2. rene-lazy: your deferred sift_down_root patch
  3. cascade+lazy: cascade for unfused gets, lazy fusion for
     get+put pairs

Results (10 runs, 1 warmup, CPU pinned to performance):

  merge-base --all master master~1000 (~4s workload):

    cascade-only   4.18s (median)
    rene-lazy      4.25s
    cascade+lazy   4.24s

  rev-list --count master~1000..master (~3.8s workload):

    cascade-only   3.86s
    rene-lazy      3.75s
    cascade+lazy   3.74s

The lazy approaches show a small win on rev-list (~3%) where get+put
pairs are common in limit_list. On merge-base --all, everything is
within noise, the prio_queue is a small fraction of total runtime
there. Combining cascade with lazy fusion didn't produce additional
gains beyond what each gives individually.

Looking at your patch, I think the deferred sift-down logic is
essentially the same optimization as the lazy_queue wrapper you
wrote for describe.c - both defer the work from get and fuse it
with a following put. So I'd be hesitant to add a second form of
that deferral directly into prio_queue when lazy_queue already
"owns" that responsibility as a wrapper.

That said, I think it would make sense to fold lazy_queue entirely
into prio_queue. It's an optimization that never hurts as far as I can
tell, and it would simplify several callers. pop_most_recent_commit
and show-branch both independently re-implement the same
peek+replace pattern that lazy_queue formalizes. Making it automatic
in prio_queue would clean up all of them.

I have a local branch exploring that direction. Maybe it makes more
sense to do the lazy_queue fold first, and then see if the cascade
change is still worth adding on top?

Either way, I think the two directions are complementary - cascade
reduces comparisons per sift, while lazy fusion can eliminate full
rebalance cycles.

I'm on a company offsite now so I may be slow to answer, but I will
definitely resume this when I get back home.

- Kristofer

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-06-02 22:41 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-31 17:57 [PATCH] prio-queue: use cascade-down sift for faster extract-min Kristofer Karlsson via GitGitGadget
2026-06-01  0:09 ` Junio C Hamano
2026-06-01  6:16 ` Junio C Hamano
2026-06-01  6:21   ` Kristofer Karlsson
2026-06-01  8:17 ` [PATCH v2] prio-queue: use cascade-down " Kristofer Karlsson via GitGitGadget
2026-06-02 16:36   ` René Scharfe
2026-06-02 22:40     ` Kristofer Karlsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox