From: Derrick Stolee <stolee@gmail.com>
To: Kristofer Karlsson via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org
Cc: Kristofer Karlsson <krka@spotify.com>
Subject: Re: [PATCH 2/3] commit-reach: optimize queue scan in paint_down_to_common
Date: Sun, 24 May 2026 21:59:53 -0400 [thread overview]
Message-ID: <42aef000-7952-482d-8532-2287cf32b275@gmail.com> (raw)
In-Reply-To: <4742f5e634b55820f3b5a626ec97e24617fdae3d.1779644541.git.gitgitgadget@gmail.com>
On 5/24/26 1:42 PM, Kristofer Karlsson via GitGitGadget wrote:
> From: Kristofer Karlsson <krka@spotify.com>
>
> paint_down_to_common() terminates when every commit remaining in its
> priority queue is STALE. This was checked by queue_has_nonstale(),
> which performed an O(n) linear scan of the entire queue on every
> iteration, resulting in O(n*m) total overhead where n is the queue
> size and m is the number of commits processed.
>
> Replace this with an O(1) nonstale_count that tracks the number of
> non-stale commits currently in the queue. The counter is incremented
> by maybe_enqueue() and decremented on dequeue and by mark_stale()
> when a commit transitions to STALE while still in the queue. Since
> each commit appears at most once (guaranteed by the ENQUEUED flag
> from the previous commit), the counter is exact.
This idea has a lot of merit, but I'm a bit concerned about the
organization of data. My ideas of how to improve things may also
impact patch 1's use of ENQUEUED.
> -static void maybe_enqueue(struct prio_queue *queue, struct commit *c)
> +static void maybe_enqueue(struct prio_queue *queue, struct commit *c,
> + int *nonstale_count)
> {
> if (c->object.flags & ENQUEUED)
> return;
> c->object.flags |= ENQUEUED;
> prio_queue_put(queue, c);
> + if (!(c->object.flags & STALE))
> + (*nonstale_count)++;
> +}
> +
> +static void mark_stale(struct commit *c, unsigned queued_flag,
> + int *nonstale_count)
> +{
> + if (!(c->object.flags & STALE)) {
> + if (c->object.flags & queued_flag)
> + (*nonstale_count)--;
> + c->object.flags |= STALE;
> + }
> }
These two methods have some concerns on my end:
1. We need to store the nonstale count somewhere other than the
priority queue, even though it's necessarily representing a
subset of the commits within the queue.
2. mark_stale() needs a queued_flag. (I need to check to see if
this is indeed changing in multiple callers or should always
be ENQUEUED).
> static int queue_has_nonstale(struct prio_queue *queue)
> @@ -68,6 +81,7 @@ static int paint_down_to_common(struct repository *r,
> {
> struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
> int i;
> + int nonstale_count = 0;
My preference would be to create a new struct that contains a
prio_queue as a member _and_ a nonstale_count. It could initialize
with compare_commits_by_gen_then_commit_date by default.
The important thing is that consumers of such a "stale-tracking"
queue would not be setting the STALE or ENQUEUED bits themselves,
but instead the queue would be responsible for that.
This could allow us to simplify callers by always assuming we can
"add" an element to the queue and the queue will use its ENQUEUED
bit to prevent duplicates from reaching its internal prio_queue.
Such a data structure could be private to commit-reach.c for now,
since all the methods that would use it seem to be colocated there.
This is a big ask, but I'm interested to see if such an approach
would simplify things here.
Here's a potential breakdown of how to build such a thing in
"small" patches:
1. Create the data structure and update paint_down_to_common and
ahead_behind to use that structure, but still use the existing
prio_queue methods on its internal member.
2. Add the ENQUEUED bit and methods on the new struct that add
that bit as it adds commits to the inner prio_queue. It would
also ignore commits that already have that bit. (Should it
also remove the bit as commits are removed from the queue?)
3. Now add the nonstale_count (or stale count?) to the struct and
have it control the STALE bit modifications, with increasing
the stale count when ENQUEUED is live, and decreasing the stale
count as such a STALE object is dequeued.
I like the idea of this being encapsulated within the struct and
its helper methods. But the proof will be in the implementation.
Thanks,
-Stolee
next prev parent reply other threads:[~2026-05-25 1:59 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-24 17:42 [PATCH 0/3] commit-reach: replace queue_has_nonstale with a counter Kristofer Karlsson via GitGitGadget
2026-05-24 17:42 ` [PATCH 1/3] commit-reach: deduplicate queue entries in paint_down_to_common Kristofer Karlsson via GitGitGadget
2026-05-24 23:40 ` Junio C Hamano
2026-05-25 1:43 ` Derrick Stolee
2026-05-25 6:50 ` Kristofer Karlsson
2026-05-25 7:17 ` Junio C Hamano
2026-05-25 7:53 ` Kristofer Karlsson
2026-05-25 10:02 ` Jeff King
2026-05-25 7:01 ` Jeff King
2026-05-25 7:15 ` Jeff King
2026-05-24 17:42 ` [PATCH 2/3] commit-reach: optimize queue scan " Kristofer Karlsson via GitGitGadget
2026-05-25 1:59 ` Derrick Stolee [this message]
2026-05-25 8:54 ` Kristofer Karlsson
2026-05-24 17:42 ` [PATCH 3/3] commit-reach: optimize queue scan in ahead_behind Kristofer Karlsson via GitGitGadget
2026-05-25 7:11 ` Jeff King
2026-05-25 6:47 ` [PATCH 0/3] commit-reach: replace queue_has_nonstale with a counter Jeff King
2026-05-25 7:59 ` Kristofer Karlsson
2026-05-25 8:38 ` Junio C Hamano
2026-05-25 9:55 ` Jeff King
2026-05-25 10:47 ` Kristofer Karlsson
2026-05-25 14:28 ` [PATCH v2 0/3] commit-reach: replace queue_has_nonstale() scan with O(1) tracking Kristofer Karlsson via GitGitGadget
2026-05-25 14:28 ` [PATCH v2 1/3] object.h: fix stale entries in object flag allocation table Kristofer Karlsson via GitGitGadget
2026-05-25 14:28 ` [PATCH v2 2/3] commit-reach: deduplicate queue entries in paint_down_to_common Kristofer Karlsson via GitGitGadget
2026-05-25 14:28 ` [PATCH v2 3/3] commit-reach: replace queue_has_nonstale() scan with O(1) tracking Kristofer Karlsson via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=42aef000-7952-482d-8532-2287cf32b275@gmail.com \
--to=stolee@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=krka@spotify.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox