Git development
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Junio C Hamano <gitster@pobox.com>,
	Kristofer Karlsson via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Kristofer Karlsson <krka@spotify.com>
Subject: Re: [PATCH] commit-reach: remove get_reachable_subset()
Date: Wed, 10 Jun 2026 15:29:01 -0400	[thread overview]
Message-ID: <eae94a67-9c66-4b1d-9f0b-45ee3e80ddf8@gmail.com> (raw)
In-Reply-To: <xmqqbjdixupc.fsf@gitster.g>

On 6/10/2026 11:48 AM, Junio C Hamano wrote:
> "Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
> 
>> get_reachable_subset() was introduced in fcb2c0769d (2018-11-02)
>> for add_missing_tags() in remote.c. tips_reachable_from_bases()
>> was added in cbfe360b14 (2023-03-20) as part of the ahead-behind
>> series. The two were never consolidated.
> 
> Good finding.  It is curious to see that these were from the same
> author.

I agree. In my defense, these commits are five years apart. I still
should have looked for similar code that could be reused instead of
rolling new code. (But the new code is better when a commit-graph
exists.)

The other thing that I should have done in the later commit was add
the method to the test-tool, which you do here.

>> ... Without generation numbers, some edge cases
>> may be slower with DFS instead of BFS since the date-ordered
>> prio_queue naturally stays near the top of the graph, but this
>> should not matter in practice
> 
> "should not matter in practice" because...?
> 
>> -- worst case both visit the full
>> graph down from the bases.
> 
> And of course the worst case scenario is by definition not a typical
> case that appear in practice, so it does not make a good explanation
> for "should not matter in practice".

It's important to recognize the use cases that call each method and
to understand if it is appropriate to take these performance changes.

Both methods terminate in the case that all potential targets are
found. And that's the only case that matters, as we will walk all
reachable commits in the case of any one commit not being reachable.

Both methods avoid walking below the "minimum generation" among the
target commits.

The key opportunity here is that tips_reachable_from_bases() will
"increase" the minimum generation when it finds the current-minimum
target commit. That's a big reason why the DFS approach wins: it
has the opportunity to find those lower commits without needing to
walk _every_ commit with higher generation.

The one downside to this approach is that the DFS approach does not
take into account the commit date as a fallback when there is no
commit-graph file with computed generation numbers. When there is
no commit-graph file, then the fallback to commit date to break ties
among "generation number infinity" commits can't be used to help the
BFS search in get_reachable_subset().

And perhaps that is the critical reason for the different algorithms:
in 2018 we didn't have the commit-graph for very long so it wasn't a
reasonable expectation that we'd have one, even for large repositories.

Now? The feature is quite stable and it's easy for users to create
and maintain one. All servers are expected to use it for performance
needs. It's probably reasonable to expect that any repos where this
would matter would have one.

Thanks,
-Stolee


      parent reply	other threads:[~2026-06-10 19:29 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-09 19:28 [PATCH] commit-reach: remove get_reachable_subset() Kristofer Karlsson via GitGitGadget
2026-06-10 15:48 ` Junio C Hamano
2026-06-10 18:25   ` Kristofer Karlsson
2026-06-10 19:29   ` Derrick Stolee [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eae94a67-9c66-4b1d-9f0b-45ee3e80ddf8@gmail.com \
    --to=stolee@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=krka@spotify.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox