Git development
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Kristofer Karlsson via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Kristofer Karlsson <krka@spotify.com>
Subject: Re: [PATCH] commit-reach: use the decoration hash for tips_reachable_from_bases()
Date: Fri, 15 May 2026 17:14:59 -0400	[thread overview]
Message-ID: <20260515211459.GA158762@coredump.intra.peff.net> (raw)
In-Reply-To: <pull.2116.git.1778868463992.gitgitgadget@gmail.com>

On Fri, May 15, 2026 at 06:07:43PM +0000, Kristofer Karlsson via GitGitGadget wrote:

> From: Kristofer Karlsson <krka@spotify.com>
> 
> tips_reachable_from_bases() walks the commit graph from a set of base
> commits to find which tip commits are reachable.  The inner loop does
> a linear scan over the tips array to check whether each visited commit
> is a tip, making the overall cost O(C * T) where C is commits walked
> and T is the number of tips.
> 
> Replace the linear scan with the decoration hash for lookups, reducing
> the per-commit tip check from O(T) to O(1) and the overall cost from
> O(C * T) to O(C + T).
> 
> This function is called by `git for-each-ref --merged` and
> `git branch/tag --contains/--no-contains` via reach_filter() in
> ref-filter.c.
> 
> Benchmark on a merge-heavy monorepo (2.3M commits, 10,000 refs):
> 
>   Command                           Before    After   Speedup
>   for-each-ref --merged HEAD        6.64s     1.66s     4.0x
>   for-each-ref --no-merged HEAD     6.75s     1.74s     3.9x
>   branch --merged HEAD              0.68s     0.61s      10%
>   branch --no-merged HEAD           0.65s     0.61s       8%
>   tag --merged HEAD                 0.12s     0.12s       -
> 
> The large speedup for for-each-ref is because it checks all 10,000
> refs as tips, making the O(T) inner loop expensive.  The branch
> subcommand only checks local branches (fewer tips), so the improvement
> is smaller.

Hmm, I couldn't reproduce the speedup on something like linux.git (~1.4M
commits) with a lot of synthetic branches. I'd think that old branches
would be the most expensive, so I did:

  old=$(git rev-list --reverse HEAD | head -n1)
  seq --format="update refs/heads/branch%g $old" 10000 |
  git update-ref --stdin

Running "git for-each-ref --no-merged HEAD" takes ~650ms with stock Git.
But with your patch, it goes to ~830ms!

So what am I missing about your repo that it is so slow in the first
place?

>      * Hacking the array index into the decoration value as (void *)(i + 1)
>        instead of storing a proper pointer

The decoration API is not the most generic option here. There's an
oidmap type, but you have to embed the hashmap bits into your struct,
which is a lot of boilerplate if you're just storing an int. You can
define a khash with a custom value type, and I think the existing
oid_pos uses an int, which might be enough. All of those will store an
extra copy of the oid, though for the sizes we're talking about that's
not the end of the world.

Since we're always mapping commits, you could define a commit-slab (each
commit struct gets a unique id which we then index into a big array).
See commit-slab.h for an example.

I'm not very familiar with this code, but I wonder if we actually need
to map at all. It looks like we are mostly interested in set inclusion,
so perhaps an oidset() would work. Or even a bit in the object-flags.

-Peff

  reply	other threads:[~2026-05-15 21:15 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-15 18:07 [PATCH] commit-reach: use the decoration hash for tips_reachable_from_bases() Kristofer Karlsson via GitGitGadget
2026-05-15 21:14 ` Jeff King [this message]
2026-05-16  8:23   ` Kristofer Karlsson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260515211459.GA158762@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=krka@spotify.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox