Git development
 help / color / mirror / Atom feed
* [PATCH] commit-reach: use the decoration hash for tips_reachable_from_bases()
@ 2026-05-15 18:07 Kristofer Karlsson via GitGitGadget
  2026-05-15 21:14 ` Jeff King
  0 siblings, 1 reply; 3+ messages in thread
From: Kristofer Karlsson via GitGitGadget @ 2026-05-15 18:07 UTC (permalink / raw)
  To: git; +Cc: Kristofer Karlsson, Kristofer Karlsson

From: Kristofer Karlsson <krka@spotify.com>

tips_reachable_from_bases() walks the commit graph from a set of base
commits to find which tip commits are reachable.  The inner loop does
a linear scan over the tips array to check whether each visited commit
is a tip, making the overall cost O(C * T) where C is commits walked
and T is the number of tips.

Replace the linear scan with the decoration hash for lookups, reducing
the per-commit tip check from O(T) to O(1) and the overall cost from
O(C * T) to O(C + T).

This function is called by `git for-each-ref --merged` and
`git branch/tag --contains/--no-contains` via reach_filter() in
ref-filter.c.

Benchmark on a merge-heavy monorepo (2.3M commits, 10,000 refs):

  Command                           Before    After   Speedup
  for-each-ref --merged HEAD        6.64s     1.66s     4.0x
  for-each-ref --no-merged HEAD     6.75s     1.74s     3.9x
  branch --merged HEAD              0.68s     0.61s      10%
  branch --no-merged HEAD           0.65s     0.61s       8%
  tag --merged HEAD                 0.12s     0.12s       -

The large speedup for for-each-ref is because it checks all 10,000
refs as tips, making the O(T) inner loop expensive.  The branch
subcommand only checks local branches (fewer tips), so the improvement
is smaller.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
    commit-reach: use the decoration hash for tips_reachable_from_bases()
    
    This is a single small commit that replaces an O(C*T) linear scan in
    tips_reachable_from_bases() with an O(1) lookup using the decoration
    hash.
    
    The function is called by git for-each-ref --merged and git branch/tag
    --contains/--no-contains via reach_filter() in ref-filter.c. On a
    merge-heavy monorepo with 2.3M commits and 10,000 refs, git for-each-ref
    --merged HEAD goes from 6.6s to 1.7s (4x).
    
    The diff is intentionally minimal (+9/-6) to make the idea easy to
    discuss before polishing. Things I'm not fully happy about:
    
     * Extra block scope { } just to preserve indentation of the inner body
     * Hacking the array index into the decoration value as (void *)(i + 1)
       instead of storing a proper pointer
     * Relying on unsigned wraparound (- 1 on a size_t 0) to check for
       not-found via j < tips_nr
    
    Happy to clean all of these up in a follow-up commit if the approach
    makes sense.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2116%2Fspkrka%2Ftips-reachable-minimal-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2116/spkrka/tips-reachable-minimal-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/2116

 commit-reach.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index d3a9b3ed6f..70b056eae0 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -1150,6 +1150,7 @@ void tips_reachable_from_bases(struct repository *r,
 	size_t min_generation_index = 0;
 	timestamp_t min_generation;
 	struct commit_list *stack = NULL;
+	struct decoration tip_index = { "tip_index" };
 
 	if (!bases || !tips || !tips_nr)
 		return;
@@ -1173,6 +1174,10 @@ void tips_reachable_from_bases(struct repository *r,
 	QSORT(commits, tips_nr, compare_commit_and_index_by_generation);
 	min_generation = commits[0].generation;
 
+	for (size_t i = 0; i < tips_nr; i++)
+		add_decoration(&tip_index, &commits[i].commit->object,
+			       (void *)(i + 1));
+
 	while (bases) {
 		repo_parse_commit(r, bases->item);
 		commit_list_insert(bases->item, &stack);
@@ -1183,14 +1188,11 @@ void tips_reachable_from_bases(struct repository *r,
 		int explored_all_parents = 1;
 		struct commit_list *p;
 		struct commit *c = stack->item;
-		timestamp_t c_gen = commit_graph_generation(c);
 
 		/* Does it match any of our tips? */
-		for (size_t j = min_generation_index; j < tips_nr; j++) {
-			if (c_gen < commits[j].generation)
-				break;
-
-			if (commits[j].commit == c) {
+		{
+			size_t j = (size_t)lookup_decoration(&tip_index, &c->object) - 1;
+			if (j < tips_nr) {
 				tips[commits[j].index]->object.flags |= mark;
 
 				if (j == min_generation_index) {
@@ -1232,6 +1234,7 @@ void tips_reachable_from_bases(struct repository *r,
 	}
 
 done:
+	clear_decoration(&tip_index, NULL);
 	free(commits);
 	repo_clear_commit_marks(r, SEEN);
 	commit_list_free(stack);

base-commit: 59ff4886a579f4bc91e976fe18590b9ae02c7a08
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] commit-reach: use the decoration hash for tips_reachable_from_bases()
  2026-05-15 18:07 [PATCH] commit-reach: use the decoration hash for tips_reachable_from_bases() Kristofer Karlsson via GitGitGadget
@ 2026-05-15 21:14 ` Jeff King
  2026-05-16  8:23   ` Kristofer Karlsson
  0 siblings, 1 reply; 3+ messages in thread
From: Jeff King @ 2026-05-15 21:14 UTC (permalink / raw)
  To: Kristofer Karlsson via GitGitGadget; +Cc: git, Kristofer Karlsson

On Fri, May 15, 2026 at 06:07:43PM +0000, Kristofer Karlsson via GitGitGadget wrote:

> From: Kristofer Karlsson <krka@spotify.com>
> 
> tips_reachable_from_bases() walks the commit graph from a set of base
> commits to find which tip commits are reachable.  The inner loop does
> a linear scan over the tips array to check whether each visited commit
> is a tip, making the overall cost O(C * T) where C is commits walked
> and T is the number of tips.
> 
> Replace the linear scan with the decoration hash for lookups, reducing
> the per-commit tip check from O(T) to O(1) and the overall cost from
> O(C * T) to O(C + T).
> 
> This function is called by `git for-each-ref --merged` and
> `git branch/tag --contains/--no-contains` via reach_filter() in
> ref-filter.c.
> 
> Benchmark on a merge-heavy monorepo (2.3M commits, 10,000 refs):
> 
>   Command                           Before    After   Speedup
>   for-each-ref --merged HEAD        6.64s     1.66s     4.0x
>   for-each-ref --no-merged HEAD     6.75s     1.74s     3.9x
>   branch --merged HEAD              0.68s     0.61s      10%
>   branch --no-merged HEAD           0.65s     0.61s       8%
>   tag --merged HEAD                 0.12s     0.12s       -
> 
> The large speedup for for-each-ref is because it checks all 10,000
> refs as tips, making the O(T) inner loop expensive.  The branch
> subcommand only checks local branches (fewer tips), so the improvement
> is smaller.

Hmm, I couldn't reproduce the speedup on something like linux.git (~1.4M
commits) with a lot of synthetic branches. I'd think that old branches
would be the most expensive, so I did:

  old=$(git rev-list --reverse HEAD | head -n1)
  seq --format="update refs/heads/branch%g $old" 10000 |
  git update-ref --stdin

Running "git for-each-ref --no-merged HEAD" takes ~650ms with stock Git.
But with your patch, it goes to ~830ms!

So what am I missing about your repo that it is so slow in the first
place?

>      * Hacking the array index into the decoration value as (void *)(i + 1)
>        instead of storing a proper pointer

The decoration API is not the most generic option here. There's an
oidmap type, but you have to embed the hashmap bits into your struct,
which is a lot of boilerplate if you're just storing an int. You can
define a khash with a custom value type, and I think the existing
oid_pos uses an int, which might be enough. All of those will store an
extra copy of the oid, though for the sizes we're talking about that's
not the end of the world.

Since we're always mapping commits, you could define a commit-slab (each
commit struct gets a unique id which we then index into a big array).
See commit-slab.h for an example.

I'm not very familiar with this code, but I wonder if we actually need
to map at all. It looks like we are mostly interested in set inclusion,
so perhaps an oidset() would work. Or even a bit in the object-flags.

-Peff

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] commit-reach: use the decoration hash for tips_reachable_from_bases()
  2026-05-15 21:14 ` Jeff King
@ 2026-05-16  8:23   ` Kristofer Karlsson
  0 siblings, 0 replies; 3+ messages in thread
From: Kristofer Karlsson @ 2026-05-16  8:23 UTC (permalink / raw)
  To: Jeff King; +Cc: Kristofer Karlsson via GitGitGadget, git

Thanks for testing this, Jeff! You're right, the patch as posted
regresses on your synthetic test case.

The issue is that when multiple refs point to the same commit,
add_decoration overwrites earlier entries,
so only one index gets stored. The marking itself is correct (the flag
is on the shared commit object,
so all duplicates get marked), but the j == min_generation_index check
never fires for the minimum tip,
so early termination breaks. The DFS walks the entire graph instead of
stopping when all tips are found.

I have a fix for the early-termination bug (checking the flag at
min_generation_index instead of comparing indices),
but your suggestions about the API are well taken, I don't think the
decoration hash is the right tool here.
Since we only need set membership ("is this commit a tip?"), not a
mapping, an object-flags bit or commit-slab would
indeed be simpler and avoid the (void *)(i + 1) hack entirely.

I fixed it locally now for the linux test case and got a 4x speedup
there too - the problem was failing the early termination.
Some numbers when running against the linux repo on my machine:

Command          │ Baseline │     V1 (broken)     │     V2 (fixed)      │
--no-merged HEAD │ 1.33s    │ 2.01s (1.5x slower) │ 0.31s (4.3x faster) │
--merged HEAD    │ 1.35s    │ 1.96s (1.5x slower) │ 0.31s (4.3x faster) │

However, I'll still need to rethink the decoration map - I will come
back with a better patch shortly.

- Kristofer

On Fri, 15 May 2026 at 23:15, Jeff King <peff@peff.net> wrote:
>
> On Fri, May 15, 2026 at 06:07:43PM +0000, Kristofer Karlsson via GitGitGadget wrote:
>
> > From: Kristofer Karlsson <krka@spotify.com>
> >
> > tips_reachable_from_bases() walks the commit graph from a set of base
> > commits to find which tip commits are reachable.  The inner loop does
> > a linear scan over the tips array to check whether each visited commit
> > is a tip, making the overall cost O(C * T) where C is commits walked
> > and T is the number of tips.
> >
> > Replace the linear scan with the decoration hash for lookups, reducing
> > the per-commit tip check from O(T) to O(1) and the overall cost from
> > O(C * T) to O(C + T).
> >
> > This function is called by `git for-each-ref --merged` and
> > `git branch/tag --contains/--no-contains` via reach_filter() in
> > ref-filter.c.
> >
> > Benchmark on a merge-heavy monorepo (2.3M commits, 10,000 refs):
> >
> >   Command                           Before    After   Speedup
> >   for-each-ref --merged HEAD        6.64s     1.66s     4.0x
> >   for-each-ref --no-merged HEAD     6.75s     1.74s     3.9x
> >   branch --merged HEAD              0.68s     0.61s      10%
> >   branch --no-merged HEAD           0.65s     0.61s       8%
> >   tag --merged HEAD                 0.12s     0.12s       -
> >
> > The large speedup for for-each-ref is because it checks all 10,000
> > refs as tips, making the O(T) inner loop expensive.  The branch
> > subcommand only checks local branches (fewer tips), so the improvement
> > is smaller.
>
> Hmm, I couldn't reproduce the speedup on something like linux.git (~1.4M
> commits) with a lot of synthetic branches. I'd think that old branches
> would be the most expensive, so I did:
>
>   old=$(git rev-list --reverse HEAD | head -n1)
>   seq --format="update refs/heads/branch%g $old" 10000 |
>   git update-ref --stdin
>
> Running "git for-each-ref --no-merged HEAD" takes ~650ms with stock Git.
> But with your patch, it goes to ~830ms!
>
> So what am I missing about your repo that it is so slow in the first
> place?
>
> >      * Hacking the array index into the decoration value as (void *)(i + 1)
> >        instead of storing a proper pointer
>
> The decoration API is not the most generic option here. There's an
> oidmap type, but you have to embed the hashmap bits into your struct,
> which is a lot of boilerplate if you're just storing an int. You can
> define a khash with a custom value type, and I think the existing
> oid_pos uses an int, which might be enough. All of those will store an
> extra copy of the oid, though for the sizes we're talking about that's
> not the end of the world.
>
> Since we're always mapping commits, you could define a commit-slab (each
> commit struct gets a unique id which we then index into a big array).
> See commit-slab.h for an example.
>
> I'm not very familiar with this code, but I wonder if we actually need
> to map at all. It looks like we are mostly interested in set inclusion,
> so perhaps an oidset() would work. Or even a bit in the object-flags.
>
> -Peff

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-16  8:23 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-15 18:07 [PATCH] commit-reach: use the decoration hash for tips_reachable_from_bases() Kristofer Karlsson via GitGitGadget
2026-05-15 21:14 ` Jeff King
2026-05-16  8:23   ` Kristofer Karlsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox