[PATCH] describe: use khash in finish_depth

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] describe: use khash in finish_depth_computation()
@ 2025-08-24  8:37 René Scharfe
  2025-08-24 10:31 ` Jeff King
  2025-09-02 18:24 ` [PATCH v2] describe: use oidset " René Scharfe
  0 siblings, 2 replies; 21+ messages in thread
From: René Scharfe @ 2025-08-24  8:37 UTC (permalink / raw)
  To: Git List

Depth computation can end early if all remaining commits are flagged.
The current code determines if that's the case by checking all queue
items each time it dequeues a flagged commit.  This can cause
quadratic complexity.

We could simply count the flagged items in the queue and then update
that number as we add and remove items.  That would provide a general
speedup, but leave one case where we have to scan the whole queue: When
we flag a previously seen, but unflagged commit.  It could be on the
queue and then we'd have to decrease our count.

We could dedicate an object flag to track queue membership, but that
would leave less for candidate tags, affecting the results.  So use a
hash table, specifically a khash set of commit pointers, to track that.
This avoids quadratic behaviour in all cases and provides a nice
performance boost over the previous commit, 08bb69d70f (describe: use
prio_queue_replace(), 2025-08-03):

Benchmark 1: ./git_08bb69d70f describe $(git rev-list v2.41.0..v2.47.0)
  Time (mean ± σ):     851.7 ms ±   1.1 ms    [User: 788.7 ms, System: 49.2 ms]
  Range (min … max):   849.4 ms … 852.8 ms    10 runs

Benchmark 2: ./git describe $(git rev-list v2.41.0..v2.47.0)
  Time (mean ± σ):     607.1 ms ±   0.9 ms    [User: 544.6 ms, System: 48.6 ms]
  Range (min … max):   606.1 ms … 608.3 ms    10 runs

Summary
  ./git describe $(git rev-list v2.41.0..v2.47.0) ran
    1.40 ± 0.00 times faster than ./git_08bb69d70f describe $(git rev-list v2.41.0..v2.47.0)

Use the commit index value as a hash because it is unique, has the
right size and needs no computation.  We could also derive the hash
directly from the pointer value, but that requires slightly more effort.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 builtin/describe.c | 57 ++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 47 insertions(+), 10 deletions(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index 0540976210..edb4dec79d 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -24,6 +24,7 @@
 #include "commit-slab.h"
 #include "wildmatch.h"
 #include "prio-queue.h"
+#include "khash.h"
 
 #define MAX_TAGS	(FLAG_BITS - 1)
 #define DEFAULT_CANDIDATES 10
@@ -286,38 +287,74 @@ static void lazy_queue_clear(struct lazy_queue *queue)
 	queue->get_pending = false;
 }
 
-static bool all_have_flag(const struct lazy_queue *queue, unsigned flag)
+static inline unsigned int commit_index(const struct commit *commit)
 {
-	for (size_t i = queue->get_pending ? 1 : 0; i < queue->queue.nr; i++) {
-		struct commit *commit = queue->queue.array[i].data;
-		if (!(commit->object.flags & flag))
-			return false;
-	}
-	return true;
+	return commit->index;
+}
+
+static inline int ptr_eq(const void *a, const void *b)
+{
+	return a == b;
+}
+
+KHASH_INIT(commit_set, struct commit *, int, 0, commit_index, ptr_eq)
+
+static void commit_set_insert(kh_commit_set_t *set, struct commit *commit)
+{
+	int added;
+	kh_put_commit_set(set, commit, &added);
+}
+
+static void commit_set_remove(kh_commit_set_t *set, struct commit *commit)
+{
+	khiter_t pos = kh_get_commit_set(set, commit);
+	if (pos != kh_end(set))
+		kh_del_commit_set(set, pos);
 }
 
 static unsigned long finish_depth_computation(struct lazy_queue *queue,
 					      struct possible_tag *best)
 {
 	unsigned long seen_commits = 0;
+	kh_commit_set_t unflagged = { 0 };
+
+	for (size_t i = queue->get_pending ? 1 : 0; i < queue->queue.nr; i++) {
+		struct commit *commit = queue->queue.array[i].data;
+		if (!(commit->object.flags & best->flag_within))
+			commit_set_insert(&unflagged, commit);
+	}
+
 	while (!lazy_queue_empty(queue)) {
 		struct commit *c = lazy_queue_get(queue);
 		struct commit_list *parents = c->parents;
+
 		seen_commits++;
 		if (c->object.flags & best->flag_within) {
-			if (all_have_flag(queue, best->flag_within))
+			if (!kh_size(&unflagged))
 				break;
-		} else
+		} else {
+			commit_set_remove(&unflagged, c);
 			best->depth++;
+		}
 		while (parents) {
 			struct commit *p = parents->item;
+			unsigned seen, flag_before, flag_after;
+
 			repo_parse_commit(the_repository, p);
-			if (!(p->object.flags & SEEN))
+			seen = p->object.flags & SEEN;
+			if (!seen)
 				lazy_queue_put(queue, p);
+			flag_before = p->object.flags & best->flag_within;
 			p->object.flags |= c->object.flags;
+			flag_after = p->object.flags & best->flag_within;
+			if (!seen && !flag_after)
+				commit_set_insert(&unflagged, p);
+			if (seen && !flag_before && flag_after)
+				commit_set_remove(&unflagged, p);
 			parents = parents->next;
 		}
 	}
+	kh_release_commit_set(&unflagged);
 	return seen_commits;
 }
 
-- 
2.51.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] describe: use khash in finish_depth_computation()
  2025-08-24  8:37 [PATCH] describe: use khash in finish_depth_computation() René Scharfe
@ 2025-08-24 10:31 ` Jeff King
  2025-08-24 16:32   ` René Scharfe
  2025-09-02 18:24 ` [PATCH v2] describe: use oidset " René Scharfe
  1 sibling, 1 reply; 21+ messages in thread
From: Jeff King @ 2025-08-24 10:31 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List

On Sun, Aug 24, 2025 at 10:37:28AM +0200, René Scharfe wrote:

> We could dedicate an object flag to track queue membership, but that
> would leave less for candidate tags, affecting the results.  So use a
> hash table, specifically a khash set of commit pointers, to track that.
> This avoids quadratic behaviour in all cases and provides a nice
> performance boost over the previous commit, 08bb69d70f (describe: use
> prio_queue_replace(), 2025-08-03):
> 
> Benchmark 1: ./git_08bb69d70f describe $(git rev-list v2.41.0..v2.47.0)
>   Time (mean ± σ):     851.7 ms ±   1.1 ms    [User: 788.7 ms, System: 49.2 ms]
>   Range (min … max):   849.4 ms … 852.8 ms    10 runs
> 
> Benchmark 2: ./git describe $(git rev-list v2.41.0..v2.47.0)
>   Time (mean ± σ):     607.1 ms ±   0.9 ms    [User: 544.6 ms, System: 48.6 ms]
>   Range (min … max):   606.1 ms … 608.3 ms    10 runs
> 
> Summary
>   ./git describe $(git rev-list v2.41.0..v2.47.0) ran
>     1.40 ± 0.00 times faster than ./git_08bb69d70f describe $(git rev-list v2.41.0..v2.47.0)
> 
> Use the commit index value as a hash because it is unique, has the
> right size and needs no computation.  We could also derive the hash
> directly from the pointer value, but that requires slightly more effort.

Interesting. This is exactly what commit-slabs were created for (and are
why the convenient index value is there in the first place!).

The idea of commit-slab was to have a zero-cost lookup, which is done by
indexing into an array (well, really an array-of-arrays). The biggest
downside of commit-slabs is that they allocate one element per commit.
So for a sparse set you end up over-allocating and possibly suffering
cache woes due to requests being far apart.

Whereas in your technique we are trading a little bit of computation
(indexing a bucket and then probing for the match) to get a table that
scales with the number of elements actually added to it.

It should be easy to convert between the two and time it. On top of your
patch, I think this works:

diff --git a/builtin/describe.c b/builtin/describe.c
index edb4dec79d..f1d1ce8c8e 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -287,36 +287,38 @@ static void lazy_queue_clear(struct lazy_queue *queue)
 	queue->get_pending = false;
 }
 
-static inline unsigned int commit_index(const struct commit *commit)
-{
-	return commit->index;
-}
-
-static inline int ptr_eq(const void *a, const void *b)
-{
-	return a == b;
-}
+define_commit_slab(commit_counter, int);
 
-KHASH_INIT(commit_set, struct commit *, int, 0, commit_index, ptr_eq)
+struct commit_set {
+	int nr;
+	struct commit_counter present;
+};
 
-static void commit_set_insert(kh_commit_set_t *set, struct commit *commit)
+static void commit_set_insert(struct commit_set *set, struct commit *commit)
 {
-	int added;
-	kh_put_commit_set(set, commit, &added);
+	int *v = commit_counter_at(&set->present, commit);
+	if (!*v) {
+		set->nr++;
+		*v = 1;
+	}
 }
 
-static void commit_set_remove(kh_commit_set_t *set, struct commit *commit)
+static void commit_set_remove(struct commit_set *set, struct commit *commit)
 {
-	khiter_t pos = kh_get_commit_set(set, commit);
-	if (pos != kh_end(set))
-		kh_del_commit_set(set, pos);
+	int *v = commit_counter_peek(&set->present, commit);
+	if (*v) {
+		set->nr--;
+		*v = 0;
+	}
 }
 
 static unsigned long finish_depth_computation(struct lazy_queue *queue,
 					      struct possible_tag *best)
 {
 	unsigned long seen_commits = 0;
-	kh_commit_set_t unflagged = { 0 };
+	struct commit_set unflagged = { 0 };
+
+	init_commit_counter(&unflagged.present);
 
 	for (size_t i = queue->get_pending ? 1 : 0; i < queue->queue.nr; i++) {
 		struct commit *commit = queue->queue.array[i].data;
@@ -330,7 +332,7 @@ static unsigned long finish_depth_computation(struct lazy_queue *queue,
 
 		seen_commits++;
 		if (c->object.flags & best->flag_within) {
-			if (!kh_size(&unflagged))
+			if (!unflagged.nr)
 				break;
 		} else {
 			commit_set_remove(&unflagged, c);
@@ -354,7 +356,7 @@ static unsigned long finish_depth_computation(struct lazy_queue *queue,
 			parents = parents->next;
 		}
 	}
-	kh_release_commit_set(&unflagged);
+	clear_commit_counter(&unflagged.present);
 	return seen_commits;
 }
 

Here's what I get for timing:

  Benchmark 1: ./git.orig describe $(git rev-list v2.41.0..v2.47.0)
    Time (mean ± σ):      1.195 s ±  0.012 s    [User: 1.152 s, System: 0.045 s]
    Range (min … max):    1.175 s …  1.220 s    10 runs
  
  Benchmark 2: ./git.khash describe $(git rev-list v2.41.0..v2.47.0)
    Time (mean ± σ):     912.4 ms ±   5.7 ms    [User: 867.7 ms, System: 46.3 ms]
    Range (min … max):   901.1 ms … 921.2 ms    10 runs
  
  Benchmark 3: ./git.slab describe $(git rev-list v2.41.0..v2.47.0)
    Time (mean ± σ):     937.9 ms ±   7.6 ms    [User: 896.1 ms, System: 43.5 ms]
    Range (min … max):   924.8 ms … 947.9 ms    10 runs
  
  Summary
    ./git.khash describe $(git rev-list v2.41.0..v2.47.0) ran
      1.03 ± 0.01 times faster than ./git.slab describe $(git rev-list v2.41.0..v2.47.0)
      1.31 ± 0.02 times faster than ./git.orig describe $(git rev-list v2.41.0..v2.47.0)

So I see similar speedups vs stock Git using your patch, but the
commit-slab version is just slightly slower. That of course makes me
wonder if we could or should replace the guts of commit-slab with a hash
more like this. Some obvious questions:

  1. Does the hash always perform better? For a dense set, might the
     commit-slab do better (probably something like topo-sort would be a
     good test there).

  2. Can the hash version handle strides of different sizes? One of the
     points of commit-slab is that the fixed size of the value type can
     be set at runtime (so you could have a slab of 32 bits per commit,
     or 132, depending on your traversal needs).

  3. How does it perform if we swap the commit->index field for using
     the pointer? If it's similar or faster, we could get rid of the
     commit->index field entirely. Besides saving a few bytes and being
     simpler, that would also mean that we could start to use the same
     slab techniques for non-commit objects. There are several cases
     where we use a few custom bits in object.flags because we need to
     cover both commits and other objects. But those are error prone if
     two sub-systems of Git use the same bits and happen to run at the
     same time without clearing in between. It would be great if each
     algorithm could declare its own unique flag space (and discard them
     in one action without iterating yet again to clear the bits).

-Peff

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] describe: use khash in finish_depth_computation()
  2025-08-24 10:31 ` Jeff King
@ 2025-08-24 16:32   ` René Scharfe
  2025-08-25  7:34     ` Jeff King
  0 siblings, 1 reply; 21+ messages in thread
From: René Scharfe @ 2025-08-24 16:32 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List

On 8/24/25 12:31 PM, Jeff King wrote:
> On Sun, Aug 24, 2025 at 10:37:28AM +0200, René Scharfe wrote:
> 
>> We could dedicate an object flag to track queue membership, but that
>> would leave less for candidate tags, affecting the results.  So use a
>> hash table, specifically a khash set of commit pointers, to track that.
>> This avoids quadratic behaviour in all cases and provides a nice
>> performance boost over the previous commit, 08bb69d70f (describe: use
>> prio_queue_replace(), 2025-08-03):
>>
>> Benchmark 1: ./git_08bb69d70f describe $(git rev-list v2.41.0..v2.47.0)
>>   Time (mean ± σ):     851.7 ms ±   1.1 ms    [User: 788.7 ms, System: 49.2 ms]
>>   Range (min … max):   849.4 ms … 852.8 ms    10 runs
>>
>> Benchmark 2: ./git describe $(git rev-list v2.41.0..v2.47.0)
>>   Time (mean ± σ):     607.1 ms ±   0.9 ms    [User: 544.6 ms, System: 48.6 ms]
>>   Range (min … max):   606.1 ms … 608.3 ms    10 runs
>>
>> Summary
>>   ./git describe $(git rev-list v2.41.0..v2.47.0) ran
>>     1.40 ± 0.00 times faster than ./git_08bb69d70f describe $(git rev-list v2.41.0..v2.47.0)
>>
>> Use the commit index value as a hash because it is unique, has the
>> right size and needs no computation.  We could also derive the hash
>> directly from the pointer value, but that requires slightly more effort.
> 
> Interesting. This is exactly what commit-slabs were created for (and are
> why the convenient index value is there in the first place!).

Kinda -- they have a payload value, while a khash set only stores keys.
A commit slab with an int payload would still be smaller than a khash
set with 64-bit pointers as keys -- IF we were to add all commits.  Here
we typically add just a few, but a pathological history could add a lot;
not sure if there's a boundary.  Hmm.  So you might be able to find
examples where commit-slabs win.

> The idea of commit-slab was to have a zero-cost lookup, which is done by
> indexing into an array (well, really an array-of-arrays). The biggest
> downside of commit-slabs is that they allocate one element per commit.
> So for a sparse set you end up over-allocating and possibly suffering
> cache woes due to requests being far apart.

Right, and it doesn't support removal.

> Whereas in your technique we are trading a little bit of computation
> (indexing a bucket and then probing for the match) to get a table that
> scales with the number of elements actually added to it.
> 
> It should be easy to convert between the two and time it. On top of your
> patch, I think this works:
> 
> diff --git a/builtin/describe.c b/builtin/describe.c
> index edb4dec79d..f1d1ce8c8e 100644
> --- a/builtin/describe.c
> +++ b/builtin/describe.c
> @@ -287,36 +287,38 @@ static void lazy_queue_clear(struct lazy_queue *queue)
>  	queue->get_pending = false;
>  }
>  
> -static inline unsigned int commit_index(const struct commit *commit)
> -{
> -	return commit->index;
> -}
> -
> -static inline int ptr_eq(const void *a, const void *b)
> -{
> -	return a == b;
> -}
> +define_commit_slab(commit_counter, int);

We only need one bit, so a uint8_t or char would suffice.

>  
> -KHASH_INIT(commit_set, struct commit *, int, 0, commit_index, ptr_eq)
> +struct commit_set {
> +	int nr;
> +	struct commit_counter present;
> +};
>  
> -static void commit_set_insert(kh_commit_set_t *set, struct commit *commit)
> +static void commit_set_insert(struct commit_set *set, struct commit *commit)
>  {
> -	int added;
> -	kh_put_commit_set(set, commit, &added);
> +	int *v = commit_counter_at(&set->present, commit);
> +	if (!*v) {
> +		set->nr++;
> +		*v = 1;
> +	}
>  }
>  
> -static void commit_set_remove(kh_commit_set_t *set, struct commit *commit)
> +static void commit_set_remove(struct commit_set *set, struct commit *commit)
>  {
> -	khiter_t pos = kh_get_commit_set(set, commit);
> -	if (pos != kh_end(set))
> -		kh_del_commit_set(set, pos);
> +	int *v = commit_counter_peek(&set->present, commit);
> +	if (*v) {
> +		set->nr--;
> +		*v = 0;
> +	}
>  }
>  
>  static unsigned long finish_depth_computation(struct lazy_queue *queue,
>  					      struct possible_tag *best)
>  {
>  	unsigned long seen_commits = 0;
> -	kh_commit_set_t unflagged = { 0 };
> +	struct commit_set unflagged = { 0 };
> +
> +	init_commit_counter(&unflagged.present);
>  
>  	for (size_t i = queue->get_pending ? 1 : 0; i < queue->queue.nr; i++) {
>  		struct commit *commit = queue->queue.array[i].data;
> @@ -330,7 +332,7 @@ static unsigned long finish_depth_computation(struct lazy_queue *queue,
>  
>  		seen_commits++;
>  		if (c->object.flags & best->flag_within) {
> -			if (!kh_size(&unflagged))
> +			if (!unflagged.nr)
>  				break;
>  		} else {
>  			commit_set_remove(&unflagged, c);
> @@ -354,7 +356,7 @@ static unsigned long finish_depth_computation(struct lazy_queue *queue,
>  			parents = parents->next;
>  		}
>  	}
> -	kh_release_commit_set(&unflagged);
> +	clear_commit_counter(&unflagged.present);
>  	return seen_commits;
>  }
>  
> 
> Here's what I get for timing:
> 
>   Benchmark 1: ./git.orig describe $(git rev-list v2.41.0..v2.47.0)
>     Time (mean ± σ):      1.195 s ±  0.012 s    [User: 1.152 s, System: 0.045 s]
>     Range (min … max):    1.175 s …  1.220 s    10 runs
>   
>   Benchmark 2: ./git.khash describe $(git rev-list v2.41.0..v2.47.0)
>     Time (mean ± σ):     912.4 ms ±   5.7 ms    [User: 867.7 ms, System: 46.3 ms]
>     Range (min … max):   901.1 ms … 921.2 ms    10 runs
>   
>   Benchmark 3: ./git.slab describe $(git rev-list v2.41.0..v2.47.0)
>     Time (mean ± σ):     937.9 ms ±   7.6 ms    [User: 896.1 ms, System: 43.5 ms]
>     Range (min … max):   924.8 ms … 947.9 ms    10 runs
>   
>   Summary
>     ./git.khash describe $(git rev-list v2.41.0..v2.47.0) ran
>       1.03 ± 0.01 times faster than ./git.slab describe $(git rev-list v2.41.0..v2.47.0)
>       1.31 ± 0.02 times faster than ./git.orig describe $(git rev-list v2.41.0..v2.47.0)
> 

I didn't even consider them a contender due to the memory overhead.  The
difference is surprisingly small.  I also got 3%:

Benchmark 1: ./git_khash describe $(git rev-list v2.41.0..v2.47.0)
  Time (mean ± σ):     608.4 ms ±   0.6 ms    [User: 544.6 ms, System: 49.2 ms]
  Range (min … max):   607.5 ms … 609.3 ms    10 runs

Benchmark 2: ./git_slab describe $(git rev-list v2.41.0..v2.47.0)
  Time (mean ± σ):     624.4 ms ±   1.0 ms    [User: 560.6 ms, System: 49.3 ms]
  Range (min … max):   623.3 ms … 626.5 ms    10 runs

Summary
  ./git_khash describe $(git rev-list v2.41.0..v2.47.0) ran
    1.03 ± 0.00 times faster than ./git_slab describe $(git rev-list v2.41.0..v2.47.0)

I guess hash tables are just fast, even the slower ones.  Provided they
fit into memory.

> So I see similar speedups vs stock Git using your patch, but the
> commit-slab version is just slightly slower. That of course makes me
> wonder if we could or should replace the guts of commit-slab with a hash
> more like this. Some obvious questions:
> 
>   1. Does the hash always perform better? For a dense set, might the
>      commit-slab do better (probably something like topo-sort would be a
>      good test there).

With dense you mean that most commits get some data value assigned that
is kept for longer?  That's when commit-slabs should shine.

>   2. Can the hash version handle strides of different sizes? One of the
>      points of commit-slab is that the fixed size of the value type can
>      be set at runtime (so you could have a slab of 32 bits per commit,
>      or 132, depending on your traversal needs).

Dynamic value sizes require an indirection via a pointer, or at least I
don't see any other way.  What would be a possible use case?  (Don't see
any in-tree.)

>   3. How does it perform if we swap the commit->index field for using
>      the pointer? If it's similar or faster, we could get rid of the
>      commit->index field entirely. Besides saving a few bytes and being
>      simpler, that would also mean that we could start to use the same
>      slab techniques for non-commit objects. There are several cases
>      where we use a few custom bits in object.flags because we need to
>      cover both commits and other objects. But those are error prone if
>      two sub-systems of Git use the same bits and happen to run at the
>      same time without clearing in between. It would be great if each
>      algorithm could declare its own unique flag space (and discard them
>      in one action without iterating yet again to clear the bits).
With "commit" being the pointer, returning "(uintptr_t)commit / 8" in the
hash function for the khash set gets me the same performance.  This
assumes an eight-byte alignment and that pointer values are flat indexes
into memory, which might be too much.  A portable solution would probably
have to mix and spread out all bits evenly?

The nice thing about commit-slabs is the lack of required maintenance.
They just allocate as needed and never give anything back, never need to
rehash.  And they don't need to store the keys anywhere.  They should be
good alternatives to object flags used during full history traversal
without flagging non-commits.

Off-the-shelf hash tables like khash might be slower in these cases,
though not far off, I expect.

René


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] describe: use khash in finish_depth_computation()
  2025-08-24 16:32   ` René Scharfe
@ 2025-08-25  7:34     ` Jeff King
  2025-08-25  8:13       ` Jeff King
                         ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Jeff King @ 2025-08-25  7:34 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List

On Sun, Aug 24, 2025 at 06:32:47PM +0200, René Scharfe wrote:

> >> Use the commit index value as a hash because it is unique, has the
> >> right size and needs no computation.  We could also derive the hash
> >> directly from the pointer value, but that requires slightly more effort.
> > 
> > Interesting. This is exactly what commit-slabs were created for (and are
> > why the convenient index value is there in the first place!).
> 
> Kinda -- they have a payload value, while a khash set only stores keys.
> A commit slab with an int payload would still be smaller than a khash
> set with 64-bit pointers as keys -- IF we were to add all commits.  Here
> we typically add just a few, but a pathological history could add a lot;
> not sure if there's a boundary.  Hmm.  So you might be able to find
> examples where commit-slabs win.

A khash set doesn't have a "vals" array, but I think it does always keep
the flags array, which is a uint32_t per bucket. That's how it knows
which buckets have a value in them (since it does not otherwise assume
there is a sentinel value like NULL). So even though its API is that of
a set, you can think of it as a mapping of keys to flags. So a mapping
to an int (or even a char) should cost the same.

> > The idea of commit-slab was to have a zero-cost lookup, which is done by
> > indexing into an array (well, really an array-of-arrays). The biggest
> > downside of commit-slabs is that they allocate one element per commit.
> > So for a sparse set you end up over-allocating and possibly suffering
> > cache woes due to requests being far apart.
> 
> Right, and it doesn't support removal.

True. And I think that is the big downfall for what I was suggesting
earlier (backing a slab with a hash table). If you have a slab that
holds only a few items at one time, but which cycles through a larger
set of items, it will grow to match that larger set. That is a waste of
memory with a traditional slab, but if you back the slab with a hash
table you also waste computation doing lookups on a hash table with lots
of elements, when most of them are not interesting (but the slab code
has no way to say "this is not interesting any more"; you can only
take the address and assign to it).

So probably the slab API is not so great for many cases. But what I was
mostly getting at is: should we be using hashes in more places if their
access time is cost-competitive with the slab arrays?

More on that in a moment...

> > +define_commit_slab(commit_counter, int);
> 
> We only need one bit, so a uint8_t or char would suffice.

True, though that doesn't seem to change timings much for me (I used
"int" because that's what your khash used, but I think the type is a
dummy value in "set" mode).

> I didn't even consider them a contender due to the memory overhead.  The
> difference is surprisingly small.  I also got 3%:
> 
> Benchmark 1: ./git_khash describe $(git rev-list v2.41.0..v2.47.0)
>   Time (mean ± σ):     608.4 ms ±   0.6 ms    [User: 544.6 ms, System: 49.2 ms]
>   Range (min … max):   607.5 ms … 609.3 ms    10 runs
> 
> Benchmark 2: ./git_slab describe $(git rev-list v2.41.0..v2.47.0)
>   Time (mean ± σ):     624.4 ms ±   1.0 ms    [User: 560.6 ms, System: 49.3 ms]
>   Range (min … max):   623.3 ms … 626.5 ms    10 runs
> 
> Summary
>   ./git_khash describe $(git rev-list v2.41.0..v2.47.0) ran
>     1.03 ± 0.00 times faster than ./git_slab describe $(git rev-list v2.41.0..v2.47.0)
> 
> I guess hash tables are just fast, even the slower ones.  Provided they
> fit into memory.

Yeah. Thinking on it more, how much benefit are we getting from the use
of commit->index in your patch? In other contexts where we store oids or
objects in hash tables, we use the low bits of the oid directly. So it's
similarly cheap. If I do this tweak on top of your patch:

diff --git a/builtin/describe.c b/builtin/describe.c
index edb4dec79d..e024feb080 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -289,7 +289,7 @@ static void lazy_queue_clear(struct lazy_queue *queue)

 static inline unsigned int commit_index(const struct commit *commit)
 {
-	return commit->index;
+	return oidhash(&commit->object.oid);
 }

 static inline int ptr_eq(const void *a, const void *b)

I get similar results (actually faster, but I think within the noise).

Which made me think more (always a danger). We already have an oidset
data structure, which is backed by a khash. It's not _quite_ the same
thing, because it's going to store an actual 36-byte oid struct as the
hash key, rather than an 8-byte pointer to a "struct object" (which
contains that same oid). And likewise when we do a lookup, the bucket
search requires a larger memcmp() than the pointer equality. But how
much difference does that make?

If I instead do this on top of your patch:

diff --git a/builtin/describe.c b/builtin/describe.c
index edb4dec79d..38ce0c1978 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -287,41 +287,16 @@ static void lazy_queue_clear(struct lazy_queue *queue)
 	queue->get_pending = false;
 }

-static inline unsigned int commit_index(const struct commit *commit)
-{
-	return commit->index;
-}
-
-static inline int ptr_eq(const void *a, const void *b)
-{
-	return a == b;
-}
-
-KHASH_INIT(commit_set, struct commit *, int, 0, commit_index, ptr_eq)
-
-static void commit_set_insert(kh_commit_set_t *set, struct commit *commit)
-{
-	int added;
-	kh_put_commit_set(set, commit, &added);
-}
-
-static void commit_set_remove(kh_commit_set_t *set, struct commit *commit)
-{
-	khiter_t pos = kh_get_commit_set(set, commit);
-	if (pos != kh_end(set))
-		kh_del_commit_set(set, pos);
-}
-
 static unsigned long finish_depth_computation(struct lazy_queue *queue,
 					      struct possible_tag *best)
 {
 	unsigned long seen_commits = 0;
-	kh_commit_set_t unflagged = { 0 };
+	struct oidset unflagged = OIDSET_INIT;

 	for (size_t i = queue->get_pending ? 1 : 0; i < queue->queue.nr; i++) {
 		struct commit *commit = queue->queue.array[i].data;
 		if (!(commit->object.flags & best->flag_within))
-			commit_set_insert(&unflagged, commit);
+			oidset_insert(&unflagged, &commit->object.oid);
 	}

 	while (!lazy_queue_empty(queue)) {
@@ -330,10 +305,10 @@ static unsigned long finish_depth_computation(struct lazy_queue *queue,

 		seen_commits++;
 		if (c->object.flags & best->flag_within) {
-			if (!kh_size(&unflagged))
+			if (!oidset_size(&unflagged))
 				break;
 		} else {
-			commit_set_remove(&unflagged, c);
+			oidset_remove(&unflagged, &c->object.oid);
 			best->depth++;
 		}
 		while (parents) {
@@ -348,13 +323,13 @@ static unsigned long finish_depth_computation(struct lazy_queue *queue,
 			p->object.flags |= c->object.flags;
 			flag_after = p->object.flags & best->flag_within;
 			if (!seen && !flag_after)
-				commit_set_insert(&unflagged, p);
+				oidset_insert(&unflagged, &p->object.oid);
 			if (seen && !flag_before && flag_after)
-				commit_set_remove(&unflagged, p);
+				oidset_remove(&unflagged, &p->object.oid);
 			parents = parents->next;
 		}
 	}
-	kh_release_commit_set(&unflagged);
+	oidset_clear(&unflagged);
 	return seen_commits;
 }

then I likewise get very similar timings. Your version seems to be
consistently a little bit faster, but within the run-to-run noise of
~1%. But the bonus here is that we didn't need to define a new hash
type, nor do any tricks with the commit->index field.

Now if we really are worried about those extra bytes in storing a fresh
oid, is there room for new data types? I.e., A "struct objset" that
stores object pointers, hashes based on the oid, and uses pointer
equality to find a match?  And likewise a "struct objmap" that could
perhaps compete with commit-slab (but be more pleasant to use).

> >   1. Does the hash always perform better? For a dense set, might the
> >      commit-slab do better (probably something like topo-sort would be a
> >      good test there).
> 
> With dense you mean that most commits get some data value assigned that
> is kept for longer?  That's when commit-slabs should shine.

Yes, that's what I meant. And yeah, I'd expect commit slabs to be more
competitive with hashes in those cases. But I wonder if a good hash
table could come close enough to obsolete slabs.

> >   2. Can the hash version handle strides of different sizes? One of the
> >      points of commit-slab is that the fixed size of the value type can
> >      be set at runtime (so you could have a slab of 32 bits per commit,
> >      or 132, depending on your traversal needs).
> 
> Dynamic value sizes require an indirection via a pointer, or at least I
> don't see any other way.  What would be a possible use case?  (Don't see
> any in-tree.)

The value isn't totally dynamic; it's static for a single instantiation
of a slab. I don't think khash supports that (it has a value type and
lets the compiler take care of indexing into an array of that value).
But you could imagine a world where khash (or some custom hash code)
only knows that it's storing N bytes per item, and does the indexing
multiplication itself.

But yes, I don't think we are even using that feature of commit-slab
currently. I used it in a series for fast --contains long ago:

  https://lore.kernel.org/git/20140625233429.GA20457@sigill.intra.peff.net/

but can't remember why I never got back to it. It would also be useful
for show-branch, which wants to paint down for each starting tip. But
given that we haven't used it, it may just not be that useful a feature.

> >   3. How does it perform if we swap the commit->index field for using
> >      the pointer? If it's similar or faster, we could get rid of the
> >      commit->index field entirely. Besides saving a few bytes and being
> >      simpler, that would also mean that we could start to use the same
> >      slab techniques for non-commit objects. There are several cases
> >      where we use a few custom bits in object.flags because we need to
> >      cover both commits and other objects. But those are error prone if
> >      two sub-systems of Git use the same bits and happen to run at the
> >      same time without clearing in between. It would be great if each
> >      algorithm could declare its own unique flag space (and discard them
> >      in one action without iterating yet again to clear the bits).
> With "commit" being the pointer, returning "(uintptr_t)commit / 8" in the
> hash function for the khash set gets me the same performance.  This
> assumes an eight-byte alignment and that pointer values are flat indexes
> into memory, which might be too much.  A portable solution would probably
> have to mix and spread out all bits evenly?

I'd think there would be more entropy in the low bits in general, so
just "(uintptr_t)commit & 0xffffffff". But probably just using oidhash()
is better still.

> The nice thing about commit-slabs is the lack of required maintenance.
> They just allocate as needed and never give anything back, never need to
> rehash.  And they don't need to store the keys anywhere.  They should be
> good alternatives to object flags used during full history traversal
> without flagging non-commits.
> 
> Off-the-shelf hash tables like khash might be slower in these cases,
> though not far off, I expect.

Yeah, maybe. I've never really loved the drawbacks of commit-slab, and
your numbers made me wonder if we could be getting away with hashes in
more places. Though one of the drawbacks I find with commit-slab is the
grossness of instantiating a giant mess of macros. That is not much
better with khash. ;) And certainly oidmap, though it does not use
macros, is no picnic to use either.

-Peff

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] describe: use khash in finish_depth_computation()
  2025-08-25  7:34     ` Jeff King
@ 2025-08-25  8:13       ` Jeff King
  2025-08-25 18:48         ` Junio C Hamano
  2025-08-31 17:25       ` René Scharfe
  2025-09-03 16:30       ` René Scharfe
  2 siblings, 1 reply; 21+ messages in thread
From: Jeff King @ 2025-08-25  8:13 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List

On Mon, Aug 25, 2025 at 03:34:03AM -0400, Jeff King wrote:

> > The nice thing about commit-slabs is the lack of required maintenance.
> > They just allocate as needed and never give anything back, never need to
> > rehash.  And they don't need to store the keys anywhere.  They should be
> > good alternatives to object flags used during full history traversal
> > without flagging non-commits.
> > 
> > Off-the-shelf hash tables like khash might be slower in these cases,
> > though not far off, I expect.
> 
> Yeah, maybe. I've never really loved the drawbacks of commit-slab, and
> your numbers made me wonder if we could be getting away with hashes in
> more places. Though one of the drawbacks I find with commit-slab is the
> grossness of instantiating a giant mess of macros. That is not much
> better with khash. ;) And certainly oidmap, though it does not use
> macros, is no picnic to use either.

So out of curiosity I tried replacing a slab that should be pretty
densely filled, using a khash based on oidhash/ptr along with some
quality-of-life wrappers.  Patch is below.

It performs...very badly. Not sure if I've screwed something up, but
it's about 7x slower to run "git rev-list --author-date-order HEAD" in
the kernel. So maybe slabs really are worth it overall.

-Peff

---
diff --git a/commit.c b/commit.c
index 16d91b2bfc..0678932ab8 100644
--- a/commit.c
+++ b/commit.c
@@ -822,9 +822,7 @@ struct commit *pop_commit(struct commit_list **stack)
 /* count number of children that have not been emitted */
 define_commit_slab(indegree_slab, int);
 
-define_commit_slab(author_date_slab, timestamp_t);
-
-void record_author_date(struct author_date_slab *author_date,
+void record_author_date(kh_obj_timestamp_t *author_date,
 			struct commit *commit)
 {
 	const char *buffer = repo_get_commit_buffer(the_repository, commit,
@@ -845,7 +843,7 @@ void record_author_date(struct author_date_slab *author_date,
 	date = parse_timestamp(ident.date_begin, &date_end, 10);
 	if (date_end != ident.date_end)
 		goto fail_exit; /* malformed date */
-	*(author_date_slab_at(author_date, commit)) = date;
+	obj_timestamp_put(author_date, &commit->object, date);
 
 fail_exit:
 	repo_unuse_commit_buffer(the_repository, commit, buffer);
@@ -855,9 +853,9 @@ int compare_commits_by_author_date(const void *a_, const void *b_,
 				   void *cb_data)
 {
 	const struct commit *a = a_, *b = b_;
-	struct author_date_slab *author_date = cb_data;
-	timestamp_t a_date = *(author_date_slab_at(author_date, a));
-	timestamp_t b_date = *(author_date_slab_at(author_date, b));
+	kh_obj_timestamp_t *author_date = cb_data;
+	timestamp_t a_date = obj_timestamp_get(author_date, &a->object);
+	timestamp_t b_date = obj_timestamp_get(author_date, &b->object);
 
 	/* newer commits with larger date first */
 	if (a_date < b_date)
@@ -910,7 +908,7 @@ void sort_in_topological_order(struct commit_list **list, enum rev_sort_order so
 	struct indegree_slab indegree;
 	struct prio_queue queue;
 	struct commit *commit;
-	struct author_date_slab author_date;
+	kh_obj_timestamp_t author_date;
 
 	if (!orig)
 		return;
@@ -927,7 +925,7 @@ void sort_in_topological_order(struct commit_list **list, enum rev_sort_order so
 		queue.compare = compare_commits_by_commit_date;
 		break;
 	case REV_SORT_BY_AUTHOR_DATE:
-		init_author_date_slab(&author_date);
+		memset(&author_date, 0, sizeof(author_date));
 		queue.compare = compare_commits_by_author_date;
 		queue.cb_data = &author_date;
 		break;
@@ -1011,7 +1009,7 @@ void sort_in_topological_order(struct commit_list **list, enum rev_sort_order so
 	clear_indegree_slab(&indegree);
 	clear_prio_queue(&queue);
 	if (sort_order == REV_SORT_BY_AUTHOR_DATE)
-		clear_author_date_slab(&author_date);
+		kh_release_obj_timestamp(&author_date);
 }
 
 struct rev_collect {
diff --git a/commit.h b/commit.h
index 1d6e0c7518..f49531ab8b 100644
--- a/commit.h
+++ b/commit.h
@@ -334,8 +334,8 @@ int remove_signature(struct strbuf *buf);
 int check_commit_signature(const struct commit *commit, struct signature_check *sigc);
 
 /* record author-date for each commit object */
-struct author_date_slab;
-void record_author_date(struct author_date_slab *author_date,
+struct kh_obj_timestamp_t;
+void record_author_date(kh_obj_timestamp_t *author_date,
 			struct commit *commit);
 
 int compare_commits_by_author_date(const void *a_, const void *b_, void *unused);
diff --git a/object.h b/object.h
index 8c3c1c46e1..14718051f2 100644
--- a/object.h
+++ b/object.h
@@ -2,6 +2,7 @@
 #define OBJECT_H
 
 #include "hash.h"
+#include "khash.h"
 
 struct buffer_slab;
 struct repository;
@@ -340,4 +341,33 @@ void clear_object_flags(struct repository *repo, unsigned flags);
  */
 void repo_clear_commit_marks(struct repository *r, unsigned int flags);
 
+/*
+ * This would all probably go into its own header file...
+ */
+static inline unsigned int obj_ptr_hash(const struct object *obj)
+{
+	return oidhash(&obj->oid);
+}
+
+static inline int obj_ptr_eq(const struct object *a, const struct object *b)
+{
+	return a == b;
+}
+
+#define OBJHASH_INIT(name, value_type, sentinel) \
+KHASH_INIT(name, const struct object *, value_type, 1, obj_ptr_hash, obj_ptr_eq); \
+static inline void name##_put(kh_##name##_t *map, const struct object *obj, value_type v) \
+{ \
+	int hash_ret; \
+	khiter_t pos = kh_put_##name(map, obj, &hash_ret); \
+	kh_value(map, pos) = v; \
+} \
+static inline value_type name##_get(kh_##name##_t *map, const struct object *obj) \
+{ \
+	khiter_t pos = kh_get_##name(map, obj); \
+	return pos == kh_end(map) ? sentinel : kh_value(map, pos); \
+}
+
+OBJHASH_INIT(obj_timestamp, timestamp_t, 0);
+
 #endif /* OBJECT_H */
diff --git a/revision.c b/revision.c
index 6ba8f67054..e67cf2249a 100644
--- a/revision.c
+++ b/revision.c
@@ -3622,15 +3622,14 @@ static int mark_uninteresting(const struct object_id *oid,
 }
 
 define_commit_slab(indegree_slab, int);
-define_commit_slab(author_date_slab, timestamp_t);
 
 struct topo_walk_info {
 	timestamp_t min_generation;
 	struct prio_queue explore_queue;
 	struct prio_queue indegree_queue;
 	struct prio_queue topo_queue;
 	struct indegree_slab indegree;
-	struct author_date_slab author_date;
+	kh_obj_timestamp_t author_date;
 };
 
 static int topo_walk_atexit_registered;
@@ -3755,7 +3754,7 @@ static void release_revisions_topo_walk_info(struct topo_walk_info *info)
 	clear_prio_queue(&info->indegree_queue);
 	clear_prio_queue(&info->topo_queue);
 	clear_indegree_slab(&info->indegree);
-	clear_author_date_slab(&info->author_date);
+	kh_release_obj_timestamp(&info->author_date);
 	free(info);
 }
 
@@ -3789,7 +3788,7 @@ static void init_topo_walk(struct rev_info *revs)
 		info->topo_queue.compare = compare_commits_by_commit_date;
 		break;
 	case REV_SORT_BY_AUTHOR_DATE:
-		init_author_date_slab(&info->author_date);
+		memset(&info->author_date, 0, sizeof(info->author_date));
 		info->topo_queue.compare = compare_commits_by_author_date;
 		info->topo_queue.cb_data = &info->author_date;
 		break;

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] describe: use khash in finish_depth_computation()
  2025-08-25  8:13       ` Jeff King
@ 2025-08-25 18:48         ` Junio C Hamano
  2025-08-26  3:39           ` Jeff King
  0 siblings, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2025-08-25 18:48 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, Git List

Jeff King <peff@peff.net> writes:

> So out of curiosity I tried replacing a slab that should be pretty
> densely filled, using a khash based on oidhash/ptr along with some
> quality-of-life wrappers.  Patch is below.
>
> It performs...very badly. Not sure if I've screwed something up, but
> it's about 7x slower to run "git rev-list --author-date-order HEAD" in
> the kernel. So maybe slabs really are worth it overall.

Hmph.  It is the best case scenario for the slab code, as you'd need
author date for each and every commit object in this use case, and
the comparison function called by prio-queue would be called for the
same object many times.

But the hash function being oidhash(), I am a bit surprised.  It
shouldn't be so much more expensive to peek at the first 4 bytes and
then do the usual hashtable thing than looking at the in-object
commit->index.  Is it a sign that the range of oidhash() is a bit
too small for a real workload?

Nah, 4 byte unsigned integer should be sufficient for the number of
objects in the kernel.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] describe: use khash in finish_depth_computation()
  2025-08-25 18:48         ` Junio C Hamano
@ 2025-08-26  3:39           ` Jeff King
  2025-08-26  4:26             ` Jeff King
  0 siblings, 1 reply; 21+ messages in thread
From: Jeff King @ 2025-08-26  3:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, Git List

On Mon, Aug 25, 2025 at 11:48:09AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > So out of curiosity I tried replacing a slab that should be pretty
> > densely filled, using a khash based on oidhash/ptr along with some
> > quality-of-life wrappers.  Patch is below.
> >
> > It performs...very badly. Not sure if I've screwed something up, but
> > it's about 7x slower to run "git rev-list --author-date-order HEAD" in
> > the kernel. So maybe slabs really are worth it overall.
> 
> Hmph.  It is the best case scenario for the slab code, as you'd need
> author date for each and every commit object in this use case, and
> the comparison function called by prio-queue would be called for the
> same object many times.
> 
> But the hash function being oidhash(), I am a bit surprised.  It
> shouldn't be so much more expensive to peek at the first 4 bytes and
> then do the usual hashtable thing than looking at the in-object
> commit->index.  Is it a sign that the range of oidhash() is a bit
> too small for a real workload?
> 
> Nah, 4 byte unsigned integer should be sufficient for the number of
> objects in the kernel.

I was surprised, too. I expected it be maybe 20% slower or something.
Which really makes me think I've managed to screw up the patch, but if
so, I don't see it. I tried profiling the result, expecting to see a
bunch of extra time spent in obj_timestamp_put() or obj_timestamp_get().
But I don't. They account together for only a few percent of the
run-time, according to perf.

So I dunno. I am confused by the results, but I am not sure if I am
holding it wrong.

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] describe: use khash in finish_depth_computation()
  2025-08-26  3:39           ` Jeff King
@ 2025-08-26  4:26             ` Jeff King
  2025-08-26  5:52               ` Jeff King
  2025-08-26 15:34               ` Junio C Hamano
  0 siblings, 2 replies; 21+ messages in thread
From: Jeff King @ 2025-08-26  4:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, Git List

On Mon, Aug 25, 2025 at 11:39:20PM -0400, Jeff King wrote:

> > But the hash function being oidhash(), I am a bit surprised.  It
> > shouldn't be so much more expensive to peek at the first 4 bytes and
> > then do the usual hashtable thing than looking at the in-object
> > commit->index.  Is it a sign that the range of oidhash() is a bit
> > too small for a real workload?
> > 
> > Nah, 4 byte unsigned integer should be sufficient for the number of
> > objects in the kernel.
> 
> I was surprised, too. I expected it be maybe 20% slower or something.
> Which really makes me think I've managed to screw up the patch, but if
> so, I don't see it. I tried profiling the result, expecting to see a
> bunch of extra time spent in obj_timestamp_put() or obj_timestamp_get().
> But I don't. They account together for only a few percent of the
> run-time, according to perf.
> 
> So I dunno. I am confused by the results, but I am not sure if I am
> holding it wrong.

OK, maybe I am just holding it wrong. I think I may have mistakenly been
using the wrong timing for my baseline (maybe --date-order instead of
--author-date-order; the latter is _way_ more expensive because we have
to open the commits to parse the author date).

Here's a more apples-to-apples comparison using hyperfine. On git.git:

  Benchmark 1: ./git.slab rev-list --author-date-order HEAD
    Time (mean ± σ):     547.3 ms ±  12.2 ms    [User: 535.8 ms, System: 11.3 ms]
    Range (min … max):   536.1 ms … 566.4 ms    10 runs
  
  Benchmark 2: ./git.hash rev-list --author-date-order HEAD
    Time (mean ± σ):     558.6 ms ±  11.2 ms    [User: 542.4 ms, System: 16.0 ms]
    Range (min … max):   544.4 ms … 572.6 ms    10 runs
  
  Summary
    ./git.slab rev-list --author-date-order HEAD ran
      1.02 ± 0.03 times faster than ./git.hash rev-list --author-date-order HEAD

So a little slowdown, but within the run-to-run noise. And on linux.git:

  Benchmark 1: ~/compile/git/git.slab rev-list --author-date-order HEAD
    Time (mean ± σ):     11.020 s ±  0.131 s    [User: 10.764 s, System: 0.254 s]
    Range (min … max):   10.886 s … 11.262 s    10 runs
  
  Benchmark 2: ~/compile/git/git.hash rev-list --author-date-order HEAD
    Time (mean ± σ):     11.682 s ±  0.204 s    [User: 11.398 s, System: 0.282 s]
    Range (min … max):   11.424 s … 12.139 s    10 runs
  
  Summary
    ~/compile/git/git.slab rev-list --author-date-order HEAD ran
      1.06 ± 0.02 times faster than ~/compile/git/git.hash rev-list --author-date-order HEAD


A little more measurable there. Those numbers are more in line with what
I was expecting. I'm not sure what it all means, though. 6% is enough
that it is probably worth keeping a custom data type like slab around.
Though it would be nice to have a data type that worked on all object
types and didn't necessarily use a ton of memory.

This particular case may not be representative, either. I picked it
because it was easy to convert. But I wonder how bad it would be to put
the object flags for a traversal into a hash. Right now those are in the
original struct, not even in a commit-slab. So I'd guess it's an even
bigger slowdown.

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] describe: use khash in finish_depth_computation()
  2025-08-26  4:26             ` Jeff King
@ 2025-08-26  5:52               ` Jeff King
  2025-08-26 15:34               ` Junio C Hamano
  1 sibling, 0 replies; 21+ messages in thread
From: Jeff King @ 2025-08-26  5:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, Git List

On Tue, Aug 26, 2025 at 12:26:07AM -0400, Jeff King wrote:

> This particular case may not be representative, either. I picked it
> because it was easy to convert. But I wonder how bad it would be to put
> the object flags for a traversal into a hash. Right now those are in the
> original struct, not even in a commit-slab. So I'd guess it's an even
> bigger slowdown.

Just for fun, and because I hadn't tortured myself with coccinelle
syntax in a while, I tried removing object.flags entirely. Apply the
patch below, then:

  make SPATCH_USE_O_DEPENDENCIES= contrib/coccinelle/flags.cocci.patch
  git apply contrib/coccinelle/flags.cocci.patch

That misses a few stragglers, which I'll include fixes for below.
Anyway, the result is a system that does hash lookups for every access
of the "flags" field. I added a few helpers to do |= and &= without
doing two lookups. Probably this could go a little further with some
kind of test-and-set (e.g., you want to know if SEEN is set and if not,
set it and do some other stuff). But it gives a general idea.

I tried timing merge-base which I know will need to paint flags, using
an old commit (so that we have to walk almost down to the root). Here
are the results:

  Benchmark 1: ./git.orig merge-base HEAD 19b2860cba574
    Time (mean ± σ):      31.7 ms ±   0.7 ms    [User: 26.1 ms, System: 5.5 ms]
    Range (min … max):    30.5 ms …  34.0 ms    89 runs
  
  Benchmark 2: ./git.hash merge-base HEAD 19b2860cba574
    Time (mean ± σ):      39.7 ms ±   0.7 ms    [User: 33.9 ms, System: 5.8 ms]
    Range (min … max):    38.5 ms …  42.6 ms    75 runs
  
  Summary
    ./git.orig merge-base HEAD 19b2860cba574 ran
      1.25 ± 0.03 times faster than ./git.hash merge-base HEAD 19b2860cba574

And here's a rev-list which needs to set SEEN on a lot of objects (but
also has to actually inflate a lot of trees, too):

  $ hyperfine -L v orig,hash './git.{v} rev-list --objects --all'
  Benchmark 1: ./git.orig rev-list --objects --all
    Time (mean ± σ):      2.875 s ±  0.050 s    [User: 2.799 s, System: 0.075 s]
    Range (min … max):    2.803 s …  2.973 s    10 runs
  
  Benchmark 2: ./git.hash rev-list --objects --all
    Time (mean ± σ):      5.547 s ±  0.069 s    [User: 5.477 s, System: 0.070 s]
    Range (min … max):    5.396 s …  5.671 s    10 runs
  
  Summary
    ./git.orig rev-list --objects --all ran
      1.93 ± 0.04 times faster than ./git.hash rev-list --objects --all


I expected the merge-base to fare much worse, since it's otherwise
mostly just pulling commit data from the commit-graph file and walking
the pointers. But I guess we check a lot of SEEN flags for rev-list.

Anyway, I think the numbers there demonstrate that it's probably not
that interesting a direction. I'm a little curious how a slab would
fare. It shouldn't be _too_ hard to slot it in with a similar API (the
real work was adjusting all the callers via cocci). But I'm not sure I
have the stomach to waste more time on this. ;)

Patches below for reference. This is the one that rips out the flags
field:

---
 contrib/coccinelle/flags.cocci | 59 ++++++++++++++++++++++++++++++++++
 object.c                       |  5 ++-
 object.h                       | 29 ++++++++++++++++-
 3 files changed, 91 insertions(+), 2 deletions(-)
 create mode 100644 contrib/coccinelle/flags.cocci

diff --git a/contrib/coccinelle/flags.cocci b/contrib/coccinelle/flags.cocci
new file mode 100644
index 0000000000..5dae0054d6
--- /dev/null
+++ b/contrib/coccinelle/flags.cocci
@@ -0,0 +1,59 @@
+@@
+struct object *OBJPTR;
+expression FLAGS;
+@@
+- OBJPTR->flags |= FLAGS
++ objflags_set(OBJPTR, FLAGS)
+
+@@
+struct object *OBJPTR;
+expression FLAGS;
+@@
+- OBJPTR->flags &= FLAGS
++ objflags_clear(OBJPTR, FLAGS)
+
+@@
+struct object *OBJPTR;
+@@
+- OBJPTR->flags
++ objflags_get(OBJPTR)
+
+@@
+expression TYPEDOBJ;
+expression FLAGS;
+@@
+- TYPEDOBJ->object.flags |= FLAGS
++ objflags_set(&TYPEDOBJ->object, FLAGS)
+
+@@
+expression TYPEDOBJ;
+expression FLAGS;
+@@
+- TYPEDOBJ->object.flags &= FLAGS
++ objflags_clear(&TYPEDOBJ->object, FLAGS)
+
+@@
+expression TYPEDOBJ;
+@@
+- TYPEDOBJ->object.flags
++ objflags_get(&TYPEDOBJ->object)
+
+@@
+struct object_array_entry *ENTRY;
+expression FLAGS;
+@@
+- ENTRY->item->flags |= FLAGS
++ objflags_set(ENTRY->item, FLAGS)
+
+@@
+struct object_array_entry *ENTRY;
+expression FLAGS;
+@@
+- ENTRY->item->flags &= FLAGS
++ objflags_clear(ENTRY->item, FLAGS)
+
+@@
+struct object_array_entry *ENTRY;
+@@
+- ENTRY->item->flags
++ objflags_get(ENTRY->item)
diff --git a/object.c b/object.c
index c1553ee433..679aadbd7b 100644
--- a/object.c
+++ b/object.c
@@ -14,6 +14,9 @@
 #include "alloc.h"
 #include "commit-graph.h"
 
+static kh_obj_flags_t objflags_data;
+kh_obj_flags_t *objflags = &objflags_data;
+
 unsigned int get_max_object_index(const struct repository *repo)
 {
 	return repo->parsed_objects->obj_hash_size;
@@ -149,7 +152,7 @@ void *create_object(struct repository *r, const struct object_id *oid, void *o)
 	struct object *obj = o;
 
 	obj->parsed = 0;
-	obj->flags = 0;
+	/* flags default to 0 via flags hash */
 	oidcpy(&obj->oid, oid);
 
 	if (r->parsed_objects->obj_hash_size - 1 <= r->parsed_objects->nr_objs * 2)
diff --git a/object.h b/object.h
index 14718051f2..50d1d0e65f 100644
--- a/object.h
+++ b/object.h
@@ -159,7 +159,6 @@ static inline unsigned int canon_mode(unsigned int mode)
 struct object {
 	unsigned parsed : 1;
 	unsigned type : TYPE_BITS;
-	unsigned flags : FLAG_BITS;
 	struct object_id oid;
 };
 
@@ -369,5 +368,33 @@ static inline value_type name##_get(kh_##name##_t *map, const struct object *obj
 }
 
 OBJHASH_INIT(obj_timestamp, timestamp_t, 0);
+OBJHASH_INIT(obj_flags, unsigned, 0);
+
+extern kh_obj_flags_t *objflags;
+
+static inline unsigned objflags_get(const struct object *obj)
+{
+	return obj_flags_get(objflags, obj);
+}
+
+static inline void objflags_set(const struct object *obj, unsigned flags)
+{
+	int hash_ret;
+	khiter_t pos = kh_put_obj_flags(objflags, obj, &hash_ret);
+	if (!hash_ret)
+		kh_value(objflags, pos) |= flags;
+	else
+		kh_value(objflags, pos) = flags;
+}
+
+static inline void objflags_clear(const struct object *obj, unsigned flags)
+{
+	int hash_ret;
+	khiter_t pos = kh_put_obj_flags(objflags, obj, &hash_ret);
+	if (!hash_ret)
+		kh_value(objflags, pos) &= flags;
+	else
+		kh_value(objflags, pos) = 0;
+}
 
 #endif /* OBJECT_H */

And then this one gives you the fixups on top of applying the cocci
patch:

---
 builtin/describe.c |  5 +++--
 builtin/fsck.c     |  2 +-
 builtin/log.c      | 10 +++++-----
 commit-graph.c     |  2 +-
 commit-reach.c     |  2 +-
 http-push.c        |  2 +-
 range-diff.c       |  2 +-
 reflog.c           |  6 +++---
 revision.c         |  4 ++--
 9 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index b9ebdb9a3d..aeb3b3344f 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -290,7 +290,7 @@ static bool all_have_flag(const struct lazy_queue *queue, unsigned flag)
 {
 	for (size_t i = queue->get_pending ? 1 : 0; i < queue->queue.nr; i++) {
 		struct commit *commit = queue->queue.array[i].data;
-		if (!(commit->object.flags & flag))
+		if (!(objflags_get(&commit->object) & flag))
 			return false;
 	}
 	return true;
@@ -398,7 +398,8 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
 		have_util = 1;
 	}
 
-	objflags_get(&cmit->object) = SEEN;
+	/* weird, this assigns SEEN instead of OR-ing it? */
+	obj_flags_put(objflags, &cmit->object, SEEN);
 	lazy_queue_put(&queue, cmit);
 	while (!lazy_queue_empty(&queue)) {
 		struct commit *c = lazy_queue_get(&queue);
diff --git a/builtin/fsck.c b/builtin/fsck.c
index c2a3df8c1d..c5d5a4ea75 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -214,7 +214,7 @@ static int mark_used(struct object *obj, enum object_type type UNUSED,
 {
 	if (!obj)
 		return 1;
-	obj->flags |= USED;
+	objflags_set(obj, USED);
 	return 0;
 }
 
diff --git a/builtin/log.c b/builtin/log.c
index 9ceeace9c3..34ba1ff782 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1164,8 +1164,8 @@ static void get_patch_ids(struct rev_info *rev, struct patch_ids *ids)
 	/* given a range a..b get all patch ids for b..a */
 	repo_init_revisions(the_repository, &check_rev, rev->prefix);
 	check_rev.max_parents = 1;
-	objflags_get(o1) ^= UNINTERESTING;
-	objflags_get(o2) ^= UNINTERESTING;
+	obj_flags_put(objflags, o1, objflags_get(o1) ^ UNINTERESTING);
+	obj_flags_put(objflags, o2, objflags_get(o2) ^ UNINTERESTING);
 	add_pending_object(&check_rev, o1, "o1");
 	add_pending_object(&check_rev, o2, "o2");
 	if (prepare_revision_walk(&check_rev))
@@ -1178,8 +1178,8 @@ static void get_patch_ids(struct rev_info *rev, struct patch_ids *ids)
 	/* reset for next revision walk */
 	clear_commit_marks(c1, SEEN | UNINTERESTING | SHOWN | ADDED);
 	clear_commit_marks(c2, SEEN | UNINTERESTING | SHOWN | ADDED);
-	objflags_get(o1) = flags1;
-	objflags_get(o2) = flags2;
+	obj_flags_put(objflags, o1, flags1);
+	obj_flags_put(objflags, o2, flags2);
 }
 
 static void gen_message_id(struct rev_info *info, const char *base)
@@ -2224,7 +2224,7 @@ int cmd_format_patch(int argc,
 			 * origin" that prepares what the origin side still
 			 * does not have.
 			 */
-			rev.pending.objects[0].item->flags |= UNINTERESTING;
+			objflags_set(rev.pending.objects[0].item, UNINTERESTING);
 			add_head_to_pending(&rev);
 			check_head = 1;
 		}
diff --git a/commit-graph.c b/commit-graph.c
index 89a48171e4..a29f716523 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1568,7 +1568,7 @@ static void close_reachable(struct write_commit_graph_context *ctx)
 		commit = lookup_commit(ctx->r, &ctx->oids.oid[i]);
 
 		if (commit)
-			commit->object.flags &= ~REACHABLE;
+			objflags_clear(&commit->object, ~REACHABLE);
 	}
 	stop_progress(&ctx->progress);
 }
diff --git a/commit-reach.c b/commit-reach.c
index 329c61d559..3189aee0db 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -812,7 +812,7 @@ int can_all_from_reach_with_flag(struct object_array *from,
 			 * leave a note to ourselves not to worry about
 			 * this object anymore.
 			 */
-			from->objects[i].item->flags |= assign_flag;
+			objflags_set(from->objects[i].item, assign_flag);
 			continue;
 		}
 
diff --git a/http-push.c b/http-push.c
index 0706929cac..5d7635988e 100644
--- a/http-push.c
+++ b/http-push.c
@@ -1377,7 +1377,7 @@ static int get_delta(struct rev_info *revs, struct remote_lock *lock)
 	while (objects) {
 		struct object_list *next = objects->next;
 
-		if (!(objects->item->flags & UNINTERESTING))
+		if (!(objflags_get(objects->item) & UNINTERESTING))
 			count += add_send_request(objects->item, lock);
 
 		free(objects);
diff --git a/range-diff.c b/range-diff.c
index 8a2dcbee32..2774720615 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -611,7 +611,7 @@ int is_range_diff_range(const char *arg)
 	repo_init_revisions(the_repository, &revs, NULL);
 	if (setup_revisions(3, argv, &revs, NULL) == 1) {
 		for (i = 0; i < revs.pending.nr; i++)
-			if (revs.pending.objects[i].item->flags & UNINTERESTING)
+			if (objflags_get(revs.pending.objects[i].item) & UNINTERESTING)
 				negative++;
 			else
 				positive++;
diff --git a/reflog.c b/reflog.c
index 879e802228..ff99a8353c 100644
--- a/reflog.c
+++ b/reflog.c
@@ -244,12 +244,12 @@ static int commit_is_complete(struct commit *commit)
 		if (!is_incomplete) {
 			/* mark all found commits as complete, iow SEEN */
 			for (i = 0; i < found.nr; i++)
-				found.objects[i].item->flags |= SEEN;
+				objflags_set(found.objects[i].item, SEEN);
 		}
 	}
 	/* clear flags from the objects we traversed */
 	for (i = 0; i < found.nr; i++)
-		found.objects[i].item->flags &= ~STUDYING;
+		objflags_clear(found.objects[i].item, ~STUDYING);
 	if (is_incomplete)
 		objflags_set(&commit->object, INCOMPLETE);
 	else {
@@ -262,7 +262,7 @@ static int commit_is_complete(struct commit *commit)
 		 * we have seen during this process are complete.
 		 */
 		for (i = 0; i < found.nr; i++)
-			found.objects[i].item->flags |= SEEN;
+			objflags_set(found.objects[i].item, SEEN);
 	}
 	/* free object arrays */
 	object_array_clear(&study);
diff --git a/revision.c b/revision.c
index 90e3ee30d5..4fd4b96eb1 100644
--- a/revision.c
+++ b/revision.c
@@ -3656,10 +3656,10 @@ static void trace2_topo_walk_statistics_atexit(void)
 
 static inline void test_flag_and_insert(struct prio_queue *q, struct commit *c, int flag)
 {
-	if (c->object.flags & flag)
+	if (objflags_get(&c->object) & flag)
 		return;
 
-	c->object.flags |= flag;
+	objflags_set(&c->object, flag);
 	prio_queue_put(q, c);
 }
 
-- 
2.51.0.382.g68ff637df6



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] describe: use khash in finish_depth_computation()
  2025-08-26  4:26             ` Jeff King
  2025-08-26  5:52               ` Jeff King
@ 2025-08-26 15:34               ` Junio C Hamano
  1 sibling, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2025-08-26 15:34 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, Git List

Jeff King <peff@peff.net> writes:

> OK, maybe I am just holding it wrong. I think I may have mistakenly been
> using the wrong timing for my baseline (maybe --date-order instead of
> --author-date-order; the latter is _way_ more expensive because we have
> to open the commits to parse the author date).

True.  And the numbers below are much more believable ;-)

> Here's a more apples-to-apples comparison using hyperfine. On git.git:
> ...
> So a little slowdown, but within the run-to-run noise. And on linux.git:
> ...
> A little more measurable there. Those numbers are more in line with what
> I was expecting.
> ...
> I'm not sure what it all means, though. 6% is enough that it is
> probably worth keeping a custom data type like slab around.
> Though it would be nice to have a data type that worked on all
> object types and didn't necessarily use a ton of memory.

Yup, a short version of it is "assume that we still live in the old
world where 'you want to compute X?  spawn a process, work in-core
and compute and spew out your result, then exit(3) will clean up
after you' was the norm.  And our codebase has an easy way to tell
the object API to allocate N extra bytes for app specific per object
bookkeeping purpose.  Now, would you use that facility to store your
data there because you'd need that data for all objects you touch in
your application?" and if the answer is yes, it belongs to slab,
otherwise you are better off with a separate hashtable indexed via
object names populated only for few selected objects you care about.

For example, the slab is something I would have used to rewrite
"show-branch", since it would have been really handy to have more
bits in per-object "flags", and the most natural way to write that
application is to keep one bit in all objects it slurps in core for
every starting ref.

> This particular case may not be representative, either. I picked it
> because it was easy to convert. But I wonder how bad it would be to put
> the object flags for a traversal into a hash. Right now those are in the
> original struct, not even in a commit-slab. So I'd guess it's an even
> bigger slowdown.

True, too.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] describe: use khash in finish_depth_computation()
  2025-08-25  7:34     ` Jeff King
  2025-08-25  8:13       ` Jeff King
@ 2025-08-31 17:25       ` René Scharfe
  2025-09-01 19:06         ` René Scharfe
  2025-09-02 12:38         ` Jeff King
  2025-09-03 16:30       ` René Scharfe
  2 siblings, 2 replies; 21+ messages in thread
From: René Scharfe @ 2025-08-31 17:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List

On 8/25/25 9:34 AM, Jeff King wrote:
> On Sun, Aug 24, 2025 at 06:32:47PM +0200, René Scharfe wrote:
> 
>>>> Use the commit index value as a hash because it is unique, has the
>>>> right size and needs no computation.  We could also derive the hash
>>>> directly from the pointer value, but that requires slightly more effort.
>>>
>>> Interesting. This is exactly what commit-slabs were created for (and are
>>> why the convenient index value is there in the first place!).
>>
>> Kinda -- they have a payload value, while a khash set only stores keys.
>> A commit slab with an int payload would still be smaller than a khash
>> set with 64-bit pointers as keys -- IF we were to add all commits.  Here
>> we typically add just a few, but a pathological history could add a lot;
>> not sure if there's a boundary.  Hmm.  So you might be able to find
>> examples where commit-slabs win.
> 
> A khash set doesn't have a "vals" array, but I think it does always keep
> the flags array, which is a uint32_t per bucket. That's how it knows
> which buckets have a value in them (since it does not otherwise assume
> there is a sentinel value like NULL). So even though its API is that of
> a set, you can think of it as a mapping of keys to flags. So a mapping
> to an int (or even a char) should cost the same.

Admittedly I *did* ignore the flags and they *are* stored in an array of
uint32_t, but they occupy only two bits per bucket (see __ac_isempty,
__ac_isdel, __ac_fsize).

>>> +define_commit_slab(commit_counter, int);
>>
>> We only need one bit, so a uint8_t or char would suffice.
> 
> True, though that doesn't seem to change timings much for me (I used
> "int" because that's what your khash used, but I think the type is a
> dummy value in "set" mode).
I also saw no significant change.  Yes, the second type is not actually
used in a khash set.

> Thinking on it more, how much benefit are we getting from the use
> of commit->index in your patch? In other contexts where we store oids or
> objects in hash tables, we use the low bits of the oid directly. So it's
> similarly cheap. If I do this tweak on top of your patch:
> 
> diff --git a/builtin/describe.c b/builtin/describe.c
> index edb4dec79d..e024feb080 100644
> --- a/builtin/describe.c
> +++ b/builtin/describe.c
> @@ -289,7 +289,7 @@ static void lazy_queue_clear(struct lazy_queue *queue)
>  
>  static inline unsigned int commit_index(const struct commit *commit)
>  {
> -	return commit->index;
> +	return oidhash(&commit->object.oid);
>  }
>  
>  static inline int ptr_eq(const void *a, const void *b)
> 
> I get similar results (actually faster, but I think within the noise).

Sure.  I'm not comfortable with oidhash() though, because it allows
attackers to influence the hash value, cause collisions and thus
increase the cost of lookups and inserts to O(N), leading to quadratic
complexity overall.

They "just" need to construct commits with a common hash prefix.  I
guess that's easy for two bytes and hard for four bytes.  Not sure how
what an attacker would get out of planting such performance traps, but
I guess some people would do it just for the heck of it.

Letting oidhash() XOR in another word would close that line of attack
for quite a while, I assume.

> Which made me think more (always a danger). We already have an oidset
> data structure, which is backed by a khash. It's not _quite_ the same
> thing, because it's going to store an actual 36-byte oid struct as the
> hash key, rather than an 8-byte pointer to a "struct object" (which
> contains that same oid). And likewise when we do a lookup, the bucket
> search requires a larger memcmp() than the pointer equality. But how
> much difference does that make?
> 
> If I instead do this on top of your patch:
> 
> diff --git a/builtin/describe.c b/builtin/describe.c
> index edb4dec79d..38ce0c1978 100644
> --- a/builtin/describe.c
> +++ b/builtin/describe.c
> @@ -287,41 +287,16 @@ static void lazy_queue_clear(struct lazy_queue *queue)
>  	queue->get_pending = false;
>  }
>  
> -static inline unsigned int commit_index(const struct commit *commit)
> -{
> -	return commit->index;
> -}
> -
> -static inline int ptr_eq(const void *a, const void *b)
> -{
> -	return a == b;
> -}
> -
> -KHASH_INIT(commit_set, struct commit *, int, 0, commit_index, ptr_eq)
> -
> -static void commit_set_insert(kh_commit_set_t *set, struct commit *commit)
> -{
> -	int added;
> -	kh_put_commit_set(set, commit, &added);
> -}
> -
> -static void commit_set_remove(kh_commit_set_t *set, struct commit *commit)
> -{
> -	khiter_t pos = kh_get_commit_set(set, commit);
> -	if (pos != kh_end(set))
> -		kh_del_commit_set(set, pos);
> -}
> -
>  static unsigned long finish_depth_computation(struct lazy_queue *queue,
>  					      struct possible_tag *best)
>  {
>  	unsigned long seen_commits = 0;
> -	kh_commit_set_t unflagged = { 0 };
> +	struct oidset unflagged = OIDSET_INIT;
>  
>  	for (size_t i = queue->get_pending ? 1 : 0; i < queue->queue.nr; i++) {
>  		struct commit *commit = queue->queue.array[i].data;
>  		if (!(commit->object.flags & best->flag_within))
> -			commit_set_insert(&unflagged, commit);
> +			oidset_insert(&unflagged, &commit->object.oid);
>  	}
>  
>  	while (!lazy_queue_empty(queue)) {
> @@ -330,10 +305,10 @@ static unsigned long finish_depth_computation(struct lazy_queue *queue,
>  
>  		seen_commits++;
>  		if (c->object.flags & best->flag_within) {
> -			if (!kh_size(&unflagged))
> +			if (!oidset_size(&unflagged))
>  				break;
>  		} else {
> -			commit_set_remove(&unflagged, c);
> +			oidset_remove(&unflagged, &c->object.oid);
>  			best->depth++;
>  		}
>  		while (parents) {
> @@ -348,13 +323,13 @@ static unsigned long finish_depth_computation(struct lazy_queue *queue,
>  			p->object.flags |= c->object.flags;
>  			flag_after = p->object.flags & best->flag_within;
>  			if (!seen && !flag_after)
> -				commit_set_insert(&unflagged, p);
> +				oidset_insert(&unflagged, &p->object.oid);
>  			if (seen && !flag_before && flag_after)
> -				commit_set_remove(&unflagged, p);
> +				oidset_remove(&unflagged, &p->object.oid);
>  			parents = parents->next;
>  		}
>  	}
> -	kh_release_commit_set(&unflagged);
> +	oidset_clear(&unflagged);
>  	return seen_commits;
>  }
>  
> 
> then I likewise get very similar timings. Your version seems to be
> consistently a little bit faster, but within the run-to-run noise of
> ~1%. But the bonus here is that we didn't need to define a new hash
> type, nor do any tricks with the commit->index field.

I don't see any performance difference at all.  Using the existing
oidset structure is clearly better under these circumstances.

> Now if we really are worried about those extra bytes in storing a fresh
> oid, is there room for new data types? I.e., A "struct objset" that
> stores object pointers, hashes based on the oid, and uses pointer
> equality to find a match?  And likewise a "struct objmap" that could
> perhaps compete with commit-slab (but be more pleasant to use).

Maybe.  It's a matter of measuring the performance difference, though,
and that's hard.

>>>   2. Can the hash version handle strides of different sizes? One of the
>>>      points of commit-slab is that the fixed size of the value type can
>>>      be set at runtime (so you could have a slab of 32 bits per commit,
>>>      or 132, depending on your traversal needs).
>>
>> Dynamic value sizes require an indirection via a pointer, or at least I
>> don't see any other way.  What would be a possible use case?  (Don't see
>> any in-tree.)
> 
> The value isn't totally dynamic; it's static for a single instantiation
> of a slab. I don't think khash supports that (it has a value type and
> lets the compiler take care of indexing into an array of that value).
> But you could imagine a world where khash (or some custom hash code)
> only knows that it's storing N bytes per item, and does the indexing
> multiplication itself.
> 
> But yes, I don't think we are even using that feature of commit-slab
> currently. I used it in a series for fast --contains long ago:
> 
>   https://lore.kernel.org/git/20140625233429.GA20457@sigill.intra.peff.net/
> 
> but can't remember why I never got back to it. It would also be useful
> for show-branch, which wants to paint down for each starting tip. But
> given that we haven't used it, it may just not be that useful a feature.

Or for describe, actually, to allow an arbitrarily large --max-candidates
value.

René


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] describe: use khash in finish_depth_computation()
  2025-08-31 17:25       ` René Scharfe
@ 2025-09-01 19:06         ` René Scharfe
  2025-09-02 12:38         ` Jeff King
  1 sibling, 0 replies; 21+ messages in thread
From: René Scharfe @ 2025-09-01 19:06 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List

On 8/31/25 7:25 PM, René Scharfe wrote:
> Sure.  I'm not comfortable with oidhash() though, because it allows
> attackers to influence the hash value, cause collisions and thus
> increase the cost of lookups and inserts to O(N), leading to quadratic
> complexity overall.
> 
> They "just" need to construct commits with a common hash prefix.  I
> guess that's easy for two bytes and hard for four bytes.  Not sure how
> what an attacker would get out of planting such performance traps, but
> I guess some people would do it just for the heck of it.

There's https://github.com/not-an-aardvark/lucky-commit, which claims to
take a quarter of a second on a four year old laptop to give a commit a
chosen 28-bit hash prefix by adjusting whitespace in its message.
Constructing a history with a common 32-bit prefix that would
effectively turn any oidhash()-based hash table into an unordered list
seems within easy reach.

René


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] describe: use khash in finish_depth_computation()
  2025-08-31 17:25       ` René Scharfe
  2025-09-01 19:06         ` René Scharfe
@ 2025-09-02 12:38         ` Jeff King
  2025-09-02 18:51           ` René Scharfe
  1 sibling, 1 reply; 21+ messages in thread
From: Jeff King @ 2025-09-02 12:38 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List

On Sun, Aug 31, 2025 at 07:25:13PM +0200, René Scharfe wrote:

> > diff --git a/builtin/describe.c b/builtin/describe.c
> > index edb4dec79d..e024feb080 100644
> > --- a/builtin/describe.c
> > +++ b/builtin/describe.c
> > @@ -289,7 +289,7 @@ static void lazy_queue_clear(struct lazy_queue *queue)
> >  
> >  static inline unsigned int commit_index(const struct commit *commit)
> >  {
> > -	return commit->index;
> > +	return oidhash(&commit->object.oid);
> >  }
> >  
> >  static inline int ptr_eq(const void *a, const void *b)
> > 
> > I get similar results (actually faster, but I think within the noise).
> 
> Sure.  I'm not comfortable with oidhash() though, because it allows
> attackers to influence the hash value, cause collisions and thus
> increase the cost of lookups and inserts to O(N), leading to quadratic
> complexity overall.
>
> They "just" need to construct commits with a common hash prefix.  I
> guess that's easy for two bytes and hard for four bytes.  Not sure how
> what an attacker would get out of planting such performance traps, but
> I guess some people would do it just for the heck of it.

I doubt that commit->index is any better in that regard. If I can
influence the order in which Git loads the commits (e.g., by creating a
bunch of refs which get loaded when we walk over for_each_ref), I can
choose the index for each. They'll be unique, but I can still cause
collisions modulo the number of buckets.

Likewise for oidhash(), I'd guess that colliding 4 bytes is not even
necessary to cause trouble, since probing starts by throwing away
everything mod n_buckets. So really you just need to collide however
many low bits you need to make your desired N, and then get O(N^2)
behavior.

I'm not super worried about it, because in my experience Git is already
a perfectly tuned device for convincing other people to waste CPU. ;)
But if we want to address it, I'd rather do so by improving oidhash()
than avoiding it. Specifically...

> Letting oidhash() XOR in another word would close that line of attack
> for quite a while, I assume.

Yeah. We have at least 160 bits in an object hash, and we only bother
with the low 32. We could XOR up to 5 times, but I agree that even a
single extra word would probably be plenty. Might be an interesting
experiment to time something like the patch below on a hash-heavy
workload.

diff --git a/hash.h b/hash.h
index fae966b23c..c9d21f589e 100644
--- a/hash.h
+++ b/hash.h
@@ -457,7 +457,10 @@ static inline unsigned int oidhash(const struct object_id *oid)
 	 * platforms that don't support unaligned reads.
 	 */
 	unsigned int hash;
+	unsigned int entropy;
 	memcpy(&hash, oid->hash, sizeof(hash));
+	memcpy(&entropy, oid->hash + sizeof(hash), sizeof(entropy));
+	hash ^= entropy;
 	return hash;
 }

I suspect it won't make a big time difference. The old code should have
been optimized down to a single word load, and now we have two word
loads and an xor. That probably isn't important compared to the actual
5-word memcmp() we have to do in order to verify that we found the right
bucket anyway.

-Peff

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2] describe: use oidset in finish_depth_computation()
  2025-08-24  8:37 [PATCH] describe: use khash in finish_depth_computation() René Scharfe
  2025-08-24 10:31 ` Jeff King
@ 2025-09-02 18:24 ` René Scharfe
  2025-09-03 14:36   ` Jeff King
  1 sibling, 1 reply; 21+ messages in thread
From: René Scharfe @ 2025-09-02 18:24 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Jeff King

Depth computation can end early if all remaining commits are flagged.
The current code determines if that's the case by checking all queue
items each time it dequeues a flagged commit.  This can cause
quadratic complexity.

We could simply count the flagged items in the queue and then update
that number as we add and remove items.  That would provide a general
speedup, but leave one case where we have to scan the whole queue: When
we flag a previously seen, but unflagged commit.  It could be on the
queue and then we'd have to decrease our count.

We could dedicate an object flag to track queue membership, but that
would leave less for candidate tags, affecting the results.  So use a
hash table, specifically an oidset of commit hashes, to track that.
This avoids quadratic behaviour in all cases and provides a nice
performance boost over the previous commit, 08bb69d70f (describe: use
prio_queue_replace(), 2025-08-03):

Benchmark 1: ./git_08bb69d70f describe $(git rev-list v2.41.0..v2.47.0)
  Time (mean ± σ):     855.3 ms ±   1.3 ms    [User: 790.8 ms, System: 49.9 ms]
  Range (min … max):   853.7 ms … 857.8 ms    10 runs

Benchmark 2: ./git describe $(git rev-list v2.41.0..v2.47.0)
  Time (mean ± σ):     610.8 ms ±   1.7 ms    [User: 546.9 ms, System: 49.3 ms]
  Range (min … max):   608.9 ms … 613.3 ms    10 runs

Summary
  ./git describe $(git rev-list v2.41.0..v2.47.0) ran
    1.40 ± 0.00 times faster than ./git_08bb69d70f describe $(git rev-list v2.41.0..v2.47.0)

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
Changes:
- Use oidset instead of a custom khash set for simplicity.
- Removed spurious whitespace changes.
- Formatted with --patience to better see the function removal.

 builtin/describe.c | 36 +++++++++++++++++++++++-------------
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index fbe78ace66..9f4e26d7ff 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -24,6 +24,7 @@
 #include "commit-slab.h"
 #include "wildmatch.h"
 #include "prio-queue.h"
+#include "oidset.h"
 
 #define MAX_TAGS	(FLAG_BITS - 1)
 #define DEFAULT_CANDIDATES 10
@@ -286,38 +287,47 @@ static void lazy_queue_clear(struct lazy_queue *queue)
 	queue->get_pending = false;
 }
 
-static bool all_have_flag(const struct lazy_queue *queue, unsigned flag)
-{
-	for (size_t i = queue->get_pending ? 1 : 0; i < queue->queue.nr; i++) {
-		struct commit *commit = queue->queue.array[i].data;
-		if (!(commit->object.flags & flag))
-			return false;
-	}
-	return true;
-}
-
 static unsigned long finish_depth_computation(struct lazy_queue *queue,
 					      struct possible_tag *best)
 {
 	unsigned long seen_commits = 0;
+	struct oidset unflagged = OIDSET_INIT;
+
+	for (size_t i = queue->get_pending ? 1 : 0; i < queue->queue.nr; i++) {
+		struct commit *commit = queue->queue.array[i].data;
+		if (!(commit->object.flags & best->flag_within))
+			oidset_insert(&unflagged, &commit->object.oid);
+	}
+
 	while (!lazy_queue_empty(queue)) {
 		struct commit *c = lazy_queue_get(queue);
 		struct commit_list *parents = c->parents;
 		seen_commits++;
 		if (c->object.flags & best->flag_within) {
-			if (all_have_flag(queue, best->flag_within))
+			if (!oidset_size(&unflagged))
 				break;
-		} else
+		} else {
+			oidset_remove(&unflagged, &c->object.oid);
 			best->depth++;
+		}
 		while (parents) {
+			unsigned seen, flag_before, flag_after;
 			struct commit *p = parents->item;
 			repo_parse_commit(the_repository, p);
-			if (!(p->object.flags & SEEN))
+			seen = p->object.flags & SEEN;
+			if (!seen)
 				lazy_queue_put(queue, p);
+			flag_before = p->object.flags & best->flag_within;
 			p->object.flags |= c->object.flags;
+			flag_after = p->object.flags & best->flag_within;
+			if (!seen && !flag_after)
+				oidset_insert(&unflagged, &p->object.oid);
+			if (seen && !flag_before && flag_after)
+				oidset_remove(&unflagged, &p->object.oid);
 			parents = parents->next;
 		}
 	}
+	oidset_clear(&unflagged);
 	return seen_commits;
 }
 
-- 
2.51.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] describe: use khash in finish_depth_computation()
  2025-09-02 12:38         ` Jeff King
@ 2025-09-02 18:51           ` René Scharfe
  2025-09-03 14:31             ` Jeff King
  0 siblings, 1 reply; 21+ messages in thread
From: René Scharfe @ 2025-09-02 18:51 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List

On 9/2/25 2:38 PM, Jeff King wrote:
> 
> I doubt that commit->index is any better in that regard. If I can
> influence the order in which Git loads the commits (e.g., by creating a
> bunch of refs which get loaded when we walk over for_each_ref), I can
> choose the index for each. They'll be unique, but I can still cause
> collisions modulo the number of buckets.

Hmm, sounds tricky, but feasible. :-O
> Likewise for oidhash(), I'd guess that colliding 4 bytes is not even
> necessary to cause trouble, since probing starts by throwing away
> everything mod n_buckets. So really you just need to collide however
> many low bits you need to make your desired N, and then get O(N^2)
> behavior.

Oh, right.

>> Letting oidhash() XOR in another word would close that line of attack
>> for quite a while, I assume.
> 
> Yeah. We have at least 160 bits in an object hash, and we only bother
> with the low 32. We could XOR up to 5 times, but I agree that even a
> single extra word would probably be plenty. Might be an interesting
> experiment to time something like the patch below on a hash-heavy
> workload.
> 
> diff --git a/hash.h b/hash.h
> index fae966b23c..c9d21f589e 100644
> --- a/hash.h
> +++ b/hash.h
> @@ -457,7 +457,10 @@ static inline unsigned int oidhash(const struct object_id *oid)
>  	 * platforms that don't support unaligned reads.
>  	 */
>  	unsigned int hash;
> +	unsigned int entropy;
>  	memcpy(&hash, oid->hash, sizeof(hash));
> +	memcpy(&entropy, oid->hash + sizeof(hash), sizeof(entropy));
> +	hash ^= entropy;
>  	return hash;
>  }
>  
> 
> I suspect it won't make a big time difference. The old code should have
> been optimized down to a single word load, and now we have two word
> loads and an xor. That probably isn't important compared to the actual
> 5-word memcmp() we have to do in order to verify that we found the right
> bucket anyway.
I see slightly worse performance, but within the noise.

However, just stacking two words won't do if only a few bits of the
resulting hash will be used to find a bucket.  We could mix in more bits
and smear them all over, but if that's done by a deterministic function
then it could be applied during the construction of manipulated object
hash values as well, no?

Perhaps salting with a random value determined at runtime would help.
Not XORing it in (pointless if the other value is controlled by the
attacker, as the result would still collide), but using it as a mask to
choose the bits to take from the object hash?

René


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] describe: use khash in finish_depth_computation()
  2025-09-02 18:51           ` René Scharfe
@ 2025-09-03 14:31             ` Jeff King
  2025-09-03 15:41               ` René Scharfe
  0 siblings, 1 reply; 21+ messages in thread
From: Jeff King @ 2025-09-03 14:31 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List

On Tue, Sep 02, 2025 at 08:51:37PM +0200, René Scharfe wrote:

> > I suspect it won't make a big time difference. The old code should have
> > been optimized down to a single word load, and now we have two word
> > loads and an xor. That probably isn't important compared to the actual
> > 5-word memcmp() we have to do in order to verify that we found the right
> > bucket anyway.
>
> I see slightly worse performance, but within the noise.
> 
> However, just stacking two words won't do if only a few bits of the
> resulting hash will be used to find a bucket.  We could mix in more bits
> and smear them all over, but if that's done by a deterministic function
> then it could be applied during the construction of manipulated object
> hash values as well, no?

I think the difficulty in manipulating scales as the number of bits
increases. So yeah, if you are worried about the low 8 bits, then
XOR-ing in another 8 bits is not going to do much. But your table is
only 256 items long, so you don't care much either way.

At even 16 bits, it gets hard for the attacker to choose the low 16 bits
_and_ the low 16 bits of the next word (you mentioned a project earlier
which claims 28 bits). If you XOR in a third word, now your 16-bit hash
is using 48 bits that the attacker has to control. And so on.

> Perhaps salting with a random value determined at runtime would help.
> Not XORing it in (pointless if the other value is controlled by the
> attacker, as the result would still collide), but using it as a mask to
> choose the bits to take from the object hash?

I think that would work, but XOR-ing the higher order bits is easier to
do and I think produces a similar effect. Let's shrink the problem for a
second. Imagine sha1 was 16 bits, and we wanted to create an 8-bit hash
to use in our table.  The attacker creates two objects with binary
hashes:

  object a: 10111001 11110111
  object b: 01001000 11110111

They collide in the lower 8 bits, but we don't want them to. In your
scheme, as I understand it, we'd come up with a 16-bit mask that has
exactly 8 bits set, like:

  11010110 01011000

and then picking only the bits where the mask is "1", we get:

  object a: 10111001 11110111
      mask: 11010110 01011000
    hash a: 10100110

  object b: 01001000 11110111
      mask: 11010110 01011000
    hash b: 01000110

So I agree that is hard to foil without the attacker knowing which bits
you'll pick. You've made their job 8 bits harder, because they now have
to control all 16 bits to get their collision.

But if we instead XOR the words of the object hashes together, we get:

  object a[hi]: 10111001
  object a[lo]: 11110111
        hash a: 01001110

  object b[hi]: 01001000
  object b[lo]: 11110111
        hash b: 10111111

So you're flipping bits "randomly". It's not truly random, but is coming
from the rest of the hash the attacker provided. But for any bit they
want to control, they have to control that position in both words. So
they're back to needing to control all 16 bits to get their desired
hash.

And as somebody who just hand-computed those answers, I can tell you
that the XOR one is much simpler to do. ;)

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] describe: use oidset in finish_depth_computation()
  2025-09-02 18:24 ` [PATCH v2] describe: use oidset " René Scharfe
@ 2025-09-03 14:36   ` Jeff King
  0 siblings, 0 replies; 21+ messages in thread
From: Jeff King @ 2025-09-03 14:36 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List, Junio C Hamano

On Tue, Sep 02, 2025 at 08:24:52PM +0200, René Scharfe wrote:

> Changes:
> - Use oidset instead of a custom khash set for simplicity.
> - Removed spurious whitespace changes.
> - Formatted with --patience to better see the function removal.

Thanks, this version looks good to me.

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] describe: use khash in finish_depth_computation()
  2025-09-03 14:31             ` Jeff King
@ 2025-09-03 15:41               ` René Scharfe
  2025-09-04 11:16                 ` Jeff King
  0 siblings, 1 reply; 21+ messages in thread
From: René Scharfe @ 2025-09-03 15:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List

On 9/3/25 4:31 PM, Jeff King wrote:
> On Tue, Sep 02, 2025 at 08:51:37PM +0200, René Scharfe wrote:
> 
>>> I suspect it won't make a big time difference. The old code should have
>>> been optimized down to a single word load, and now we have two word
>>> loads and an xor. That probably isn't important compared to the actual
>>> 5-word memcmp() we have to do in order to verify that we found the right
>>> bucket anyway.
>>
>> I see slightly worse performance, but within the noise.
>>
>> However, just stacking two words won't do if only a few bits of the
>> resulting hash will be used to find a bucket.  We could mix in more bits
>> and smear them all over, but if that's done by a deterministic function
>> then it could be applied during the construction of manipulated object
>> hash values as well, no?
> 
> I think the difficulty in manipulating scales as the number of bits
> increases. So yeah, if you are worried about the low 8 bits, then
> XOR-ing in another 8 bits is not going to do much. But your table is
> only 256 items long, so you don't care much either way.
> 
> At even 16 bits, it gets hard for the attacker to choose the low 16 bits
> _and_ the low 16 bits of the next word (you mentioned a project earlier
> which claims 28 bits). If you XOR in a third word, now your 16-bit hash
> is using 48 bits that the attacker has to control. And so on.
> 
>> Perhaps salting with a random value determined at runtime would help.
>> Not XORing it in (pointless if the other value is controlled by the
>> attacker, as the result would still collide), but using it as a mask to
>> choose the bits to take from the object hash?
> 
> I think that would work, but XOR-ing the higher order bits is easier to
> do and I think produces a similar effect. Let's shrink the problem for a
> second. Imagine sha1 was 16 bits, and we wanted to create an 8-bit hash
> to use in our table.  The attacker creates two objects with binary
> hashes:
> 
>   object a: 10111001 11110111
>   object b: 01001000 11110111
> 
> They collide in the lower 8 bits, but we don't want them to. In your
> scheme, as I understand it, we'd come up with a 16-bit mask that has
> exactly 8 bits set, like:
> 
>   11010110 01011000
> 
> and then picking only the bits where the mask is "1", we get:
> 
>   object a: 10111001 11110111
>       mask: 11010110 01011000
>     hash a: 10100110
> 
>   object b: 01001000 11110111
>       mask: 11010110 01011000
>     hash b: 01000110
> 
> So I agree that is hard to foil without the attacker knowing which bits
> you'll pick. You've made their job 8 bits harder, because they now have
> to control all 16 bits to get their collision.
> 
> But if we instead XOR the words of the object hashes together, we get:
> 
>   object a[hi]: 10111001
>   object a[lo]: 11110111
>         hash a: 01001110
> 
>   object b[hi]: 01001000
>   object b[lo]: 11110111
>         hash b: 10111111
> 
> So you're flipping bits "randomly". It's not truly random, but is coming
> from the rest of the hash the attacker provided. But for any bit they
> want to control, they have to control that position in both words. So
> they're back to needing to control all 16 bits to get their desired
> hash.
> 
> And as somebody who just hand-computed those answers, I can tell you
> that the XOR one is much simpler to do. ;)
No doubt. :)  Using a deterministic function is easier, but also allows
an attacker to use it for finding object hash values that yield
colliding hash table hashes.

How does an attacker control object hashes?  Hash it, check if it fits
the criteria, if it doesn't then make some inconsequential changes like
adding whitespace to a commit message and repeat.  That criteria can be
"bits 1-16 are all zero", but it can just as well be "bits 1-8 XORed
with bits 9-16 are all zero".  For the former they'd have to roll the
dice in the order of 2^16 times, for the latter just 2^8 times.

The attacker in our scenario doesn't have to care about the individual
bits of object hashes, just the resulting hash table hashes, which
reduces their search space a lot.  Making the deterministic function
more complicated or using more attacker-supplied input bits doesn't
change that.

René


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] describe: use khash in finish_depth_computation()
  2025-08-25  7:34     ` Jeff King
  2025-08-25  8:13       ` Jeff King
  2025-08-31 17:25       ` René Scharfe
@ 2025-09-03 16:30       ` René Scharfe
  2025-09-04 11:22         ` Jeff King
  2 siblings, 1 reply; 21+ messages in thread
From: René Scharfe @ 2025-09-03 16:30 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List

On 8/25/25 9:34 AM, Jeff King wrote:
> 
> [oidset instead of khash]
> But the bonus here is that we didn't need to define a new hash
> type, nor do any tricks with the commit->index field.

It took me a while to notice what's tricky, or rather inconsistent,
about the khash set of commit objects with commit index as hash: We
could just as well go all in and use an uint32_t khash set of commit
indexes.  That would reduce the memory footprint further.  No pointers
needed here.  Didn't measure a meaningful performance difference
though, so that's that..
René

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] describe: use khash in finish_depth_computation()
  2025-09-03 15:41               ` René Scharfe
@ 2025-09-04 11:16                 ` Jeff King
  0 siblings, 0 replies; 21+ messages in thread
From: Jeff King @ 2025-09-04 11:16 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List

On Wed, Sep 03, 2025 at 05:41:49PM +0200, René Scharfe wrote:

> How does an attacker control object hashes?  Hash it, check if it fits
> the criteria, if it doesn't then make some inconsequential changes like
> adding whitespace to a commit message and repeat.  That criteria can be
> "bits 1-16 are all zero", but it can just as well be "bits 1-8 XORed
> with bits 9-16 are all zero".  For the former they'd have to roll the
> dice in the order of 2^16 times, for the latter just 2^8 times.

Hmm, yeah, you're right. I was counting the wrong thing. It is not one
expensive action to generate a byte (or word) of sha1 hash. It is one
action to generate the whole hash. And then cheap to XOR it and find out
what the result would be in our XOR-bucket scheme. So if you are just
brute-forcing anyway, it is the same number of hash attempts, which is
what the attacker cares about minimizing. My proposal adds nothing
there.

> The attacker in our scenario doesn't have to care about the individual
> bits of object hashes, just the resulting hash table hashes, which
> reduces their search space a lot.  Making the deterministic function
> more complicated or using more attacker-supplied input bits doesn't
> change that.

Yep. Thanks for a dose of sanity.

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] describe: use khash in finish_depth_computation()
  2025-09-03 16:30       ` René Scharfe
@ 2025-09-04 11:22         ` Jeff King
  0 siblings, 0 replies; 21+ messages in thread
From: Jeff King @ 2025-09-04 11:22 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List

On Wed, Sep 03, 2025 at 06:30:57PM +0200, René Scharfe wrote:

> On 8/25/25 9:34 AM, Jeff King wrote:
> > 
> > [oidset instead of khash]
> > But the bonus here is that we didn't need to define a new hash
> > type, nor do any tricks with the commit->index field.
> 
> It took me a while to notice what's tricky, or rather inconsistent,
> about the khash set of commit objects with commit index as hash: We
> could just as well go all in and use an uint32_t khash set of commit
> indexes.  That would reduce the memory footprint further.  No pointers
> needed here.  Didn't measure a meaningful performance difference
> though, so that's that..

Heh, yes, that's true. I didn't even think about that. Conceptually we
could replace "struct object" with just an index value, and then all
code could just allocate arrays and index them. And the commit-slab code
is essentially a slightly more dynamic version of that. And a hash of
ints is a good way of handling a sparsely populated set of data.

It's less crazy than it sounds because we allocate objects sequentially
and never free them anyway. But it would still be a pretty big change
(and each data access would involve a few extra instructions to compute
the address).

Anyway, I'm happy enough with the oidset solution you landed on.

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-09-04 11:22 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-24  8:37 [PATCH] describe: use khash in finish_depth_computation() René Scharfe
2025-08-24 10:31 ` Jeff King
2025-08-24 16:32   ` René Scharfe
2025-08-25  7:34     ` Jeff King
2025-08-25  8:13       ` Jeff King
2025-08-25 18:48         ` Junio C Hamano
2025-08-26  3:39           ` Jeff King
2025-08-26  4:26             ` Jeff King
2025-08-26  5:52               ` Jeff King
2025-08-26 15:34               ` Junio C Hamano
2025-08-31 17:25       ` René Scharfe
2025-09-01 19:06         ` René Scharfe
2025-09-02 12:38         ` Jeff King
2025-09-02 18:51           ` René Scharfe
2025-09-03 14:31             ` Jeff King
2025-09-03 15:41               ` René Scharfe
2025-09-04 11:16                 ` Jeff King
2025-09-03 16:30       ` René Scharfe
2025-09-04 11:22         ` Jeff King
2025-09-02 18:24 ` [PATCH v2] describe: use oidset " René Scharfe
2025-09-03 14:36   ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).