[PATCH 0/2] optimize string hashing in xdiff

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] optimize string hashing in xdiff
@ 2025-07-28 19:05 Alexander Monakov
  2025-07-28 19:05 ` [PATCH 1/2] xdiff: refactor xdl_hash_record() Alexander Monakov
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Alexander Monakov @ 2025-07-28 19:05 UTC (permalink / raw)
  To: git; +Cc: Phillip Wood, Alexander Monakov

Hello world,

I've noticed the work by Phillip Wood regarding hash optimization for xdiff.
I want to point out that it is possible to speed up the existing hash by 1.5x
matching the peformance of xxhash (but without introducing a dependendency).

The additive variant of the djb2 hash is used in ELF symbol lookup, and
Noah Goldstein contributed a well-optimized implementation to Glibc.

I'm taking the refactoring patch from Phillip and building on top of it.

Alexander Monakov (1):
  xdiff: optimize xdl_hash_record_verbatim

Phillip Wood (1):
  xdiff: refactor xdl_hash_record()

 xdiff/xutils.c | 66 +++++++++++++++++++++++++++++++++++++++++++-------
 xdiff/xutils.h | 10 +++++++-
 2 files changed, 66 insertions(+), 10 deletions(-)

-- 
2.44.2

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/2] xdiff: refactor xdl_hash_record()
  2025-07-28 19:05 [PATCH 0/2] optimize string hashing in xdiff Alexander Monakov
@ 2025-07-28 19:05 ` Alexander Monakov
  2025-07-28 19:05 ` [PATCH 2/2] xdiff: optimize xdl_hash_record_verbatim Alexander Monakov
  2025-07-28 19:32 ` [PATCH 0/2] optimize string hashing in xdiff Junio C Hamano
  2 siblings, 0 replies; 22+ messages in thread
From: Alexander Monakov @ 2025-07-28 19:05 UTC (permalink / raw)
  To: git; +Cc: Phillip Wood, Alexander Monakov

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Inline the check for whitespace flags so that the compiler can hoist
it out of the loop in xdl_prepare_ctx(). This improves the performance
by 8%.

$ hyperfine --warmup=1 -L rev HEAD,HEAD^  --setup='git checkout {rev} -- :/ && make git' ': {rev}; GIT_CONFIG_GLOBAL=/dev/null ./git log --oneline --shortstat v2.0.0..v2.5.0'
Benchmark 1: : HEAD; GIT_CONFIG_GLOBAL=/dev/null ./git log --oneline --shortstat v2.0.0..v2.5.0
  Time (mean ± σ):      1.670 s ±  0.044 s    [User: 1.473 s, System: 0.196 s]
  Range (min … max):    1.619 s …  1.754 s    10 runs

Benchmark 2: : HEAD^; GIT_CONFIG_GLOBAL=/dev/null ./git log --oneline --shortstat v2.0.0..v2.5.0
  Time (mean ± σ):      1.801 s ±  0.021 s    [User: 1.605 s, System: 0.192 s]
  Range (min … max):    1.766 s …  1.831 s    10 runs

Summary
  ': HEAD^; GIT_CONFIG_GLOBAL=/dev/null ./git log --oneline --shortstat v2.0.0..v2.5.0' ran
    1.08 ± 0.03 times faster than ': HEAD^^; GIT_CONFIG_GLOBAL=/dev/null ./git log --oneline --shortstat v2.0.0..v2.5.0'

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 xdiff/xutils.c |  7 ++-----
 xdiff/xutils.h | 10 +++++++++-
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/xdiff/xutils.c b/xdiff/xutils.c
index 444a108f87..e070ed649f 100644
--- a/xdiff/xutils.c
+++ b/xdiff/xutils.c
@@ -249,7 +249,7 @@ int xdl_recmatch(const char *l1, long s1, const char *l2, long s2, long flags)
 	return 1;
 }
 
-static unsigned long xdl_hash_record_with_whitespace(char const **data,
+unsigned long xdl_hash_record_with_whitespace(char const **data,
 		char const *top, long flags) {
 	unsigned long ha = 5381;
 	char const *ptr = *data;
@@ -294,13 +294,10 @@ static unsigned long xdl_hash_record_with_whitespace(char const **data,
 	return ha;
 }
 
-unsigned long xdl_hash_record(char const **data, char const *top, long flags) {
+unsigned long xdl_hash_record_verbatim(char const **data, char const *top) {
 	unsigned long ha = 5381;
 	char const *ptr = *data;
 
-	if (flags & XDF_WHITESPACE_FLAGS)
-		return xdl_hash_record_with_whitespace(data, top, flags);
-
 	for (; ptr < top && *ptr != '\n'; ptr++) {
 		ha += (ha << 5);
 		ha ^= (unsigned long) *ptr;
diff --git a/xdiff/xutils.h b/xdiff/xutils.h
index fd0bba94e8..13f6831047 100644
--- a/xdiff/xutils.h
+++ b/xdiff/xutils.h
@@ -34,7 +34,15 @@ void *xdl_cha_alloc(chastore_t *cha);
 long xdl_guess_lines(mmfile_t *mf, long sample);
 int xdl_blankline(const char *line, long size, long flags);
 int xdl_recmatch(const char *l1, long s1, const char *l2, long s2, long flags);
-unsigned long xdl_hash_record(char const **data, char const *top, long flags);
+unsigned long xdl_hash_record_verbatim(char const **data, char const *top);
+unsigned long xdl_hash_record_with_whitespace(char const **data, char const *top, long flags);
+static inline unsigned long xdl_hash_record(char const **data, char const *top, long flags)
+{
+	if (flags & XDF_WHITESPACE_FLAGS)
+		return xdl_hash_record_with_whitespace(data, top, flags);
+	else
+		return xdl_hash_record_verbatim(data, top);
+}
 unsigned int xdl_hashbits(unsigned int size);
 int xdl_num_out(char *out, long val);
 int xdl_emit_hunk_hdr(long s1, long c1, long s2, long c2,
-- 
2.44.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/2] xdiff: optimize xdl_hash_record_verbatim
  2025-07-28 19:05 [PATCH 0/2] optimize string hashing in xdiff Alexander Monakov
  2025-07-28 19:05 ` [PATCH 1/2] xdiff: refactor xdl_hash_record() Alexander Monakov
@ 2025-07-28 19:05 ` Alexander Monakov
  2025-07-28 20:50   ` Junio C Hamano
  2025-08-04 13:49   ` Phillip Wood
  2025-07-28 19:32 ` [PATCH 0/2] optimize string hashing in xdiff Junio C Hamano
  2 siblings, 2 replies; 22+ messages in thread
From: Alexander Monakov @ 2025-07-28 19:05 UTC (permalink / raw)
  To: git; +Cc: Phillip Wood, Alexander Monakov

xdl_hash_record_verbatim uses modified djb2 hash with XOR instead of ADD
for combining. The ADD-based variant is used as the basis of the modern
("GNU") symbol lookup scheme in ELF. Glibc dynamic loader received an
optimized version of this hash function thanks to Noah Goldstein [1].

Switch xdl_hash_record_verbatim to additive hashing and implement
an optimized loop following the scheme suggested by Noah.

Timing 'git log --oneline --shortstat v2.0.0..v2.5.0' under perf, I got

version | cycles, bn | instructions, bn
---------------------------------------
A         6.38         11.3
B         6.21         10.89
C         5.80          9.95
D         5.83          8.74
---------------------------------------

A: baseline (git master at e4ef0485fd78)
B: plus 'xdiff: refactor xdl_hash_record()'
C: and plus this patch
D: with 'xdiff: use xxhash' by Phillip Wood

The resulting speedup for xdl_hash_record_verbatim itself is about 1.5x.

[1] https://inbox.sourceware.org/libc-alpha/20220519221803.57957-6-goldstein.w.n@gmail.com/

Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
---
 xdiff/xutils.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 55 insertions(+), 4 deletions(-)

diff --git a/xdiff/xutils.c b/xdiff/xutils.c
index e070ed649f..b1f8273f0f 100644
--- a/xdiff/xutils.c
+++ b/xdiff/xutils.c
@@ -294,16 +294,67 @@ unsigned long xdl_hash_record_with_whitespace(char const **data,
 	return ha;
 }
 
+/*
+ * Compiler reassociation barrier: pretend to modify X and Y to disallow
+ * changing evaluation order with respect to following uses of X and Y.
+ */
+#ifdef __GNUC__
+#define REASSOC_FENCE(x, y) asm("" : "+r"(x), "+r"(y))
+#else
+#define REASSOC_FENCE(x, y)
+#endif
+
 unsigned long xdl_hash_record_verbatim(char const **data, char const *top) {
-	unsigned long ha = 5381;
+	unsigned long ha = 5381, c0, c1;
 	char const *ptr = *data;
-
+#if 0
+	/*
+	 * The baseline form of the optimized loop below. This is the djb2
+	 * hash (the above function uses a variant with XOR instead of ADD).
+	 */
 	for (; ptr < top && *ptr != '\n'; ptr++) {
 		ha += (ha << 5);
-		ha ^= (unsigned long) *ptr;
+		ha += (unsigned long) *ptr;
 	}
 	*data = ptr < top ? ptr + 1: ptr;
-
+#else
+	/* Process two characters per iteration. */
+	if (top - ptr >= 2) do {
+		if ((c0 = ptr[0]) == '\n') {
+			*data = ptr + 1;
+			return ha;
+		}
+		if ((c1 = ptr[1]) == '\n') {
+			*data = ptr + 2;
+			c0 += ha;
+			REASSOC_FENCE(c0, ha);
+			ha = ha * 32 + c0;
+			return ha;
+		}
+		/*
+		 * Combine characters C0 and C1 into the hash HA. We have
+		 * HA = (HA * 33 + C0) * 33 + C1, and we want to ensure
+		 * that dependency chain over HA is just one multiplication
+		 * and one addition, i.e. we want to evaluate this as
+		 * HA = HA * 33 * 33 + (C0 * 33 + C1), and likewise prefer
+		 * (C0 * 32 + (C0 + C1)) for the expression in parenthesis.
+		 */
+		ha *= 33 * 33;
+		c1 += c0;
+		REASSOC_FENCE(c1, c0);
+		c1 += c0 * 32;
+		REASSOC_FENCE(c1, ha);
+		ha += c1;
+
+		ptr += 2;
+	} while (ptr < top - 1);
+	*data = top;
+	if (ptr < top && (c0 = ptr[0]) != '\n') {
+		c0 += ha;
+		REASSOC_FENCE(c0, ha);
+		ha = ha * 32 + c0;
+	}
+#endif
 	return ha;
 }
 
-- 
2.44.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] xdiff: optimize xdl_hash_record_verbatim
  2025-07-28 19:05 ` [PATCH 2/2] xdiff: optimize xdl_hash_record_verbatim Alexander Monakov
@ 2025-07-28 20:50   ` Junio C Hamano
  2025-07-28 20:57     ` Alexander Monakov
  2025-08-04 13:49   ` Phillip Wood
  1 sibling, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2025-07-28 20:50 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: git, Phillip Wood

Alexander Monakov <amonakov@ispras.ru> writes:

> +/*
> + * Compiler reassociation barrier: pretend to modify X and Y to disallow
> + * changing evaluation order with respect to following uses of X and Y.
> + */
> +#ifdef __GNUC__
> +#define REASSOC_FENCE(x, y) asm("" : "+r"(x), "+r"(y))
> +#else
> +#define REASSOC_FENCE(x, y)
> +#endif

With gcc we can build, but with clang, we unfortunately get this:

    $ make CC=clang DEVELOPER=YesPlease
    xdiff/xutils.c:330:4: error: extension used [-Werror,-Wlanguage-extension-token]
      330 |                         REASSOC_FENCE(c0, ha);
          |                         ^
    xdiff/xutils.c:302:29: note: expanded from macro 'REASSOC_FENCE'
      302 | #define REASSOC_FENCE(x, y) asm("" : "+r"(x), "+r"(y))
          |                             ^


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] xdiff: optimize xdl_hash_record_verbatim
  2025-07-28 20:50   ` Junio C Hamano
@ 2025-07-28 20:57     ` Alexander Monakov
  0 siblings, 0 replies; 22+ messages in thread
From: Alexander Monakov @ 2025-07-28 20:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Phillip Wood

On Mon, 28 Jul 2025, Junio C Hamano wrote:

> Alexander Monakov <amonakov@ispras.ru> writes:
> 
> > +/*
> > + * Compiler reassociation barrier: pretend to modify X and Y to disallow
> > + * changing evaluation order with respect to following uses of X and Y.
> > + */
> > +#ifdef __GNUC__
> > +#define REASSOC_FENCE(x, y) asm("" : "+r"(x), "+r"(y))
> > +#else
> > +#define REASSOC_FENCE(x, y)
> > +#endif
> 
> With gcc we can build, but with clang, we unfortunately get this:
> 
>     $ make CC=clang DEVELOPER=YesPlease
>     xdiff/xutils.c:330:4: error: extension used [-Werror,-Wlanguage-extension-token]
>       330 |                         REASSOC_FENCE(c0, ha);
>           |                         ^
>     xdiff/xutils.c:302:29: note: expanded from macro 'REASSOC_FENCE'
>       302 | #define REASSOC_FENCE(x, y) asm("" : "+r"(x), "+r"(y))
>           |                             ^

Sorry, wasn't aware that Clang would warn. The solution is to spell 'asm' with
double underscores, __asm__.  I'll make this change if I post a v2.

Thanks.
Alexander

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] xdiff: optimize xdl_hash_record_verbatim
  2025-07-28 19:05 ` [PATCH 2/2] xdiff: optimize xdl_hash_record_verbatim Alexander Monakov
  2025-07-28 20:50   ` Junio C Hamano
@ 2025-08-04 13:49   ` Phillip Wood
  2025-08-04 14:39     ` Alexander Monakov
  1 sibling, 1 reply; 22+ messages in thread
From: Phillip Wood @ 2025-08-04 13:49 UTC (permalink / raw)
  To: Alexander Monakov, git; +Cc: Phillip Wood

Hi Alexander

On 28/07/2025 20:05, Alexander Monakov wrote:
> xdl_hash_record_verbatim uses modified djb2 hash with XOR instead of ADD
> for combining. The ADD-based variant is used as the basis of the modern
> ("GNU") symbol lookup scheme in ELF. Glibc dynamic loader received an
> optimized version of this hash function thanks to Noah Goldstein [1].

Interesting

> Switch xdl_hash_record_verbatim to additive hashing and implement
> an optimized loop following the scheme suggested by Noah.
> 
> Timing 'git log --oneline --shortstat v2.0.0..v2.5.0' under perf, I got
> 
> version | cycles, bn | instructions, bn
> ---------------------------------------
> A         6.38         11.3
> B         6.21         10.89
> C         5.80          9.95
> D         5.83          8.74
> ---------------------------------------
> 
> A: baseline (git master at e4ef0485fd78)
> B: plus 'xdiff: refactor xdl_hash_record()'
> C: and plus this patch
> D: with 'xdiff: use xxhash' by Phillip Wood

I think it would be helpful to say that B is the previous patch and 
provide a link for D.
> The resulting speedup for xdl_hash_record_verbatim itself is about 1.5x.

While that's interesting it does not tell us how much this speeds up 
diff generation. Running the command above under hyperfine it is 1.02 ± 
0.01 times faster than the previous patch and 1.11 ± 0.01 times faster 
than master. Using xxhash (D above) is 1.03 ± 0.01 times faster than 
this patch. How do the changes below affect compilers other than gcc and 
clang than do not see the re-association barrier? We'd want to make sure 
that it does not result in slower diffs. Can we use 
atomic_signal_fence() on compilers that support C11? (we don't require 
C11 so we'd have to make it optional but it is supported by things like 
MSVC)

Thanks

Phillip
> 
> [1] https://inbox.sourceware.org/libc-alpha/20220519221803.57957-6-goldstein.w.n@gmail.com/
> 
> Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
> ---
>   xdiff/xutils.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++----
>   1 file changed, 55 insertions(+), 4 deletions(-)
> 
> diff --git a/xdiff/xutils.c b/xdiff/xutils.c
> index e070ed649f..b1f8273f0f 100644
> --- a/xdiff/xutils.c
> +++ b/xdiff/xutils.c
> @@ -294,16 +294,67 @@ unsigned long xdl_hash_record_with_whitespace(char const **data,
>   	return ha;
>   }
>   
> +/*
> + * Compiler reassociation barrier: pretend to modify X and Y to disallow
> + * changing evaluation order with respect to following uses of X and Y.
> + */
> +#ifdef __GNUC__
> +#define REASSOC_FENCE(x, y) asm("" : "+r"(x), "+r"(y))
> +#else
> +#define REASSOC_FENCE(x, y)
> +#endif
> +
>   unsigned long xdl_hash_record_verbatim(char const **data, char const *top) {
> -	unsigned long ha = 5381;
> +	unsigned long ha = 5381, c0, c1;
>   	char const *ptr = *data;
> -
> +#if 0
> +	/*
> +	 * The baseline form of the optimized loop below. This is the djb2
> +	 * hash (the above function uses a variant with XOR instead of ADD).
> +	 */
>   	for (; ptr < top && *ptr != '\n'; ptr++) {
>   		ha += (ha << 5);
> -		ha ^= (unsigned long) *ptr;
> +		ha += (unsigned long) *ptr;
>   	}
>   	*data = ptr < top ? ptr + 1: ptr;
> -
> +#else
> +	/* Process two characters per iteration. */
> +	if (top - ptr >= 2) do {
> +		if ((c0 = ptr[0]) == '\n') {
> +			*data = ptr + 1;
> +			return ha;
> +		}
> +		if ((c1 = ptr[1]) == '\n') {
> +			*data = ptr + 2;
> +			c0 += ha;
> +			REASSOC_FENCE(c0, ha);
> +			ha = ha * 32 + c0;
> +			return ha;
> +		}
> +		/*
> +		 * Combine characters C0 and C1 into the hash HA. We have
> +		 * HA = (HA * 33 + C0) * 33 + C1, and we want to ensure
> +		 * that dependency chain over HA is just one multiplication
> +		 * and one addition, i.e. we want to evaluate this as
> +		 * HA = HA * 33 * 33 + (C0 * 33 + C1), and likewise prefer
> +		 * (C0 * 32 + (C0 + C1)) for the expression in parenthesis.
> +		 */
> +		ha *= 33 * 33;
> +		c1 += c0;
> +		REASSOC_FENCE(c1, c0);
> +		c1 += c0 * 32;
> +		REASSOC_FENCE(c1, ha);
> +		ha += c1;
> +
> +		ptr += 2;
> +	} while (ptr < top - 1);
> +	*data = top;
> +	if (ptr < top && (c0 = ptr[0]) != '\n') {
> +		c0 += ha;
> +		REASSOC_FENCE(c0, ha);
> +		ha = ha * 32 + c0;
> +	}
> +#endif
>   	return ha;
>   }
>   


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] xdiff: optimize xdl_hash_record_verbatim
  2025-08-04 13:49   ` Phillip Wood
@ 2025-08-04 14:39     ` Alexander Monakov
  2025-08-11 13:13       ` Phillip Wood
  0 siblings, 1 reply; 22+ messages in thread
From: Alexander Monakov @ 2025-08-04 14:39 UTC (permalink / raw)
  To: Phillip Wood; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2241 bytes --]


On Mon, 4 Aug 2025, Phillip Wood wrote:

> > Switch xdl_hash_record_verbatim to additive hashing and implement
> > an optimized loop following the scheme suggested by Noah.
> > 
> > Timing 'git log --oneline --shortstat v2.0.0..v2.5.0' under perf, I got
> > 
> > version | cycles, bn | instructions, bn
> > ---------------------------------------
> > A         6.38         11.3
> > B         6.21         10.89
> > C         5.80          9.95
> > D         5.83          8.74
> > ---------------------------------------
> > 
> > A: baseline (git master at e4ef0485fd78)
> > B: plus 'xdiff: refactor xdl_hash_record()'
> > C: and plus this patch
> > D: with 'xdiff: use xxhash' by Phillip Wood
> 
> I think it would be helpful to say that B is the previous patch and provide a
> link for D.

Ok, reworded locally, will appear in v2.

> > The resulting speedup for xdl_hash_record_verbatim itself is about 1.5x.
> 
> While that's interesting it does not tell us how much this speeds up diff
> generation.

That's what the 'cycles' column in the table gives (6.21/5.8 = 1.070...)

> Running the command above under hyperfine it is 1.02 ± 0.01 times
> faster than the previous patch and 1.11 ± 0.01 times faster than master.

Then you get 9% from the inlining patch and only 2% from the faster hash
function? That's a bit surprising, which compiler and CPU you used? Is it
with default optimization (-O2)?

> Using
> xxhash (D above) is 1.03 ± 0.01 times faster than this patch. How do the
> changes below affect compilers other than gcc and clang than do not see the
> re-association barrier?

I'd say under reasonable assumptions (e.g. a not too ancient CPU with 3-cycle
integer multiplication) the new scheme is generally faster even without asm.

But Git can certainly follow Glibc's choice and employ this only on x86_64
(and only with GCC or Clang).

> We'd want to make sure that it does not result in
> slower diffs. Can we use atomic_signal_fence() on compilers that support C11?

No, what we need to do here is outside of the abstract machine's view, standard
functions are not going to help.

Alexander

> (we don't require C11 so we'd have to make it optional but it is supported by
> things like MSVC)
> 
> Thanks
> 
> Phillip

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] xdiff: optimize xdl_hash_record_verbatim
  2025-08-04 14:39     ` Alexander Monakov
@ 2025-08-11 13:13       ` Phillip Wood
  2025-08-11 14:14         ` Alexander Monakov
  0 siblings, 1 reply; 22+ messages in thread
From: Phillip Wood @ 2025-08-11 13:13 UTC (permalink / raw)
  To: Alexander Monakov, Phillip Wood; +Cc: git

On 04/08/2025 15:39, Alexander Monakov wrote:
> On Mon, 4 Aug 2025, Phillip Wood wrote:
> 
>>> Switch xdl_hash_record_verbatim to additive hashing and implement
>>> an optimized loop following the scheme suggested by Noah.
>>>
>>> Timing 'git log --oneline --shortstat v2.0.0..v2.5.0' under perf, I got
>>>
>>> version | cycles, bn | instructions, bn
>>> ---------------------------------------
>>> A         6.38         11.3
>>> B         6.21         10.89
>>> C         5.80          9.95
>>> D         5.83          8.74
>>> ---------------------------------------
>>>
>>> A: baseline (git master at e4ef0485fd78)
>>> B: plus 'xdiff: refactor xdl_hash_record()'
>>> C: and plus this patch
>>> D: with 'xdiff: use xxhash' by Phillip Wood
>>
>> I think it would be helpful to say that B is the previous patch and provide a
>> link for D.
> 
> Ok, reworded locally, will appear in v2.

Thanks

>>> The resulting speedup for xdl_hash_record_verbatim itself is about 1.5x.
>>
>> While that's interesting it does not tell us how much this speeds up diff
>> generation.
> 
> That's what the 'cycles' column in the table gives (6.21/5.8 = 1.070...)

It would be helpful to add a column with those calculations in it rather 
than forcing the reader to calculate the speed up for themselves. Also 
what is the cycles column measuring? What is it that takes 6.21 cycles 
for B and only 5.8 cycles for C?

>> Running the command above under hyperfine it is 1.02 ± 0.01 times
>> faster than the previous patch and 1.11 ± 0.01 times faster than master.
> 
> Then you get 9% from the inlining patch and only 2% from the faster hash
> function? That's a bit surprising, which compiler and CPU you used? Is it
> with default optimization (-O2)?

I used gcc with -O2 -march=native on an i5-8500. I saw a similar 
improvement from the inlining when I was playing with xxhash.

>> Using
>> xxhash (D above) is 1.03 ± 0.01 times faster than this patch. How do the
>> changes below affect compilers other than gcc and clang than do not see the
>> re-association barrier?
> 
> I'd say under reasonable assumptions (e.g. a not too ancient CPU with 3-cycle
> integer multiplication) the new scheme is generally faster even without asm.

Thanks, fwiw I don't see a measurable difference in the timings with and 
without the asm on my machine - sometimes one is faster, sometimes the 
other, any difference is within the noise.

> But Git can certainly follow Glibc's choice and employ this only on x86_64
> (and only with GCC or Clang).
> 
>> We'd want to make sure that it does not result in
>> slower diffs. Can we use atomic_signal_fence() on compilers that support C11?
> 
> No, what we need to do here is outside of the abstract machine's view, standard
> functions are not going to help.

That's a shame. I'd hoped that stopping the compiler reorder the code 
would do the same thing - what is the asm doing that's different?

Thanks

Phillip

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] xdiff: optimize xdl_hash_record_verbatim
  2025-08-11 13:13       ` Phillip Wood
@ 2025-08-11 14:14         ` Alexander Monakov
  2025-08-12 17:56           ` Alexander Monakov
  2025-08-13 13:10           ` Phillip Wood
  0 siblings, 2 replies; 22+ messages in thread
From: Alexander Monakov @ 2025-08-11 14:14 UTC (permalink / raw)
  To: Phillip Wood; +Cc: git

On Mon, 11 Aug 2025, Phillip Wood wrote:

> > That's what the 'cycles' column in the table gives (6.21/5.8 = 1.070...)
> 
> It would be helpful to add a column with those calculations in it rather than
> forcing the reader to calculate the speed up for themselves.

Ok, will change it to

version | speedup over (A) | cycles, bn | instructions, bn
----------------------------------------------------------
A                            6.38         11.3
B         1.027              6.21         10.89
C         1.1                5.80          9.95
D         1.094              5.83          8.74
----------------------------------------------------------

> Also what is the cycles column measuring? What is it that takes 6.21 cycles
> for B and only 5.8 cycles for C?

Billions of cycles, e.g. in C the entire command completes in 5.8e9 CPU cycles.

> > Then you get 9% from the inlining patch and only 2% from the faster hash
> > function? That's a bit surprising, which compiler and CPU you used? Is it
> > with default optimization (-O2)?
> 
> I used gcc with -O2 -march=native on an i5-8500. I saw a similar improvement
> from the inlining when I was playing with xxhash.

Thanks, I'll see if I can benchmark it on a Skylake in the coming days. That
said, I think most users will get Git from their distro, without -march=native,
right? So I'd suggest looking at plain -O2, especially for xxhash, which
selects hashing primitives based on CPU-indicating predefined macros.

> > I'd say under reasonable assumptions (e.g. a not too ancient CPU with
> > 3-cycle integer multiplication) the new scheme is generally faster even
> > without asm.
> 
> Thanks, fwiw I don't see a measurable difference in the timings with and
> without the asm on my machine -

To be clear, by "without asm" you mean forcing the !__GNUC__ branch where
REASSOC_FENCE macro is empty?

> sometimes one is faster, sometimes the other, any difference is within the
> noise.

Would you mind showing your 'gcc --version'? Also, I prefer 'perf stat' for
such measurements, because its measurements are not so sensitive to frequency
scaling (plus, you can compare my cycles/instructions counts with yours if you
run 'perf stat', but I cannot compare your seconds from hyperfine with mine
because of course my CPU runs at a different frequency than yours).

'perf stat -r 5' runs the workload 5 times and prints averages and deviation.

> > No, what we need to do here is outside of the abstract machine's view,
> > standard functions are not going to help.
> 
> That's a shame. I'd hoped that stopping the compiler reorder the code would do
> the same thing - what is the asm doing that's different?

atomic_signal_fence only blocks reordering of references to memory that can be
observed from a signal handler interrupting the current thread. It has no effect
on variables whose addresses do not escape (let alone never taken in the first
place). Here we want to force a particular evaluation order for variables that
end up on registers and are not supposed to appear in memory at all.

Alexander

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] xdiff: optimize xdl_hash_record_verbatim
  2025-08-11 14:14         ` Alexander Monakov
@ 2025-08-12 17:56           ` Alexander Monakov
  2025-08-20 21:34             ` Junio C Hamano
  2025-08-13 13:10           ` Phillip Wood
  1 sibling, 1 reply; 22+ messages in thread
From: Alexander Monakov @ 2025-08-12 17:56 UTC (permalink / raw)
  To: Phillip Wood; +Cc: git

> On Mon, 11 Aug 2025, Phillip Wood wrote:
> 
> > > That's what the 'cycles' column in the table gives (6.21/5.8 = 1.070...)
> > 
> > It would be helpful to add a column with those calculations in it rather than
> > forcing the reader to calculate the speed up for themselves.
> 
> Ok, will change it to
> 
> version | speedup over (A) | cycles, bn | instructions, bn
> ----------------------------------------------------------
> A                            6.38         11.3
> B         1.027              6.21         10.89
> C         1.1                5.80          9.95
> D         1.094              5.83          8.74
> ----------------------------------------------------------

On my Skylake:

version | speedup over (A) | cycles, bn | instructions, bn
----------------------------------------------------------
A                            5.77         10.96
B         1.076              5.36         10.60
C         1.12               5.16          9.66
----------------------------------------------------------

A is today's master, B and C are patch 1 and 1+2 like before.

Alexander

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] xdiff: optimize xdl_hash_record_verbatim
  2025-08-12 17:56           ` Alexander Monakov
@ 2025-08-20 21:34             ` Junio C Hamano
  2025-09-08 19:06               ` Alexander Monakov
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2025-08-20 21:34 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: Phillip Wood, git

Alexander Monakov <amonakov@ispras.ru> writes:

>> On Mon, 11 Aug 2025, Phillip Wood wrote:
>> 
>> > > That's what the 'cycles' column in the table gives (6.21/5.8 = 1.070...)
>> > 
>> > It would be helpful to add a column with those calculations in it rather than
>> > forcing the reader to calculate the speed up for themselves.
>> 
>> Ok, will change it to
>> 
>> version | speedup over (A) | cycles, bn | instructions, bn
>> ----------------------------------------------------------
>> A                            6.38         11.3
>> B         1.027              6.21         10.89
>> C         1.1                5.80          9.95
>> D         1.094              5.83          8.74
>> ----------------------------------------------------------
>
> On my Skylake:
>
> version | speedup over (A) | cycles, bn | instructions, bn
> ----------------------------------------------------------
> A                            5.77         10.96
> B         1.076              5.36         10.60
> C         1.12               5.16          9.66
> ----------------------------------------------------------
>
> A is today's master, B and C are patch 1 and 1+2 like before.

The thread has gone quiet.  I assume everybody is happy with the
result?  Can we have a hopefully final v2 iteration of these
patches, to address the updated to the table (this thread), to
squelch the __asm__() issue [*asm*], and a reword you mentioned
[*reword*] against Phillip's review?

Thanks.


*asm*
https://lore.kernel.org/git/3405f274-cef1-b361-7424-840dc55b48a1@ispras.ru/

*reword*
https://lore.kernel.org/git/353c7865-d9b5-2a1c-4d71-cd1136581f01@ispras.ru/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] xdiff: optimize xdl_hash_record_verbatim
  2025-08-20 21:34             ` Junio C Hamano
@ 2025-09-08 19:06               ` Alexander Monakov
  2025-09-08 21:04                 ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Alexander Monakov @ 2025-09-08 19:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Phillip Wood, git

On Wed, 20 Aug 2025, Junio C Hamano wrote:

> The thread has gone quiet.  I assume everybody is happy with the
> result?  Can we have a hopefully final v2 iteration of these
> patches, to address the updated to the table (this thread), to
> squelch the __asm__() issue [*asm*], and a reword you mentioned
> [*reword*] against Phillip's review?

I was expecting that Phillip would come back to the question of underwhelming
performance improvement he was seeing on his CPU. I was working on an
alternative approach to speed up that function, which I just sent in the v2
thread: https://lore.kernel.org/git/20250908184939.16338-4-amonakov@ispras.ru/
It does not depend on the performance of integer multiplication anymore,
so it should work better from architecture neutrality point of view.

I'm not sure what's the current status though, it seems nobody gave the original
two patches a Reviewed-by?

If the proposed changes in v2 are too sudden, what happens now?

Thanks.
Alexander

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] xdiff: optimize xdl_hash_record_verbatim
  2025-09-08 19:06               ` Alexander Monakov
@ 2025-09-08 21:04                 ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2025-09-08 21:04 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: Phillip Wood, git

Alexander Monakov <amonakov@ispras.ru> writes:

> On Wed, 20 Aug 2025, Junio C Hamano wrote:
>
>> The thread has gone quiet.  I assume everybody is happy with the
>> result?  Can we have a hopefully final v2 iteration of these
>> patches, to address the updated to the table (this thread), to
>> squelch the __asm__() issue [*asm*], and a reword you mentioned
>> [*reword*] against Phillip's review?
>
> I was expecting that Phillip would come back to the question of underwhelming
> performance improvement he was seeing on his CPU. I was working on an
> alternative approach to speed up that function, which I just sent in the v2
> thread: https://lore.kernel.org/git/20250908184939.16338-4-amonakov@ispras.ru/
> It does not depend on the performance of integer multiplication anymore,
> so it should work better from architecture neutrality point of view.
>
> I'm not sure what's the current status though, it seems nobody gave the original
> two patches a Reviewed-by?
>
> If the proposed changes in v2 are too sudden, what happens now?

Well, it has been quite a while since I asked, and the last round,
which looked reasonably well done, is now in 'next' and is about to
graduate to 'master'.  So, if there are further good changes on top,
can you make them incremental on top of 'master' after I push out
today's integration result?

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] xdiff: optimize xdl_hash_record_verbatim
  2025-08-11 14:14         ` Alexander Monakov
  2025-08-12 17:56           ` Alexander Monakov
@ 2025-08-13 13:10           ` Phillip Wood
  1 sibling, 0 replies; 22+ messages in thread
From: Phillip Wood @ 2025-08-13 13:10 UTC (permalink / raw)
  To: Alexander Monakov, Phillip Wood; +Cc: git

Hi Alexander

On 11/08/2025 15:14, Alexander Monakov wrote:
> 
> On Mon, 11 Aug 2025, Phillip Wood wrote:
> 
>>> That's what the 'cycles' column in the table gives (6.21/5.8 = 1.070...)
>>
>> It would be helpful to add a column with those calculations in it rather than
>> forcing the reader to calculate the speed up for themselves.
> 
> Ok, will change it to
> 
> version | speedup over (A) | cycles, bn | instructions, bn
> ----------------------------------------------------------
> A                            6.38         11.3
> B         1.027              6.21         10.89
> C         1.1                5.80          9.95
> D         1.094              5.83          8.74
> ----------------------------------------------------------

That looks good, thanks

>> Also what is the cycles column measuring? What is it that takes 6.21 cycles
>> for B and only 5.8 cycles for C?
> 
> Billions of cycles, e.g. in C the entire command completes in 5.8e9 CPU cycles.

Ah, for some reason I'd not realized than bn was short for billion
>>> Then you get 9% from the inlining patch and only 2% from the faster hash
>>> function? That's a bit surprising, which compiler and CPU you used? Is it
>>> with default optimization (-O2)?
>>
>> I used gcc with -O2 -march=native on an i5-8500. I saw a similar improvement
>> from the inlining when I was playing with xxhash.
> 
> Thanks, I'll see if I can benchmark it on a Skylake in the coming days. That
> said, I think most users will get Git from their distro, without -march=native,
> right? So I'd suggest looking at plain -O2, especially for xxhash, which
> selects hashing primitives based on CPU-indicating predefined macros.

For xxhash I was using the system library rather than compiling it myself
>>> I'd say under reasonable assumptions (e.g. a not too ancient CPU with
>>> 3-cycle integer multiplication) the new scheme is generally faster even
>>> without asm.
>>
>> Thanks, fwiw I don't see a measurable difference in the timings with and
>> without the asm on my machine -
> 
> To be clear, by "without asm" you mean forcing the !__GNUC__ branch where
> REASSOC_FENCE macro is empty?

Exactly

>> sometimes one is faster, sometimes the other, any difference is within the
>> noise.
> 
> Would you mind showing your 'gcc --version'?

gcc (Debian 12.2.0-14+deb12u1) 12.2.0

> Also, I prefer 'perf stat' for
> such measurements, because its measurements are not so sensitive to frequency
> scaling (plus, you can compare my cycles/instructions counts with yours if you
> run 'perf stat', but I cannot compare your seconds from hyperfine with mine
> because of course my CPU runs at a different frequency than yours).
> 
> 'perf stat -r 5' runs the workload 5 times and prints averages and deviation.

I'll try and take a look at that though I'm off line next week and I'm 
not sure I'll have time before then.
>>> No, what we need to do here is outside of the abstract machine's view,
>>> standard functions are not going to help.
>>
>> That's a shame. I'd hoped that stopping the compiler reorder the code would do
>> the same thing - what is the asm doing that's different?
> 
> atomic_signal_fence only blocks reordering of references to memory that can be
> observed from a signal handler interrupting the current thread. It has no effect
> on variables whose addresses do not escape (let alone never taken in the first
> place). Here we want to force a particular evaluation order for variables that
> end up on registers and are not supposed to appear in memory at all.

Ah, that makes sense

Thanks

Phillip

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2] optimize string hashing in xdiff
  2025-07-28 19:05 [PATCH 0/2] optimize string hashing in xdiff Alexander Monakov
  2025-07-28 19:05 ` [PATCH 1/2] xdiff: refactor xdl_hash_record() Alexander Monakov
  2025-07-28 19:05 ` [PATCH 2/2] xdiff: optimize xdl_hash_record_verbatim Alexander Monakov
@ 2025-07-28 19:32 ` Junio C Hamano
  2025-07-28 19:56   ` Eli Schwartz
  2025-07-28 20:25   ` Alexander Monakov
  2 siblings, 2 replies; 22+ messages in thread
From: Junio C Hamano @ 2025-07-28 19:32 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: git, Phillip Wood

Alexander Monakov <amonakov@ispras.ru> writes:

> I've noticed the work by Phillip Wood regarding hash optimization for xdiff.
> I want to point out that it is possible to speed up the existing hash by 1.5x
> matching the peformance of xxhash (but without introducing a dependendency).

Using xxhash() was merely a sample code path for technology
demonstration, so the Rust adoption topic may want to pick a
different code path to do its thing.

> The additive variant of the djb2 hash is used in ELF symbol lookup, and
> Noah Goldstein contributed a well-optimized implementation to Glibc.

What is the licensing terms for that code you are proposing us to
borrow?  If it is anything recent in GNU, I'd expect that it would
be GPLv3, which would be incompatible with our code base?

> I'm taking the refactoring patch from Phillip and building on top of it.

It is an obviously good approach to do this.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2] optimize string hashing in xdiff
  2025-07-28 19:32 ` [PATCH 0/2] optimize string hashing in xdiff Junio C Hamano
@ 2025-07-28 19:56   ` Eli Schwartz
  2025-07-28 20:54     ` Junio C Hamano
  2025-07-28 20:25   ` Alexander Monakov
  1 sibling, 1 reply; 22+ messages in thread
From: Eli Schwartz @ 2025-07-28 19:56 UTC (permalink / raw)
  To: Junio C Hamano, Alexander Monakov; +Cc: git, Phillip Wood


[-- Attachment #1.1: Type: text/plain, Size: 915 bytes --]

On 7/28/25 3:32 PM, Junio C Hamano wrote:

>> The additive variant of the djb2 hash is used in ELF symbol lookup, and
>> Noah Goldstein contributed a well-optimized implementation to Glibc.
> 
> What is the licensing terms for that code you are proposing us to
> borrow?  If it is anything recent in GNU, I'd expect that it would
> be GPLv3, which would be incompatible with our code base?


That feels like a quite surprising assessment. Many GNU projects make
specific calculations here. See:

https://www.gnu.org/licenses/gpl-faq.html#DoesAllGNUSoftwareUseTheGNUGPLAsItsLicense

https://www.gnu.org/licenses/why-not-lgpl.html


At any rate, quite untrue. Glibc's wikipedia page -- and also its source
code, luckily -- documents "LGPL-2.1-or-later", which is more permissive
than git (and equally as permissive as xdiff).

Reason is documented in the second link. :)


-- 
Eli Schwartz

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2] optimize string hashing in xdiff
  2025-07-28 19:56   ` Eli Schwartz
@ 2025-07-28 20:54     ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2025-07-28 20:54 UTC (permalink / raw)
  To: Eli Schwartz; +Cc: Alexander Monakov, git, Phillip Wood

Eli Schwartz <eschwartz@gentoo.org> writes:

> At any rate, quite untrue. Glibc's wikipedia page -- and also its source
> code, luckily -- documents "LGPL-2.1-or-later", which is more permissive
> than git (and equally as permissive as xdiff).

Ah, OK.  That sound very nice.  Thanks for correcting me.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2] optimize string hashing in xdiff
  2025-07-28 19:32 ` [PATCH 0/2] optimize string hashing in xdiff Junio C Hamano
  2025-07-28 19:56   ` Eli Schwartz
@ 2025-07-28 20:25   ` Alexander Monakov
  2025-08-14 15:01     ` Junio C Hamano
  2025-08-28 23:40     ` Junio C Hamano
  1 sibling, 2 replies; 22+ messages in thread
From: Alexander Monakov @ 2025-07-28 20:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Phillip Wood

[-- Attachment #1: Type: text/plain, Size: 1577 bytes --]

On Mon, 28 Jul 2025, Junio C Hamano wrote:

> Alexander Monakov <amonakov@ispras.ru> writes:
> 
> > I've noticed the work by Phillip Wood regarding hash optimization for xdiff.
> > I want to point out that it is possible to speed up the existing hash by 1.5x
> > matching the peformance of xxhash (but without introducing a dependendency).
> 
> Using xxhash() was merely a sample code path for technology
> demonstration, so the Rust adoption topic may want to pick a
> different code path to do its thing.

My interest here is just speeding up xdiff in C, is that a welcome topic?

> > The additive variant of the djb2 hash is used in ELF symbol lookup, and
> > Noah Goldstein contributed a well-optimized implementation to Glibc.
> 
> What is the licensing terms for that code you are proposing us to
> borrow?  If it is anything recent in GNU, I'd expect that it would
> be GPLv3, which would be incompatible with our code base?

Noah's code is not usable in xdiff due to different context (mainly the need
to limit iteration by length — ELF hashing iterates until the NUL character).

I have participated in review of Noah's patches and he kindly listed me as
a co-author in the final revision of his patchset. So while I'm aware of how
his code is structured, I had to write a new implementation in order to meet
the contract of xdl_hash_record_verbatim. Therefore I think I can contribute
this code on GPLv2 terms with my sign-off.

Maybe someone would be willing to look at patch 2 and compare against Noah's
patch (linked in the commit message)?

Thank you.
Alexander

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2] optimize string hashing in xdiff
  2025-07-28 20:25   ` Alexander Monakov
@ 2025-08-14 15:01     ` Junio C Hamano
  2025-08-28 23:40     ` Junio C Hamano
  1 sibling, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2025-08-14 15:01 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: git, Phillip Wood

Alexander Monakov <amonakov@ispras.ru> writes:

>> Using xxhash() was merely a sample code path for technology
>> demonstration, so the Rust adoption topic may want to pick a
>> different code path to do its thing.
>
> My interest here is just speeding up xdiff in C, is that a welcome topic?

I missed this question.  It is very much welcome.

It is not like Rust-minded folks licked this corner of the system
and others cannot touch it ;-)

>> What is the licensing terms for that code you are proposing us to
>> borrow?  If it is anything recent in GNU, I'd expect that it would
>> be GPLv3, which would be incompatible with our code base?
> ...
> I have participated in review of Noah's patches and he kindly listed me as
> a co-author in the final revision of his patchset. So while I'm aware of how
> his code is structured, I had to write a new implementation in order to meet
> the contract of xdl_hash_record_verbatim. Therefore I think I can contribute
> this code on GPLv2 terms with my sign-off.

Thanks for a wonderfully clear description.

I obviously misread the log message of [2/2] and misunderstood that
this was a borrowed code.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2] optimize string hashing in xdiff
  2025-07-28 20:25   ` Alexander Monakov
  2025-08-14 15:01     ` Junio C Hamano
@ 2025-08-28 23:40     ` Junio C Hamano
  2025-08-29  1:13       ` Jacob Keller
  2025-08-29  3:09       ` Elijah Newren
  1 sibling, 2 replies; 22+ messages in thread
From: Junio C Hamano @ 2025-08-28 23:40 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: git, Phillip Wood, Ezekiel Newren

Alexander Monakov <amonakov@ispras.ru> writes:

>> Using xxhash() was merely a sample code path for technology
>> demonstration, so the Rust adoption topic may want to pick a
>> different code path to do its thing.
>
> My interest here is just speeding up xdiff in C, is that a welcome topic?

It seems that the (side) discussion on the performance has
concluded, and Ezekiel's new iteration of the Rust thing moved to a
non-overlapping part of the system, so I do not see any reason to
keep this topic out of 'next'.

Is everybody OK for me to mark the topic for 'next' soonish?  Any
objections I overlooked?

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2] optimize string hashing in xdiff
  2025-08-28 23:40     ` Junio C Hamano
@ 2025-08-29  1:13       ` Jacob Keller
  2025-08-29  3:09       ` Elijah Newren
  1 sibling, 0 replies; 22+ messages in thread
From: Jacob Keller @ 2025-08-29  1:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Alexander Monakov, git, Phillip Wood, Ezekiel Newren

On Thu, Aug 28, 2025 at 4:52 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Alexander Monakov <amonakov@ispras.ru> writes:
>
> >> Using xxhash() was merely a sample code path for technology
> >> demonstration, so the Rust adoption topic may want to pick a
> >> different code path to do its thing.
> >
> > My interest here is just speeding up xdiff in C, is that a welcome topic?
>
> It seems that the (side) discussion on the performance has
> concluded, and Ezekiel's new iteration of the Rust thing moved to a
> non-overlapping part of the system, so I do not see any reason to
> keep this topic out of 'next'.
>
> Is everybody OK for me to mark the topic for 'next' soonish?  Any
> objections I overlooked?
>
> Thanks.
>

That seems reasonable to me.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2] optimize string hashing in xdiff
  2025-08-28 23:40     ` Junio C Hamano
  2025-08-29  1:13       ` Jacob Keller
@ 2025-08-29  3:09       ` Elijah Newren
  1 sibling, 0 replies; 22+ messages in thread
From: Elijah Newren @ 2025-08-29  3:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Alexander Monakov, git, Phillip Wood, Ezekiel Newren

On Thu, Aug 28, 2025 at 4:41 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Alexander Monakov <amonakov@ispras.ru> writes:
>
> >> Using xxhash() was merely a sample code path for technology
> >> demonstration, so the Rust adoption topic may want to pick a
> >> different code path to do its thing.
> >
> > My interest here is just speeding up xdiff in C, is that a welcome topic?
>
> It seems that the (side) discussion on the performance has
> concluded, and Ezekiel's new iteration of the Rust thing moved to a
> non-overlapping part of the system, so I do not see any reason to
> keep this topic out of 'next'.
>
> Is everybody OK for me to mark the topic for 'next' soonish?  Any
> objections I overlooked?

Sounds good to me.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-09-08 21:04 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-28 19:05 [PATCH 0/2] optimize string hashing in xdiff Alexander Monakov
2025-07-28 19:05 ` [PATCH 1/2] xdiff: refactor xdl_hash_record() Alexander Monakov
2025-07-28 19:05 ` [PATCH 2/2] xdiff: optimize xdl_hash_record_verbatim Alexander Monakov
2025-07-28 20:50   ` Junio C Hamano
2025-07-28 20:57     ` Alexander Monakov
2025-08-04 13:49   ` Phillip Wood
2025-08-04 14:39     ` Alexander Monakov
2025-08-11 13:13       ` Phillip Wood
2025-08-11 14:14         ` Alexander Monakov
2025-08-12 17:56           ` Alexander Monakov
2025-08-20 21:34             ` Junio C Hamano
2025-09-08 19:06               ` Alexander Monakov
2025-09-08 21:04                 ` Junio C Hamano
2025-08-13 13:10           ` Phillip Wood
2025-07-28 19:32 ` [PATCH 0/2] optimize string hashing in xdiff Junio C Hamano
2025-07-28 19:56   ` Eli Schwartz
2025-07-28 20:54     ` Junio C Hamano
2025-07-28 20:25   ` Alexander Monakov
2025-08-14 15:01     ` Junio C Hamano
2025-08-28 23:40     ` Junio C Hamano
2025-08-29  1:13       ` Jacob Keller
2025-08-29  3:09       ` Elijah Newren

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).