All of lore.kernel.org
 help / color / mirror / Atom feed
From: Phillip Wood <phillip.wood123@gmail.com>
To: Alexander Monakov <amonakov@ispras.ru>, git@vger.kernel.org
Cc: Phillip Wood <phillip.wood@dunelm.org.uk>
Subject: Re: [PATCH 2/2] xdiff: optimize xdl_hash_record_verbatim
Date: Mon, 4 Aug 2025 14:49:41 +0100	[thread overview]
Message-ID: <aedb1be1-3151-421e-94ce-27bc77d80b83@gmail.com> (raw)
In-Reply-To: <20250728190520.10962-3-amonakov@ispras.ru>

Hi Alexander

On 28/07/2025 20:05, Alexander Monakov wrote:
> xdl_hash_record_verbatim uses modified djb2 hash with XOR instead of ADD
> for combining. The ADD-based variant is used as the basis of the modern
> ("GNU") symbol lookup scheme in ELF. Glibc dynamic loader received an
> optimized version of this hash function thanks to Noah Goldstein [1].

Interesting

> Switch xdl_hash_record_verbatim to additive hashing and implement
> an optimized loop following the scheme suggested by Noah.
> 
> Timing 'git log --oneline --shortstat v2.0.0..v2.5.0' under perf, I got
> 
> version | cycles, bn | instructions, bn
> ---------------------------------------
> A         6.38         11.3
> B         6.21         10.89
> C         5.80          9.95
> D         5.83          8.74
> ---------------------------------------
> 
> A: baseline (git master at e4ef0485fd78)
> B: plus 'xdiff: refactor xdl_hash_record()'
> C: and plus this patch
> D: with 'xdiff: use xxhash' by Phillip Wood

I think it would be helpful to say that B is the previous patch and 
provide a link for D.
> The resulting speedup for xdl_hash_record_verbatim itself is about 1.5x.

While that's interesting it does not tell us how much this speeds up 
diff generation. Running the command above under hyperfine it is 1.02 ± 
0.01 times faster than the previous patch and 1.11 ± 0.01 times faster 
than master. Using xxhash (D above) is 1.03 ± 0.01 times faster than 
this patch. How do the changes below affect compilers other than gcc and 
clang than do not see the re-association barrier? We'd want to make sure 
that it does not result in slower diffs. Can we use 
atomic_signal_fence() on compilers that support C11? (we don't require 
C11 so we'd have to make it optional but it is supported by things like 
MSVC)

Thanks

Phillip
> 
> [1] https://inbox.sourceware.org/libc-alpha/20220519221803.57957-6-goldstein.w.n@gmail.com/
> 
> Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
> ---
>   xdiff/xutils.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++----
>   1 file changed, 55 insertions(+), 4 deletions(-)
> 
> diff --git a/xdiff/xutils.c b/xdiff/xutils.c
> index e070ed649f..b1f8273f0f 100644
> --- a/xdiff/xutils.c
> +++ b/xdiff/xutils.c
> @@ -294,16 +294,67 @@ unsigned long xdl_hash_record_with_whitespace(char const **data,
>   	return ha;
>   }
>   
> +/*
> + * Compiler reassociation barrier: pretend to modify X and Y to disallow
> + * changing evaluation order with respect to following uses of X and Y.
> + */
> +#ifdef __GNUC__
> +#define REASSOC_FENCE(x, y) asm("" : "+r"(x), "+r"(y))
> +#else
> +#define REASSOC_FENCE(x, y)
> +#endif
> +
>   unsigned long xdl_hash_record_verbatim(char const **data, char const *top) {
> -	unsigned long ha = 5381;
> +	unsigned long ha = 5381, c0, c1;
>   	char const *ptr = *data;
> -
> +#if 0
> +	/*
> +	 * The baseline form of the optimized loop below. This is the djb2
> +	 * hash (the above function uses a variant with XOR instead of ADD).
> +	 */
>   	for (; ptr < top && *ptr != '\n'; ptr++) {
>   		ha += (ha << 5);
> -		ha ^= (unsigned long) *ptr;
> +		ha += (unsigned long) *ptr;
>   	}
>   	*data = ptr < top ? ptr + 1: ptr;
> -
> +#else
> +	/* Process two characters per iteration. */
> +	if (top - ptr >= 2) do {
> +		if ((c0 = ptr[0]) == '\n') {
> +			*data = ptr + 1;
> +			return ha;
> +		}
> +		if ((c1 = ptr[1]) == '\n') {
> +			*data = ptr + 2;
> +			c0 += ha;
> +			REASSOC_FENCE(c0, ha);
> +			ha = ha * 32 + c0;
> +			return ha;
> +		}
> +		/*
> +		 * Combine characters C0 and C1 into the hash HA. We have
> +		 * HA = (HA * 33 + C0) * 33 + C1, and we want to ensure
> +		 * that dependency chain over HA is just one multiplication
> +		 * and one addition, i.e. we want to evaluate this as
> +		 * HA = HA * 33 * 33 + (C0 * 33 + C1), and likewise prefer
> +		 * (C0 * 32 + (C0 + C1)) for the expression in parenthesis.
> +		 */
> +		ha *= 33 * 33;
> +		c1 += c0;
> +		REASSOC_FENCE(c1, c0);
> +		c1 += c0 * 32;
> +		REASSOC_FENCE(c1, ha);
> +		ha += c1;
> +
> +		ptr += 2;
> +	} while (ptr < top - 1);
> +	*data = top;
> +	if (ptr < top && (c0 = ptr[0]) != '\n') {
> +		c0 += ha;
> +		REASSOC_FENCE(c0, ha);
> +		ha = ha * 32 + c0;
> +	}
> +#endif
>   	return ha;
>   }
>   


  parent reply	other threads:[~2025-08-04 13:49 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-28 19:05 [PATCH 0/2] optimize string hashing in xdiff Alexander Monakov
2025-07-28 19:05 ` [PATCH 1/2] xdiff: refactor xdl_hash_record() Alexander Monakov
2025-07-28 19:05 ` [PATCH 2/2] xdiff: optimize xdl_hash_record_verbatim Alexander Monakov
2025-07-28 20:50   ` Junio C Hamano
2025-07-28 20:57     ` Alexander Monakov
2025-08-04 13:49   ` Phillip Wood [this message]
2025-08-04 14:39     ` Alexander Monakov
2025-08-11 13:13       ` Phillip Wood
2025-08-11 14:14         ` Alexander Monakov
2025-08-12 17:56           ` Alexander Monakov
2025-08-20 21:34             ` Junio C Hamano
2025-09-08 19:06               ` Alexander Monakov
2025-09-08 21:04                 ` Junio C Hamano
2025-08-13 13:10           ` Phillip Wood
2025-07-28 19:32 ` [PATCH 0/2] optimize string hashing in xdiff Junio C Hamano
2025-07-28 19:56   ` Eli Schwartz
2025-07-28 20:54     ` Junio C Hamano
2025-07-28 20:25   ` Alexander Monakov
2025-08-14 15:01     ` Junio C Hamano
2025-08-28 23:40     ` Junio C Hamano
2025-08-29  1:13       ` Jacob Keller
2025-08-29  3:09       ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aedb1be1-3151-421e-94ce-27bc77d80b83@gmail.com \
    --to=phillip.wood123@gmail.com \
    --cc=amonakov@ispras.ru \
    --cc=git@vger.kernel.org \
    --cc=phillip.wood@dunelm.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.