qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Richard Henderson <rth@twiddle.net>
To: "Emilio G. Cota" <cota@braap.org>
Cc: "MTTCG Devel" <mttcg@greensocs.com>,
	"Peter Maydell" <peter.maydell@linaro.org>,
	"Peter Crosthwaite" <crosthwaite.peter@gmail.com>,
	"QEMU Developers" <qemu-devel@nongnu.org>,
	"Sergey Fedorov" <serge.fdrv@gmail.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Alex Bennée" <alex.bennee@linaro.org>
Subject: Re: [Qemu-devel] [PATCH 07/10] tb hash: hash phys_pc, pc, and flags with xxhash
Date: Tue, 5 Apr 2016 14:08:13 -0700	[thread overview]
Message-ID: <5704293D.1070105@twiddle.net> (raw)
In-Reply-To: <20160405194028.GA6671@flamenco>

On 04/05/2016 12:40 PM, Emilio G. Cota wrote:
> On Tue, Apr 05, 2016 at 09:07:57 -0700, Richard Henderson wrote:
>> On 04/05/2016 08:48 AM, Paolo Bonzini wrote:
>>> I think it's fine to use the struct.  The exact size of the struct
>>> varies from 3 to 5 32-bit words, so it's hard to write nice
>>> size-dependent code for the hash.
>>
>> I don't think it is.  We have 3 integers.  It is trivial to create a simple
>> function of 2 multiplies, two adds, and a remainder.
>>
>> Take the primes from the xxhash.h, for example:
>>
>>   (phys_pc * PRIME32_2 + pc * PRIME32_3 + flags)
>>   % PRIME32_1
>>   & (CODE_GEN_PHYS_HASH_SIZE - 1)
>>
>> Obviously, some bucket measurements should be taken, but I can well imagine
>> that this might perform just as well as the fully generic hasher.
>
> That function doesn't perform well: 25.06s vs. 21.18s with xxh32.

Sure, that's just what came off the top of my head.

But the point is that we can do better than dropping data into memory.
Particularly for those hosts that do not support unaligned data, such as you
created with the packed structure.


> * inline:
>
> 00000000001a6800 <tb_hash_func>:
>   1a6800:	48 83 ec 28          	sub    $0x28,%rsp
>   1a6804:	69 cf 77 ca eb 85    	imul   $0x85ebca77,%edi,%ecx
>   1a680a:	48 89 3c 24          	mov    %rdi,(%rsp)
>   1a680e:	48 c1 ef 20          	shr    $0x20,%rdi
>   1a6812:	69 ff 77 ca eb 85    	imul   $0x85ebca77,%edi,%edi
>   1a6818:	48 89 74 24 08       	mov    %rsi,0x8(%rsp)
>   1a681d:	64 48 8b 04 25 28 00 	mov    %fs:0x28,%rax
>   1a6824:	00 00
>   1a6826:	48 89 44 24 18       	mov    %rax,0x18(%rsp)
>   1a682b:	31 c0                	xor    %eax,%eax
>   1a682d:	81 c1 29 44 23 24    	add    $0x24234429,%ecx
>   1a6833:	69 c6 77 ca eb 85    	imul   $0x85ebca77,%esi,%eax
>   1a6839:	48 c1 ee 20          	shr    $0x20,%rsi
>   1a683d:	81 ef 88 35 14 7a    	sub    $0x7a143588,%edi
>   1a6843:	69 f6 77 ca eb 85    	imul   $0x85ebca77,%esi,%esi
>   1a6849:	c1 c9 13             	ror    $0x13,%ecx
>   1a684c:	c1 cf 13             	ror    $0x13,%edi
>   1a684f:	83 c0 01             	add    $0x1,%eax
>   1a6852:	69 c9 b1 79 37 9e    	imul   $0x9e3779b1,%ecx,%ecx
>   1a6858:	c1 c8 13             	ror    $0x13,%eax
>   1a685b:	81 c6 50 86 c8 61    	add    $0x61c88650,%esi
>   1a6861:	69 ff b1 79 37 9e    	imul   $0x9e3779b1,%edi,%edi
>   1a6867:	c1 ce 13             	ror    $0x13,%esi
>   1a686a:	c1 c9 1f             	ror    $0x1f,%ecx
>   1a686d:	69 c0 b1 79 37 9e    	imul   $0x9e3779b1,%eax,%eax
>   1a6873:	c1 cf 19             	ror    $0x19,%edi
>   1a6876:	69 f6 b1 79 37 9e    	imul   $0x9e3779b1,%esi,%esi
>   1a687c:	8d 7c 39 14          	lea    0x14(%rcx,%rdi,1),%edi
>   1a6880:	c1 c8 14             	ror    $0x14,%eax
>   1a6883:	69 d2 3d ae b2 c2    	imul   $0xc2b2ae3d,%edx,%edx
>   1a6889:	01 f8                	add    %edi,%eax
>   1a688b:	c1 ce 0e             	ror    $0xe,%esi
>   1a688e:	01 c6                	add    %eax,%esi
>   1a6890:	01 f2                	add    %esi,%edx
>   1a6892:	c1 ca 0f             	ror    $0xf,%edx
>   1a6895:	69 d2 2f eb d4 27    	imul   $0x27d4eb2f,%edx,%edx
>   1a689b:	89 d0                	mov    %edx,%eax
>   1a689d:	c1 e8 0f             	shr    $0xf,%eax
>   1a68a0:	31 d0                	xor    %edx,%eax
>   1a68a2:	69 d0 77 ca eb 85    	imul   $0x85ebca77,%eax,%edx
>   1a68a8:	89 d0                	mov    %edx,%eax
>   1a68aa:	c1 e8 0d             	shr    $0xd,%eax
>   1a68ad:	31 d0                	xor    %edx,%eax
>   1a68af:	69 d0 3d ae b2 c2    	imul   $0xc2b2ae3d,%eax,%edx
>   1a68b5:	89 d0                	mov    %edx,%eax
>   1a68b7:	c1 e8 10             	shr    $0x10,%eax
>   1a68ba:	31 d0                	xor    %edx,%eax
>   1a68bc:	48 8b 54 24 18       	mov    0x18(%rsp),%rdx
>   1a68c1:	64 48 33 14 25 28 00 	xor    %fs:0x28,%rdx
>   1a68c8:	00 00
>   1a68ca:	75 05                	jne    1a68d1 <tb_hash_func+0xd1>
>   1a68cc:	48 83 c4 28          	add    $0x28,%rsp
>   1a68d0:	c3                   	retq

If I inline everything explicitly, we do even better.


r~


0000000000000000 <tb_hash_func>:
    0:	44 69 c6 77 ca eb 85 	imul   $0x85ebca77,%esi,%r8d
    7:	48 c1 ee 20          	shr    $0x20,%rsi
    b:	69 cf 77 ca eb 85    	imul   $0x85ebca77,%edi,%ecx
   11:	48 c1 ef 20          	shr    $0x20,%rdi
   15:	41 83 c0 01          	add    $0x1,%r8d
   19:	41 c1 c0 0d          	rol    $0xd,%r8d
   1d:	81 c1 29 44 23 24    	add    $0x24234429,%ecx
   23:	69 c6 77 ca eb 85    	imul   $0x85ebca77,%esi,%eax
   29:	c1 c1 0d             	rol    $0xd,%ecx
   2c:	41 69 f0 b1 79 37 9e 	imul   $0x9e3779b1,%r8d,%esi
   33:	05 50 86 c8 61       	add    $0x61c88650,%eax
   38:	69 ff 77 ca eb 85    	imul   $0x85ebca77,%edi,%edi
   3e:	c1 c6 0c             	rol    $0xc,%esi
   41:	c1 c0 0d             	rol    $0xd,%eax
   44:	69 d2 3d ae b2 c2    	imul   $0xc2b2ae3d,%edx,%edx
   4a:	81 ef 88 35 14 7a    	sub    $0x7a143588,%edi
   50:	69 c9 b1 79 37 9e    	imul   $0x9e3779b1,%ecx,%ecx
   56:	8d 54 16 14          	lea    0x14(%rsi,%rdx,1),%edx
   5a:	c1 c7 0d             	rol    $0xd,%edi
   5d:	69 c0 b1 79 37 9e    	imul   $0x9e3779b1,%eax,%eax
   63:	d1 c1                	rol    %ecx
   65:	01 d1                	add    %edx,%ecx
   67:	c1 c8 0e             	ror    $0xe,%eax
   6a:	69 d7 b1 79 37 9e    	imul   $0x9e3779b1,%edi,%edx
   70:	01 c8                	add    %ecx,%eax
   72:	c1 c2 07             	rol    $0x7,%edx
   75:	01 d0                	add    %edx,%eax
   77:	c1 c8 0f             	ror    $0xf,%eax
   7a:	69 c0 2f eb d4 27    	imul   $0x27d4eb2f,%eax,%eax
   80:	89 c2                	mov    %eax,%edx
   82:	c1 ea 0f             	shr    $0xf,%edx
   85:	31 d0                	xor    %edx,%eax
   87:	69 c0 77 ca eb 85    	imul   $0x85ebca77,%eax,%eax
   8d:	89 c2                	mov    %eax,%edx
   8f:	c1 ea 0d             	shr    $0xd,%edx
   92:	31 d0                	xor    %edx,%eax
   94:	69 c0 3d ae b2 c2    	imul   $0xc2b2ae3d,%eax,%eax
   9a:	89 c2                	mov    %eax,%edx
   9c:	c1 ea 10             	shr    $0x10,%edx
   9f:	31 d0                	xor    %edx,%eax
   a1:	c3                   	retq


#define PRIME32_1   2654435761U
#define PRIME32_2   2246822519U
#define PRIME32_3   3266489917U
#define PRIME32_4    668265263U
#define PRIME32_5    374761393U
#define XXH_rotl32(x, r) ((x << r) | (x >> (32 - r)))

unsigned tb_hash_func(unsigned long phys_pc, unsigned long pc, unsigned flags)
{
   unsigned h32, seed = 1;
   unsigned w0, w1, w2, w3, w4;
   unsigned v1, v2, v3, v4;

   w0 = phys_pc;
   w1 = phys_pc >> 31 >> 1;
   w2 = pc;
   w3 = pc >> 31 >> 1;
   w4 = flags;

   v1 = seed + PRIME32_1 + PRIME32_2;
   v2 = seed + PRIME32_2;
   v3 = seed + 0;
   v4 = seed - PRIME32_1;

   v1 += w0 * PRIME32_2;
   v1 = XXH_rotl32(v1, 13);
   v1 *= PRIME32_1;
   v2 += w1 * PRIME32_2;
   v2 = XXH_rotl32(v2, 13);
   v2 *= PRIME32_1;
   v3 += w2 * PRIME32_2;
   v3 = XXH_rotl32(v3, 13);
   v3 *= PRIME32_1;
   v4 += w3 * PRIME32_2;
   v4 = XXH_rotl32(v4, 13);
   v4 *= PRIME32_1;

   h32 = XXH_rotl32(v1, 1) + XXH_rotl32(v2, 7)
	+ XXH_rotl32(v3, 12) + XXH_rotl32(v4, 18);

   h32 += 20;

   h32 += w4 * PRIME32_3;
   h32  = XXH_rotl32(h32, 17) * PRIME32_4;

   h32 ^= h32 >> 15;
   h32 *= PRIME32_2;
   h32 ^= h32 >> 13;
   h32 *= PRIME32_3;
   h32 ^= h32 >> 16;

   return h32;
}

  reply	other threads:[~2016-04-05 21:08 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-05  5:30 [Qemu-devel] [PATCH 00/10] tb hash improvements Emilio G. Cota
2016-04-05  5:30 ` [Qemu-devel] [PATCH 01/10] translate-all: add missing fold of tb_ctx into tcg_ctx Emilio G. Cota
2016-04-05  8:49   ` Paolo Bonzini
2016-04-05  5:30 ` [Qemu-devel] [PATCH 02/10] compiler.h: add QEMU_CACHELINE + QEMU_ALIGN() + QEMU_CACHELINE_ALIGNED Emilio G. Cota
2016-04-05  7:57   ` Peter Maydell
2016-04-05 17:24     ` Emilio G. Cota
2016-04-05 18:01       ` Peter Maydell
2016-04-05 19:13         ` Emilio G. Cota
2016-04-05  8:49   ` Paolo Bonzini
2016-04-05 12:57   ` Lluís Vilanova
2016-04-05 12:58     ` Peter Maydell
2016-04-05 15:29       ` Paolo Bonzini
2016-04-05 16:23       ` Lluís Vilanova
2016-04-05 16:31         ` Richard Henderson
2016-04-05 16:56           ` Peter Maydell
2016-04-05 19:02             ` Lluís Vilanova
2016-04-05 19:15               ` Richard Henderson
2016-04-05 20:09                 ` Lluís Vilanova
2016-04-06 11:44                   ` Paolo Bonzini
2016-04-06 12:02                     ` Laurent Desnogues
2016-04-05  5:30 ` [Qemu-devel] [PATCH 03/10] seqlock: remove optional mutex Emilio G. Cota
2016-04-06  8:38   ` Alex Bennée
2016-04-05  5:30 ` [Qemu-devel] [PATCH 04/10] seqlock: rename write_lock/unlock to write_begin/end Emilio G. Cota
2016-04-06  8:42   ` Alex Bennée
2016-04-05  5:30 ` [Qemu-devel] [PATCH 05/10] include: add spinlock wrapper Emilio G. Cota
2016-04-05  8:51   ` Paolo Bonzini
2016-04-06 15:51     ` Alex Bennée
2016-04-05  5:30 ` [Qemu-devel] [PATCH 06/10] include: add xxhash.h Emilio G. Cota
2016-04-06 11:39   ` Alex Bennée
2016-04-06 22:59     ` Emilio G. Cota
2016-04-05  5:30 ` [Qemu-devel] [PATCH 07/10] tb hash: hash phys_pc, pc, and flags with xxhash Emilio G. Cota
2016-04-05 15:41   ` Richard Henderson
2016-04-05 15:48     ` Paolo Bonzini
2016-04-05 16:07       ` Richard Henderson
2016-04-05 19:40         ` Emilio G. Cota
2016-04-05 21:08           ` Richard Henderson [this message]
2016-04-06  0:52             ` Emilio G. Cota
2016-04-06 11:52               ` Paolo Bonzini
2016-04-06 17:44                 ` Emilio G. Cota
2016-04-06 18:23                   ` Paolo Bonzini
2016-04-06 18:27                     ` Richard Henderson
2016-04-07  0:37                     ` Emilio G. Cota
2016-04-07  8:46                       ` Paolo Bonzini
2016-04-05 16:33     ` Laurent Desnogues
2016-04-05 17:19       ` Richard Henderson
2016-04-06  6:06         ` Laurent Desnogues
2016-04-06 17:32           ` Emilio G. Cota
2016-04-06 17:42             ` Richard Henderson
2016-04-07  8:12               ` Laurent Desnogues
2016-04-05  5:30 ` [Qemu-devel] [PATCH 08/10] qht: QEMU's fast, resizable and scalable Hash Table Emilio G. Cota
2016-04-05  9:01   ` Paolo Bonzini
2016-04-05 15:50   ` Richard Henderson
2016-04-08 10:27   ` Alex Bennée
2016-04-19 23:03     ` Emilio G. Cota
2016-04-05  5:30 ` [Qemu-devel] [PATCH 09/10] qht: add test program Emilio G. Cota
2016-04-08 10:45   ` Alex Bennée
2016-04-19 23:06     ` Emilio G. Cota
2016-04-20  7:50       ` Alex Bennée
2016-04-05  5:30 ` [Qemu-devel] [PATCH 10/10] tb hash: track translated blocks with qht Emilio G. Cota
2016-04-08 12:39   ` Alex Bennée
2016-04-05  8:47 ` [Qemu-devel] [PATCH 00/10] tb hash improvements Alex Bennée
2016-04-05  9:01 ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5704293D.1070105@twiddle.net \
    --to=rth@twiddle.net \
    --cc=alex.bennee@linaro.org \
    --cc=cota@braap.org \
    --cc=crosthwaite.peter@gmail.com \
    --cc=mttcg@greensocs.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=serge.fdrv@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).