All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Emilio G. Cota" <cota@braap.org>
To: Richard Henderson <rth@twiddle.net>
Cc: "MTTCG Devel" <mttcg@greensocs.com>,
	"Peter Maydell" <peter.maydell@linaro.org>,
	"Peter Crosthwaite" <crosthwaite.peter@gmail.com>,
	"QEMU Developers" <qemu-devel@nongnu.org>,
	"Sergey Fedorov" <serge.fdrv@gmail.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Alex Bennée" <alex.bennee@linaro.org>
Subject: Re: [Qemu-devel] [PATCH 07/10] tb hash: hash phys_pc, pc, and flags with xxhash
Date: Tue, 5 Apr 2016 15:40:28 -0400	[thread overview]
Message-ID: <20160405194028.GA6671@flamenco> (raw)
In-Reply-To: <5703E2DD.3020103@twiddle.net>

On Tue, Apr 05, 2016 at 09:07:57 -0700, Richard Henderson wrote:
> On 04/05/2016 08:48 AM, Paolo Bonzini wrote:
> >I think it's fine to use the struct.  The exact size of the struct
> >varies from 3 to 5 32-bit words, so it's hard to write nice
> >size-dependent code for the hash.
> 
> I don't think it is.  We have 3 integers.  It is trivial to create a simple
> function of 2 multiplies, two adds, and a remainder.
> 
> Take the primes from the xxhash.h, for example:
> 
>   (phys_pc * PRIME32_2 + pc * PRIME32_3 + flags)
>   % PRIME32_1
>   & (CODE_GEN_PHYS_HASH_SIZE - 1)
> 
> Obviously, some bucket measurements should be taken, but I can well imagine
> that this might perform just as well as the fully generic hasher.

That function doesn't perform well: 25.06s vs. 21.18s with xxh32.

Having the packed struct and passing it to an *inlined* xxhash is
virtually unbeatable; gcc (>=v4.6, dunno about older ones) optimizes the
inline function since it knows the size of the struct.

To show this I'm appending the generated code for tb_hash_func when xxh32
is inlined vs. when it is not, for x86_64-softmmu. Results are similar
for arm-softmmu.

Anyway (for the arm bootup test) we're talking about ~0.50% of runtime spent
in tb_hash_func (with xxh32 inlined), so whatever we did here could not
improve overall performance much.

Thanks,

		Emilio

* no inline:

00000000001a4e60 <qemu_xxh32>:
  1a4e60:	48 83 ec 18          	sub    $0x18,%rsp
  1a4e64:	4c 8d 0c b7          	lea    (%rdi,%rsi,4),%r9
  1a4e68:	64 48 8b 04 25 28 00 	mov    %fs:0x28,%rax
  1a4e6f:	00 00 
  1a4e71:	48 89 44 24 08       	mov    %rax,0x8(%rsp)
  1a4e76:	31 c0                	xor    %eax,%eax
  1a4e78:	48 83 fe 03          	cmp    $0x3,%rsi
  1a4e7c:	8d 82 b1 67 56 16    	lea    0x165667b1(%rdx),%eax
  1a4e82:	0f 86 92 00 00 00    	jbe    1a4f1a <qemu_xxh32+0xba>
  1a4e88:	4d 8d 59 f0          	lea    -0x10(%r9),%r11
  1a4e8c:	44 8d 82 28 44 23 24 	lea    0x24234428(%rdx),%r8d
  1a4e93:	8d 8a 77 ca eb 85    	lea    -0x7a143589(%rdx),%ecx
  1a4e99:	8d 82 4f 86 c8 61    	lea    0x61c8864f(%rdx),%eax
  1a4e9f:	90                   	nop
  1a4ea0:	44 8b 17             	mov    (%rdi),%r10d
  1a4ea3:	45 69 d2 77 ca eb 85 	imul   $0x85ebca77,%r10d,%r10d
  1a4eaa:	45 01 d0             	add    %r10d,%r8d
  1a4ead:	44 8b 57 04          	mov    0x4(%rdi),%r10d
  1a4eb1:	41 c1 c0 0d          	rol    $0xd,%r8d
  1a4eb5:	45 69 c0 b1 79 37 9e 	imul   $0x9e3779b1,%r8d,%r8d
  1a4ebc:	45 69 d2 77 ca eb 85 	imul   $0x85ebca77,%r10d,%r10d
  1a4ec3:	44 01 d1             	add    %r10d,%ecx
  1a4ec6:	44 8b 57 08          	mov    0x8(%rdi),%r10d
  1a4eca:	c1 c1 0d             	rol    $0xd,%ecx
  1a4ecd:	69 c9 b1 79 37 9e    	imul   $0x9e3779b1,%ecx,%ecx
  1a4ed3:	45 69 d2 77 ca eb 85 	imul   $0x85ebca77,%r10d,%r10d
  1a4eda:	44 01 d2             	add    %r10d,%edx
  1a4edd:	44 8b 57 0c          	mov    0xc(%rdi),%r10d
  1a4ee1:	48 83 c7 10          	add    $0x10,%rdi
  1a4ee5:	c1 c2 0d             	rol    $0xd,%edx
  1a4ee8:	69 d2 b1 79 37 9e    	imul   $0x9e3779b1,%edx,%edx
  1a4eee:	45 69 d2 77 ca eb 85 	imul   $0x85ebca77,%r10d,%r10d
  1a4ef5:	44 01 d0             	add    %r10d,%eax
  1a4ef8:	c1 c0 0d             	rol    $0xd,%eax
  1a4efb:	69 c0 b1 79 37 9e    	imul   $0x9e3779b1,%eax,%eax
  1a4f01:	49 39 fb             	cmp    %rdi,%r11
  1a4f04:	73 9a                	jae    1a4ea0 <qemu_xxh32+0x40>
  1a4f06:	c1 c9 19             	ror    $0x19,%ecx
  1a4f09:	41 c1 c8 1f          	ror    $0x1f,%r8d
  1a4f0d:	c1 ca 14             	ror    $0x14,%edx
  1a4f10:	44 01 c1             	add    %r8d,%ecx
  1a4f13:	c1 c8 0e             	ror    $0xe,%eax
  1a4f16:	01 ca                	add    %ecx,%edx
  1a4f18:	01 d0                	add    %edx,%eax
  1a4f1a:	4c 39 cf             	cmp    %r9,%rdi
  1a4f1d:	8d 34 b0             	lea    (%rax,%rsi,4),%esi
  1a4f20:	73 22                	jae    1a4f44 <qemu_xxh32+0xe4>
  1a4f22:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
  1a4f28:	8b 17                	mov    (%rdi),%edx
  1a4f2a:	48 83 c7 04          	add    $0x4,%rdi
  1a4f2e:	69 c2 3d ae b2 c2    	imul   $0xc2b2ae3d,%edx,%eax
  1a4f34:	01 c6                	add    %eax,%esi
  1a4f36:	c1 c6 11             	rol    $0x11,%esi
  1a4f39:	69 f6 2f eb d4 27    	imul   $0x27d4eb2f,%esi,%esi
  1a4f3f:	49 39 f9             	cmp    %rdi,%r9
  1a4f42:	77 e4                	ja     1a4f28 <qemu_xxh32+0xc8>
  1a4f44:	89 f0                	mov    %esi,%eax
  1a4f46:	c1 e8 0f             	shr    $0xf,%eax
  1a4f49:	31 f0                	xor    %esi,%eax
  1a4f4b:	69 d0 77 ca eb 85    	imul   $0x85ebca77,%eax,%edx
  1a4f51:	89 d0                	mov    %edx,%eax
  1a4f53:	c1 e8 0d             	shr    $0xd,%eax
  1a4f56:	31 d0                	xor    %edx,%eax
  1a4f58:	69 d0 3d ae b2 c2    	imul   $0xc2b2ae3d,%eax,%edx
  1a4f5e:	89 d0                	mov    %edx,%eax
  1a4f60:	c1 e8 10             	shr    $0x10,%eax
  1a4f63:	31 d0                	xor    %edx,%eax
  1a4f65:	48 8b 54 24 08       	mov    0x8(%rsp),%rdx
  1a4f6a:	64 48 33 14 25 28 00 	xor    %fs:0x28,%rdx
  1a4f71:	00 00 
  1a4f73:	75 05                	jne    1a4f7a <qemu_xxh32+0x11a>
  1a4f75:	48 83 c4 18          	add    $0x18,%rsp
  1a4f79:	c3                   	retq   
  1a4f7a:	e8 f1 7a fe ff       	callq  18ca70 <__stack_chk_fail@plt>
  1a4f7f:	90                   	nop

00000000001a4f80 <tb_hash_func>:
  1a4f80:	48 83 ec 28          	sub    $0x28,%rsp
  1a4f84:	48 89 3c 24          	mov    %rdi,(%rsp)
  1a4f88:	48 89 74 24 08       	mov    %rsi,0x8(%rsp)
  1a4f8d:	48 89 e7             	mov    %rsp,%rdi
  1a4f90:	89 54 24 10          	mov    %edx,0x10(%rsp)
  1a4f94:	be 05 00 00 00       	mov    $0x5,%esi
  1a4f99:	ba 01 00 00 00       	mov    $0x1,%edx
  1a4f9e:	64 48 8b 04 25 28 00 	mov    %fs:0x28,%rax
  1a4fa5:	00 00 
  1a4fa7:	48 89 44 24 18       	mov    %rax,0x18(%rsp)
  1a4fac:	31 c0                	xor    %eax,%eax
  1a4fae:	e8 ad fe ff ff       	callq  1a4e60 <qemu_xxh32>
  1a4fb3:	48 8b 54 24 18       	mov    0x18(%rsp),%rdx
  1a4fb8:	64 48 33 14 25 28 00 	xor    %fs:0x28,%rdx
  1a4fbf:	00 00 
  1a4fc1:	75 05                	jne    1a4fc8 <tb_hash_func+0x48>
  1a4fc3:	48 83 c4 28          	add    $0x28,%rsp
  1a4fc7:	c3                   	retq   
  1a4fc8:	e8 a3 7a fe ff       	callq  18ca70 <__stack_chk_fail@plt>
  1a4fcd:	0f 1f 00             	nopl   (%rax)

* inline:

00000000001a6800 <tb_hash_func>:
  1a6800:	48 83 ec 28          	sub    $0x28,%rsp
  1a6804:	69 cf 77 ca eb 85    	imul   $0x85ebca77,%edi,%ecx
  1a680a:	48 89 3c 24          	mov    %rdi,(%rsp)
  1a680e:	48 c1 ef 20          	shr    $0x20,%rdi
  1a6812:	69 ff 77 ca eb 85    	imul   $0x85ebca77,%edi,%edi
  1a6818:	48 89 74 24 08       	mov    %rsi,0x8(%rsp)
  1a681d:	64 48 8b 04 25 28 00 	mov    %fs:0x28,%rax
  1a6824:	00 00 
  1a6826:	48 89 44 24 18       	mov    %rax,0x18(%rsp)
  1a682b:	31 c0                	xor    %eax,%eax
  1a682d:	81 c1 29 44 23 24    	add    $0x24234429,%ecx
  1a6833:	69 c6 77 ca eb 85    	imul   $0x85ebca77,%esi,%eax
  1a6839:	48 c1 ee 20          	shr    $0x20,%rsi
  1a683d:	81 ef 88 35 14 7a    	sub    $0x7a143588,%edi
  1a6843:	69 f6 77 ca eb 85    	imul   $0x85ebca77,%esi,%esi
  1a6849:	c1 c9 13             	ror    $0x13,%ecx
  1a684c:	c1 cf 13             	ror    $0x13,%edi
  1a684f:	83 c0 01             	add    $0x1,%eax
  1a6852:	69 c9 b1 79 37 9e    	imul   $0x9e3779b1,%ecx,%ecx
  1a6858:	c1 c8 13             	ror    $0x13,%eax
  1a685b:	81 c6 50 86 c8 61    	add    $0x61c88650,%esi
  1a6861:	69 ff b1 79 37 9e    	imul   $0x9e3779b1,%edi,%edi
  1a6867:	c1 ce 13             	ror    $0x13,%esi
  1a686a:	c1 c9 1f             	ror    $0x1f,%ecx
  1a686d:	69 c0 b1 79 37 9e    	imul   $0x9e3779b1,%eax,%eax
  1a6873:	c1 cf 19             	ror    $0x19,%edi
  1a6876:	69 f6 b1 79 37 9e    	imul   $0x9e3779b1,%esi,%esi
  1a687c:	8d 7c 39 14          	lea    0x14(%rcx,%rdi,1),%edi
  1a6880:	c1 c8 14             	ror    $0x14,%eax
  1a6883:	69 d2 3d ae b2 c2    	imul   $0xc2b2ae3d,%edx,%edx
  1a6889:	01 f8                	add    %edi,%eax
  1a688b:	c1 ce 0e             	ror    $0xe,%esi
  1a688e:	01 c6                	add    %eax,%esi
  1a6890:	01 f2                	add    %esi,%edx
  1a6892:	c1 ca 0f             	ror    $0xf,%edx
  1a6895:	69 d2 2f eb d4 27    	imul   $0x27d4eb2f,%edx,%edx
  1a689b:	89 d0                	mov    %edx,%eax
  1a689d:	c1 e8 0f             	shr    $0xf,%eax
  1a68a0:	31 d0                	xor    %edx,%eax
  1a68a2:	69 d0 77 ca eb 85    	imul   $0x85ebca77,%eax,%edx
  1a68a8:	89 d0                	mov    %edx,%eax
  1a68aa:	c1 e8 0d             	shr    $0xd,%eax
  1a68ad:	31 d0                	xor    %edx,%eax
  1a68af:	69 d0 3d ae b2 c2    	imul   $0xc2b2ae3d,%eax,%edx
  1a68b5:	89 d0                	mov    %edx,%eax
  1a68b7:	c1 e8 10             	shr    $0x10,%eax
  1a68ba:	31 d0                	xor    %edx,%eax
  1a68bc:	48 8b 54 24 18       	mov    0x18(%rsp),%rdx
  1a68c1:	64 48 33 14 25 28 00 	xor    %fs:0x28,%rdx
  1a68c8:	00 00 
  1a68ca:	75 05                	jne    1a68d1 <tb_hash_func+0xd1>
  1a68cc:	48 83 c4 28          	add    $0x28,%rsp
  1a68d0:	c3                   	retq   
  1a68d1:	e8 9a 61 fe ff       	callq  18ca70 <__stack_chk_fail@plt>
  1a68d6:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  1a68dd:	00 00 00 

  reply	other threads:[~2016-04-05 19:40 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-05  5:30 [Qemu-devel] [PATCH 00/10] tb hash improvements Emilio G. Cota
2016-04-05  5:30 ` [Qemu-devel] [PATCH 01/10] translate-all: add missing fold of tb_ctx into tcg_ctx Emilio G. Cota
2016-04-05  8:49   ` Paolo Bonzini
2016-04-05  5:30 ` [Qemu-devel] [PATCH 02/10] compiler.h: add QEMU_CACHELINE + QEMU_ALIGN() + QEMU_CACHELINE_ALIGNED Emilio G. Cota
2016-04-05  7:57   ` Peter Maydell
2016-04-05 17:24     ` Emilio G. Cota
2016-04-05 18:01       ` Peter Maydell
2016-04-05 19:13         ` Emilio G. Cota
2016-04-05  8:49   ` Paolo Bonzini
2016-04-05 12:57   ` Lluís Vilanova
2016-04-05 12:58     ` Peter Maydell
2016-04-05 15:29       ` Paolo Bonzini
2016-04-05 16:23       ` Lluís Vilanova
2016-04-05 16:31         ` Richard Henderson
2016-04-05 16:56           ` Peter Maydell
2016-04-05 19:02             ` Lluís Vilanova
2016-04-05 19:15               ` Richard Henderson
2016-04-05 20:09                 ` Lluís Vilanova
2016-04-06 11:44                   ` Paolo Bonzini
2016-04-06 12:02                     ` Laurent Desnogues
2016-04-05  5:30 ` [Qemu-devel] [PATCH 03/10] seqlock: remove optional mutex Emilio G. Cota
2016-04-06  8:38   ` Alex Bennée
2016-04-05  5:30 ` [Qemu-devel] [PATCH 04/10] seqlock: rename write_lock/unlock to write_begin/end Emilio G. Cota
2016-04-06  8:42   ` Alex Bennée
2016-04-05  5:30 ` [Qemu-devel] [PATCH 05/10] include: add spinlock wrapper Emilio G. Cota
2016-04-05  8:51   ` Paolo Bonzini
2016-04-06 15:51     ` Alex Bennée
2016-04-05  5:30 ` [Qemu-devel] [PATCH 06/10] include: add xxhash.h Emilio G. Cota
2016-04-06 11:39   ` Alex Bennée
2016-04-06 22:59     ` Emilio G. Cota
2016-04-05  5:30 ` [Qemu-devel] [PATCH 07/10] tb hash: hash phys_pc, pc, and flags with xxhash Emilio G. Cota
2016-04-05 15:41   ` Richard Henderson
2016-04-05 15:48     ` Paolo Bonzini
2016-04-05 16:07       ` Richard Henderson
2016-04-05 19:40         ` Emilio G. Cota [this message]
2016-04-05 21:08           ` Richard Henderson
2016-04-06  0:52             ` Emilio G. Cota
2016-04-06 11:52               ` Paolo Bonzini
2016-04-06 17:44                 ` Emilio G. Cota
2016-04-06 18:23                   ` Paolo Bonzini
2016-04-06 18:27                     ` Richard Henderson
2016-04-07  0:37                     ` Emilio G. Cota
2016-04-07  8:46                       ` Paolo Bonzini
2016-04-05 16:33     ` Laurent Desnogues
2016-04-05 17:19       ` Richard Henderson
2016-04-06  6:06         ` Laurent Desnogues
2016-04-06 17:32           ` Emilio G. Cota
2016-04-06 17:42             ` Richard Henderson
2016-04-07  8:12               ` Laurent Desnogues
2016-04-05  5:30 ` [Qemu-devel] [PATCH 08/10] qht: QEMU's fast, resizable and scalable Hash Table Emilio G. Cota
2016-04-05  9:01   ` Paolo Bonzini
2016-04-05 15:50   ` Richard Henderson
2016-04-08 10:27   ` Alex Bennée
2016-04-19 23:03     ` Emilio G. Cota
2016-04-05  5:30 ` [Qemu-devel] [PATCH 09/10] qht: add test program Emilio G. Cota
2016-04-08 10:45   ` Alex Bennée
2016-04-19 23:06     ` Emilio G. Cota
2016-04-20  7:50       ` Alex Bennée
2016-04-05  5:30 ` [Qemu-devel] [PATCH 10/10] tb hash: track translated blocks with qht Emilio G. Cota
2016-04-08 12:39   ` Alex Bennée
2016-04-05  8:47 ` [Qemu-devel] [PATCH 00/10] tb hash improvements Alex Bennée
2016-04-05  9:01 ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160405194028.GA6671@flamenco \
    --to=cota@braap.org \
    --cc=alex.bennee@linaro.org \
    --cc=crosthwaite.peter@gmail.com \
    --cc=mttcg@greensocs.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=serge.fdrv@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.