From: "Emilio G. Cota" <cota@braap.org>
To: Richard Henderson <rth@twiddle.net>
Cc: "MTTCG Devel" <mttcg@greensocs.com>,
"Peter Maydell" <peter.maydell@linaro.org>,
"Peter Crosthwaite" <crosthwaite.peter@gmail.com>,
"QEMU Developers" <qemu-devel@nongnu.org>,
"Sergey Fedorov" <serge.fdrv@gmail.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Alex Bennée" <alex.bennee@linaro.org>
Subject: Re: [Qemu-devel] [PATCH 07/10] tb hash: hash phys_pc, pc, and flags with xxhash
Date: Tue, 5 Apr 2016 15:40:28 -0400 [thread overview]
Message-ID: <20160405194028.GA6671@flamenco> (raw)
In-Reply-To: <5703E2DD.3020103@twiddle.net>
On Tue, Apr 05, 2016 at 09:07:57 -0700, Richard Henderson wrote:
> On 04/05/2016 08:48 AM, Paolo Bonzini wrote:
> >I think it's fine to use the struct. The exact size of the struct
> >varies from 3 to 5 32-bit words, so it's hard to write nice
> >size-dependent code for the hash.
>
> I don't think it is. We have 3 integers. It is trivial to create a simple
> function of 2 multiplies, two adds, and a remainder.
>
> Take the primes from the xxhash.h, for example:
>
> (phys_pc * PRIME32_2 + pc * PRIME32_3 + flags)
> % PRIME32_1
> & (CODE_GEN_PHYS_HASH_SIZE - 1)
>
> Obviously, some bucket measurements should be taken, but I can well imagine
> that this might perform just as well as the fully generic hasher.
That function doesn't perform well: 25.06s vs. 21.18s with xxh32.
Having the packed struct and passing it to an *inlined* xxhash is
virtually unbeatable; gcc (>=v4.6, dunno about older ones) optimizes the
inline function since it knows the size of the struct.
To show this I'm appending the generated code for tb_hash_func when xxh32
is inlined vs. when it is not, for x86_64-softmmu. Results are similar
for arm-softmmu.
Anyway (for the arm bootup test) we're talking about ~0.50% of runtime spent
in tb_hash_func (with xxh32 inlined), so whatever we did here could not
improve overall performance much.
Thanks,
Emilio
* no inline:
00000000001a4e60 <qemu_xxh32>:
1a4e60: 48 83 ec 18 sub $0x18,%rsp
1a4e64: 4c 8d 0c b7 lea (%rdi,%rsi,4),%r9
1a4e68: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
1a4e6f: 00 00
1a4e71: 48 89 44 24 08 mov %rax,0x8(%rsp)
1a4e76: 31 c0 xor %eax,%eax
1a4e78: 48 83 fe 03 cmp $0x3,%rsi
1a4e7c: 8d 82 b1 67 56 16 lea 0x165667b1(%rdx),%eax
1a4e82: 0f 86 92 00 00 00 jbe 1a4f1a <qemu_xxh32+0xba>
1a4e88: 4d 8d 59 f0 lea -0x10(%r9),%r11
1a4e8c: 44 8d 82 28 44 23 24 lea 0x24234428(%rdx),%r8d
1a4e93: 8d 8a 77 ca eb 85 lea -0x7a143589(%rdx),%ecx
1a4e99: 8d 82 4f 86 c8 61 lea 0x61c8864f(%rdx),%eax
1a4e9f: 90 nop
1a4ea0: 44 8b 17 mov (%rdi),%r10d
1a4ea3: 45 69 d2 77 ca eb 85 imul $0x85ebca77,%r10d,%r10d
1a4eaa: 45 01 d0 add %r10d,%r8d
1a4ead: 44 8b 57 04 mov 0x4(%rdi),%r10d
1a4eb1: 41 c1 c0 0d rol $0xd,%r8d
1a4eb5: 45 69 c0 b1 79 37 9e imul $0x9e3779b1,%r8d,%r8d
1a4ebc: 45 69 d2 77 ca eb 85 imul $0x85ebca77,%r10d,%r10d
1a4ec3: 44 01 d1 add %r10d,%ecx
1a4ec6: 44 8b 57 08 mov 0x8(%rdi),%r10d
1a4eca: c1 c1 0d rol $0xd,%ecx
1a4ecd: 69 c9 b1 79 37 9e imul $0x9e3779b1,%ecx,%ecx
1a4ed3: 45 69 d2 77 ca eb 85 imul $0x85ebca77,%r10d,%r10d
1a4eda: 44 01 d2 add %r10d,%edx
1a4edd: 44 8b 57 0c mov 0xc(%rdi),%r10d
1a4ee1: 48 83 c7 10 add $0x10,%rdi
1a4ee5: c1 c2 0d rol $0xd,%edx
1a4ee8: 69 d2 b1 79 37 9e imul $0x9e3779b1,%edx,%edx
1a4eee: 45 69 d2 77 ca eb 85 imul $0x85ebca77,%r10d,%r10d
1a4ef5: 44 01 d0 add %r10d,%eax
1a4ef8: c1 c0 0d rol $0xd,%eax
1a4efb: 69 c0 b1 79 37 9e imul $0x9e3779b1,%eax,%eax
1a4f01: 49 39 fb cmp %rdi,%r11
1a4f04: 73 9a jae 1a4ea0 <qemu_xxh32+0x40>
1a4f06: c1 c9 19 ror $0x19,%ecx
1a4f09: 41 c1 c8 1f ror $0x1f,%r8d
1a4f0d: c1 ca 14 ror $0x14,%edx
1a4f10: 44 01 c1 add %r8d,%ecx
1a4f13: c1 c8 0e ror $0xe,%eax
1a4f16: 01 ca add %ecx,%edx
1a4f18: 01 d0 add %edx,%eax
1a4f1a: 4c 39 cf cmp %r9,%rdi
1a4f1d: 8d 34 b0 lea (%rax,%rsi,4),%esi
1a4f20: 73 22 jae 1a4f44 <qemu_xxh32+0xe4>
1a4f22: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
1a4f28: 8b 17 mov (%rdi),%edx
1a4f2a: 48 83 c7 04 add $0x4,%rdi
1a4f2e: 69 c2 3d ae b2 c2 imul $0xc2b2ae3d,%edx,%eax
1a4f34: 01 c6 add %eax,%esi
1a4f36: c1 c6 11 rol $0x11,%esi
1a4f39: 69 f6 2f eb d4 27 imul $0x27d4eb2f,%esi,%esi
1a4f3f: 49 39 f9 cmp %rdi,%r9
1a4f42: 77 e4 ja 1a4f28 <qemu_xxh32+0xc8>
1a4f44: 89 f0 mov %esi,%eax
1a4f46: c1 e8 0f shr $0xf,%eax
1a4f49: 31 f0 xor %esi,%eax
1a4f4b: 69 d0 77 ca eb 85 imul $0x85ebca77,%eax,%edx
1a4f51: 89 d0 mov %edx,%eax
1a4f53: c1 e8 0d shr $0xd,%eax
1a4f56: 31 d0 xor %edx,%eax
1a4f58: 69 d0 3d ae b2 c2 imul $0xc2b2ae3d,%eax,%edx
1a4f5e: 89 d0 mov %edx,%eax
1a4f60: c1 e8 10 shr $0x10,%eax
1a4f63: 31 d0 xor %edx,%eax
1a4f65: 48 8b 54 24 08 mov 0x8(%rsp),%rdx
1a4f6a: 64 48 33 14 25 28 00 xor %fs:0x28,%rdx
1a4f71: 00 00
1a4f73: 75 05 jne 1a4f7a <qemu_xxh32+0x11a>
1a4f75: 48 83 c4 18 add $0x18,%rsp
1a4f79: c3 retq
1a4f7a: e8 f1 7a fe ff callq 18ca70 <__stack_chk_fail@plt>
1a4f7f: 90 nop
00000000001a4f80 <tb_hash_func>:
1a4f80: 48 83 ec 28 sub $0x28,%rsp
1a4f84: 48 89 3c 24 mov %rdi,(%rsp)
1a4f88: 48 89 74 24 08 mov %rsi,0x8(%rsp)
1a4f8d: 48 89 e7 mov %rsp,%rdi
1a4f90: 89 54 24 10 mov %edx,0x10(%rsp)
1a4f94: be 05 00 00 00 mov $0x5,%esi
1a4f99: ba 01 00 00 00 mov $0x1,%edx
1a4f9e: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
1a4fa5: 00 00
1a4fa7: 48 89 44 24 18 mov %rax,0x18(%rsp)
1a4fac: 31 c0 xor %eax,%eax
1a4fae: e8 ad fe ff ff callq 1a4e60 <qemu_xxh32>
1a4fb3: 48 8b 54 24 18 mov 0x18(%rsp),%rdx
1a4fb8: 64 48 33 14 25 28 00 xor %fs:0x28,%rdx
1a4fbf: 00 00
1a4fc1: 75 05 jne 1a4fc8 <tb_hash_func+0x48>
1a4fc3: 48 83 c4 28 add $0x28,%rsp
1a4fc7: c3 retq
1a4fc8: e8 a3 7a fe ff callq 18ca70 <__stack_chk_fail@plt>
1a4fcd: 0f 1f 00 nopl (%rax)
* inline:
00000000001a6800 <tb_hash_func>:
1a6800: 48 83 ec 28 sub $0x28,%rsp
1a6804: 69 cf 77 ca eb 85 imul $0x85ebca77,%edi,%ecx
1a680a: 48 89 3c 24 mov %rdi,(%rsp)
1a680e: 48 c1 ef 20 shr $0x20,%rdi
1a6812: 69 ff 77 ca eb 85 imul $0x85ebca77,%edi,%edi
1a6818: 48 89 74 24 08 mov %rsi,0x8(%rsp)
1a681d: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
1a6824: 00 00
1a6826: 48 89 44 24 18 mov %rax,0x18(%rsp)
1a682b: 31 c0 xor %eax,%eax
1a682d: 81 c1 29 44 23 24 add $0x24234429,%ecx
1a6833: 69 c6 77 ca eb 85 imul $0x85ebca77,%esi,%eax
1a6839: 48 c1 ee 20 shr $0x20,%rsi
1a683d: 81 ef 88 35 14 7a sub $0x7a143588,%edi
1a6843: 69 f6 77 ca eb 85 imul $0x85ebca77,%esi,%esi
1a6849: c1 c9 13 ror $0x13,%ecx
1a684c: c1 cf 13 ror $0x13,%edi
1a684f: 83 c0 01 add $0x1,%eax
1a6852: 69 c9 b1 79 37 9e imul $0x9e3779b1,%ecx,%ecx
1a6858: c1 c8 13 ror $0x13,%eax
1a685b: 81 c6 50 86 c8 61 add $0x61c88650,%esi
1a6861: 69 ff b1 79 37 9e imul $0x9e3779b1,%edi,%edi
1a6867: c1 ce 13 ror $0x13,%esi
1a686a: c1 c9 1f ror $0x1f,%ecx
1a686d: 69 c0 b1 79 37 9e imul $0x9e3779b1,%eax,%eax
1a6873: c1 cf 19 ror $0x19,%edi
1a6876: 69 f6 b1 79 37 9e imul $0x9e3779b1,%esi,%esi
1a687c: 8d 7c 39 14 lea 0x14(%rcx,%rdi,1),%edi
1a6880: c1 c8 14 ror $0x14,%eax
1a6883: 69 d2 3d ae b2 c2 imul $0xc2b2ae3d,%edx,%edx
1a6889: 01 f8 add %edi,%eax
1a688b: c1 ce 0e ror $0xe,%esi
1a688e: 01 c6 add %eax,%esi
1a6890: 01 f2 add %esi,%edx
1a6892: c1 ca 0f ror $0xf,%edx
1a6895: 69 d2 2f eb d4 27 imul $0x27d4eb2f,%edx,%edx
1a689b: 89 d0 mov %edx,%eax
1a689d: c1 e8 0f shr $0xf,%eax
1a68a0: 31 d0 xor %edx,%eax
1a68a2: 69 d0 77 ca eb 85 imul $0x85ebca77,%eax,%edx
1a68a8: 89 d0 mov %edx,%eax
1a68aa: c1 e8 0d shr $0xd,%eax
1a68ad: 31 d0 xor %edx,%eax
1a68af: 69 d0 3d ae b2 c2 imul $0xc2b2ae3d,%eax,%edx
1a68b5: 89 d0 mov %edx,%eax
1a68b7: c1 e8 10 shr $0x10,%eax
1a68ba: 31 d0 xor %edx,%eax
1a68bc: 48 8b 54 24 18 mov 0x18(%rsp),%rdx
1a68c1: 64 48 33 14 25 28 00 xor %fs:0x28,%rdx
1a68c8: 00 00
1a68ca: 75 05 jne 1a68d1 <tb_hash_func+0xd1>
1a68cc: 48 83 c4 28 add $0x28,%rsp
1a68d0: c3 retq
1a68d1: e8 9a 61 fe ff callq 18ca70 <__stack_chk_fail@plt>
1a68d6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
1a68dd: 00 00 00
next prev parent reply other threads:[~2016-04-05 19:40 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-05 5:30 [Qemu-devel] [PATCH 00/10] tb hash improvements Emilio G. Cota
2016-04-05 5:30 ` [Qemu-devel] [PATCH 01/10] translate-all: add missing fold of tb_ctx into tcg_ctx Emilio G. Cota
2016-04-05 8:49 ` Paolo Bonzini
2016-04-05 5:30 ` [Qemu-devel] [PATCH 02/10] compiler.h: add QEMU_CACHELINE + QEMU_ALIGN() + QEMU_CACHELINE_ALIGNED Emilio G. Cota
2016-04-05 7:57 ` Peter Maydell
2016-04-05 17:24 ` Emilio G. Cota
2016-04-05 18:01 ` Peter Maydell
2016-04-05 19:13 ` Emilio G. Cota
2016-04-05 8:49 ` Paolo Bonzini
2016-04-05 12:57 ` Lluís Vilanova
2016-04-05 12:58 ` Peter Maydell
2016-04-05 15:29 ` Paolo Bonzini
2016-04-05 16:23 ` Lluís Vilanova
2016-04-05 16:31 ` Richard Henderson
2016-04-05 16:56 ` Peter Maydell
2016-04-05 19:02 ` Lluís Vilanova
2016-04-05 19:15 ` Richard Henderson
2016-04-05 20:09 ` Lluís Vilanova
2016-04-06 11:44 ` Paolo Bonzini
2016-04-06 12:02 ` Laurent Desnogues
2016-04-05 5:30 ` [Qemu-devel] [PATCH 03/10] seqlock: remove optional mutex Emilio G. Cota
2016-04-06 8:38 ` Alex Bennée
2016-04-05 5:30 ` [Qemu-devel] [PATCH 04/10] seqlock: rename write_lock/unlock to write_begin/end Emilio G. Cota
2016-04-06 8:42 ` Alex Bennée
2016-04-05 5:30 ` [Qemu-devel] [PATCH 05/10] include: add spinlock wrapper Emilio G. Cota
2016-04-05 8:51 ` Paolo Bonzini
2016-04-06 15:51 ` Alex Bennée
2016-04-05 5:30 ` [Qemu-devel] [PATCH 06/10] include: add xxhash.h Emilio G. Cota
2016-04-06 11:39 ` Alex Bennée
2016-04-06 22:59 ` Emilio G. Cota
2016-04-05 5:30 ` [Qemu-devel] [PATCH 07/10] tb hash: hash phys_pc, pc, and flags with xxhash Emilio G. Cota
2016-04-05 15:41 ` Richard Henderson
2016-04-05 15:48 ` Paolo Bonzini
2016-04-05 16:07 ` Richard Henderson
2016-04-05 19:40 ` Emilio G. Cota [this message]
2016-04-05 21:08 ` Richard Henderson
2016-04-06 0:52 ` Emilio G. Cota
2016-04-06 11:52 ` Paolo Bonzini
2016-04-06 17:44 ` Emilio G. Cota
2016-04-06 18:23 ` Paolo Bonzini
2016-04-06 18:27 ` Richard Henderson
2016-04-07 0:37 ` Emilio G. Cota
2016-04-07 8:46 ` Paolo Bonzini
2016-04-05 16:33 ` Laurent Desnogues
2016-04-05 17:19 ` Richard Henderson
2016-04-06 6:06 ` Laurent Desnogues
2016-04-06 17:32 ` Emilio G. Cota
2016-04-06 17:42 ` Richard Henderson
2016-04-07 8:12 ` Laurent Desnogues
2016-04-05 5:30 ` [Qemu-devel] [PATCH 08/10] qht: QEMU's fast, resizable and scalable Hash Table Emilio G. Cota
2016-04-05 9:01 ` Paolo Bonzini
2016-04-05 15:50 ` Richard Henderson
2016-04-08 10:27 ` Alex Bennée
2016-04-19 23:03 ` Emilio G. Cota
2016-04-05 5:30 ` [Qemu-devel] [PATCH 09/10] qht: add test program Emilio G. Cota
2016-04-08 10:45 ` Alex Bennée
2016-04-19 23:06 ` Emilio G. Cota
2016-04-20 7:50 ` Alex Bennée
2016-04-05 5:30 ` [Qemu-devel] [PATCH 10/10] tb hash: track translated blocks with qht Emilio G. Cota
2016-04-08 12:39 ` Alex Bennée
2016-04-05 8:47 ` [Qemu-devel] [PATCH 00/10] tb hash improvements Alex Bennée
2016-04-05 9:01 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160405194028.GA6671@flamenco \
--to=cota@braap.org \
--cc=alex.bennee@linaro.org \
--cc=crosthwaite.peter@gmail.com \
--cc=mttcg@greensocs.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
--cc=serge.fdrv@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).