From: "Emilio G. Cota" <cota@braap.org>
To: Richard Henderson <rth@twiddle.net>
Cc: "MTTCG Devel" <mttcg@greensocs.com>,
"Peter Maydell" <peter.maydell@linaro.org>,
"Peter Crosthwaite" <crosthwaite.peter@gmail.com>,
"QEMU Developers" <qemu-devel@nongnu.org>,
"Sergey Fedorov" <serge.fdrv@gmail.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Alex Bennée" <alex.bennee@linaro.org>
Subject: Re: [Qemu-devel] [PATCH 07/10] tb hash: hash phys_pc, pc, and flags with xxhash
Date: Tue, 5 Apr 2016 15:40:28 -0400 [thread overview]
Message-ID: <20160405194028.GA6671@flamenco> (raw)
In-Reply-To: <5703E2DD.3020103@twiddle.net>
On Tue, Apr 05, 2016 at 09:07:57 -0700, Richard Henderson wrote:
> On 04/05/2016 08:48 AM, Paolo Bonzini wrote:
> >I think it's fine to use the struct. The exact size of the struct
> >varies from 3 to 5 32-bit words, so it's hard to write nice
> >size-dependent code for the hash.
>
> I don't think it is. We have 3 integers. It is trivial to create a simple
> function of 2 multiplies, two adds, and a remainder.
>
> Take the primes from the xxhash.h, for example:
>
> (phys_pc * PRIME32_2 + pc * PRIME32_3 + flags)
> % PRIME32_1
> & (CODE_GEN_PHYS_HASH_SIZE - 1)
>
> Obviously, some bucket measurements should be taken, but I can well imagine
> that this might perform just as well as the fully generic hasher.
That function doesn't perform well: 25.06s vs. 21.18s with xxh32.
Having the packed struct and passing it to an *inlined* xxhash is
virtually unbeatable; gcc (>=v4.6, dunno about older ones) optimizes the
inline function since it knows the size of the struct.
To show this I'm appending the generated code for tb_hash_func when xxh32
is inlined vs. when it is not, for x86_64-softmmu. Results are similar
for arm-softmmu.
Anyway (for the arm bootup test) we're talking about ~0.50% of runtime spent
in tb_hash_func (with xxh32 inlined), so whatever we did here could not
improve overall performance much.
Thanks,
Emilio
* no inline:
00000000001a4e60 <qemu_xxh32>:
1a4e60: 48 83 ec 18 sub $0x18,%rsp
1a4e64: 4c 8d 0c b7 lea (%rdi,%rsi,4),%r9
1a4e68: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
1a4e6f: 00 00
1a4e71: 48 89 44 24 08 mov %rax,0x8(%rsp)
1a4e76: 31 c0 xor %eax,%eax
1a4e78: 48 83 fe 03 cmp $0x3,%rsi
1a4e7c: 8d 82 b1 67 56 16 lea 0x165667b1(%rdx),%eax
1a4e82: 0f 86 92 00 00 00 jbe 1a4f1a <qemu_xxh32+0xba>
1a4e88: 4d 8d 59 f0 lea -0x10(%r9),%r11
1a4e8c: 44 8d 82 28 44 23 24 lea 0x24234428(%rdx),%r8d
1a4e93: 8d 8a 77 ca eb 85 lea -0x7a143589(%rdx),%ecx
1a4e99: 8d 82 4f 86 c8 61 lea 0x61c8864f(%rdx),%eax
1a4e9f: 90 nop
1a4ea0: 44 8b 17 mov (%rdi),%r10d
1a4ea3: 45 69 d2 77 ca eb 85 imul $0x85ebca77,%r10d,%r10d
1a4eaa: 45 01 d0 add %r10d,%r8d
1a4ead: 44 8b 57 04 mov 0x4(%rdi),%r10d
1a4eb1: 41 c1 c0 0d rol $0xd,%r8d
1a4eb5: 45 69 c0 b1 79 37 9e imul $0x9e3779b1,%r8d,%r8d
1a4ebc: 45 69 d2 77 ca eb 85 imul $0x85ebca77,%r10d,%r10d
1a4ec3: 44 01 d1 add %r10d,%ecx
1a4ec6: 44 8b 57 08 mov 0x8(%rdi),%r10d
1a4eca: c1 c1 0d rol $0xd,%ecx
1a4ecd: 69 c9 b1 79 37 9e imul $0x9e3779b1,%ecx,%ecx
1a4ed3: 45 69 d2 77 ca eb 85 imul $0x85ebca77,%r10d,%r10d
1a4eda: 44 01 d2 add %r10d,%edx
1a4edd: 44 8b 57 0c mov 0xc(%rdi),%r10d
1a4ee1: 48 83 c7 10 add $0x10,%rdi
1a4ee5: c1 c2 0d rol $0xd,%edx
1a4ee8: 69 d2 b1 79 37 9e imul $0x9e3779b1,%edx,%edx
1a4eee: 45 69 d2 77 ca eb 85 imul $0x85ebca77,%r10d,%r10d
1a4ef5: 44 01 d0 add %r10d,%eax
1a4ef8: c1 c0 0d rol $0xd,%eax
1a4efb: 69 c0 b1 79 37 9e imul $0x9e3779b1,%eax,%eax
1a4f01: 49 39 fb cmp %rdi,%r11
1a4f04: 73 9a jae 1a4ea0 <qemu_xxh32+0x40>
1a4f06: c1 c9 19 ror $0x19,%ecx
1a4f09: 41 c1 c8 1f ror $0x1f,%r8d
1a4f0d: c1 ca 14 ror $0x14,%edx
1a4f10: 44 01 c1 add %r8d,%ecx
1a4f13: c1 c8 0e ror $0xe,%eax
1a4f16: 01 ca add %ecx,%edx
1a4f18: 01 d0 add %edx,%eax
1a4f1a: 4c 39 cf cmp %r9,%rdi
1a4f1d: 8d 34 b0 lea (%rax,%rsi,4),%esi
1a4f20: 73 22 jae 1a4f44 <qemu_xxh32+0xe4>
1a4f22: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
1a4f28: 8b 17 mov (%rdi),%edx
1a4f2a: 48 83 c7 04 add $0x4,%rdi
1a4f2e: 69 c2 3d ae b2 c2 imul $0xc2b2ae3d,%edx,%eax
1a4f34: 01 c6 add %eax,%esi
1a4f36: c1 c6 11 rol $0x11,%esi
1a4f39: 69 f6 2f eb d4 27 imul $0x27d4eb2f,%esi,%esi
1a4f3f: 49 39 f9 cmp %rdi,%r9
1a4f42: 77 e4 ja 1a4f28 <qemu_xxh32+0xc8>
1a4f44: 89 f0 mov %esi,%eax
1a4f46: c1 e8 0f shr $0xf,%eax
1a4f49: 31 f0 xor %esi,%eax
1a4f4b: 69 d0 77 ca eb 85 imul $0x85ebca77,%eax,%edx
1a4f51: 89 d0 mov %edx,%eax
1a4f53: c1 e8 0d shr $0xd,%eax
1a4f56: 31 d0 xor %edx,%eax
1a4f58: 69 d0 3d ae b2 c2 imul $0xc2b2ae3d,%eax,%edx
1a4f5e: 89 d0 mov %edx,%eax
1a4f60: c1 e8 10 shr $0x10,%eax
1a4f63: 31 d0 xor %edx,%eax
1a4f65: 48 8b 54 24 08 mov 0x8(%rsp),%rdx
1a4f6a: 64 48 33 14 25 28 00 xor %fs:0x28,%rdx
1a4f71: 00 00
1a4f73: 75 05 jne 1a4f7a <qemu_xxh32+0x11a>
1a4f75: 48 83 c4 18 add $0x18,%rsp
1a4f79: c3 retq
1a4f7a: e8 f1 7a fe ff callq 18ca70 <__stack_chk_fail@plt>
1a4f7f: 90 nop
00000000001a4f80 <tb_hash_func>:
1a4f80: 48 83 ec 28 sub $0x28,%rsp
1a4f84: 48 89 3c 24 mov %rdi,(%rsp)
1a4f88: 48 89 74 24 08 mov %rsi,0x8(%rsp)
1a4f8d: 48 89 e7 mov %rsp,%rdi
1a4f90: 89 54 24 10 mov %edx,0x10(%rsp)
1a4f94: be 05 00 00 00 mov $0x5,%esi
1a4f99: ba 01 00 00 00 mov $0x1,%edx
1a4f9e: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
1a4fa5: 00 00
1a4fa7: 48 89 44 24 18 mov %rax,0x18(%rsp)
1a4fac: 31 c0 xor %eax,%eax
1a4fae: e8 ad fe ff ff callq 1a4e60 <qemu_xxh32>
1a4fb3: 48 8b 54 24 18 mov 0x18(%rsp),%rdx
1a4fb8: 64 48 33 14 25 28 00 xor %fs:0x28,%rdx
1a4fbf: 00 00
1a4fc1: 75 05 jne 1a4fc8 <tb_hash_func+0x48>
1a4fc3: 48 83 c4 28 add $0x28,%rsp
1a4fc7: c3 retq
1a4fc8: e8 a3 7a fe ff callq 18ca70 <__stack_chk_fail@plt>
1a4fcd: 0f 1f 00 nopl (%rax)
* inline:
00000000001a6800 <tb_hash_func>:
1a6800: 48 83 ec 28 sub $0x28,%rsp
1a6804: 69 cf 77 ca eb 85 imul $0x85ebca77,%edi,%ecx
1a680a: 48 89 3c 24 mov %rdi,(%rsp)
1a680e: 48 c1 ef 20 shr $0x20,%rdi
1a6812: 69 ff 77 ca eb 85 imul $0x85ebca77,%edi,%edi
1a6818: 48 89 74 24 08 mov %rsi,0x8(%rsp)
1a681d: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
1a6824: 00 00
1a6826: 48 89 44 24 18 mov %rax,0x18(%rsp)
1a682b: 31 c0 xor %eax,%eax
1a682d: 81 c1 29 44 23 24 add $0x24234429,%ecx
1a6833: 69 c6 77 ca eb 85 imul $0x85ebca77,%esi,%eax
1a6839: 48 c1 ee 20 shr $0x20,%rsi
1a683d: 81 ef 88 35 14 7a sub $0x7a143588,%edi
1a6843: 69 f6 77 ca eb 85 imul $0x85ebca77,%esi,%esi
1a6849: c1 c9 13 ror $0x13,%ecx
1a684c: c1 cf 13 ror $0x13,%edi
1a684f: 83 c0 01 add $0x1,%eax
1a6852: 69 c9 b1 79 37 9e imul $0x9e3779b1,%ecx,%ecx
1a6858: c1 c8 13 ror $0x13,%eax
1a685b: 81 c6 50 86 c8 61 add $0x61c88650,%esi
1a6861: 69 ff b1 79 37 9e imul $0x9e3779b1,%edi,%edi
1a6867: c1 ce 13 ror $0x13,%esi
1a686a: c1 c9 1f ror $0x1f,%ecx
1a686d: 69 c0 b1 79 37 9e imul $0x9e3779b1,%eax,%eax
1a6873: c1 cf 19 ror $0x19,%edi
1a6876: 69 f6 b1 79 37 9e imul $0x9e3779b1,%esi,%esi
1a687c: 8d 7c 39 14 lea 0x14(%rcx,%rdi,1),%edi
1a6880: c1 c8 14 ror $0x14,%eax
1a6883: 69 d2 3d ae b2 c2 imul $0xc2b2ae3d,%edx,%edx
1a6889: 01 f8 add %edi,%eax
1a688b: c1 ce 0e ror $0xe,%esi
1a688e: 01 c6 add %eax,%esi
1a6890: 01 f2 add %esi,%edx
1a6892: c1 ca 0f ror $0xf,%edx
1a6895: 69 d2 2f eb d4 27 imul $0x27d4eb2f,%edx,%edx
1a689b: 89 d0 mov %edx,%eax
1a689d: c1 e8 0f shr $0xf,%eax
1a68a0: 31 d0 xor %edx,%eax
1a68a2: 69 d0 77 ca eb 85 imul $0x85ebca77,%eax,%edx
1a68a8: 89 d0 mov %edx,%eax
1a68aa: c1 e8 0d shr $0xd,%eax
1a68ad: 31 d0 xor %edx,%eax
1a68af: 69 d0 3d ae b2 c2 imul $0xc2b2ae3d,%eax,%edx
1a68b5: 89 d0 mov %edx,%eax
1a68b7: c1 e8 10 shr $0x10,%eax
1a68ba: 31 d0 xor %edx,%eax
1a68bc: 48 8b 54 24 18 mov 0x18(%rsp),%rdx
1a68c1: 64 48 33 14 25 28 00 xor %fs:0x28,%rdx
1a68c8: 00 00
1a68ca: 75 05 jne 1a68d1 <tb_hash_func+0xd1>
1a68cc: 48 83 c4 28 add $0x28,%rsp
1a68d0: c3 retq
1a68d1: e8 9a 61 fe ff callq 18ca70 <__stack_chk_fail@plt>
1a68d6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
1a68dd: 00 00 00
next prev parent reply other threads:[~2016-04-05 19:40 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-05 5:30 [Qemu-devel] [PATCH 00/10] tb hash improvements Emilio G. Cota
2016-04-05 5:30 ` [Qemu-devel] [PATCH 01/10] translate-all: add missing fold of tb_ctx into tcg_ctx Emilio G. Cota
2016-04-05 8:49 ` Paolo Bonzini
2016-04-05 5:30 ` [Qemu-devel] [PATCH 02/10] compiler.h: add QEMU_CACHELINE + QEMU_ALIGN() + QEMU_CACHELINE_ALIGNED Emilio G. Cota
2016-04-05 7:57 ` Peter Maydell
2016-04-05 17:24 ` Emilio G. Cota
2016-04-05 18:01 ` Peter Maydell
2016-04-05 19:13 ` Emilio G. Cota
2016-04-05 8:49 ` Paolo Bonzini
2016-04-05 12:57 ` Lluís Vilanova
2016-04-05 12:58 ` Peter Maydell
2016-04-05 15:29 ` Paolo Bonzini
2016-04-05 16:23 ` Lluís Vilanova
2016-04-05 16:31 ` Richard Henderson
2016-04-05 16:56 ` Peter Maydell
2016-04-05 19:02 ` Lluís Vilanova
2016-04-05 19:15 ` Richard Henderson
2016-04-05 20:09 ` Lluís Vilanova
2016-04-06 11:44 ` Paolo Bonzini
2016-04-06 12:02 ` Laurent Desnogues
2016-04-05 5:30 ` [Qemu-devel] [PATCH 03/10] seqlock: remove optional mutex Emilio G. Cota
2016-04-06 8:38 ` Alex Bennée
2016-04-05 5:30 ` [Qemu-devel] [PATCH 04/10] seqlock: rename write_lock/unlock to write_begin/end Emilio G. Cota
2016-04-06 8:42 ` Alex Bennée
2016-04-05 5:30 ` [Qemu-devel] [PATCH 05/10] include: add spinlock wrapper Emilio G. Cota
2016-04-05 8:51 ` Paolo Bonzini
2016-04-06 15:51 ` Alex Bennée
2016-04-05 5:30 ` [Qemu-devel] [PATCH 06/10] include: add xxhash.h Emilio G. Cota
2016-04-06 11:39 ` Alex Bennée
2016-04-06 22:59 ` Emilio G. Cota
2016-04-05 5:30 ` [Qemu-devel] [PATCH 07/10] tb hash: hash phys_pc, pc, and flags with xxhash Emilio G. Cota
2016-04-05 15:41 ` Richard Henderson
2016-04-05 15:48 ` Paolo Bonzini
2016-04-05 16:07 ` Richard Henderson
2016-04-05 19:40 ` Emilio G. Cota [this message]
2016-04-05 21:08 ` Richard Henderson
2016-04-06 0:52 ` Emilio G. Cota
2016-04-06 11:52 ` Paolo Bonzini
2016-04-06 17:44 ` Emilio G. Cota
2016-04-06 18:23 ` Paolo Bonzini
2016-04-06 18:27 ` Richard Henderson
2016-04-07 0:37 ` Emilio G. Cota
2016-04-07 8:46 ` Paolo Bonzini
2016-04-05 16:33 ` Laurent Desnogues
2016-04-05 17:19 ` Richard Henderson
2016-04-06 6:06 ` Laurent Desnogues
2016-04-06 17:32 ` Emilio G. Cota
2016-04-06 17:42 ` Richard Henderson
2016-04-07 8:12 ` Laurent Desnogues
2016-04-05 5:30 ` [Qemu-devel] [PATCH 08/10] qht: QEMU's fast, resizable and scalable Hash Table Emilio G. Cota
2016-04-05 9:01 ` Paolo Bonzini
2016-04-05 15:50 ` Richard Henderson
2016-04-08 10:27 ` Alex Bennée
2016-04-19 23:03 ` Emilio G. Cota
2016-04-05 5:30 ` [Qemu-devel] [PATCH 09/10] qht: add test program Emilio G. Cota
2016-04-08 10:45 ` Alex Bennée
2016-04-19 23:06 ` Emilio G. Cota
2016-04-20 7:50 ` Alex Bennée
2016-04-05 5:30 ` [Qemu-devel] [PATCH 10/10] tb hash: track translated blocks with qht Emilio G. Cota
2016-04-08 12:39 ` Alex Bennée
2016-04-05 8:47 ` [Qemu-devel] [PATCH 00/10] tb hash improvements Alex Bennée
2016-04-05 9:01 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160405194028.GA6671@flamenco \
--to=cota@braap.org \
--cc=alex.bennee@linaro.org \
--cc=crosthwaite.peter@gmail.com \
--cc=mttcg@greensocs.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
--cc=serge.fdrv@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.