From: "Emilio G. Cota" <cota@braap.org>
To: qemu-devel@nongnu.org
Cc: Paolo Bonzini <pbonzini@redhat.com>,
Peter Crosthwaite <crosthwaite.peter@gmail.com>,
Richard Henderson <rth@twiddle.net>,
Peter Maydell <peter.maydell@linaro.org>,
Eduardo Habkost <ehabkost@redhat.com>,
Andrzej Zaborowski <balrogg@gmail.com>,
Aurelien Jarno <aurelien@aurel32.net>,
Alexander Graf <agraf@suse.de>, Stefan Weil <sw@weilnetz.de>,
qemu-arm@nongnu.org, alex.bennee@linaro.org,
Pranith Kumar <bobby.prani+qemu@gmail.com>
Subject: [Qemu-devel] [PATCH v2 13/13] tb-hash: improve tb_jmp_cache hash function in user mode
Date: Tue, 25 Apr 2017 03:53:59 -0400 [thread overview]
Message-ID: <1493106839-10438-14-git-send-email-cota@braap.org> (raw)
In-Reply-To: <1493106839-10438-1-git-send-email-cota@braap.org>
Optimizations to cross-page chaining and indirect branches make
performance more sensitive to the hit rate of tb_jmp_cache.
The constraint of reserving some bits for the page number
lowers the achievable quality of the hashing function.
However, user-mode does not have this requirement. Thus,
with this change we use for user-mode a hashing function that
is both faster and of better quality than the previous one.
Measurements:
Note: baseline (i.e. speedup == 1x) is QEMU v2.9.0.
- SPECint06 (test set), x86_64-linux-user. Host: Intel i7-6700K @ 4.00GHz
2x +-+--------------------------------------------------------------------------------------------------------------+-+
| +++++ |
| jr+noinline | | |
| jr+inline ++%%@ |
1.8x +-+jr+hash+noinline +..............................................|%%@...................................+-+
|jr+multhash+inline |%%@+ |
| jr+hash+inline +$$$%@ |
| ++##|$%@ +++ |
1.6x +-+....................................................................|##|$%@....................+%%%...........+-+
| @@+ **#+$%@ $$+% |
| $$$%@+ +**#+$%@ ++++ ++$$+%@ |
| ++++ $ $%@ **# $%@ +$$%@@+++$$ %@ |
1.4x +-+.....................+%%%@..........##+$%@..........................**#.$%@...........+$$%.@***$$.%@..........+-+
| ++$$+%@ ##+$%@ **# $%@ $$% @* *#$+%@ |
| ***#$ %@ +**# $%@ **# $%@ +###$% @* *#$ %@ |
| *+*#$ %@ +%%@+**# $%@ **# $%@ **+#$% @*+*#$ %@ +%%%@+ |
1.2x +-+..................*.*#$.%@***#$$%@+**#.$%@..........................**#.$%@.........**.#$%.@*.*#$.%@***#$+%@+.+-+
| +++ * *#$ %@* *# $%@ **# $%@ +++++++ **# $%@ +++%%@@** #$% @* *#$ %@*+*#$ %@ |
| ++###$%+ * *#$ %@* *# $%@ **# $%@ **##$%@@ **# $%@+**#$$%+@** #$% @* *#$ %@* *#$ %@ |
| +**+#$%@@ ++$$@@@* *#$ %@* *# $%@ **# $%@ ** #$% @+###++@@++++%%%+ **# $%@ **# $% @** #$% @* *#$ %@* *#$ %@ |
1x +-++-**+#$%-@**##$%+@*+*#$+%@*+*#+$%@+**#+$%@+**+#$%+@**+#$+@@***#$+%@+**#+$%@+**#+$%+@**+#$%+@*+*#$+%@*-*#$+%@-++-+
| ** #$% @** #$% @* *#$ %@* *# $%@ **# $%@ ** #$% @** #$%%@* *#$ %@ **# $%@ **# $% @** #$% @* *#$ %@* *#$ %@ |
| ** #$% @** #$% @* *#$ %@* *# $%@ **# $%@ ** #$% @** #$+%@* *#$ %@ **# $%@ **# $% @** #$% @* *#$ %@* *#$ %@ |
| ** #$% @** #$% @* *#$ %@* *# $%@ **# $%@ ** #$% @** #$ %@* *#$ %@ **# $%@ **# $% @** #$% @* *#$ %@* *#$ %@ |
0.8x +-+--**##$%@@**##$%@@***#$%%@***#$$%@-**#$$%@-**##$%@@**##$%%@***#$%%@-**#$$%@-**#$$%@@**##$%@@***#$%%@***#$%%@--+-+
astar bzip2 gcc gobmk h264ref hmmlibquantum mcf omnetpperlbench sjengxalancbmk hmean
png: http://imgur.com/1ZJGjzV
Here I also tried the hash function suggested by Paolo ("multhash"):
return ((uint64_t) (pc * 2654435761) >> 32) & ();
As you can see it is just as good as the other new function ("hash"),
but I kept "hash" because with it all benchmarks have speedup > 1.
- SPECint06 (train set), x86_64-linux-user. Host: Intel i7-6700K @ 4.00GHz
2.6x +-+--------------------------------------------------------------------------------------------------------------+-+
| |
| jr+inline |
2.4x +jr+inline+hash....................................................................................###...........+-+
| # # |
| # # |
2.2x +-+................................................................................................#.#...........+-+
| # # |
| # # |
2x +-+................................................................................................#.#...........+-+
| # # |
| **** # |
1.8x +-+.............................................................................................*..*.#...........+-+
| +++ #### * * # |
| #### ****++# * * # |
1.6x +-+......................................+++...........................****..#.*++*..#..........*..*.#...........+-+
| #### *++* # * * # +++ * * # |
| +++ ++#++# * * # * * # #### * * # |
1.4x +-+...................+++###..........****..#..........................*..*..#.*..*..#....#..#..*..*.#...........+-+
| ****+# * * # * * # * * # *** # * * # #### |
| *++* # +++ * * # * * # * * # *+* # * * # ****++# |
1.2x +-+...................*..*.#..****###.*..*..#..........................*..*..#.*..*..#..*.*..#..*..*.#..*..*..#..+-+
| ****### +++ * * # * * # * * # * * # * * # * * # * * # * * # |
| * *++# ***### * * # * * # * * # * * # * * # * * # * * # * * # |
1x +-+--****###--***###--****##--****###-****###--***###--***###--****##--****###-****###--***###--****##--****###--+-+
astar bzip2 gcc gobmk h264ref hmmlibquantum mcf omnetpperlbench sjengxalancbmk hmean
png: http://imgur.com/1D2VFze
- NBench, x86_64-linux-user. Host: Intel i7-6700K @ 4.00GHz
1.1x +-+-------------------------------------------------------------------------------------------------------------+-+
| |
| jr+inline |
1.08x +jr+hash+noinline +..............................+++.....................................................+-+
| jr+hash+inline | |
| +++| |
| | |+++ |
1.06x +-+....................................................|.|.|....................................................+-+
| |###| +++++ |
| |#|#| ###$$$ |
1.04x +-+.........................+++....+++.+++.............|#|#$$$..............................++#|#++$............+-+
| |+++ |+++| ****|#| $ +++ |#+# $ |
| | | | | | * |*+#| $ |+++ **** # $ |
| +++ +++ | | ****| | * |* #++$ | |+++ * |* # $ |
1.02x +-+....|..................|####$$.*.|*|$$$$.++++++++.*.|*.#..$..........****|.|............*++*.#..$.++++++++...+-+
| ***+++ |# |#|$ * |*##| $ | | | * |* # $ * |*| | +++ * * # $ ***###$$ |
| *|* |+++ +++ +++ *** |#|$ * |*|#| $ ***###$$ *++* # $ +++ * |*##$$$ ####++ * * # $ *+*++# $ |
1x +-++-+*+*###+++****-$$$$+*+*++#+$+*++*+#++$+*+*++#+$+*++*-#++$+++-++$$$+*++*+#++$+***++#$$+*++*-#++$+*+*++#+$+-++-+
| * *++#$$ *++*|$++$ *|*++# $ * *+#++$ *+*++#|$ * * # $ *** |$+$ * *|#| $ *+* #+$ * * # $ * * # $ |
| * * #+$ * *## $ *+* # $ * * # $ * * #+$ * * # $ *+*### $ * *|#++$ * * # $ * * # $ * * # $ |
| * * # $ * *|# $ * * # $ * * # $ * * # $ * * # $ * *++# $ * *+# $ * * # $ * * # $ * * # $ |
0.98x +-+...*.*..#.$.*..*+#..$.*.*..#.$.*..*.#..$.*.*..#.$.*..*.#..$.*.*..#.$.*..*.#..$.*.*..#.$.*..*.#..$.*.*..#.$...+-+
| * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ |
| * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ |
0.96x +-+---***###$$-****##$$$-***###$$-****##$$$-***###$$-****##$$$-***###$$-****##$$$-***###$$-****##$$$-***###$$---+-+
ASSIGNMENT BITFIELD FOURFP EMULATION HUFFMAN LU DECOMPOSITIONEURAL NNUMERIC SOSTRING SORT hmean
png: http://imgur.com/xK9YfOB
- NBench, arm-linux-user. Host: Intel i7-4790K @ 4.00GHz
1.3x +-+-------------------------------------------------------------------------------------------------------------+-+
| #### +++ |
| jr+inline #++# #### |
1.25x +jr+hash+inline..............#..#...........................................#++#................................+-+
| # # # # |
| # # # # |
| # # # # |
1.2x +-+..........................#..#..................................####.....#..#................................+-+
| # # +++#++# # # |
| # # ***** # # # |
1.15x +-+..........................#..#..............................*+++*..#.....#..#................................+-+
| # # * * # **** # |
| # # * * # *++* # |
| # # * * # * * # |
1.1x +-+..........................#..#...............+++............*...*..#..*..*..#................................+-+
| # # +++#### * * # * * # #### |
| # # ***** # * * # * * # # # |
1.05x +-+..........................#..#...........*...*..#...........*...*..#..*..*..#...............####......#..#...+-+
| # # +++ * * # * * # * * # #++# ***** # |
| +++# # ****### * * # ****### * * # * * # +++# # * * # |
| ++++++ ****### ***** # *++*++# * * # *++*++# * * # * * # ++++++ **** # * * # |
1x +-++-+*****###++*++*++#++*+-+*++#+-*++*++#-+*+++*-+#++*++*++#++*+-+*++#+-*++*++#-+*****###++*++*++#++*+-+*++#+-++-+
| *+++*++# * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # |
| * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # |
0.95x +-+---*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###---+-+
ASSIGNMENT BITFIELD FOURFP EMULATION HUFFMAN LU DECOMPOSITIONEURAL NNUMERIC SOSTRING SORT hmean
png: http://imgur.com/uhIEOA1
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
include/exec/tb-hash.h | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h
index 2c27490..b1fe2d0 100644
--- a/include/exec/tb-hash.h
+++ b/include/exec/tb-hash.h
@@ -22,6 +22,8 @@
#include "exec/tb-hash-xx.h"
+#ifdef CONFIG_SOFTMMU
+
/* Only the bottom TB_JMP_PAGE_BITS of the jump cache hash bits vary for
addresses on the same page. The top bits are the same. This allows
TLB invalidation to quickly clear a subset of the hash table. */
@@ -45,6 +47,16 @@ static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc)
| (tmp & TB_JMP_ADDR_MASK));
}
+#else
+
+/* In user-mode we can get better hashing because we do not have a TLB */
+static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc)
+{
+ return (pc ^ (pc >> TB_JMP_CACHE_BITS)) & (TB_JMP_CACHE_SIZE - 1);
+}
+
+#endif /* CONFIG_SOFTMMU */
+
static inline
uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t flags)
{
--
2.7.4
next prev parent reply other threads:[~2017-04-25 7:54 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-25 7:53 [Qemu-devel] [PATCH v2 00/13] TCG optimizations for 2.10 Emilio G. Cota
2017-04-25 7:53 ` [Qemu-devel] [PATCH v2 01/13] exec-all: add tb_from_jmp_cache Emilio G. Cota
2017-04-25 7:53 ` [Qemu-devel] [PATCH v2 02/13] exec-all: inline tb_from_jmp_cache Emilio G. Cota
2017-04-25 11:00 ` Richard Henderson
2017-04-25 11:15 ` Richard Henderson
2017-04-25 7:53 ` [Qemu-devel] [PATCH v2 03/13] tcg: enforce 64-byte alignment of TCGContext Emilio G. Cota
2017-04-25 11:01 ` Richard Henderson
2017-04-25 7:53 ` [Qemu-devel] [PATCH v2 04/13] tcg: keep TCGContext's read-mostly fields in a separate cache line Emilio G. Cota
2017-04-25 7:53 ` [Qemu-devel] [PATCH v2 05/13] tcg-runtime: add lookup_tb_ptr helper Emilio G. Cota
2017-04-25 11:02 ` Richard Henderson
2017-04-25 7:53 ` [Qemu-devel] [PATCH v2 06/13] tcg: add goto_ptr opcode Emilio G. Cota
2017-04-25 11:05 ` Richard Henderson
2017-04-25 12:09 ` Richard Henderson
2017-04-25 7:53 ` [Qemu-devel] [PATCH v2 07/13] tcg/i386: implement goto_ptr op Emilio G. Cota
2017-04-25 11:24 ` Richard Henderson
2017-04-25 11:32 ` Richard Henderson
2017-04-25 7:53 ` [Qemu-devel] [PATCH v2 08/13] target/arm: optimize cross-page block chaining in softmmu Emilio G. Cota
2017-04-25 11:11 ` Richard Henderson
2017-04-25 7:53 ` [Qemu-devel] [PATCH v2 09/13] target/arm: optimize indirect branches with TCG's goto_ptr Emilio G. Cota
2017-04-25 11:12 ` Richard Henderson
2017-04-25 7:53 ` [Qemu-devel] [PATCH v2 10/13] target/i386: introduce gen_jr() helper to jump to register Emilio G. Cota
2017-04-25 7:53 ` [Qemu-devel] [PATCH v2 11/13] target/i386: optimize cross-page direct jumps in softmmu Emilio G. Cota
2017-04-25 7:53 ` [Qemu-devel] [PATCH v2 12/13] target/i386: optimize indirect branches Emilio G. Cota
2017-04-25 7:53 ` Emilio G. Cota [this message]
2017-04-25 11:19 ` [Qemu-devel] [PATCH v2 13/13] tb-hash: improve tb_jmp_cache hash function in user mode Richard Henderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1493106839-10438-14-git-send-email-cota@braap.org \
--to=cota@braap.org \
--cc=agraf@suse.de \
--cc=alex.bennee@linaro.org \
--cc=aurelien@aurel32.net \
--cc=balrogg@gmail.com \
--cc=bobby.prani+qemu@gmail.com \
--cc=crosthwaite.peter@gmail.com \
--cc=ehabkost@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
--cc=sw@weilnetz.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).