From: "Alex Bennée" <alex.bennee@linaro.org>
To: Richard Henderson <rth@twiddle.net>
Cc: qemu-devel@nongnu.org, cota@braap.org
Subject: Re: [Qemu-devel] [PATCH v5 13/19] tb-hash: improve tb_jmp_cache hash function in user mode
Date: Fri, 28 Apr 2017 18:00:57 +0100 [thread overview]
Message-ID: <87o9vg79h2.fsf@linaro.org> (raw)
In-Reply-To: <20170427120006.20564-14-rth@twiddle.net>
Richard Henderson <rth@twiddle.net> writes:
> From: "Emilio G. Cota" <cota@braap.org>
>
> Optimizations to cross-page chaining and indirect branches make
> performance more sensitive to the hit rate of tb_jmp_cache.
> The constraint of reserving some bits for the page number
> lowers the achievable quality of the hashing function.
>
> However, user-mode does not have this requirement. Thus,
> with this change we use for user-mode a hashing function that
> is both faster and of better quality than the previous one.
>
> Measurements:
>
> Note: baseline (i.e. speedup == 1x) is QEMU v2.9.0.
>
> - SPECint06 (test set), x86_64-linux-user. Host: Intel i7-6700K @ 4.00GHz
>
> 2.2x +-+--------------------------------------------------------------------------------------------------------------+-+
> | |
> | jr |
> 2x +jr+multhash +....................................................+++++...................................+-+
> | jr+hash |$$$ |
> | |$+$ |
> | ### $ |
> 1.8x +-+......................................................................#|#.$...................................+-+
> | ++#+# $ |
> | |# # $ |
> 1.6x +-+....................................................................***.#.$....................++$$$..........+-+
> | $$$ *+* # $ |$+$ |
> | ++$$$ ### $ * * # $ +++|$ $ |
> | ++###+$ # # $ * * # $ ### ****## $ |
> 1.4x +-+...................***+#.$.........***.#.$..........................*.*.#.$...........#+#$$.*++*|#.$..........+-+
> | *+* # $ * * # $ * * # $ # # $ * *+# $ |
> | * * # $ +++++ * * # $ * * # $ *** # $ * * # $ ###$$ |
> 1.2x +-+...................*.*.#.$.***##$$.*.*.#.$..........................*.*.#.$.........*.*.#.$.*..*.#.$.***+#+$..+-+
> | * * # $ *+* # $ * * # $ +++ * * # $ ++###$$ * * # $ * * # $ * * # $ |
> | ***##$$ * * # $ * * # $ * * # $ ***##$$ ++### * * # $ *** #+$ * * # $ * * # $ * * # $ |
> | *+*+#+$ ***##$$$ * * # $ * * # $ * * # $ *+* # $ ++####$$ ***+# * * # $ * * # $ * * # $ * * # $ * * # $ |
> 1x +-++-*+*+#+$+*+*+#-+$+*+*-#+$+*+*+#+$+*+*+#+$+*-*+#+$+***++#+$+*+*+#$$+*+*+#+$+*+*+#+$+*+*-#+$+*+-*+#+$+*+*+#+$-++-+
> | * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ |
> | * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ |
> 0.8x +-+--***##$$-***##$$$-***##$$-***##$$-***##$$-***##$$-***###$$-***##$$-***##$$-***##$$-***##$$-****##$$-***##$$--+-+
> astar bzip2 gcc gobmk h264ref hmmlibquantum mcf omnetpperlbench sjengxalancbmk hmean
> png: http://imgur.com/4UXTrEc
>
> Here I also tried the hash function suggested by Paolo ("multhash"):
>
> return ((uint64_t) (pc * 2654435761) >> 32) & (TB_JMP_CACHE_SIZE - 1);
>
> As you can see it is just as good as the other new function ("hash"),
> which is what I ended up going with.
>
> - SPECint06 (train set), x86_64-linux-user. Host: Intel i7-6700K @ 4.00GHz
>
> 2.6x +-+--------------------------------------------------------------------------------------------------------------+-+
> | |
> | jr ### |
> 2.4x +jr+hash...........................................................................................#.#...........+-+
> | # # |
> | # # |
> 2.2x +-+................................................................................................#.#...........+-+
> | # # |
> | # # |
> 2x +-+................................................................................................#.#...........+-+
> | **** # |
> | * * # |
> 1.8x +-+.............................................................................................*..*.#...........+-+
> | +++ * * # |
> | #### #### * * # |
> 1.6x +-+......................................####.............................#..#.****..#..........*..*.#...........+-+
> | +++ #++# **** # * * # #### * * # |
> | ### # # * * # * * # # # * * # |
> 1.4x +-+...................****+#..........****..#..........................*..*..#.*..*..#....#..#..*..*.#...........+-+
> | *++* # * * # * * # * * # *** # * * # #### |
> | * * # #### * * # * * # * * # * * # * * # **** # |
> 1.2x +-+...................*..*.#..****++#.*..*..#..........................*..*..#.*..*..#..*.*..#..*..*.#..*..*..#..+-+
> | ****### * * # * * # * * # * * # * * # * * # * * # * * # |
> | * * # ***### * * # * * # * * # ****## * * # * * # * * # * * # * * # |
> 1x +-+--****###--***###--****##--****###-****###--***###--***###--****##--****###-****###--***###--****##--****###--+-+
> astar bzip2 gcc gobmk h264ref hmmlibquantum mcf omnetpperlbench sjengxalancbmk hmean
> png: http://imgur.com/ArCbHqo
>
> - NBench, x86_64-linux-user. Host: Intel i7-6700K @ 4.00GHz
>
> 1.12x +-+-------------------------------------------------------------------------------------------------------------+-+
> | |
> | jr +++ |
> 1.1x +jr+hash...........................................................####.........................................+-+
> | +++#| # |
> | | #++# |
> 1.08x +-+................................+++................+++.+++..*****..#.........................................+-+
> | | +++ | | * | * # |
> | | | | | *+++* # |
> 1.06x +-+................................****###.............|...|...*...*..#.........................+++.............+-+
> | *| * |# ****### * * # | |
> | *| *++# *| * |# * * # #### |
> 1.04x +-+................................*++*..#............*|.*.|#..*...*..#........................#.|#.............+-+
> | * * # *++*++# * * # +++#++# |
> | * * # * * # * * # | # # +++#### |
> 1.02x +-+................................*..*..#......+++...*..*..#..*...*..#.....................****..#..*****++#...+-+
> | +++ * * # +++ | * * # * * # +++ *| * # *+++* # |
> | +++ | +++ +++ ++++++ * * # *****### * * # * * # | +++ ++++++ *++* # * * # |
> 1x +-++-+++++####++****###++++-+####+-*++*++#-+*+++*-+#++*++*++#++*+-+*++#+-+++####-+*****###++*++*++#++*+-+*++#+-++-+
> | *****| # *++* |# *****| # * * # * *++# * * # * * # **** |# * * # * * # * * # |
> | * | *| # * *++# * | *++# * * # * * # * * # * * # *| *++# * * # * * # * * # |
> 0.98x +-+...*.|.*++#..*..*..#..*+++*..#..*..*..#..*...*..#..*..*..#..*...*..#..*++*..#..*...*..#..*..*..#..*...*..#...+-+
> | *+++* # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # |
> | * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # |
> 0.96x +-+---*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###---+-+
> ASSIGNMENT BITFIELD FOURFP EMULATION HUFFMAN LU DECOMPOSITIONEURAL NNUMERIC SOSTRING SORT hmean
> png: http://imgur.com/ZXFX0hJ
>
> - NBench, arm-linux-user. Host: Intel i7-4790K @ 4.00GHz
>
> 1.3x +-+-------------------------------------------------------------------------------------------------------------+-+
> | #### |
> | jr # # +++ |
> 1.25x +jr+hash.....................#..#...........................................####................................+-+
> | # # # # |
> | # # # # |
> 1.2x +-+..........................#..#...........................................#..#................................+-+
> | # # # # |
> | # # # # |
> 1.15x +-+..........................#..#...........................................#..#................................+-+
> | # # #### # # |
> | # # # # # # |
> 1.1x +-+..........................#..#..................................#..#.....#..#................................+-+
> | # # # # # # +++ |
> | # # #### # # # # #### |
> 1.05x +-+..........................#..#...............#..#.....####......#..#.....#..#.........................#..#...+-+
> | # # # # # # # # # # +++ # # |
> | +++ ***** # #### ***** # # # +++# # **** # ****### # # |
> 1x +-++-+*****###++****+++++*+-+*++#+-****++#-+*+++*-+#+++++#++#++*****++#+-*++*++#-+*****-++++*++*++#++*****++#+-++-+
> | * * # * * | * * # * * # * * # **** # * * # * * # * *### * *++# * * # |
> | * * # * *### * * # * * # * * # * * # * * # * * # * * # * * # * * # |
> 0.95x +-+...*...*..#..*..*.|#..*...*..#..*..*..#..*...*..#..*..*..#..*...*..#..*..*..#..*...*..#..*..*..#..*...*..#...+-+
> | * * # * * |# * * # * * # * * # * * # * * # * * # * * # * * # * * # |
> | * * # * * |# * * # * * # * * # * * # * * # * * # * * # * * # * * # |
> 0.9x +-+---*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###---+-+
> ASSIGNMENT BITFIELD FOURFP EMULATION HUFFMAN LU DECOMPOSITIONEURAL NNUMERIC SOSTRING SORT hmean
> png: http://imgur.com/FfD27ey
>
> Reviewed-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> Message-Id: <1493263764-18657-12-git-send-email-cota@braap.org>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
> include/exec/tb-hash.h | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h
> index 2c27490..b1fe2d0 100644
> --- a/include/exec/tb-hash.h
> +++ b/include/exec/tb-hash.h
> @@ -22,6 +22,8 @@
>
> #include "exec/tb-hash-xx.h"
>
> +#ifdef CONFIG_SOFTMMU
> +
> /* Only the bottom TB_JMP_PAGE_BITS of the jump cache hash bits vary for
> addresses on the same page. The top bits are the same. This allows
> TLB invalidation to quickly clear a subset of the hash table. */
> @@ -45,6 +47,16 @@ static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc)
> | (tmp & TB_JMP_ADDR_MASK));
> }
>
> +#else
> +
> +/* In user-mode we can get better hashing because we do not have a TLB */
> +static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc)
> +{
> + return (pc ^ (pc >> TB_JMP_CACHE_BITS)) & (TB_JMP_CACHE_SIZE - 1);
> +}
> +
> +#endif /* CONFIG_SOFTMMU */
> +
> static inline
> uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t flags)
> {
I'll note when I've plotted hit-rates against the cache we don't seem to
be making a good even use of the cache over time. But I suspect there is
more that could be done here. That said the numbers are compelling so:
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
next prev parent reply other threads:[~2017-04-28 17:00 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-27 11:59 [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations Richard Henderson
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 01/19] target/nios2: Fix 64-bit ilp32 compilation Richard Henderson
2017-04-27 16:03 ` Alex Bennée
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 02/19] tcg/sparc: Use the proper compilation flags for 32-bit Richard Henderson
2017-04-27 16:04 ` Alex Bennée
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 03/19] qemu/atomic: Loosen restrictions for 64-bit ILP32 hosts Richard Henderson
2017-04-27 16:10 ` Alex Bennée
2017-04-28 7:07 ` Richard Henderson
2017-04-28 7:47 ` Alex Bennée
2017-04-28 8:05 ` Richard Henderson
2017-04-28 10:25 ` Alex Bennée
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 04/19] exec-all: export tb_htable_lookup Richard Henderson
2017-04-27 16:10 ` Alex Bennée
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 05/19] tcg-runtime: add lookup_tb_ptr helper Richard Henderson
2017-04-28 10:29 ` Alex Bennée
2017-04-28 10:32 ` Richard Henderson
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 06/19] tcg: introduce goto_ptr opcode Richard Henderson
2017-04-28 10:32 ` Alex Bennée
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 07/19] tcg: export tcg_gen_lookup_and_goto_ptr Richard Henderson
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 08/19] target/arm: optimize cross-page direct jumps in softmmu Richard Henderson
2017-04-28 11:30 ` Alex Bennée
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 09/19] target/arm: optimize indirect branches Richard Henderson
2017-04-27 22:58 ` Emilio G. Cota
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 10/19] target/i386: introduce gen_jr helper to generate lookup_and_goto_ptr Richard Henderson
2017-04-28 16:50 ` Alex Bennée
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 11/19] target/i386: optimize cross-page direct jumps in softmmu Richard Henderson
2017-04-28 16:56 ` Alex Bennée
2017-04-29 9:14 ` Richard Henderson
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 12/19] target/i386: optimize indirect branches Richard Henderson
2017-04-28 16:58 ` Alex Bennée
2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 13/19] tb-hash: improve tb_jmp_cache hash function in user mode Richard Henderson
2017-04-28 17:00 ` Alex Bennée [this message]
2017-04-28 17:44 ` Emilio G. Cota
2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 14/19] target/alpha: Use tcg_gen_goto_ptr Richard Henderson
2017-04-28 17:10 ` Alex Bennée
2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 15/19] tcg/i386: implement goto_ptr Richard Henderson
2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 16/19] tcg/ppc: Implement goto_ptr Richard Henderson
2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 17/19] tcg/aarch64: " Richard Henderson
2017-04-27 22:18 ` Emilio G. Cota
2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 18/19] tcg/sparc: " Richard Henderson
2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 19/19] tcg/s390: " Richard Henderson
2017-04-27 12:58 ` [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations no-reply
2017-04-28 19:17 ` [Qemu-devel] [PATCH v5+] " Emilio G. Cota
2017-04-28 19:17 ` [Qemu-devel] [PATCH v5 + 1/2] target/aarch64: optimize cross-page direct jumps in softmmu Emilio G. Cota
2017-04-28 19:22 ` Emilio G. Cota
2017-04-29 10:30 ` Richard Henderson
2017-05-01 2:10 ` Emilio G. Cota
2017-04-28 19:17 ` [Qemu-devel] [PATCH v5 + 2/2] target/aarch64: optimize indirect branches Emilio G. Cota
2017-04-28 21:19 ` Emilio G. Cota
2017-04-30 9:47 ` Richard Henderson
2017-04-30 10:17 ` Richard Henderson
2017-04-30 14:52 ` [Qemu-devel] [PATCH v5++] TCG cross-tb optimizations Aurelien Jarno
2017-04-30 14:52 ` [Qemu-devel] [PATCH v5++ 1/3] tcg/mips: implement goto_ptr Aurelien Jarno
2017-05-01 22:00 ` Philippe Mathieu-Daudé
2017-05-02 16:21 ` Richard Henderson
2017-05-02 19:38 ` Aurelien Jarno
2017-04-30 14:52 ` [Qemu-devel] [PATCH v5++ 2/3] target/mips: optimize cross-page direct jumps in softmmu Aurelien Jarno
2017-04-30 14:52 ` [Qemu-devel] [PATCH v5++ 3/3] target/mips: optimize indirect branches Aurelien Jarno
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87o9vg79h2.fsf@linaro.org \
--to=alex.bennee@linaro.org \
--cc=cota@braap.org \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.