qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Alex Bennée" <alex.bennee@linaro.org>
To: Richard Henderson <rth@twiddle.net>
Cc: qemu-devel@nongnu.org, cota@braap.org
Subject: Re: [Qemu-devel] [PATCH v5 13/19] tb-hash: improve tb_jmp_cache hash function in user mode
Date: Fri, 28 Apr 2017 18:00:57 +0100	[thread overview]
Message-ID: <87o9vg79h2.fsf@linaro.org> (raw)
In-Reply-To: <20170427120006.20564-14-rth@twiddle.net>


Richard Henderson <rth@twiddle.net> writes:

> From: "Emilio G. Cota" <cota@braap.org>
>
> Optimizations to cross-page chaining and indirect branches make
> performance more sensitive to the hit rate of tb_jmp_cache.
> The constraint of reserving some bits for the page number
> lowers the achievable quality of the hashing function.
>
> However, user-mode does not have this requirement. Thus,
> with this change we use for user-mode a hashing function that
> is both faster and of better quality than the previous one.
>
> Measurements:
>
> Note: baseline (i.e. speedup == 1x) is QEMU v2.9.0.
>
> -                           SPECint06 (test set), x86_64-linux-user. Host: Intel i7-6700K @ 4.00GHz
>
>  2.2x +-+--------------------------------------------------------------------------------------------------------------+-+
>       |                                                                                                                  |
>       |         jr                                                                                                       |
>    2x +jr+multhash        +....................................................+++++...................................+-+
>       |    jr+hash                                                              |$$$                                     |
>       |                                                                         |$+$                                     |
>       |                                                                        ### $                                     |
>  1.8x +-+......................................................................#|#.$...................................+-+
>       |                                                                      ++#+# $                                     |
>       |                                                                       |# # $                                     |
>  1.6x +-+....................................................................***.#.$....................++$$$..........+-+
>       |                                         $$$                          *+* # $                     |$+$            |
>       |                       ++$$$           ### $                          * * # $                  +++|$ $            |
>       |                     ++###+$           # # $                          * * # $           ###   ****## $            |
>  1.4x +-+...................***+#.$.........***.#.$..........................*.*.#.$...........#+#$$.*++*|#.$..........+-+
>       |                     *+* # $         * * # $                          * * # $           # # $ *  *+# $            |
>       |                     * * # $   +++++ * * # $                          * * # $         *** # $ *  * # $   ###$$    |
>  1.2x +-+...................*.*.#.$.***##$$.*.*.#.$..........................*.*.#.$.........*.*.#.$.*..*.#.$.***+#+$..+-+
>       |                     * * # $ *+* # $ * * # $   +++                    * * # $ ++###$$ * * # $ *  * # $ * * # $    |
>       |    ***##$$          * * # $ * * # $ * * # $ ***##$$          ++###   * * # $ *** #+$ * * # $ *  * # $ * * # $    |
>       |    *+*+#+$ ***##$$$ * * # $ * * # $ * * # $ *+* # $ ++####$$ ***+#   * * # $ * * # $ * * # $ *  * # $ * * # $    |
>    1x +-++-*+*+#+$+*+*+#-+$+*+*-#+$+*+*+#+$+*+*+#+$+*-*+#+$+***++#+$+*+*+#$$+*+*+#+$+*+*+#+$+*+*-#+$+*+-*+#+$+*+*+#+$-++-+
>       |    * * # $ * * #  $ * * # $ * * # $ * * # $ * * # $ * *  # $ * * # $ * * # $ * * # $ * * # $ *  * # $ * * # $    |
>       |    * * # $ * * #  $ * * # $ * * # $ * * # $ * * # $ * *  # $ * * # $ * * # $ * * # $ * * # $ *  * # $ * * # $    |
>  0.8x +-+--***##$$-***##$$$-***##$$-***##$$-***##$$-***##$$-***###$$-***##$$-***##$$-***##$$-***##$$-****##$$-***##$$--+-+
>          astar   bzip2      gcc   gobmk h264ref   hmmlibquantum      mcf omnetpperlbench   sjengxalancbmk   hmean
>   png: http://imgur.com/4UXTrEc
>
> Here I also tried the hash function suggested by Paolo ("multhash"):
>
>   return ((uint64_t) (pc * 2654435761) >> 32) & (TB_JMP_CACHE_SIZE - 1);
>
> As you can see it is just as good as the other new function ("hash"),
> which is what I ended up going with.
>
> -                          SPECint06 (train set), x86_64-linux-user. Host: Intel i7-6700K @ 4.00GHz
>
>  2.6x +-+--------------------------------------------------------------------------------------------------------------+-+
>       |                                                                                                                  |
>       |     jr                                                                                           ###             |
>  2.4x +jr+hash...........................................................................................#.#...........+-+
>       |                                                                                                  # #             |
>       |                                                                                                  # #             |
>  2.2x +-+................................................................................................#.#...........+-+
>       |                                                                                                  # #             |
>       |                                                                                                  # #             |
>    2x +-+................................................................................................#.#...........+-+
>       |                                                                                               **** #             |
>       |                                                                                               *  * #             |
>  1.8x +-+.............................................................................................*..*.#...........+-+
>       |                                                                         +++                   *  * #             |
>       |                                                                         ####    ####          *  * #             |
>  1.6x +-+......................................####.............................#..#.****..#..........*..*.#...........+-+
>       |                        +++             #++#                          ****  # *  *  #    ####  *  * #             |
>       |                        ###             #  #                          *  *  # *  *  #    #  #  *  * #             |
>  1.4x +-+...................****+#..........****..#..........................*..*..#.*..*..#....#..#..*..*.#...........+-+
>       |                     *++* #          *  *  #                          *  *  # *  *  #  ***  #  *  * #     ####    |
>       |                     *  * #     #### *  *  #                          *  *  # *  *  #  * *  #  *  * #  ****  #    |
>  1.2x +-+...................*..*.#..****++#.*..*..#..........................*..*..#.*..*..#..*.*..#..*..*.#..*..*..#..+-+
>       |    ****###          *  * #  *  *  # *  *  #                          *  *  # *  *  #  * *  #  *  * #  *  *  #    |
>       |    *  *  #  ***###  *  * #  *  *  # *  *  #                  ****##  *  *  # *  *  #  * *  #  *  * #  *  *  #    |
>    1x +-+--****###--***###--****##--****###-****###--***###--***###--****##--****###-****###--***###--****##--****###--+-+
>          astar   bzip2      gcc   gobmk h264ref   hmmlibquantum      mcf omnetpperlbench   sjengxalancbmk   hmean
>   png: http://imgur.com/ArCbHqo
>
> -                                    NBench, x86_64-linux-user. Host: Intel i7-6700K @ 4.00GHz
>
>  1.12x +-+-------------------------------------------------------------------------------------------------------------+-+
>        |                                                                                                                 |
>        |     jr                                                           +++                                            |
>   1.1x +jr+hash...........................................................####.........................................+-+
>        |                                                               +++#| #                                           |
>        |                                                                | #++#                                           |
>  1.08x +-+................................+++................+++.+++..*****..#.........................................+-+
>        |                                   |  +++             |   |   * | *  #                                           |
>        |                                   |   |              |   |   *+++*  #                                           |
>  1.06x +-+................................****###.............|...|...*...*..#.........................+++.............+-+
>        |                                  *| * |#            ****###  *   *  #                          |                |
>        |                                  *| *++#            *| * |#  *   *  #                        ####               |
>  1.04x +-+................................*++*..#............*|.*.|#..*...*..#........................#.|#.............+-+
>        |                                  *  *  #            *++*++#  *   *  #                     +++#++#               |
>        |                                  *  *  #            *  *  #  *   *  #                      | #  #   +++####     |
>  1.02x +-+................................*..*..#......+++...*..*..#..*...*..#.....................****..#..*****++#...+-+
>        |         +++                      *  *  #   +++ |    *  *  #  *   *  #  +++                *| *  #  *+++*  #     |
>        |      +++ |    +++ +++   ++++++   *  *  #  *****###  *  *  #  *   *  #   |  +++   ++++++   *++*  #  *   *  #     |
>     1x +-++-+++++####++****###++++-+####+-*++*++#-+*+++*-+#++*++*++#++*+-+*++#+-+++####-+*****###++*++*++#++*+-+*++#+-++-+
>        |     *****| #  *++* |#  *****| #  *  *  #  *   *++#  *  *  #  *   *  #  **** |#  *   *  #  *  *  #  *   *  #     |
>        |     * | *| #  *  *++#  * | *++#  *  *  #  *   *  #  *  *  #  *   *  #  *| *++#  *   *  #  *  *  #  *   *  #     |
>  0.98x +-+...*.|.*++#..*..*..#..*+++*..#..*..*..#..*...*..#..*..*..#..*...*..#..*++*..#..*...*..#..*..*..#..*...*..#...+-+
>        |     *+++*  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #     |
>        |     *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #     |
>  0.96x +-+---*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###---+-+
>        ASSIGNMENT BITFIELD   FOURFP EMULATION   HUFFMAN   LU DECOMPOSITIONEURAL NNUMERIC SOSTRING SORT     hmean
>   png: http://imgur.com/ZXFX0hJ
>
> -                                   NBench, arm-linux-user. Host: Intel i7-4790K @ 4.00GHz
>
>   1.3x +-+-------------------------------------------------------------------------------------------------------------+-+
>        |                            ####                                                                                 |
>        |     jr                     #  #                                            +++                                  |
>  1.25x +jr+hash.....................#..#...........................................####................................+-+
>        |                            #  #                                           #  #                                  |
>        |                            #  #                                           #  #                                  |
>   1.2x +-+..........................#..#...........................................#..#................................+-+
>        |                            #  #                                           #  #                                  |
>        |                            #  #                                           #  #                                  |
>  1.15x +-+..........................#..#...........................................#..#................................+-+
>        |                            #  #                                  ####     #  #                                  |
>        |                            #  #                                  #  #     #  #                                  |
>   1.1x +-+..........................#..#..................................#..#.....#..#................................+-+
>        |                            #  #                                  #  #     #  #                         +++      |
>        |                            #  #               ####               #  #     #  #                         ####     |
>  1.05x +-+..........................#..#...............#..#.....####......#..#.....#..#.........................#..#...+-+
>        |                            #  #               #  #     #  #      #  #     #  #                +++      #  #     |
>        |                   +++  *****  #     ####  *****  #     #  #   +++#  #  ****  #            ****###      #  #     |
>     1x +-++-+*****###++****+++++*+-+*++#+-****++#-+*+++*-+#+++++#++#++*****++#+-*++*++#-+*****-++++*++*++#++*****++#+-++-+
>        |     *   *  #  *  * |   *   *  #  *  *  #  *   *  #  ****  #  *   *  #  *  *  #  *   *###  *  *++#  *   *  #     |
>        |     *   *  #  *  *###  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #     |
>  0.95x +-+...*...*..#..*..*.|#..*...*..#..*..*..#..*...*..#..*..*..#..*...*..#..*..*..#..*...*..#..*..*..#..*...*..#...+-+
>        |     *   *  #  *  * |#  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #     |
>        |     *   *  #  *  * |#  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #     |
>   0.9x +-+---*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###---+-+
>        ASSIGNMENT BITFIELD   FOURFP EMULATION   HUFFMAN   LU DECOMPOSITIONEURAL NNUMERIC SOSTRING SORT     hmean
>   png: http://imgur.com/FfD27ey
>
> Reviewed-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> Message-Id: <1493263764-18657-12-git-send-email-cota@braap.org>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  include/exec/tb-hash.h | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h
> index 2c27490..b1fe2d0 100644
> --- a/include/exec/tb-hash.h
> +++ b/include/exec/tb-hash.h
> @@ -22,6 +22,8 @@
>
>  #include "exec/tb-hash-xx.h"
>
> +#ifdef CONFIG_SOFTMMU
> +
>  /* Only the bottom TB_JMP_PAGE_BITS of the jump cache hash bits vary for
>     addresses on the same page.  The top bits are the same.  This allows
>     TLB invalidation to quickly clear a subset of the hash table.  */
> @@ -45,6 +47,16 @@ static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc)
>             | (tmp & TB_JMP_ADDR_MASK));
>  }
>
> +#else
> +
> +/* In user-mode we can get better hashing because we do not have a TLB */
> +static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc)
> +{
> +    return (pc ^ (pc >> TB_JMP_CACHE_BITS)) & (TB_JMP_CACHE_SIZE - 1);
> +}
> +
> +#endif /* CONFIG_SOFTMMU */
> +
>  static inline
>  uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t flags)
>  {

I'll note when I've plotted hit-rates against the cache we don't seem to
be making a good even use of the cache over time. But I suspect there is
more that could be done here. That said the numbers are compelling so:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

  reply	other threads:[~2017-04-28 17:00 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-27 11:59 [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations Richard Henderson
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 01/19] target/nios2: Fix 64-bit ilp32 compilation Richard Henderson
2017-04-27 16:03   ` Alex Bennée
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 02/19] tcg/sparc: Use the proper compilation flags for 32-bit Richard Henderson
2017-04-27 16:04   ` Alex Bennée
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 03/19] qemu/atomic: Loosen restrictions for 64-bit ILP32 hosts Richard Henderson
2017-04-27 16:10   ` Alex Bennée
2017-04-28  7:07     ` Richard Henderson
2017-04-28  7:47       ` Alex Bennée
2017-04-28  8:05         ` Richard Henderson
2017-04-28 10:25           ` Alex Bennée
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 04/19] exec-all: export tb_htable_lookup Richard Henderson
2017-04-27 16:10   ` Alex Bennée
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 05/19] tcg-runtime: add lookup_tb_ptr helper Richard Henderson
2017-04-28 10:29   ` Alex Bennée
2017-04-28 10:32     ` Richard Henderson
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 06/19] tcg: introduce goto_ptr opcode Richard Henderson
2017-04-28 10:32   ` Alex Bennée
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 07/19] tcg: export tcg_gen_lookup_and_goto_ptr Richard Henderson
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 08/19] target/arm: optimize cross-page direct jumps in softmmu Richard Henderson
2017-04-28 11:30   ` Alex Bennée
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 09/19] target/arm: optimize indirect branches Richard Henderson
2017-04-27 22:58   ` Emilio G. Cota
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 10/19] target/i386: introduce gen_jr helper to generate lookup_and_goto_ptr Richard Henderson
2017-04-28 16:50   ` Alex Bennée
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 11/19] target/i386: optimize cross-page direct jumps in softmmu Richard Henderson
2017-04-28 16:56   ` Alex Bennée
2017-04-29  9:14     ` Richard Henderson
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 12/19] target/i386: optimize indirect branches Richard Henderson
2017-04-28 16:58   ` Alex Bennée
2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 13/19] tb-hash: improve tb_jmp_cache hash function in user mode Richard Henderson
2017-04-28 17:00   ` Alex Bennée [this message]
2017-04-28 17:44     ` Emilio G. Cota
2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 14/19] target/alpha: Use tcg_gen_goto_ptr Richard Henderson
2017-04-28 17:10   ` Alex Bennée
2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 15/19] tcg/i386: implement goto_ptr Richard Henderson
2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 16/19] tcg/ppc: Implement goto_ptr Richard Henderson
2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 17/19] tcg/aarch64: " Richard Henderson
2017-04-27 22:18   ` Emilio G. Cota
2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 18/19] tcg/sparc: " Richard Henderson
2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 19/19] tcg/s390: " Richard Henderson
2017-04-27 12:58 ` [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations no-reply
2017-04-28 19:17 ` [Qemu-devel] [PATCH v5+] " Emilio G. Cota
2017-04-28 19:17   ` [Qemu-devel] [PATCH v5 + 1/2] target/aarch64: optimize cross-page direct jumps in softmmu Emilio G. Cota
2017-04-28 19:22     ` Emilio G. Cota
2017-04-29 10:30       ` Richard Henderson
2017-05-01  2:10         ` Emilio G. Cota
2017-04-28 19:17   ` [Qemu-devel] [PATCH v5 + 2/2] target/aarch64: optimize indirect branches Emilio G. Cota
2017-04-28 21:19     ` Emilio G. Cota
2017-04-30  9:47     ` Richard Henderson
2017-04-30 10:17       ` Richard Henderson
2017-04-30 14:52 ` [Qemu-devel] [PATCH v5++] TCG cross-tb optimizations Aurelien Jarno
2017-04-30 14:52   ` [Qemu-devel] [PATCH v5++ 1/3] tcg/mips: implement goto_ptr Aurelien Jarno
2017-05-01 22:00     ` Philippe Mathieu-Daudé
2017-05-02 16:21     ` Richard Henderson
2017-05-02 19:38       ` Aurelien Jarno
2017-04-30 14:52   ` [Qemu-devel] [PATCH v5++ 2/3] target/mips: optimize cross-page direct jumps in softmmu Aurelien Jarno
2017-04-30 14:52   ` [Qemu-devel] [PATCH v5++ 3/3] target/mips: optimize indirect branches Aurelien Jarno

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o9vg79h2.fsf@linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=cota@braap.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).