From: "Alex Bennée" <alex.bennee@linaro.org>
To: "Emilio G. Cota" <cota@braap.org>
Cc: qemu-devel@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>,
Peter Crosthwaite <crosthwaite.peter@gmail.com>,
Richard Henderson <rth@twiddle.net>,
Peter Maydell <peter.maydell@linaro.org>,
Eduardo Habkost <ehabkost@redhat.com>,
Andrzej Zaborowski <balrogg@gmail.com>,
Aurelien Jarno <aurelien@aurel32.net>,
Alexander Graf <agraf@suse.de>, Stefan Weil <sw@weilnetz.de>,
qemu-arm@nongnu.org, Pranith Kumar <bobby.prani+qemu@gmail.com>
Subject: Re: [PATCH v4 07/11] target/arm: optimize indirect branches
Date: Thu, 27 Apr 2017 10:41:50 +0100 [thread overview]
Message-ID: <878tmm89wh.fsf@linaro.org> (raw)
In-Reply-To: <1493263764-18657-8-git-send-email-cota@braap.org>
Emilio G. Cota <cota@braap.org> writes:
> Speed up indirect branches by jumping to the target if it is valid.
>
> Softmmu measurements (see later commit for user-mode results):
>
> Note: baseline (i.e. speedup == 1x) is QEMU v2.9.0.
>
> - Impact on Boot time
>
> | setup | ARM debian jessie boot+shutdown time | stddev |
> |--------+--------------------------------------+--------|
> | v2.9.0 | 8.84 | 0.07 |
> | +cross | 8.85 | 0.03 |
> | +jr | 8.83 | 0.06 |
>
> - NBench, arm-softmmu (debian jessie guest). Host: Intel i7-4790K @ 4.00GHz
>
> 1.3x +-+-------------------------------------------------------------------------------------------------------------+-+
> | |
> | cross #### |
> 1.25x +cross+jr..........................................................#++#.........................................+-+
> | #### # # |
> | +++# # # # |
> | +++ **** # # # |
> 1.2x +-+...................................####............*..*..#......#..#.........................................+-+
> | **** # * * # # # #### |
> | * * # * * # # # # # |
> 1.15x +-+................................*..*..#............*..*..#......#..#.....#..#................................+-+
> | * * # * * # # # # # |
> | * * # #### * * # # # # # |
> | * * # # # * * # # # # # #### |
> 1.1x +-+................................*..*..#......#..#..*..*..#......#..#.....#..#.........................#..#...+-+
> | * * # # # * * # # # # # # # |
> | * * # # # * * # # # # # # # |
> 1.05x +-+..........................####..*..*..#......#..#..*..*..#......#..#.....#..#......+++............*****..#...+-+
> | ***** # * * # # # * * # ***** # # # +++ | ****### * * # |
> | *+++* # * * # # # * * # *+++* # **** # *****### * * # * * # |
> | *****### +++#### * * # * * # ***** # * * # * * # * * # * | *++# * * # * * # |
> 1x +-++-+*+++*-+#++****++#++*+-+*++#+-*++*++#-+*+++*-+#++*++*++#++*+-+*++#+-*++*++#-+*+++*-+#++*++*++#++*+-+*++#+-++-+
> | * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # |
> | * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # |
> 0.95x +-+---*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###---+-+
> ASSIGNMENT BITFIELD FOURFP EMULATION HUFFMAN LU DECOMPOSITIONEURAL NNUMERIC SOSTRING SORT hmean
> png: http://imgur.com/eOLmZNR
>
> NB. 'cross' represents the previous commit.
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
Hmm not sure why but this doesn't cleanly apply to master.
> ---
> target/arm/translate.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index 02cad96..d46a576 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -65,6 +65,7 @@ static TCGv_i32 cpu_R[16];
> TCGv_i32 cpu_CF, cpu_NF, cpu_VF, cpu_ZF;
> TCGv_i64 cpu_exclusive_addr;
> TCGv_i64 cpu_exclusive_val;
> +static bool gen_jr;
Isn't this something that should be part of the DisasContext rather than
a global? I know we are unlikely to making the translators run
independently anytime soon but we shouldn't use globals where we can
help it.
>
> /* FIXME: These should be removed. */
> static TCGv_i32 cpu_F0s, cpu_F1s;
> @@ -221,6 +222,7 @@ static void store_reg(DisasContext *s, int reg, TCGv_i32 var)
> */
> tcg_gen_andi_i32(var, var, s->thumb ? ~1 : ~3);
> s->is_jmp = DISAS_JUMP;
> + gen_jr = true;
> }
> tcg_gen_mov_i32(cpu_R[reg], var);
> tcg_temp_free_i32(var);
> @@ -893,6 +895,7 @@ static inline void gen_bx_im(DisasContext *s, uint32_t addr)
> tcg_temp_free_i32(tmp);
> }
> tcg_gen_movi_i32(cpu_R[15], addr & ~1);
> + gen_jr = true;
> }
>
> /* Set PC and Thumb state from var. var is marked as dead. */
> @@ -902,6 +905,7 @@ static inline void gen_bx(DisasContext *s, TCGv_i32 var)
> tcg_gen_andi_i32(cpu_R[15], var, ~1);
> tcg_gen_andi_i32(var, var, 1);
> store_cpu_field(var, thumb);
> + gen_jr = true;
> }
>
> /* Variant of store_reg which uses branch&exchange logic when storing
> @@ -12034,6 +12038,20 @@ void gen_intermediate_code(CPUARMState *env, TranslationBlock *tb)
> gen_set_pc_im(dc, dc->pc);
> /* fall through */
> case DISAS_JUMP:
> + /*
> + * gen_jr is not set on every DISAS_JUMP because for some of those
> + * we do want to exit to the exec loop.
> + */
> + if (gen_jr) {
> + TCGv addr = tcg_temp_new();
> +
> + gen_jr = false;
> + tcg_gen_extu_i32_tl(addr, cpu_R[15]);
> + tcg_gen_lookup_and_goto_ptr(addr);
> + tcg_temp_free(addr);
> + break;
> + }
> + /* fall through */
> default:
> /* indicate that the hash table must be used to find the next TB */
> tcg_gen_exit_tb(0);
--
Alex Bennée
WARNING: multiple messages have this Message-ID (diff)
From: "Alex Bennée" <alex.bennee@linaro.org>
To: "Emilio G. Cota" <cota@braap.org>
Cc: qemu-devel@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>,
Peter Crosthwaite <crosthwaite.peter@gmail.com>,
Richard Henderson <rth@twiddle.net>,
Peter Maydell <peter.maydell@linaro.org>,
Eduardo Habkost <ehabkost@redhat.com>,
Andrzej Zaborowski <balrogg@gmail.com>,
Aurelien Jarno <aurelien@aurel32.net>,
Alexander Graf <agraf@suse.de>, Stefan Weil <sw@weilnetz.de>,
qemu-arm@nongnu.org, Pranith Kumar <bobby.prani+qemu@gmail.com>
Subject: Re: [Qemu-devel] [PATCH v4 07/11] target/arm: optimize indirect branches
Date: Thu, 27 Apr 2017 10:41:50 +0100 [thread overview]
Message-ID: <878tmm89wh.fsf@linaro.org> (raw)
In-Reply-To: <1493263764-18657-8-git-send-email-cota@braap.org>
Emilio G. Cota <cota@braap.org> writes:
> Speed up indirect branches by jumping to the target if it is valid.
>
> Softmmu measurements (see later commit for user-mode results):
>
> Note: baseline (i.e. speedup == 1x) is QEMU v2.9.0.
>
> - Impact on Boot time
>
> | setup | ARM debian jessie boot+shutdown time | stddev |
> |--------+--------------------------------------+--------|
> | v2.9.0 | 8.84 | 0.07 |
> | +cross | 8.85 | 0.03 |
> | +jr | 8.83 | 0.06 |
>
> - NBench, arm-softmmu (debian jessie guest). Host: Intel i7-4790K @ 4.00GHz
>
> 1.3x +-+-------------------------------------------------------------------------------------------------------------+-+
> | |
> | cross #### |
> 1.25x +cross+jr..........................................................#++#.........................................+-+
> | #### # # |
> | +++# # # # |
> | +++ **** # # # |
> 1.2x +-+...................................####............*..*..#......#..#.........................................+-+
> | **** # * * # # # #### |
> | * * # * * # # # # # |
> 1.15x +-+................................*..*..#............*..*..#......#..#.....#..#................................+-+
> | * * # * * # # # # # |
> | * * # #### * * # # # # # |
> | * * # # # * * # # # # # #### |
> 1.1x +-+................................*..*..#......#..#..*..*..#......#..#.....#..#.........................#..#...+-+
> | * * # # # * * # # # # # # # |
> | * * # # # * * # # # # # # # |
> 1.05x +-+..........................####..*..*..#......#..#..*..*..#......#..#.....#..#......+++............*****..#...+-+
> | ***** # * * # # # * * # ***** # # # +++ | ****### * * # |
> | *+++* # * * # # # * * # *+++* # **** # *****### * * # * * # |
> | *****### +++#### * * # * * # ***** # * * # * * # * * # * | *++# * * # * * # |
> 1x +-++-+*+++*-+#++****++#++*+-+*++#+-*++*++#-+*+++*-+#++*++*++#++*+-+*++#+-*++*++#-+*+++*-+#++*++*++#++*+-+*++#+-++-+
> | * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # |
> | * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # |
> 0.95x +-+---*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###---+-+
> ASSIGNMENT BITFIELD FOURFP EMULATION HUFFMAN LU DECOMPOSITIONEURAL NNUMERIC SOSTRING SORT hmean
> png: http://imgur.com/eOLmZNR
>
> NB. 'cross' represents the previous commit.
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
Hmm not sure why but this doesn't cleanly apply to master.
> ---
> target/arm/translate.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index 02cad96..d46a576 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -65,6 +65,7 @@ static TCGv_i32 cpu_R[16];
> TCGv_i32 cpu_CF, cpu_NF, cpu_VF, cpu_ZF;
> TCGv_i64 cpu_exclusive_addr;
> TCGv_i64 cpu_exclusive_val;
> +static bool gen_jr;
Isn't this something that should be part of the DisasContext rather than
a global? I know we are unlikely to making the translators run
independently anytime soon but we shouldn't use globals where we can
help it.
>
> /* FIXME: These should be removed. */
> static TCGv_i32 cpu_F0s, cpu_F1s;
> @@ -221,6 +222,7 @@ static void store_reg(DisasContext *s, int reg, TCGv_i32 var)
> */
> tcg_gen_andi_i32(var, var, s->thumb ? ~1 : ~3);
> s->is_jmp = DISAS_JUMP;
> + gen_jr = true;
> }
> tcg_gen_mov_i32(cpu_R[reg], var);
> tcg_temp_free_i32(var);
> @@ -893,6 +895,7 @@ static inline void gen_bx_im(DisasContext *s, uint32_t addr)
> tcg_temp_free_i32(tmp);
> }
> tcg_gen_movi_i32(cpu_R[15], addr & ~1);
> + gen_jr = true;
> }
>
> /* Set PC and Thumb state from var. var is marked as dead. */
> @@ -902,6 +905,7 @@ static inline void gen_bx(DisasContext *s, TCGv_i32 var)
> tcg_gen_andi_i32(cpu_R[15], var, ~1);
> tcg_gen_andi_i32(var, var, 1);
> store_cpu_field(var, thumb);
> + gen_jr = true;
> }
>
> /* Variant of store_reg which uses branch&exchange logic when storing
> @@ -12034,6 +12038,20 @@ void gen_intermediate_code(CPUARMState *env, TranslationBlock *tb)
> gen_set_pc_im(dc, dc->pc);
> /* fall through */
> case DISAS_JUMP:
> + /*
> + * gen_jr is not set on every DISAS_JUMP because for some of those
> + * we do want to exit to the exec loop.
> + */
> + if (gen_jr) {
> + TCGv addr = tcg_temp_new();
> +
> + gen_jr = false;
> + tcg_gen_extu_i32_tl(addr, cpu_R[15]);
> + tcg_gen_lookup_and_goto_ptr(addr);
> + tcg_temp_free(addr);
> + break;
> + }
> + /* fall through */
> default:
> /* indicate that the hash table must be used to find the next TB */
> tcg_gen_exit_tb(0);
--
Alex Bennée
next prev parent reply other threads:[~2017-04-27 9:41 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-27 3:29 [PATCH v4 00/11] TCG optimizations for 2.10 Emilio G. Cota
2017-04-27 3:29 ` [Qemu-devel] " Emilio G. Cota
2017-04-27 3:29 ` [PATCH v4 01/11] exec-all: export tb_htable_lookup Emilio G. Cota
2017-04-27 3:29 ` [Qemu-devel] " Emilio G. Cota
2017-04-27 3:29 ` [PATCH v4 02/11] tcg-runtime: add lookup_tb_ptr helper Emilio G. Cota
2017-04-27 3:29 ` [Qemu-devel] " Emilio G. Cota
2017-04-27 3:29 ` [PATCH v4 03/11] tcg: introduce goto_ptr opcode Emilio G. Cota
2017-04-27 3:29 ` [Qemu-devel] " Emilio G. Cota
2017-04-27 8:09 ` Richard Henderson
2017-04-27 8:09 ` [Qemu-devel] " Richard Henderson
2017-04-27 3:29 ` [PATCH v4 04/11] tcg: export tcg_gen_lookup_and_goto_ptr Emilio G. Cota
2017-04-27 3:29 ` [Qemu-devel] " Emilio G. Cota
2017-04-27 3:29 ` [PATCH v4 05/11] tcg/i386: implement goto_ptr op Emilio G. Cota
2017-04-27 3:29 ` [Qemu-devel] " Emilio G. Cota
2017-04-27 3:29 ` [PATCH v4 06/11] target/arm: optimize cross-page direct jumps in softmmu Emilio G. Cota
2017-04-27 3:29 ` [Qemu-devel] " Emilio G. Cota
2017-04-27 3:29 ` [PATCH v4 07/11] target/arm: optimize indirect branches Emilio G. Cota
2017-04-27 3:29 ` [Qemu-devel] " Emilio G. Cota
2017-04-27 9:36 ` Aurelien Jarno
2017-04-27 9:36 ` [Qemu-devel] " Aurelien Jarno
2017-04-27 9:42 ` Richard Henderson
2017-04-27 9:42 ` [Qemu-devel] " Richard Henderson
2017-04-27 10:15 ` Aurelien Jarno
2017-04-27 10:15 ` [Qemu-devel] " Aurelien Jarno
2017-04-27 9:41 ` Alex Bennée [this message]
2017-04-27 9:41 ` Alex Bennée
2017-04-27 3:29 ` [PATCH v4 08/11] target/i386: introduce gen_jr helper to generate lookup_and_goto_ptr Emilio G. Cota
2017-04-27 3:29 ` [Qemu-devel] " Emilio G. Cota
2017-04-27 8:12 ` Richard Henderson
2017-04-27 8:12 ` [Qemu-devel] " Richard Henderson
2017-04-27 3:29 ` [PATCH v4 09/11] target/i386: optimize cross-page direct jumps in softmmu Emilio G. Cota
2017-04-27 3:29 ` [Qemu-devel] " Emilio G. Cota
2017-04-27 3:29 ` [PATCH v4 10/11] target/i386: optimize indirect branches Emilio G. Cota
2017-04-27 3:29 ` [Qemu-devel] " Emilio G. Cota
2017-04-27 3:29 ` [PATCH v4 11/11] tb-hash: improve tb_jmp_cache hash function in user mode Emilio G. Cota
2017-04-27 3:29 ` [Qemu-devel] " Emilio G. Cota
2017-04-27 3:32 ` [PATCH v4 00/11] TCG optimizations for 2.10 Emilio G. Cota
2017-04-27 3:32 ` [Qemu-devel] " Emilio G. Cota
2017-04-27 9:39 ` Aurelien Jarno
2017-04-27 9:39 ` [Qemu-devel] " Aurelien Jarno
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=878tmm89wh.fsf@linaro.org \
--to=alex.bennee@linaro.org \
--cc=agraf@suse.de \
--cc=aurelien@aurel32.net \
--cc=balrogg@gmail.com \
--cc=bobby.prani+qemu@gmail.com \
--cc=cota@braap.org \
--cc=crosthwaite.peter@gmail.com \
--cc=ehabkost@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
--cc=sw@weilnetz.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.