Re: [PATCH 09/10] target/i386: optimize indirect branches with TCG's jr op

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Emilio G. Cota" <cota@braap.org>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org,
	Peter Crosthwaite <crosthwaite.peter@gmail.com>,
	Richard Henderson <rth@twiddle.net>,
	Peter Maydell <peter.maydell@linaro.org>,
	Eduardo Habkost <ehabkost@redhat.com>,
	Claudio Fontana <claudio.fontana@huawei.com>,
	Andrzej Zaborowski <balrogg@gmail.com>,
	Aurelien Jarno <aurelien@aurel32.net>,
	Alexander Graf <agraf@suse.de>, Stefan Weil <sw@weilnetz.de>,
	qemu-arm@nongnu.org, alex.bennee@linaro.org,
	Pranith Kumar <bobby.prani+qemu@gmail.com>
Subject: Re: [PATCH 09/10] target/i386: optimize indirect branches with TCG's jr op
Date: Wed, 12 Apr 2017 21:46:46 -0400	[thread overview]
Message-ID: <20170413014646.GA1474@flamenco> (raw)
In-Reply-To: <2ede0852-6888-8bcb-ac5a-363478841bc7@redhat.com>

On Wed, Apr 12, 2017 at 11:43:45 +0800, Paolo Bonzini wrote:
> 
> 
> On 12/04/2017 09:17, Emilio G. Cota wrote:
> > 
> > The fact that NBench is not very sensitive to changes here is a
> > little surprising, especially given the significant improvements for
> > ARM shown in the previous commit. I wonder whether the compiler is doing
> > a better job compiling the x86_64 version (I'm using gcc 5.4.0), or I'm simply
> > missing some i386 instructions to which the jr optimization should
> > be applied.
> 
> Maybe it is "ret"?  That would be a straightforward "bx lr" on ARM, but
> it is missing in your i386 patch.

Yes I missed that. I added this fix-up:

diff --git a/target/i386/translate.c b/target/i386/translate.c
index aab5c13..f2b5a0f 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -6430,7 +6430,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
         /* Note that gen_pop_T0 uses a zero-extending load.  */
         gen_op_jmp_v(cpu_T0);
         gen_bnd_jmp(s);
-        gen_eob(s);
+        gen_jr(s, cpu_T0);
         break;
     case 0xc3: /* ret */
         ot = gen_pop_T0(s);
@@ -6438,7 +6438,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
         /* Note that gen_pop_T0 uses a zero-extending load.  */
         gen_op_jmp_v(cpu_T0);
         gen_bnd_jmp(s);
-        gen_eob(s);
+        gen_jr(s, cpu_T0);
         break;
     case 0xca: /* lret im */
         val = cpu_ldsw_code(env, s->pc);

Any other instructions I should look into? Perhaps lret/lret im?

Anyway, nbench does not improve much with the above. The reason seems to be
that it's full of direct jumps (visible with -d in_asm). Also tried softmmu
to see whether these jumps are in-page or not: peak improvement is ~8%, so
I guess most of them are in-page. See http://imgur.com/EKRrYUz

I'm running new tests on a server with no other users and which has
frequency scaling disabled. This should help get less noisy numbers,
since I'm having trouble replicating my own results :> (I used my desktop
machine until now). Will post these numbers tomorrow (running overnight
SPECint both train and set sizes).

Thanks,

		Emilio

WARNING: multiple messages have this Message-ID (diff)

From: "Emilio G. Cota" <cota@braap.org>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org,
	Peter Crosthwaite <crosthwaite.peter@gmail.com>,
	Richard Henderson <rth@twiddle.net>,
	Peter Maydell <peter.maydell@linaro.org>,
	Eduardo Habkost <ehabkost@redhat.com>,
	Claudio Fontana <claudio.fontana@huawei.com>,
	Andrzej Zaborowski <balrogg@gmail.com>,
	Aurelien Jarno <aurelien@aurel32.net>,
	Alexander Graf <agraf@suse.de>, Stefan Weil <sw@weilnetz.de>,
	qemu-arm@nongnu.org, alex.bennee@linaro.org,
	Pranith Kumar <bobby.prani+qemu@gmail.com>
Subject: Re: [Qemu-devel] [PATCH 09/10] target/i386: optimize indirect branches with TCG's jr op
Date: Wed, 12 Apr 2017 21:46:46 -0400	[thread overview]
Message-ID: <20170413014646.GA1474@flamenco> (raw)
In-Reply-To: <2ede0852-6888-8bcb-ac5a-363478841bc7@redhat.com>

On Wed, Apr 12, 2017 at 11:43:45 +0800, Paolo Bonzini wrote:
> 
> 
> On 12/04/2017 09:17, Emilio G. Cota wrote:
> > 
> > The fact that NBench is not very sensitive to changes here is a
> > little surprising, especially given the significant improvements for
> > ARM shown in the previous commit. I wonder whether the compiler is doing
> > a better job compiling the x86_64 version (I'm using gcc 5.4.0), or I'm simply
> > missing some i386 instructions to which the jr optimization should
> > be applied.
> 
> Maybe it is "ret"?  That would be a straightforward "bx lr" on ARM, but
> it is missing in your i386 patch.

Yes I missed that. I added this fix-up:

diff --git a/target/i386/translate.c b/target/i386/translate.c
index aab5c13..f2b5a0f 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -6430,7 +6430,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
         /* Note that gen_pop_T0 uses a zero-extending load.  */
         gen_op_jmp_v(cpu_T0);
         gen_bnd_jmp(s);
-        gen_eob(s);
+        gen_jr(s, cpu_T0);
         break;
     case 0xc3: /* ret */
         ot = gen_pop_T0(s);
@@ -6438,7 +6438,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
         /* Note that gen_pop_T0 uses a zero-extending load.  */
         gen_op_jmp_v(cpu_T0);
         gen_bnd_jmp(s);
-        gen_eob(s);
+        gen_jr(s, cpu_T0);
         break;
     case 0xca: /* lret im */
         val = cpu_ldsw_code(env, s->pc);

Any other instructions I should look into? Perhaps lret/lret im?

Anyway, nbench does not improve much with the above. The reason seems to be
that it's full of direct jumps (visible with -d in_asm). Also tried softmmu
to see whether these jumps are in-page or not: peak improvement is ~8%, so
I guess most of them are in-page. See http://imgur.com/EKRrYUz

I'm running new tests on a server with no other users and which has
frequency scaling disabled. This should help get less noisy numbers,
since I'm having trouble replicating my own results :> (I used my desktop
machine until now). Will post these numbers tomorrow (running overnight
SPECint both train and set sizes).

Thanks,

		Emilio

next prev parent reply	other threads:[~2017-04-13  1:46 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-12  1:17 [PATCH 00/10] TCG optimizations for 2.10 Emilio G. Cota
2017-04-12  1:17 ` [Qemu-devel] " Emilio G. Cota
2017-04-12  1:17 ` [PATCH 01/10] exec-all: add tb_from_jmp_cache Emilio G. Cota
2017-04-12  1:17   ` [Qemu-devel] " Emilio G. Cota
2017-04-12  1:17 ` [PATCH 02/10] exec-all: inline tb_from_jmp_cache Emilio G. Cota
2017-04-12  1:17   ` [Qemu-devel] " Emilio G. Cota
2017-04-12  1:17 ` [PATCH 03/10] target/arm: optimize cross-page block chaining in softmmu Emilio G. Cota
2017-04-12  1:17   ` [Qemu-devel] " Emilio G. Cota
2017-04-15 11:24   ` Richard Henderson
2017-04-12  1:17 ` [PATCH 04/10] target/i386: " Emilio G. Cota
2017-04-12  1:17   ` [Qemu-devel] " Emilio G. Cota
2017-04-12  1:17 ` [PATCH 05/10] tcg: add jr opcode Emilio G. Cota
2017-04-12  1:17   ` [Qemu-devel] " Emilio G. Cota
2017-04-13  5:09   ` Paolo Bonzini
2017-04-15 11:40   ` Richard Henderson
2017-04-16 18:28     ` Emilio G. Cota
2017-04-12  1:17 ` [PATCH 06/10] tcg: add brcondi_ptr Emilio G. Cota
2017-04-12  1:17   ` [Qemu-devel] " Emilio G. Cota
2017-04-12  1:17 ` [PATCH 07/10] tcg: add tcg_temp_local_new_ptr Emilio G. Cota
2017-04-12  1:17   ` [Qemu-devel] " Emilio G. Cota
2017-04-12  1:17 ` [PATCH 08/10] target/arm: optimize indirect branches with TCG's jr op Emilio G. Cota
2017-04-12  1:17   ` [Qemu-devel] " Emilio G. Cota
2017-04-12  1:17 ` [PATCH 09/10] target/i386: " Emilio G. Cota
2017-04-12  1:17   ` [Qemu-devel] " Emilio G. Cota
2017-04-12  3:43   ` Paolo Bonzini
2017-04-12  3:43     ` [Qemu-devel] " Paolo Bonzini
2017-04-13  1:46     ` Emilio G. Cota [this message]
2017-04-13  1:46       ` Emilio G. Cota
2017-04-14  5:17       ` Paolo Bonzini
2017-04-14  5:17         ` [Qemu-devel] " Paolo Bonzini
2017-04-12  1:17 ` [PATCH 10/10] tb-hash: improve tb_jmp_cache hash function in user mode Emilio G. Cota
2017-04-12  1:17   ` [Qemu-devel] " Emilio G. Cota
2017-04-12  3:46   ` Paolo Bonzini
2017-04-12  3:46     ` [Qemu-devel] " Paolo Bonzini
2017-04-12  5:07     ` Emilio G. Cota
2017-04-12  5:07       ` [Qemu-devel] " Emilio G. Cota
2017-04-12 10:03 ` [PATCH 00/10] TCG optimizations for 2.10 Alex Bennée
2017-04-12 10:03   ` [Qemu-devel] " Alex Bennée

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:aab5c13 dfblob:f2b5a0f dfblob:aab5c13 dfblob:f2b5a0f )
 OR (
bs:"Re: [Qemu-devel] [PATCH 09/10] target/i386: optimize indirect branches with TCG's jr op" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170413014646.GA1474@flamenco \
    --to=cota@braap.org \
    --cc=agraf@suse.de \
    --cc=alex.bennee@linaro.org \
    --cc=aurelien@aurel32.net \
    --cc=balrogg@gmail.com \
    --cc=bobby.prani+qemu@gmail.com \
    --cc=claudio.fontana@huawei.com \
    --cc=crosthwaite.peter@gmail.com \
    --cc=ehabkost@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=sw@weilnetz.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.