qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/11] target-mips: optimizations
@ 2008-11-08  8:31 Aurelien Jarno
  2008-11-08  8:32 ` [Qemu-devel] [PATCH 01/11] target-mips: optimize gen_save_pc() Aurelien Jarno
                   ` (10 more replies)
  0 siblings, 11 replies; 14+ messages in thread
From: Aurelien Jarno @ 2008-11-08  8:31 UTC (permalink / raw)
  To: qemu-devel

Hi,

This series optimize the MIPS emulation, and more precisely the 64-bit
emulation as shown the table below. All the tests have been done on 
an x86-64 host. The 32-bit task corresponds to compilation of C++ code
using a 32-bit userland. The times are given is seconds.

                     +---------------------+---------------------+
                     |      Boot time      |     32-bit task     |
                     +----------+----------+----------+----------+
		     | Original |  Patched | Original |  Patched |
+--------------------+----------+----------+----------+----------+
| qemu-system-mips32 |     33   |     33   |    131   |    131   |
|                    |          |          |          |          |
+--------------------+----------+----------+----------+----------+
| qemu-system-mips64 |     53   |     33   |    303   |    138   |
|  + 32-bit kernel   |          |          |          |          |
+--------------------+----------+----------+----------+----------+
| qemu-system-mips64 |     47   |     31   |    231   |    107   |
|  + 64-bit kernel   |          |          |          |          |
+--------------------+----------+----------+----------+----------+

With this series, there is now very few overheads running a 32-bit
system in qemu-system-mips64, and a gain when switching to a 64-bit 
kernel. This is probably not true on a 32-bit host.

Aurelien

-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32@debian.org         | aurelien@aurel32.net
   `-    people.debian.org/~aurel32 | www.aurel32.net

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 01/11] target-mips: optimize gen_save_pc()
  2008-11-08  8:31 [Qemu-devel] [PATCH 0/11] target-mips: optimizations Aurelien Jarno
@ 2008-11-08  8:32 ` Aurelien Jarno
  2008-11-08  8:32 ` [Qemu-devel] [PATCH 02/11] target-mips: optimize gen_op_addr_add() (1/2) Aurelien Jarno
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Aurelien Jarno @ 2008-11-08  8:32 UTC (permalink / raw)
  To: qemu-devel

We obviously don't need to use a temporary variable to write PC.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target-mips/translate.c |    6 +-----
 1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/target-mips/translate.c b/target-mips/translate.c
index 9a197ee..dcd8094 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -827,11 +827,7 @@ OP_CONDZ(ltz, TCG_COND_LT);
 
 static inline void gen_save_pc(target_ulong pc)
 {
-    TCGv r_tmp = tcg_temp_new(TCG_TYPE_TL);
-
-    tcg_gen_movi_tl(r_tmp, pc);
-    tcg_gen_mov_tl(cpu_PC, r_tmp);
-    tcg_temp_free(r_tmp);
+    tcg_gen_movi_tl(cpu_PC, pc);
 }
 
 static inline void save_cpu_state (DisasContext *ctx, int do_save_pc)
-- 
1.5.6.5

-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32@debian.org         | aurelien@aurel32.net
   `-    people.debian.org/~aurel32 | www.aurel32.net

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 02/11] target-mips: optimize gen_op_addr_add() (1/2)
  2008-11-08  8:31 [Qemu-devel] [PATCH 0/11] target-mips: optimizations Aurelien Jarno
  2008-11-08  8:32 ` [Qemu-devel] [PATCH 01/11] target-mips: optimize gen_save_pc() Aurelien Jarno
@ 2008-11-08  8:32 ` Aurelien Jarno
  2008-11-08  8:33 ` [Qemu-devel] [PATCH 03/11] target-mips: optimize gen_op_addr_add() (2/2) Aurelien Jarno
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Aurelien Jarno @ 2008-11-08  8:32 UTC (permalink / raw)
  To: qemu-devel

The user mode can be tested at translation time using ctx->hflags.
This simplifies gen_op_addr_add().

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target-mips/translate.c |   16 ++++++----------
 1 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/target-mips/translate.c b/target-mips/translate.c
index dcd8094..cbe8120 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -894,7 +894,7 @@ generate_exception (DisasContext *ctx, int excp)
 }
 
 /* Addresses computation */
-static inline void gen_op_addr_add (TCGv t0, TCGv t1)
+static inline void gen_op_addr_add (DisasContext *ctx, TCGv t0, TCGv t1)
 {
     tcg_gen_add_tl(t0, t0, t1);
 
@@ -902,17 +902,13 @@ static inline void gen_op_addr_add (TCGv t0, TCGv t1)
     /* For compatibility with 32-bit code, data reference in user mode
        with Status_UX = 0 should be casted to 32-bit and sign extended.
        See the MIPS64 PRA manual, section 4.10. */
-    {
+    if ((ctx->hflags & MIPS_HFLAG_KSU) == MIPS_HFLAG_UM) {
         int l1 = gen_new_label();
-        TCGv r_tmp = tcg_temp_local_new(TCG_TYPE_I32);
+        TCGv r_tmp = tcg_temp_new(TCG_TYPE_I32);
 
-        tcg_gen_ld_i32(r_tmp, cpu_env, offsetof(CPUState, hflags));
-        tcg_gen_andi_i32(r_tmp, r_tmp, MIPS_HFLAG_KSU);
-        tcg_gen_brcondi_i32(TCG_COND_NE, r_tmp, MIPS_HFLAG_UM, l1);
         tcg_gen_ld_i32(r_tmp, cpu_env, offsetof(CPUState, CP0_Status));
         tcg_gen_andi_i32(r_tmp, r_tmp, (1 << CP0St_UX));
         tcg_gen_brcondi_i32(TCG_COND_NE, r_tmp, 0, l1);
-        tcg_temp_free(r_tmp);
         tcg_gen_ext32s_i64(t0, t0);
         gen_set_label(l1);
     }
@@ -1070,7 +1066,7 @@ static void gen_ldst (DisasContext *ctx, uint32_t opc, int rt,
     } else {
         gen_load_gpr(t0, base);
         tcg_gen_movi_tl(t1, offset);
-        gen_op_addr_add(t0, t1);
+        gen_op_addr_add(ctx, t0, t1);
     }
     /* Don't do NOP if destination is zero: we must perform the actual
        memory access. */
@@ -1235,7 +1231,7 @@ static void gen_flt_ldst (DisasContext *ctx, uint32_t opc, int ft,
 
         gen_load_gpr(t0, base);
         tcg_gen_movi_tl(t1, offset);
-        gen_op_addr_add(t0, t1);
+        gen_op_addr_add(ctx, t0, t1);
         tcg_temp_free(t1);
     }
     /* Don't do NOP if destination is zero: we must perform the actual
@@ -7369,7 +7365,7 @@ static void gen_flt3_ldst (DisasContext *ctx, uint32_t opc,
     } else {
         gen_load_gpr(t0, base);
         gen_load_gpr(t1, index);
-        gen_op_addr_add(t0, t1);
+        gen_op_addr_add(ctx, t0, t1);
     }
     /* Don't do NOP if destination is zero: we must perform the actual
        memory access. */
-- 
1.5.6.5

-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32@debian.org         | aurelien@aurel32.net
   `-    people.debian.org/~aurel32 | www.aurel32.net

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 03/11] target-mips: optimize gen_op_addr_add() (2/2)
  2008-11-08  8:31 [Qemu-devel] [PATCH 0/11] target-mips: optimizations Aurelien Jarno
  2008-11-08  8:32 ` [Qemu-devel] [PATCH 01/11] target-mips: optimize gen_save_pc() Aurelien Jarno
  2008-11-08  8:32 ` [Qemu-devel] [PATCH 02/11] target-mips: optimize gen_op_addr_add() (1/2) Aurelien Jarno
@ 2008-11-08  8:33 ` Aurelien Jarno
  2008-11-08  8:34 ` [Qemu-devel] [PATCH 04/11] target-mips: convert bitfield ops to TCG Aurelien Jarno
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Aurelien Jarno @ 2008-11-08  8:33 UTC (permalink / raw)
  To: qemu-devel

Instead of dynamically generating different code depending on the UX
flag, add a new flag in ctx->flags to generate different code.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target-mips/cpu.h       |   13 +++++++------
 target-mips/exec.h      |    5 ++++-
 target-mips/translate.c |   10 ++--------
 3 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/target-mips/cpu.h b/target-mips/cpu.h
index a27ffd3..d686f8e 100644
--- a/target-mips/cpu.h
+++ b/target-mips/cpu.h
@@ -411,7 +411,7 @@ struct CPUMIPSState {
     int error_code;
     uint32_t hflags;    /* CPU State */
     /* TMASK defines different execution modes */
-#define MIPS_HFLAG_TMASK  0x01FF
+#define MIPS_HFLAG_TMASK  0x03FF
 #define MIPS_HFLAG_MODE   0x0007 /* execution modes                    */
     /* The KSU flags must be the lowest bits in hflags. The flag order
        must be the same as defined for CP0 Status. This allows to use
@@ -430,15 +430,16 @@ struct CPUMIPSState {
        and RSQRT.D.  */
 #define MIPS_HFLAG_COP1X  0x0080 /* COP1X instructions enabled         */
 #define MIPS_HFLAG_RE     0x0100 /* Reversed endianness                */
+#define MIPS_HFLAG_UX     0x0200 /* 64-bit user mode                   */
     /* If translation is interrupted between the branch instruction and
      * the delay slot, record what type of branch it is so that we can
      * resume translation properly.  It might be possible to reduce
      * this from three bits to two.  */
-#define MIPS_HFLAG_BMASK  0x0e00
-#define MIPS_HFLAG_B      0x0200 /* Unconditional branch               */
-#define MIPS_HFLAG_BC     0x0400 /* Conditional branch                 */
-#define MIPS_HFLAG_BL     0x0600 /* Likely branch                      */
-#define MIPS_HFLAG_BR     0x0800 /* branch to register (can't link TB) */
+#define MIPS_HFLAG_BMASK  0x1C00
+#define MIPS_HFLAG_B      0x0400 /* Unconditional branch               */
+#define MIPS_HFLAG_BC     0x0800 /* Conditional branch                 */
+#define MIPS_HFLAG_BL     0x0C00 /* Likely branch                      */
+#define MIPS_HFLAG_BR     0x1000 /* branch to register (can't link TB) */
     target_ulong btarget;        /* Jump / branch target               */
     int bcond;                   /* Branch condition (if needed)       */
 
diff --git a/target-mips/exec.h b/target-mips/exec.h
index 28bf466..5d3e356 100644
--- a/target-mips/exec.h
+++ b/target-mips/exec.h
@@ -66,7 +66,8 @@ static inline int cpu_halted(CPUState *env)
 static inline void compute_hflags(CPUState *env)
 {
     env->hflags &= ~(MIPS_HFLAG_COP1X | MIPS_HFLAG_64 | MIPS_HFLAG_CP0 |
-                     MIPS_HFLAG_F64 | MIPS_HFLAG_FPU | MIPS_HFLAG_KSU);
+                     MIPS_HFLAG_F64 | MIPS_HFLAG_FPU | MIPS_HFLAG_KSU |
+                     MIPS_HFLAG_UX);
     if (!(env->CP0_Status & (1 << CP0St_EXL)) &&
         !(env->CP0_Status & (1 << CP0St_ERL)) &&
         !(env->hflags & MIPS_HFLAG_DM)) {
@@ -77,6 +78,8 @@ static inline void compute_hflags(CPUState *env)
         (env->CP0_Status & (1 << CP0St_PX)) ||
         (env->CP0_Status & (1 << CP0St_UX)))
         env->hflags |= MIPS_HFLAG_64;
+    if (env->CP0_Status & (1 << CP0St_UX))
+        env->hflags |= MIPS_HFLAG_UX;
 #endif
     if ((env->CP0_Status & (1 << CP0St_CU0)) ||
         !(env->hflags & MIPS_HFLAG_KSU))
diff --git a/target-mips/translate.c b/target-mips/translate.c
index cbe8120..af01f73 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -902,15 +902,9 @@ static inline void gen_op_addr_add (DisasContext *ctx, TCGv t0, TCGv t1)
     /* For compatibility with 32-bit code, data reference in user mode
        with Status_UX = 0 should be casted to 32-bit and sign extended.
        See the MIPS64 PRA manual, section 4.10. */
-    if ((ctx->hflags & MIPS_HFLAG_KSU) == MIPS_HFLAG_UM) {
-        int l1 = gen_new_label();
-        TCGv r_tmp = tcg_temp_new(TCG_TYPE_I32);
-
-        tcg_gen_ld_i32(r_tmp, cpu_env, offsetof(CPUState, CP0_Status));
-        tcg_gen_andi_i32(r_tmp, r_tmp, (1 << CP0St_UX));
-        tcg_gen_brcondi_i32(TCG_COND_NE, r_tmp, 0, l1);
+    if (((ctx->hflags & MIPS_HFLAG_KSU) == MIPS_HFLAG_UM) &&
+        (ctx->hflags & MIPS_HFLAG_UX)) {
         tcg_gen_ext32s_i64(t0, t0);
-        gen_set_label(l1);
     }
 #endif
 }
-- 
1.5.6.5

-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32@debian.org         | aurelien@aurel32.net
   `-    people.debian.org/~aurel32 | www.aurel32.net

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 04/11] target-mips: convert bitfield ops to TCG
  2008-11-08  8:31 [Qemu-devel] [PATCH 0/11] target-mips: optimizations Aurelien Jarno
                   ` (2 preceding siblings ...)
  2008-11-08  8:33 ` [Qemu-devel] [PATCH 03/11] target-mips: optimize gen_op_addr_add() (2/2) Aurelien Jarno
@ 2008-11-08  8:34 ` Aurelien Jarno
  2008-11-08 12:57   ` Laurent Desnogues
  2008-11-08  8:34 ` [Qemu-devel] [PATCH 05/11] target-mips: convert bit shuffle " Aurelien Jarno
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 14+ messages in thread
From: Aurelien Jarno @ 2008-11-08  8:34 UTC (permalink / raw)
  To: qemu-devel

Bitfield operations can be written with very few TCG instructions
(between 2 and 5), so it is worth converting them to TCG.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target-mips/helper.h    |    6 +----
 target-mips/op_helper.c |   26 +------------------------
 target-mips/translate.c |   49 +++++++++++++++++++++++++++++++++++++---------
 3 files changed, 41 insertions(+), 40 deletions(-)

diff --git a/target-mips/helper.h b/target-mips/helper.h
index 525ccbb..5926921 100644
--- a/target-mips/helper.h
+++ b/target-mips/helper.h
@@ -270,13 +270,9 @@ DEF_HELPER(target_ulong, do_rdhwr_ccres, (void))
 DEF_HELPER(void, do_pmon, (int function))
 DEF_HELPER(void, do_wait, (void))
 
-/* Bitfield operations. */
-DEF_HELPER(target_ulong, do_ext, (target_ulong t1, uint32_t pos, uint32_t size))
-DEF_HELPER(target_ulong, do_ins, (target_ulong t0, target_ulong t1, uint32_t pos, uint32_t size))
+/* Bit shuffle operations. */
 DEF_HELPER(target_ulong, do_wsbh, (target_ulong t1))
 #ifdef TARGET_MIPS64
-DEF_HELPER(target_ulong, do_dext, (target_ulong t1, uint32_t pos, uint32_t size))
-DEF_HELPER(target_ulong, do_dins, (target_ulong t0, target_ulong t1, uint32_t pos, uint32_t size))
 DEF_HELPER(target_ulong, do_dsbh, (target_ulong t1))
 DEF_HELPER(target_ulong, do_dshd, (target_ulong t1))
 #endif
diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c
index 3744728..b642593 100644
--- a/target-mips/op_helper.c
+++ b/target-mips/op_helper.c
@@ -1781,37 +1781,13 @@ target_ulong do_rdhwr_ccres(void)
     return 0;
 }
 
-/* Bitfield operations. */
-target_ulong do_ext(target_ulong t1, uint32_t pos, uint32_t size)
-{
-    return (int32_t)((t1 >> pos) & ((size < 32) ? ((1 << size) - 1) : ~0));
-}
-
-target_ulong do_ins(target_ulong t0, target_ulong t1, uint32_t pos, uint32_t size)
-{
-    target_ulong mask = ((size < 32) ? ((1 << size) - 1) : ~0) << pos;
-
-    return (int32_t)((t0 & ~mask) | ((t1 << pos) & mask));
-}
-
+/* Bit shuffle operations. */
 target_ulong do_wsbh(target_ulong t1)
 {
     return (int32_t)(((t1 << 8) & ~0x00FF00FF) | ((t1 >> 8) & 0x00FF00FF));
 }
 
 #if defined(TARGET_MIPS64)
-target_ulong do_dext(target_ulong t1, uint32_t pos, uint32_t size)
-{
-    return (t1 >> pos) & ((size < 64) ? ((1ULL << size) - 1) : ~0ULL);
-}
-
-target_ulong do_dins(target_ulong t0, target_ulong t1, uint32_t pos, uint32_t size)
-{
-    target_ulong mask = ((size < 64) ? ((1ULL << size) - 1) : ~0ULL) << pos;
-
-    return (t0 & ~mask) | ((t1 << pos) & mask);
-}
-
 target_ulong do_dsbh(target_ulong t1)
 {
     return ((t1 << 8) & ~0x00FF00FF00FF00FFULL) | ((t1 >> 8) & 0x00FF00FF00FF00FFULL);
diff --git a/target-mips/translate.c b/target-mips/translate.c
index af01f73..2cd1868 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -2682,57 +2682,86 @@ static void gen_compute_branch (DisasContext *ctx, uint32_t opc,
 static void gen_bitops (DisasContext *ctx, uint32_t opc, int rt,
                         int rs, int lsb, int msb)
 {
-    TCGv t0 = tcg_temp_local_new(TCG_TYPE_TL);
-    TCGv t1 = tcg_temp_local_new(TCG_TYPE_TL);
+    TCGv t0 = tcg_temp_new(TCG_TYPE_TL);
+    TCGv t1 = tcg_temp_new(TCG_TYPE_TL);
+    target_ulong mask;
 
     gen_load_gpr(t1, rs);
     switch (opc) {
     case OPC_EXT:
         if (lsb + msb > 31)
             goto fail;
-        tcg_gen_helper_1_1ii(do_ext, t0, t1, lsb, msb + 1);
+        tcg_gen_shri_tl(t0, t1, lsb);
+        if (msb + 1 < 32) {
+            tcg_gen_andi_tl(t0, t0, (1 << (msb + 1)) - 1);
+        } else {
+            tcg_gen_ext32s_tl(t0, t0);
+        }
         break;
 #if defined(TARGET_MIPS64)
     case OPC_DEXTM:
         if (lsb + msb > 63)
             goto fail;
-        tcg_gen_helper_1_1ii(do_dext, t0, t1, lsb, msb + 1 + 32);
+        tcg_gen_shri_tl(t0, t1, lsb);
+        if (msb + 1 + 32 < 64) {
+            tcg_gen_andi_tl(t0, t0, (1ULL << (msb + 1 + 32)) - 1);
+        }
         break;
     case OPC_DEXTU:
         if (lsb + msb > 63)
             goto fail;
-        tcg_gen_helper_1_1ii(do_dext, t0, t1, lsb + 32, msb + 1);
+        tcg_gen_shri_tl(t0, t1, lsb + 32);
+        tcg_gen_andi_tl(t0, t0, (1ULL << (msb + 1)) - 1);
         break;
     case OPC_DEXT:
         if (lsb + msb > 63)
             goto fail;
-        tcg_gen_helper_1_1ii(do_dext, t0, t1, lsb, msb + 1);
+        tcg_gen_shri_tl(t0, t1, lsb);
+        tcg_gen_andi_tl(t0, t0, (1ULL << (msb + 1)) - 1);
         break;
 #endif
     case OPC_INS:
         if (lsb > msb)
             goto fail;
+        mask = ((msb - lsb + 1 < 32) ? ((1 << (msb - lsb + 1)) - 1) : ~0) << lsb;
         gen_load_gpr(t0, rt);
-        tcg_gen_helper_1_2ii(do_ins, t0, t0, t1, lsb, msb - lsb + 1);
+        tcg_gen_andi_tl(t0, t0, ~mask);
+        tcg_gen_shli_tl(t1, t1, lsb);
+        tcg_gen_andi_tl(t1, t1, mask);
+        tcg_gen_or_tl(t0, t0, t1);
+        tcg_gen_ext32s_tl(t0, t0);
         break;
 #if defined(TARGET_MIPS64)
     case OPC_DINSM:
         if (lsb > msb)
             goto fail;
+        mask = ((msb - lsb + 1 + 32 < 64) ? ((1ULL << (msb - lsb + 1 + 32)) - 1) : ~0ULL) << lsb;
         gen_load_gpr(t0, rt);
-        tcg_gen_helper_1_2ii(do_dins, t0, t0, t1, lsb, msb - lsb + 1 + 32);
+        tcg_gen_andi_tl(t0, t0, ~mask);
+        tcg_gen_shli_tl(t1, t1, lsb);
+        tcg_gen_andi_tl(t1, t1, mask);
+        tcg_gen_or_tl(t0, t0, t1);
         break;
     case OPC_DINSU:
         if (lsb > msb)
             goto fail;
+        mask = ((1ULL << (msb - lsb + 1)) - 1) << lsb;
         gen_load_gpr(t0, rt);
-        tcg_gen_helper_1_2ii(do_dins, t0, t0, t1, lsb + 32, msb - lsb + 1);
+        tcg_gen_andi_tl(t0, t0, ~mask);
+        tcg_gen_shli_tl(t1, t1, lsb + 32);
+        tcg_gen_andi_tl(t1, t1, mask);
+        tcg_gen_or_tl(t0, t0, t1);
         break;
     case OPC_DINS:
         if (lsb > msb)
             goto fail;
         gen_load_gpr(t0, rt);
-        tcg_gen_helper_1_2ii(do_dins, t0, t0, t1, lsb, msb - lsb + 1);
+        mask = ((1ULL << (msb - lsb + 1)) - 1) << lsb;
+        gen_load_gpr(t0, rt);
+        tcg_gen_andi_tl(t0, t0, ~mask);
+        tcg_gen_shli_tl(t1, t1, lsb);
+        tcg_gen_andi_tl(t1, t1, mask);
+        tcg_gen_or_tl(t0, t0, t1);
         break;
 #endif
     default:
-- 
1.5.6.5


-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32@debian.org         | aurelien@aurel32.net
   `-    people.debian.org/~aurel32 | www.aurel32.net

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 05/11] target-mips: convert bit shuffle ops to TCG
  2008-11-08  8:31 [Qemu-devel] [PATCH 0/11] target-mips: optimizations Aurelien Jarno
                   ` (3 preceding siblings ...)
  2008-11-08  8:34 ` [Qemu-devel] [PATCH 04/11] target-mips: convert bitfield ops to TCG Aurelien Jarno
@ 2008-11-08  8:34 ` Aurelien Jarno
  2008-11-08  8:35 ` [Qemu-devel] [PATCH 06/11] target-mips: optimize gen_arith()/gen_arith_imm() Aurelien Jarno
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Aurelien Jarno @ 2008-11-08  8:34 UTC (permalink / raw)
  To: qemu-devel

Bit shuffle operations can be written with very few TCG instructions
(between 5 and 8), so it is worth converting them to TCG.

This code also move all bit shuffle generation code to a separate
function in order to have a cleaner exception code path, that is it
doesn't store back the TCG register to the target register after the
exception, as the TCG register doesn't exist anymore.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target-mips/helper.h    |    7 ---
 target-mips/op_helper.c |   19 --------
 target-mips/translate.c |  106 +++++++++++++++++++++++++----------------------
 3 files changed, 56 insertions(+), 76 deletions(-)

diff --git a/target-mips/helper.h b/target-mips/helper.h
index 5926921..f67c82a 100644
--- a/target-mips/helper.h
+++ b/target-mips/helper.h
@@ -269,10 +269,3 @@ DEF_HELPER(target_ulong, do_rdhwr_cc, (void))
 DEF_HELPER(target_ulong, do_rdhwr_ccres, (void))
 DEF_HELPER(void, do_pmon, (int function))
 DEF_HELPER(void, do_wait, (void))
-
-/* Bit shuffle operations. */
-DEF_HELPER(target_ulong, do_wsbh, (target_ulong t1))
-#ifdef TARGET_MIPS64
-DEF_HELPER(target_ulong, do_dsbh, (target_ulong t1))
-DEF_HELPER(target_ulong, do_dshd, (target_ulong t1))
-#endif
diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c
index b642593..3fe62fb 100644
--- a/target-mips/op_helper.c
+++ b/target-mips/op_helper.c
@@ -1781,25 +1781,6 @@ target_ulong do_rdhwr_ccres(void)
     return 0;
 }
 
-/* Bit shuffle operations. */
-target_ulong do_wsbh(target_ulong t1)
-{
-    return (int32_t)(((t1 << 8) & ~0x00FF00FF) | ((t1 >> 8) & 0x00FF00FF));
-}
-
-#if defined(TARGET_MIPS64)
-target_ulong do_dsbh(target_ulong t1)
-{
-    return ((t1 << 8) & ~0x00FF00FF00FF00FFULL) | ((t1 >> 8) & 0x00FF00FF00FF00FFULL);
-}
-
-target_ulong do_dshd(target_ulong t1)
-{
-    t1 = ((t1 << 16) & ~0x0000FFFF0000FFFFULL) | ((t1 >> 16) & 0x0000FFFF0000FFFFULL);
-    return (t1 << 32) | (t1 >> 32);
-}
-#endif
-
 void do_pmon (int function)
 {
     function /= 2;
diff --git a/target-mips/translate.c b/target-mips/translate.c
index 2cd1868..cf6e373 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -2777,6 +2777,60 @@ fail:
     tcg_temp_free(t1);
 }
 
+static void gen_bshfl (DisasContext *ctx, uint32_t op2, int rt, int rd)
+{
+    TCGv t0 = tcg_temp_new(TCG_TYPE_TL);
+    TCGv t1 = tcg_temp_new(TCG_TYPE_TL);
+
+    gen_load_gpr(t1, rt);
+    switch (op2) {
+    case OPC_WSBH:
+        tcg_gen_shri_tl(t0, t1, 8);
+        tcg_gen_andi_tl(t0, t0, 0x00FF00FF);
+        tcg_gen_shli_tl(t1, t1, 8);
+        tcg_gen_andi_tl(t1, t1, ~0x00FF00FF);
+        tcg_gen_or_tl(t0, t0, t1);
+        tcg_gen_ext32s_tl(t0, t0);
+        break;
+    case OPC_SEB:
+        tcg_gen_ext8s_tl(t0, t1);
+        break;
+    case OPC_SEH:
+        tcg_gen_ext16s_tl(t0, t1);
+        break;
+#if defined(TARGET_MIPS64)
+    case OPC_DSBH:
+        gen_load_gpr(t1, rt);
+        tcg_gen_shri_tl(t0, t1, 8);
+        tcg_gen_andi_tl(t0, t0, 0x00FF00FF00FF00FFULL);
+        tcg_gen_shli_tl(t1, t1, 8);
+        tcg_gen_andi_tl(t1, t1, ~0x00FF00FF00FF00FFULL);
+        tcg_gen_or_tl(t0, t0, t1);
+        break;
+    case OPC_DSHD:
+        gen_load_gpr(t1, rt);
+        tcg_gen_shri_tl(t0, t1, 16);
+        tcg_gen_andi_tl(t0, t0, 0x0000FFFF0000FFFFULL);
+        tcg_gen_shli_tl(t1, t1, 16);
+        tcg_gen_andi_tl(t1, t1, ~0x0000FFFF0000FFFFULL);
+        tcg_gen_or_tl(t1, t0, t1);
+        tcg_gen_shri_tl(t0, t1, 32);
+        tcg_gen_shli_tl(t1, t1, 32);
+        tcg_gen_or_tl(t0, t0, t1);
+        break;
+#endif
+    default:
+        MIPS_INVAL("bsfhl");
+        generate_exception(ctx, EXCP_RI);
+        tcg_temp_free(t0);
+        tcg_temp_free(t1);
+        return;
+    }
+    gen_store_gpr(t0, rd);
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+}
+
 #ifndef CONFIG_USER_ONLY
 /* CP0 (MMU and control) */
 static inline void gen_mfc0_load32 (TCGv t, target_ulong off)
@@ -7959,34 +8013,7 @@ static void decode_opc (CPUState *env, DisasContext *ctx)
         case OPC_BSHFL:
             check_insn(env, ctx, ISA_MIPS32R2);
             op2 = MASK_BSHFL(ctx->opcode);
-            {
-                TCGv t0 = tcg_temp_local_new(TCG_TYPE_TL);
-                TCGv t1 = tcg_temp_local_new(TCG_TYPE_TL);
-
-                switch (op2) {
-                case OPC_WSBH:
-                    gen_load_gpr(t1, rt);
-                    tcg_gen_helper_1_1(do_wsbh, t0, t1);
-                    gen_store_gpr(t0, rd);
-                    break;
-                case OPC_SEB:
-                    gen_load_gpr(t1, rt);
-                    tcg_gen_ext8s_tl(t0, t1);
-                    gen_store_gpr(t0, rd);
-                    break;
-                case OPC_SEH:
-                    gen_load_gpr(t1, rt);
-                    tcg_gen_ext16s_tl(t0, t1);
-                    gen_store_gpr(t0, rd);
-                    break;
-                default:            /* Invalid */
-                    MIPS_INVAL("bshfl");
-                    generate_exception(ctx, EXCP_RI);
-                    break;
-                }
-                tcg_temp_free(t0);
-                tcg_temp_free(t1);
-            }
+            gen_bshfl(ctx, op2, rt, rd);
             break;
         case OPC_RDHWR:
             check_insn(env, ctx, ISA_MIPS32R2);
@@ -8062,28 +8089,7 @@ static void decode_opc (CPUState *env, DisasContext *ctx)
             check_insn(env, ctx, ISA_MIPS64R2);
             check_mips_64(ctx);
             op2 = MASK_DBSHFL(ctx->opcode);
-            {
-                TCGv t0 = tcg_temp_local_new(TCG_TYPE_TL);
-                TCGv t1 = tcg_temp_local_new(TCG_TYPE_TL);
-
-                switch (op2) {
-                case OPC_DSBH:
-                    gen_load_gpr(t1, rt);
-                    tcg_gen_helper_1_1(do_dsbh, t0, t1);
-                    break;
-                case OPC_DSHD:
-                    gen_load_gpr(t1, rt);
-                    tcg_gen_helper_1_1(do_dshd, t0, t1);
-                    break;
-                default:            /* Invalid */
-                    MIPS_INVAL("dbshfl");
-                    generate_exception(ctx, EXCP_RI);
-                    break;
-                }
-                gen_store_gpr(t0, rd);
-                tcg_temp_free(t0);
-                tcg_temp_free(t1);
-            }
+            gen_bshfl(ctx, op2, rt, rd);
             break;
 #endif
         default:            /* Invalid */
-- 
1.5.6.5


-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32@debian.org         | aurelien@aurel32.net
   `-    people.debian.org/~aurel32 | www.aurel32.net

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 06/11] target-mips: optimize gen_arith()/gen_arith_imm()
  2008-11-08  8:31 [Qemu-devel] [PATCH 0/11] target-mips: optimizations Aurelien Jarno
                   ` (4 preceding siblings ...)
  2008-11-08  8:34 ` [Qemu-devel] [PATCH 05/11] target-mips: convert bit shuffle " Aurelien Jarno
@ 2008-11-08  8:35 ` Aurelien Jarno
  2008-11-08  8:37 ` [Qemu-devel] [PATCH 07/11] target-mips: optimize gen_muldiv() Aurelien Jarno
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Aurelien Jarno @ 2008-11-08  8:35 UTC (permalink / raw)
  To: qemu-devel

Optimize code generation in gen_arith()/gen_arith_imm():
- Don't do sign extension when the value is already guaranteed to be
  sign extended (otherwise, results are marked as UNPREDICTABLE).
- When the value is signed extend, compare the value to 0 instead of
  testing bit 31/63.
- Temp variables are valid up to and *including* the brcond instruction.
  Use them instead of temp local variables.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target-mips/translate.c |   72 +++++++++++++++++------------------------------
 1 files changed, 26 insertions(+), 46 deletions(-)

diff --git a/target-mips/translate.c b/target-mips/translate.c
index cf6e373..f1cb7ca 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -1333,7 +1333,7 @@ static void gen_arith_imm (CPUState *env, DisasContext *ctx, uint32_t opc,
     switch (opc) {
     case OPC_ADDI:
         {
-            TCGv r_tmp1 = tcg_temp_local_new(TCG_TYPE_TL);
+            TCGv r_tmp1 = tcg_temp_new(TCG_TYPE_TL);
             TCGv r_tmp2 = tcg_temp_new(TCG_TYPE_TL);
             int l1 = gen_new_label();
 
@@ -1341,14 +1341,11 @@ static void gen_arith_imm (CPUState *env, DisasContext *ctx, uint32_t opc,
             tcg_gen_ext32s_tl(r_tmp1, t0);
             tcg_gen_addi_tl(t0, r_tmp1, uimm);
 
-            tcg_gen_xori_tl(r_tmp1, r_tmp1, uimm);
-            tcg_gen_xori_tl(r_tmp1, r_tmp1, -1);
+            tcg_gen_xori_tl(r_tmp1, r_tmp1, ~uimm);
             tcg_gen_xori_tl(r_tmp2, t0, uimm);
             tcg_gen_and_tl(r_tmp1, r_tmp1, r_tmp2);
             tcg_temp_free(r_tmp2);
-            tcg_gen_shri_tl(r_tmp1, r_tmp1, 31);
-            tcg_gen_brcondi_tl(TCG_COND_EQ, r_tmp1, 0, l1);
-            tcg_temp_free(r_tmp1);
+            tcg_gen_brcondi_tl(TCG_COND_GE, r_tmp1, 0, l1);
             /* operands of same sign, result different sign */
             generate_exception(ctx, EXCP_OVERFLOW);
             gen_set_label(l1);
@@ -1358,7 +1355,6 @@ static void gen_arith_imm (CPUState *env, DisasContext *ctx, uint32_t opc,
         opn = "addi";
         break;
     case OPC_ADDIU:
-        tcg_gen_ext32s_tl(t0, t0);
         tcg_gen_addi_tl(t0, t0, uimm);
         tcg_gen_ext32s_tl(t0, t0);
         opn = "addiu";
@@ -1366,7 +1362,7 @@ static void gen_arith_imm (CPUState *env, DisasContext *ctx, uint32_t opc,
 #if defined(TARGET_MIPS64)
     case OPC_DADDI:
         {
-            TCGv r_tmp1 = tcg_temp_local_new(TCG_TYPE_TL);
+            TCGv r_tmp1 = tcg_temp_new(TCG_TYPE_TL);
             TCGv r_tmp2 = tcg_temp_new(TCG_TYPE_TL);
             int l1 = gen_new_label();
 
@@ -1374,14 +1370,11 @@ static void gen_arith_imm (CPUState *env, DisasContext *ctx, uint32_t opc,
             tcg_gen_mov_tl(r_tmp1, t0);
             tcg_gen_addi_tl(t0, t0, uimm);
 
-            tcg_gen_xori_tl(r_tmp1, r_tmp1, uimm);
-            tcg_gen_xori_tl(r_tmp1, r_tmp1, -1);
+            tcg_gen_xori_tl(r_tmp1, r_tmp1, ~uimm);
             tcg_gen_xori_tl(r_tmp2, t0, uimm);
             tcg_gen_and_tl(r_tmp1, r_tmp1, r_tmp2);
             tcg_temp_free(r_tmp2);
-            tcg_gen_shri_tl(r_tmp1, r_tmp1, 63);
-            tcg_gen_brcondi_tl(TCG_COND_EQ, r_tmp1, 0, l1);
-            tcg_temp_free(r_tmp1);
+            tcg_gen_brcondi_tl(TCG_COND_GE, r_tmp1, 0, l1);
             /* operands of same sign, result different sign */
             generate_exception(ctx, EXCP_OVERFLOW);
             gen_set_label(l1);
@@ -1417,7 +1410,6 @@ static void gen_arith_imm (CPUState *env, DisasContext *ctx, uint32_t opc,
         opn = "lui";
         break;
     case OPC_SLL:
-        tcg_gen_ext32u_tl(t0, t0);
         tcg_gen_shli_tl(t0, t0, uimm);
         tcg_gen_ext32s_tl(t0, t0);
         opn = "sll";
@@ -1425,15 +1417,17 @@ static void gen_arith_imm (CPUState *env, DisasContext *ctx, uint32_t opc,
     case OPC_SRA:
         tcg_gen_ext32s_tl(t0, t0);
         tcg_gen_sari_tl(t0, t0, uimm);
-        tcg_gen_ext32s_tl(t0, t0);
         opn = "sra";
         break;
     case OPC_SRL:
         switch ((ctx->opcode >> 21) & 0x1f) {
         case 0:
-            tcg_gen_ext32u_tl(t0, t0);
-            tcg_gen_shri_tl(t0, t0, uimm);
-            tcg_gen_ext32s_tl(t0, t0);
+            if (uimm != 0) {
+                tcg_gen_ext32u_tl(t0, t0);
+                tcg_gen_shri_tl(t0, t0, uimm);
+            } else {
+                tcg_gen_ext32s_tl(t0, t0);
+            }
             opn = "srl";
             break;
         case 1:
@@ -1449,9 +1443,12 @@ static void gen_arith_imm (CPUState *env, DisasContext *ctx, uint32_t opc,
                 }
                 opn = "rotr";
             } else {
-                tcg_gen_ext32u_tl(t0, t0);
-                tcg_gen_shri_tl(t0, t0, uimm);
-                tcg_gen_ext32s_tl(t0, t0);
+                if (uimm != 0) {
+                    tcg_gen_ext32u_tl(t0, t0);
+                    tcg_gen_shri_tl(t0, t0, uimm);
+                } else {
+                    tcg_gen_ext32s_tl(t0, t0);
+                }
                 opn = "srl";
             }
             break;
@@ -1562,7 +1559,7 @@ static void gen_arith (CPUState *env, DisasContext *ctx, uint32_t opc,
     switch (opc) {
     case OPC_ADD:
         {
-            TCGv r_tmp1 = tcg_temp_local_new(TCG_TYPE_TL);
+            TCGv r_tmp1 = tcg_temp_new(TCG_TYPE_TL);
             TCGv r_tmp2 = tcg_temp_new(TCG_TYPE_TL);
             int l1 = gen_new_label();
 
@@ -1576,9 +1573,7 @@ static void gen_arith (CPUState *env, DisasContext *ctx, uint32_t opc,
             tcg_gen_xor_tl(r_tmp2, t0, t1);
             tcg_gen_and_tl(r_tmp1, r_tmp1, r_tmp2);
             tcg_temp_free(r_tmp2);
-            tcg_gen_shri_tl(r_tmp1, r_tmp1, 31);
-            tcg_gen_brcondi_tl(TCG_COND_EQ, r_tmp1, 0, l1);
-            tcg_temp_free(r_tmp1);
+            tcg_gen_brcondi_tl(TCG_COND_GE, r_tmp1, 0, l1);
             /* operands of same sign, result different sign */
             generate_exception(ctx, EXCP_OVERFLOW);
             gen_set_label(l1);
@@ -1588,15 +1583,13 @@ static void gen_arith (CPUState *env, DisasContext *ctx, uint32_t opc,
         opn = "add";
         break;
     case OPC_ADDU:
-        tcg_gen_ext32s_tl(t0, t0);
-        tcg_gen_ext32s_tl(t1, t1);
         tcg_gen_add_tl(t0, t0, t1);
         tcg_gen_ext32s_tl(t0, t0);
         opn = "addu";
         break;
     case OPC_SUB:
         {
-            TCGv r_tmp1 = tcg_temp_local_new(TCG_TYPE_TL);
+            TCGv r_tmp1 = tcg_temp_new(TCG_TYPE_TL);
             TCGv r_tmp2 = tcg_temp_new(TCG_TYPE_TL);
             int l1 = gen_new_label();
 
@@ -1609,9 +1602,7 @@ static void gen_arith (CPUState *env, DisasContext *ctx, uint32_t opc,
             tcg_gen_xor_tl(r_tmp1, r_tmp1, t0);
             tcg_gen_and_tl(r_tmp1, r_tmp1, r_tmp2);
             tcg_temp_free(r_tmp2);
-            tcg_gen_shri_tl(r_tmp1, r_tmp1, 31);
-            tcg_gen_brcondi_tl(TCG_COND_EQ, r_tmp1, 0, l1);
-            tcg_temp_free(r_tmp1);
+            tcg_gen_brcondi_tl(TCG_COND_GE, r_tmp1, 0, l1);
             /* operands of different sign, first operand and result different sign */
             generate_exception(ctx, EXCP_OVERFLOW);
             gen_set_label(l1);
@@ -1621,8 +1612,6 @@ static void gen_arith (CPUState *env, DisasContext *ctx, uint32_t opc,
         opn = "sub";
         break;
     case OPC_SUBU:
-        tcg_gen_ext32s_tl(t0, t0);
-        tcg_gen_ext32s_tl(t1, t1);
         tcg_gen_sub_tl(t0, t0, t1);
         tcg_gen_ext32s_tl(t0, t0);
         opn = "subu";
@@ -1630,7 +1619,7 @@ static void gen_arith (CPUState *env, DisasContext *ctx, uint32_t opc,
 #if defined(TARGET_MIPS64)
     case OPC_DADD:
         {
-            TCGv r_tmp1 = tcg_temp_local_new(TCG_TYPE_TL);
+            TCGv r_tmp1 = tcg_temp_new(TCG_TYPE_TL);
             TCGv r_tmp2 = tcg_temp_new(TCG_TYPE_TL);
             int l1 = gen_new_label();
 
@@ -1643,9 +1632,7 @@ static void gen_arith (CPUState *env, DisasContext *ctx, uint32_t opc,
             tcg_gen_xor_tl(r_tmp2, t0, t1);
             tcg_gen_and_tl(r_tmp1, r_tmp1, r_tmp2);
             tcg_temp_free(r_tmp2);
-            tcg_gen_shri_tl(r_tmp1, r_tmp1, 63);
-            tcg_gen_brcondi_tl(TCG_COND_EQ, r_tmp1, 0, l1);
-            tcg_temp_free(r_tmp1);
+            tcg_gen_brcondi_tl(TCG_COND_GE, r_tmp1, 0, l1);
             /* operands of same sign, result different sign */
             generate_exception(ctx, EXCP_OVERFLOW);
             gen_set_label(l1);
@@ -1658,7 +1645,7 @@ static void gen_arith (CPUState *env, DisasContext *ctx, uint32_t opc,
         break;
     case OPC_DSUB:
         {
-            TCGv r_tmp1 = tcg_temp_local_new(TCG_TYPE_TL);
+            TCGv r_tmp1 = tcg_temp_new(TCG_TYPE_TL);
             TCGv r_tmp2 = tcg_temp_new(TCG_TYPE_TL);
             int l1 = gen_new_label();
 
@@ -1670,9 +1657,7 @@ static void gen_arith (CPUState *env, DisasContext *ctx, uint32_t opc,
             tcg_gen_xor_tl(r_tmp1, r_tmp1, t0);
             tcg_gen_and_tl(r_tmp1, r_tmp1, r_tmp2);
             tcg_temp_free(r_tmp2);
-            tcg_gen_shri_tl(r_tmp1, r_tmp1, 63);
-            tcg_gen_brcondi_tl(TCG_COND_EQ, r_tmp1, 0, l1);
-            tcg_temp_free(r_tmp1);
+            tcg_gen_brcondi_tl(TCG_COND_GE, r_tmp1, 0, l1);
             /* operands of different sign, first operand and result different sign */
             generate_exception(ctx, EXCP_OVERFLOW);
             gen_set_label(l1);
@@ -1710,8 +1695,6 @@ static void gen_arith (CPUState *env, DisasContext *ctx, uint32_t opc,
         opn = "xor";
         break;
     case OPC_MUL:
-        tcg_gen_ext32s_tl(t0, t0);
-        tcg_gen_ext32s_tl(t1, t1);
         tcg_gen_mul_tl(t0, t0, t1);
         tcg_gen_ext32s_tl(t0, t0);
         opn = "mul";
@@ -1737,8 +1720,6 @@ static void gen_arith (CPUState *env, DisasContext *ctx, uint32_t opc,
         opn = "movz";
         goto print;
     case OPC_SLLV:
-        tcg_gen_ext32u_tl(t0, t0);
-        tcg_gen_ext32u_tl(t1, t1);
         tcg_gen_andi_tl(t0, t0, 0x1f);
         tcg_gen_shl_tl(t0, t1, t0);
         tcg_gen_ext32s_tl(t0, t0);
@@ -1748,7 +1729,6 @@ static void gen_arith (CPUState *env, DisasContext *ctx, uint32_t opc,
         tcg_gen_ext32s_tl(t1, t1);
         tcg_gen_andi_tl(t0, t0, 0x1f);
         tcg_gen_sar_tl(t0, t1, t0);
-        tcg_gen_ext32s_tl(t0, t0);
         opn = "srav";
         break;
     case OPC_SRLV:
-- 
1.5.6.5


-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32@debian.org         | aurelien@aurel32.net
   `-    people.debian.org/~aurel32 | www.aurel32.net

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 07/11] target-mips: optimize gen_muldiv()
  2008-11-08  8:31 [Qemu-devel] [PATCH 0/11] target-mips: optimizations Aurelien Jarno
                   ` (5 preceding siblings ...)
  2008-11-08  8:35 ` [Qemu-devel] [PATCH 06/11] target-mips: optimize gen_arith()/gen_arith_imm() Aurelien Jarno
@ 2008-11-08  8:37 ` Aurelien Jarno
  2008-11-08  8:37 ` [Qemu-devel] [PATCH 08/11] target-mips: optimize gen_farith() Aurelien Jarno
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Aurelien Jarno @ 2008-11-08  8:37 UTC (permalink / raw)
  To: qemu-devel

Optimize code generation in gen_muldiv():
- Don't do sign extension when the value is already guaranteed to be
  sign extended (otherwise, results are marked as UNPREDICTABLE).
- Access the LO, HI registers directly instead of writing them through
  a temporary variable.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target-mips/translate.c |  162 ++++++++++++++---------------------------------
 1 files changed, 47 insertions(+), 115 deletions(-)

diff --git a/target-mips/translate.c b/target-mips/translate.c
index f1cb7ca..544e5c8 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -604,27 +604,7 @@ static inline void gen_store_gpr (TCGv t, int reg)
         tcg_gen_mov_tl(cpu_gpr[reg], t);
 }
 
-/* Moves to/from HI and LO registers.  */
-static inline void gen_load_HI (TCGv t, int reg)
-{
-    tcg_gen_mov_tl(t, cpu_HI[reg]);
-}
-
-static inline void gen_store_HI (TCGv t, int reg)
-{
-    tcg_gen_mov_tl(cpu_HI[reg], t);
-}
-
-static inline void gen_load_LO (TCGv t, int reg)
-{
-    tcg_gen_mov_tl(t, cpu_LO[reg]);
-}
-
-static inline void gen_store_LO (TCGv t, int reg)
-{
-    tcg_gen_mov_tl(cpu_LO[reg], t);
-}
-
+/* Moves to/from ACX register.  */
 static inline void gen_load_ACX (TCGv t, int reg)
 {
     tcg_gen_mov_tl(t, cpu_ACX[reg]);
@@ -1850,23 +1830,23 @@ static void gen_HILO (DisasContext *ctx, uint32_t opc, int reg)
     }
     switch (opc) {
     case OPC_MFHI:
-        gen_load_HI(t0, 0);
+        tcg_gen_mov_tl(t0, cpu_HI[0]);
         gen_store_gpr(t0, reg);
         opn = "mfhi";
         break;
     case OPC_MFLO:
-        gen_load_LO(t0, 0);
+        tcg_gen_mov_tl(t0, cpu_LO[0]);
         gen_store_gpr(t0, reg);
         opn = "mflo";
         break;
     case OPC_MTHI:
         gen_load_gpr(t0, reg);
-        gen_store_HI(t0, 0);
+        tcg_gen_mov_tl(cpu_HI[0], t0);
         opn = "mthi";
         break;
     case OPC_MTLO:
         gen_load_gpr(t0, reg);
-        gen_store_LO(t0, 0);
+        tcg_gen_mov_tl(cpu_LO[0], t0);
         opn = "mtlo";
         break;
     default:
@@ -1893,27 +1873,28 @@ static void gen_muldiv (DisasContext *ctx, uint32_t opc,
         {
             int l1 = gen_new_label();
 
-            tcg_gen_ext32s_tl(t0, t0);
-            tcg_gen_ext32s_tl(t1, t1);
             tcg_gen_brcondi_tl(TCG_COND_EQ, t1, 0, l1);
             {
-                TCGv r_tmp1 = tcg_temp_new(TCG_TYPE_I64);
-                TCGv r_tmp2 = tcg_temp_new(TCG_TYPE_I64);
-                TCGv r_tmp3 = tcg_temp_new(TCG_TYPE_I64);
-
-                tcg_gen_ext_tl_i64(r_tmp1, t0);
-                tcg_gen_ext_tl_i64(r_tmp2, t1);
-                tcg_gen_div_i64(r_tmp3, r_tmp1, r_tmp2);
-                tcg_gen_rem_i64(r_tmp2, r_tmp1, r_tmp2);
-                tcg_gen_trunc_i64_tl(t0, r_tmp3);
-                tcg_gen_trunc_i64_tl(t1, r_tmp2);
+                int l2 = gen_new_label();
+                TCGv r_tmp1 = tcg_temp_local_new(TCG_TYPE_I32);
+                TCGv r_tmp2 = tcg_temp_local_new(TCG_TYPE_I32);
+                TCGv r_tmp3 = tcg_temp_local_new(TCG_TYPE_I32);
+
+                tcg_gen_trunc_tl_i32(r_tmp1, t0);
+                tcg_gen_trunc_tl_i32(r_tmp2, t1);
+                tcg_gen_brcondi_i32(TCG_COND_NE, r_tmp1, -1 << 31, l2);
+                tcg_gen_brcondi_i32(TCG_COND_NE, r_tmp2, -1, l2);
+                tcg_gen_ext32s_tl(cpu_LO[0], t0);
+                tcg_gen_movi_tl(cpu_HI[0], 0);
+                tcg_gen_br(l1);
+                gen_set_label(l2);
+                tcg_gen_div_i32(r_tmp3, r_tmp1, r_tmp2);
+                tcg_gen_rem_i32(r_tmp2, r_tmp1, r_tmp2);
+                tcg_gen_ext_i32_tl(cpu_LO[0], r_tmp3);
+                tcg_gen_ext_i32_tl(cpu_HI[0], r_tmp2);
                 tcg_temp_free(r_tmp1);
                 tcg_temp_free(r_tmp2);
                 tcg_temp_free(r_tmp3);
-                tcg_gen_ext32s_tl(t0, t0);
-                tcg_gen_ext32s_tl(t1, t1);
-                gen_store_LO(t0, 0);
-                gen_store_HI(t1, 0);
             }
             gen_set_label(l1);
         }
@@ -1934,13 +1915,11 @@ static void gen_muldiv (DisasContext *ctx, uint32_t opc,
                 tcg_gen_trunc_tl_i32(r_tmp2, t1);
                 tcg_gen_divu_i32(r_tmp3, r_tmp1, r_tmp2);
                 tcg_gen_remu_i32(r_tmp1, r_tmp1, r_tmp2);
-                tcg_gen_ext_i32_tl(t0, r_tmp3);
-                tcg_gen_ext_i32_tl(t1, r_tmp1);
+                tcg_gen_ext_i32_tl(cpu_LO[0], r_tmp3);
+                tcg_gen_ext_i32_tl(cpu_HI[0], r_tmp1);
                 tcg_temp_free(r_tmp1);
                 tcg_temp_free(r_tmp2);
                 tcg_temp_free(r_tmp3);
-                gen_store_LO(t0, 0);
-                gen_store_HI(t1, 0);
             }
             gen_set_label(l1);
         }
@@ -1951,8 +1930,6 @@ static void gen_muldiv (DisasContext *ctx, uint32_t opc,
             TCGv r_tmp1 = tcg_temp_new(TCG_TYPE_I64);
             TCGv r_tmp2 = tcg_temp_new(TCG_TYPE_I64);
 
-            tcg_gen_ext32s_tl(t0, t0);
-            tcg_gen_ext32s_tl(t1, t1);
             tcg_gen_ext_tl_i64(r_tmp1, t0);
             tcg_gen_ext_tl_i64(r_tmp2, t1);
             tcg_gen_mul_i64(r_tmp1, r_tmp1, r_tmp2);
@@ -1961,10 +1938,8 @@ static void gen_muldiv (DisasContext *ctx, uint32_t opc,
             tcg_gen_shri_i64(r_tmp1, r_tmp1, 32);
             tcg_gen_trunc_i64_tl(t1, r_tmp1);
             tcg_temp_free(r_tmp1);
-            tcg_gen_ext32s_tl(t0, t0);
-            tcg_gen_ext32s_tl(t1, t1);
-            gen_store_LO(t0, 0);
-            gen_store_HI(t1, 0);
+            tcg_gen_ext32s_tl(cpu_LO[0], t0);
+            tcg_gen_ext32s_tl(cpu_HI[0], t1);
         }
         opn = "mult";
         break;
@@ -1983,10 +1958,8 @@ static void gen_muldiv (DisasContext *ctx, uint32_t opc,
             tcg_gen_shri_i64(r_tmp1, r_tmp1, 32);
             tcg_gen_trunc_i64_tl(t1, r_tmp1);
             tcg_temp_free(r_tmp1);
-            tcg_gen_ext32s_tl(t0, t0);
-            tcg_gen_ext32s_tl(t1, t1);
-            gen_store_LO(t0, 0);
-            gen_store_HI(t1, 0);
+            tcg_gen_ext32s_tl(cpu_LO[0], t0);
+            tcg_gen_ext32s_tl(cpu_HI[0], t1);
         }
         opn = "multu";
         break;
@@ -2001,24 +1974,12 @@ static void gen_muldiv (DisasContext *ctx, uint32_t opc,
 
                 tcg_gen_brcondi_tl(TCG_COND_NE, t0, -1LL << 63, l2);
                 tcg_gen_brcondi_tl(TCG_COND_NE, t1, -1LL, l2);
-                {
-                    tcg_gen_movi_tl(t1, 0);
-                    gen_store_LO(t0, 0);
-                    gen_store_HI(t1, 0);
-                    tcg_gen_br(l1);
-                }
+                tcg_gen_mov_tl(cpu_LO[0], t0);
+                tcg_gen_movi_tl(cpu_HI[0], 0);
+                tcg_gen_br(l1);
                 gen_set_label(l2);
-                {
-                    TCGv r_tmp1 = tcg_temp_new(TCG_TYPE_I64);
-                    TCGv r_tmp2 = tcg_temp_new(TCG_TYPE_I64);
-
-                    tcg_gen_div_i64(r_tmp1, t0, t1);
-                    tcg_gen_rem_i64(r_tmp2, t0, t1);
-                    gen_store_LO(r_tmp1, 0);
-                    gen_store_HI(r_tmp2, 0);
-                    tcg_temp_free(r_tmp1);
-                    tcg_temp_free(r_tmp2);
-                }
+                tcg_gen_div_i64(cpu_LO[0], t0, t1);
+                tcg_gen_rem_i64(cpu_HI[0], t0, t1);
             }
             gen_set_label(l1);
         }
@@ -2029,17 +1990,8 @@ static void gen_muldiv (DisasContext *ctx, uint32_t opc,
             int l1 = gen_new_label();
 
             tcg_gen_brcondi_tl(TCG_COND_EQ, t1, 0, l1);
-            {
-                TCGv r_tmp1 = tcg_temp_new(TCG_TYPE_I64);
-                TCGv r_tmp2 = tcg_temp_new(TCG_TYPE_I64);
-
-                tcg_gen_divu_i64(r_tmp1, t0, t1);
-                tcg_gen_remu_i64(r_tmp2, t0, t1);
-                tcg_temp_free(r_tmp1);
-                tcg_temp_free(r_tmp2);
-                gen_store_LO(r_tmp1, 0);
-                gen_store_HI(r_tmp2, 0);
-            }
+            tcg_gen_divu_i64(cpu_LO[0], t0, t1);
+            tcg_gen_remu_i64(cpu_HI[0], t0, t1);
             gen_set_label(l1);
         }
         opn = "ddivu";
@@ -2058,24 +2010,18 @@ static void gen_muldiv (DisasContext *ctx, uint32_t opc,
             TCGv r_tmp1 = tcg_temp_new(TCG_TYPE_I64);
             TCGv r_tmp2 = tcg_temp_new(TCG_TYPE_I64);
 
-            tcg_gen_ext32s_tl(t0, t0);
-            tcg_gen_ext32s_tl(t1, t1);
             tcg_gen_ext_tl_i64(r_tmp1, t0);
             tcg_gen_ext_tl_i64(r_tmp2, t1);
             tcg_gen_mul_i64(r_tmp1, r_tmp1, r_tmp2);
-            gen_load_LO(t0, 0);
-            gen_load_HI(t1, 0);
-            tcg_gen_concat_tl_i64(r_tmp2, t0, t1);
+            tcg_gen_concat_tl_i64(r_tmp2, cpu_LO[0], cpu_HI[0]);
             tcg_gen_add_i64(r_tmp1, r_tmp1, r_tmp2);
             tcg_temp_free(r_tmp2);
             tcg_gen_trunc_i64_tl(t0, r_tmp1);
             tcg_gen_shri_i64(r_tmp1, r_tmp1, 32);
             tcg_gen_trunc_i64_tl(t1, r_tmp1);
             tcg_temp_free(r_tmp1);
-            tcg_gen_ext32s_tl(t0, t0);
-            tcg_gen_ext32s_tl(t1, t1);
-            gen_store_LO(t0, 0);
-            gen_store_HI(t1, 0);
+            tcg_gen_ext32s_tl(cpu_LO[0], t0);
+            tcg_gen_ext32s_tl(cpu_LO[1], t1);
         }
         opn = "madd";
         break;
@@ -2089,19 +2035,15 @@ static void gen_muldiv (DisasContext *ctx, uint32_t opc,
             tcg_gen_extu_tl_i64(r_tmp1, t0);
             tcg_gen_extu_tl_i64(r_tmp2, t1);
             tcg_gen_mul_i64(r_tmp1, r_tmp1, r_tmp2);
-            gen_load_LO(t0, 0);
-            gen_load_HI(t1, 0);
-            tcg_gen_concat_tl_i64(r_tmp2, t0, t1);
+            tcg_gen_concat_tl_i64(r_tmp2, cpu_LO[0], cpu_HI[0]);
             tcg_gen_add_i64(r_tmp1, r_tmp1, r_tmp2);
             tcg_temp_free(r_tmp2);
             tcg_gen_trunc_i64_tl(t0, r_tmp1);
             tcg_gen_shri_i64(r_tmp1, r_tmp1, 32);
             tcg_gen_trunc_i64_tl(t1, r_tmp1);
             tcg_temp_free(r_tmp1);
-            tcg_gen_ext32s_tl(t0, t0);
-            tcg_gen_ext32s_tl(t1, t1);
-            gen_store_LO(t0, 0);
-            gen_store_HI(t1, 0);
+            tcg_gen_ext32s_tl(cpu_LO[0], t0);
+            tcg_gen_ext32s_tl(cpu_HI[0], t1);
         }
         opn = "maddu";
         break;
@@ -2110,24 +2052,18 @@ static void gen_muldiv (DisasContext *ctx, uint32_t opc,
             TCGv r_tmp1 = tcg_temp_new(TCG_TYPE_I64);
             TCGv r_tmp2 = tcg_temp_new(TCG_TYPE_I64);
 
-            tcg_gen_ext32s_tl(t0, t0);
-            tcg_gen_ext32s_tl(t1, t1);
             tcg_gen_ext_tl_i64(r_tmp1, t0);
             tcg_gen_ext_tl_i64(r_tmp2, t1);
             tcg_gen_mul_i64(r_tmp1, r_tmp1, r_tmp2);
-            gen_load_LO(t0, 0);
-            gen_load_HI(t1, 0);
-            tcg_gen_concat_tl_i64(r_tmp2, t0, t1);
+            tcg_gen_concat_tl_i64(r_tmp2, cpu_LO[0], cpu_HI[0]);
             tcg_gen_sub_i64(r_tmp1, r_tmp1, r_tmp2);
             tcg_temp_free(r_tmp2);
             tcg_gen_trunc_i64_tl(t0, r_tmp1);
             tcg_gen_shri_i64(r_tmp1, r_tmp1, 32);
             tcg_gen_trunc_i64_tl(t1, r_tmp1);
             tcg_temp_free(r_tmp1);
-            tcg_gen_ext32s_tl(t0, t0);
-            tcg_gen_ext32s_tl(t1, t1);
-            gen_store_LO(t0, 0);
-            gen_store_HI(t1, 0);
+            tcg_gen_ext32s_tl(cpu_LO[0], t0);
+            tcg_gen_ext32s_tl(cpu_HI[0], t1);
         }
         opn = "msub";
         break;
@@ -2141,19 +2077,15 @@ static void gen_muldiv (DisasContext *ctx, uint32_t opc,
             tcg_gen_extu_tl_i64(r_tmp1, t0);
             tcg_gen_extu_tl_i64(r_tmp2, t1);
             tcg_gen_mul_i64(r_tmp1, r_tmp1, r_tmp2);
-            gen_load_LO(t0, 0);
-            gen_load_HI(t1, 0);
-            tcg_gen_concat_tl_i64(r_tmp2, t0, t1);
+            tcg_gen_concat_tl_i64(r_tmp2, cpu_LO[0], cpu_HI[0]);
             tcg_gen_sub_i64(r_tmp1, r_tmp1, r_tmp2);
             tcg_temp_free(r_tmp2);
             tcg_gen_trunc_i64_tl(t0, r_tmp1);
             tcg_gen_shri_i64(r_tmp1, r_tmp1, 32);
             tcg_gen_trunc_i64_tl(t1, r_tmp1);
             tcg_temp_free(r_tmp1);
-            tcg_gen_ext32s_tl(t0, t0);
-            tcg_gen_ext32s_tl(t1, t1);
-            gen_store_LO(t0, 0);
-            gen_store_HI(t1, 0);
+            tcg_gen_ext32s_tl(cpu_LO[0], t0);
+            tcg_gen_ext32s_tl(cpu_HI[0], t1);
         }
         opn = "msubu";
         break;
-- 
1.5.6.5

-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32@debian.org         | aurelien@aurel32.net
   `-    people.debian.org/~aurel32 | www.aurel32.net

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 08/11] target-mips: optimize gen_farith()
  2008-11-08  8:31 [Qemu-devel] [PATCH 0/11] target-mips: optimizations Aurelien Jarno
                   ` (6 preceding siblings ...)
  2008-11-08  8:37 ` [Qemu-devel] [PATCH 07/11] target-mips: optimize gen_muldiv() Aurelien Jarno
@ 2008-11-08  8:37 ` Aurelien Jarno
  2008-11-08  8:38 ` [Qemu-devel] [PATCH 09/11] target-mips: optimize movc*() Aurelien Jarno
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Aurelien Jarno @ 2008-11-08  8:37 UTC (permalink / raw)
  To: qemu-devel

Optimize code generation in gen_farith():
- Temp variables are valid up to and *including* the brcond instruction.
  Use them instead of temp local variables.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target-mips/translate.c |   18 ++++++------------
 1 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/target-mips/translate.c b/target-mips/translate.c
index 544e5c8..c2b9a0e 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -6314,12 +6314,11 @@ static void gen_farith (DisasContext *ctx, uint32_t op1,
     case FOP(18, 16):
         {
             int l1 = gen_new_label();
-            TCGv t0 = tcg_temp_local_new(TCG_TYPE_TL);
+            TCGv t0 = tcg_temp_new(TCG_TYPE_TL);
             TCGv fp0 = tcg_temp_local_new(TCG_TYPE_I32);
 
             gen_load_gpr(t0, ft);
             tcg_gen_brcondi_tl(TCG_COND_NE, t0, 0, l1);
-            tcg_temp_free(t0);
             gen_load_fpr32(fp0, fs);
             gen_store_fpr32(fp0, fd);
             tcg_temp_free(fp0);
@@ -6330,12 +6329,11 @@ static void gen_farith (DisasContext *ctx, uint32_t op1,
     case FOP(19, 16):
         {
             int l1 = gen_new_label();
-            TCGv t0 = tcg_temp_local_new(TCG_TYPE_TL);
+            TCGv t0 = tcg_temp_new(TCG_TYPE_TL);
             TCGv fp0 = tcg_temp_local_new(TCG_TYPE_I32);
 
             gen_load_gpr(t0, ft);
             tcg_gen_brcondi_tl(TCG_COND_EQ, t0, 0, l1);
-            tcg_temp_free(t0);
             gen_load_fpr32(fp0, fs);
             gen_store_fpr32(fp0, fd);
             tcg_temp_free(fp0);
@@ -6733,12 +6731,11 @@ static void gen_farith (DisasContext *ctx, uint32_t op1,
     case FOP(18, 17):
         {
             int l1 = gen_new_label();
-            TCGv t0 = tcg_temp_local_new(TCG_TYPE_TL);
+            TCGv t0 = tcg_temp_new(TCG_TYPE_TL);
             TCGv fp0 = tcg_temp_local_new(TCG_TYPE_I64);
 
             gen_load_gpr(t0, ft);
             tcg_gen_brcondi_tl(TCG_COND_NE, t0, 0, l1);
-            tcg_temp_free(t0);
             gen_load_fpr64(ctx, fp0, fs);
             gen_store_fpr64(ctx, fp0, fd);
             tcg_temp_free(fp0);
@@ -6749,12 +6746,11 @@ static void gen_farith (DisasContext *ctx, uint32_t op1,
     case FOP(19, 17):
         {
             int l1 = gen_new_label();
-            TCGv t0 = tcg_temp_local_new(TCG_TYPE_TL);
+            TCGv t0 = tcg_temp_new(TCG_TYPE_TL);
             TCGv fp0 = tcg_temp_local_new(TCG_TYPE_I64);
 
             gen_load_gpr(t0, ft);
             tcg_gen_brcondi_tl(TCG_COND_EQ, t0, 0, l1);
-            tcg_temp_free(t0);
             gen_load_fpr64(ctx, fp0, fs);
             gen_store_fpr64(ctx, fp0, fd);
             tcg_temp_free(fp0);
@@ -7068,13 +7064,12 @@ static void gen_farith (DisasContext *ctx, uint32_t op1,
         check_cp1_64bitmode(ctx);
         {
             int l1 = gen_new_label();
-            TCGv t0 = tcg_temp_local_new(TCG_TYPE_TL);
+            TCGv t0 = tcg_temp_new(TCG_TYPE_TL);
             TCGv fp0 = tcg_temp_local_new(TCG_TYPE_I32);
             TCGv fph0 = tcg_temp_local_new(TCG_TYPE_I32);
 
             gen_load_gpr(t0, ft);
             tcg_gen_brcondi_tl(TCG_COND_NE, t0, 0, l1);
-            tcg_temp_free(t0);
             gen_load_fpr32(fp0, fs);
             gen_load_fpr32h(fph0, fs);
             gen_store_fpr32(fp0, fd);
@@ -7089,13 +7084,12 @@ static void gen_farith (DisasContext *ctx, uint32_t op1,
         check_cp1_64bitmode(ctx);
         {
             int l1 = gen_new_label();
-            TCGv t0 = tcg_temp_local_new(TCG_TYPE_TL);
+            TCGv t0 = tcg_temp_new(TCG_TYPE_TL);
             TCGv fp0 = tcg_temp_local_new(TCG_TYPE_I32);
             TCGv fph0 = tcg_temp_local_new(TCG_TYPE_I32);
 
             gen_load_gpr(t0, ft);
             tcg_gen_brcondi_tl(TCG_COND_EQ, t0, 0, l1);
-            tcg_temp_free(t0);
             gen_load_fpr32(fp0, fs);
             gen_load_fpr32h(fph0, fs);
             gen_store_fpr32(fp0, fd);
-- 
1.5.6.5

-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32@debian.org         | aurelien@aurel32.net
   `-    people.debian.org/~aurel32 | www.aurel32.net

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 09/11] target-mips: optimize movc*()
  2008-11-08  8:31 [Qemu-devel] [PATCH 0/11] target-mips: optimizations Aurelien Jarno
                   ` (7 preceding siblings ...)
  2008-11-08  8:37 ` [Qemu-devel] [PATCH 08/11] target-mips: optimize gen_farith() Aurelien Jarno
@ 2008-11-08  8:38 ` Aurelien Jarno
  2008-11-08  8:39 ` [Qemu-devel] [PATCH 10/11] target-mips: gen_compute_branch1() Aurelien Jarno
  2008-11-08  8:39 ` [Qemu-devel] [PATCH 11/11] target-mips: fix temporary variable freeing in op_ldst_##insn() Aurelien Jarno
  10 siblings, 0 replies; 14+ messages in thread
From: Aurelien Jarno @ 2008-11-08  8:38 UTC (permalink / raw)
  To: qemu-devel

Optimize code generation in gen_movc*():
- Temp variables are valid up to and *including* the brcond instruction.
  Use them instead of temp local variables.
- Avoid using temporary variables to transfer values.
- Access fpu_fcr31 directly in gen_movcf_ps().

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target-mips/translate.c |   81 +++++++++++++++++++----------------------------
 1 files changed, 33 insertions(+), 48 deletions(-)

diff --git a/target-mips/translate.c b/target-mips/translate.c
index c2b9a0e..2970ae2 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -5930,8 +5930,7 @@ static void gen_movci (DisasContext *ctx, int rd, int rs, int cc, int tf)
     uint32_t ccbit;
     TCGCond cond;
     TCGv t0 = tcg_temp_local_new(TCG_TYPE_TL);
-    TCGv t1 = tcg_temp_local_new(TCG_TYPE_TL);
-    TCGv r_tmp = tcg_temp_local_new(TCG_TYPE_I32);
+    TCGv r_tmp = tcg_temp_new(TCG_TYPE_I32);
 
     if (cc)
         ccbit = 1 << (24 + cc);
@@ -5943,14 +5942,9 @@ static void gen_movci (DisasContext *ctx, int rd, int rs, int cc, int tf)
         cond = TCG_COND_NE;
 
     gen_load_gpr(t0, rd);
-    gen_load_gpr(t1, rs);
     tcg_gen_andi_i32(r_tmp, fpu_fcr31, ccbit);
     tcg_gen_brcondi_i32(cond, r_tmp, 0, l1);
-    tcg_temp_free(r_tmp);
-
-    tcg_gen_mov_tl(t0, t1);
-    tcg_temp_free(t1);
-
+    gen_load_gpr(t0, rs);
     gen_set_label(l1);
     gen_store_gpr(t0, rd);
     tcg_temp_free(t0);
@@ -5960,9 +5954,8 @@ static inline void gen_movcf_s (int fs, int fd, int cc, int tf)
 {
     uint32_t ccbit;
     int cond;
-    TCGv r_tmp1 = tcg_temp_local_new(TCG_TYPE_I32);
+    TCGv r_tmp1 = tcg_temp_new(TCG_TYPE_I32);
     TCGv fp0 = tcg_temp_local_new(TCG_TYPE_I32);
-    TCGv fp1 = tcg_temp_local_new(TCG_TYPE_I32);
     int l1 = gen_new_label();
 
     if (cc)
@@ -5975,25 +5968,21 @@ static inline void gen_movcf_s (int fs, int fd, int cc, int tf)
     else
         cond = TCG_COND_NE;
 
-    gen_load_fpr32(fp0, fs);
-    gen_load_fpr32(fp1, fd);
+    gen_load_fpr32(fp0, fd);
     tcg_gen_andi_i32(r_tmp1, fpu_fcr31, ccbit);
     tcg_gen_brcondi_i32(cond, r_tmp1, 0, l1);
-    tcg_gen_mov_i32(fp1, fp0);
-    tcg_temp_free(fp0);
+    gen_load_fpr32(fp0, fs);
     gen_set_label(l1);
-    tcg_temp_free(r_tmp1);
-    gen_store_fpr32(fp1, fd);
-    tcg_temp_free(fp1);
+    gen_store_fpr32(fp0, fd);
+    tcg_temp_free(fp0);
 }
 
 static inline void gen_movcf_d (DisasContext *ctx, int fs, int fd, int cc, int tf)
 {
     uint32_t ccbit;
     int cond;
-    TCGv r_tmp1 = tcg_temp_local_new(TCG_TYPE_I32);
+    TCGv r_tmp1 = tcg_temp_new(TCG_TYPE_I32);
     TCGv fp0 = tcg_temp_local_new(TCG_TYPE_I64);
-    TCGv fp1 = tcg_temp_local_new(TCG_TYPE_I64);
     int l1 = gen_new_label();
 
     if (cc)
@@ -6006,57 +5995,53 @@ static inline void gen_movcf_d (DisasContext *ctx, int fs, int fd, int cc, int t
     else
         cond = TCG_COND_NE;
 
-    gen_load_fpr64(ctx, fp0, fs);
-    gen_load_fpr64(ctx, fp1, fd);
+    gen_load_fpr64(ctx, fp0, fd);
     tcg_gen_andi_i32(r_tmp1, fpu_fcr31, ccbit);
     tcg_gen_brcondi_i32(cond, r_tmp1, 0, l1);
-    tcg_gen_mov_i64(fp1, fp0);
-    tcg_temp_free(fp0);
+    gen_load_fpr64(ctx, fp0, fs);
     gen_set_label(l1);
-    tcg_temp_free(r_tmp1);
-    gen_store_fpr64(ctx, fp1, fd);
-    tcg_temp_free(fp1);
+    gen_store_fpr64(ctx, fp0, fd);
+    tcg_temp_free(fp0);
 }
 
 static inline void gen_movcf_ps (int fs, int fd, int cc, int tf)
 {
+    uint32_t ccbit1, ccbit2;
     int cond;
     TCGv r_tmp1 = tcg_temp_local_new(TCG_TYPE_I32);
-    TCGv r_tmp2 = tcg_temp_local_new(TCG_TYPE_I32);
     TCGv fp0 = tcg_temp_local_new(TCG_TYPE_I32);
-    TCGv fph0 = tcg_temp_local_new(TCG_TYPE_I32);
-    TCGv fp1 = tcg_temp_local_new(TCG_TYPE_I32);
-    TCGv fph1 = tcg_temp_local_new(TCG_TYPE_I32);
     int l1 = gen_new_label();
     int l2 = gen_new_label();
 
+    if (cc) {
+        ccbit1 = 1 << (24 + cc);
+        ccbit2 = 1 << (25 + cc);
+    } else {
+        ccbit1 = 1 << 23;
+        ccbit2 = 1 << 25;
+    }
+
     if (tf)
         cond = TCG_COND_EQ;
     else
         cond = TCG_COND_NE;
 
+    gen_load_fpr32(fp0, fd);
+    tcg_gen_andi_i32(r_tmp1, fpu_fcr31, ccbit1);
+    tcg_gen_brcondi_i32(cond, r_tmp1, 0, l1);
     gen_load_fpr32(fp0, fs);
-    gen_load_fpr32h(fph0, fs);
-    gen_load_fpr32(fp1, fd);
-    gen_load_fpr32h(fph1, fd);
-    get_fp_cond(r_tmp1);
-    tcg_gen_shri_i32(r_tmp1, r_tmp1, cc);
-    tcg_gen_andi_i32(r_tmp2, r_tmp1, 0x1);
-    tcg_gen_brcondi_i32(cond, r_tmp2, 0, l1);
-    tcg_gen_mov_i32(fp1, fp0);
-    tcg_temp_free(fp0);
     gen_set_label(l1);
-    tcg_gen_andi_i32(r_tmp2, r_tmp1, 0x2);
-    tcg_gen_brcondi_i32(cond, r_tmp2, 0, l2);
-    tcg_gen_mov_i32(fph1, fph0);
-    tcg_temp_free(fph0);
+    gen_store_fpr32(fp0, fd);
+
+    gen_load_fpr32h(fp0, fd);
+    tcg_gen_andi_i32(r_tmp1, fpu_fcr31, ccbit2);
+    tcg_gen_brcondi_i32(cond, r_tmp1, 0, l2);
+    gen_load_fpr32h(fp0, fs);
     gen_set_label(l2);
+    gen_store_fpr32h(fp0, fd);
+
     tcg_temp_free(r_tmp1);
-    tcg_temp_free(r_tmp2);
-    gen_store_fpr32(fp1, fd);
-    gen_store_fpr32h(fph1, fd);
-    tcg_temp_free(fp1);
-    tcg_temp_free(fph1);
+    tcg_temp_free(fp0);
 }
 
 
-- 
1.5.6.5

-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32@debian.org         | aurelien@aurel32.net
   `-    people.debian.org/~aurel32 | www.aurel32.net

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 10/11] target-mips: gen_compute_branch1()
  2008-11-08  8:31 [Qemu-devel] [PATCH 0/11] target-mips: optimizations Aurelien Jarno
                   ` (8 preceding siblings ...)
  2008-11-08  8:38 ` [Qemu-devel] [PATCH 09/11] target-mips: optimize movc*() Aurelien Jarno
@ 2008-11-08  8:39 ` Aurelien Jarno
  2008-11-08  8:39 ` [Qemu-devel] [PATCH 11/11] target-mips: fix temporary variable freeing in op_ldst_##insn() Aurelien Jarno
  10 siblings, 0 replies; 14+ messages in thread
From: Aurelien Jarno @ 2008-11-08  8:39 UTC (permalink / raw)
  To: qemu-devel

Optimize code generation in gen_compute_branch1():
- Directly use I32 variables instead of converting values from _tl to
  _i32 and back to _tl.
- Write the result directly to bcond instead of passing by a local
  variable.
- Temp variables are valid up to and *including* the brcond instruction.
  Use them instead of temp local variables.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target-mips/translate.c |  127 +++++++++++++++-------------------------------
 1 files changed, 42 insertions(+), 85 deletions(-)

diff --git a/target-mips/translate.c b/target-mips/translate.c
index 2970ae2..fdd7530 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -5634,8 +5634,7 @@ static void gen_compute_branch1 (CPUState *env, DisasContext *ctx, uint32_t op,
 {
     target_ulong btarget;
     const char *opn = "cp1 cond branch";
-    TCGv t0 = tcg_temp_local_new(TCG_TYPE_TL);
-    TCGv t1 = tcg_temp_local_new(TCG_TYPE_TL);
+    TCGv t0 = tcg_temp_new(TCG_TYPE_TL);
 
     if (cc != 0)
         check_insn(env, ctx, ISA_MIPS4 | ISA_MIPS32);
@@ -5647,19 +5646,14 @@ static void gen_compute_branch1 (CPUState *env, DisasContext *ctx, uint32_t op,
         {
             int l1 = gen_new_label();
             int l2 = gen_new_label();
-            TCGv r_tmp1 = tcg_temp_new(TCG_TYPE_I32);
 
-            get_fp_cond(r_tmp1);
-            tcg_gen_ext_i32_tl(t0, r_tmp1);
-            tcg_temp_free(r_tmp1);
-            tcg_gen_not_tl(t0, t0);
-            tcg_gen_movi_tl(t1, 0x1 << cc);
-            tcg_gen_and_tl(t0, t0, t1);
-            tcg_gen_brcondi_tl(TCG_COND_NE, t0, 0, l1);
-            tcg_gen_movi_tl(t0, 0);
+            get_fp_cond(t0);
+            tcg_gen_andi_i32(t0, t0, 0x1 << cc);
+            tcg_gen_brcondi_i32(TCG_COND_EQ, t0, 0, l1);
+            tcg_gen_movi_i32(bcond, 0);
             tcg_gen_br(l2);
             gen_set_label(l1);
-            tcg_gen_movi_tl(t0, 1);
+            tcg_gen_movi_i32(bcond, 1);
             gen_set_label(l2);
         }
         opn = "bc1f";
@@ -5668,19 +5662,14 @@ static void gen_compute_branch1 (CPUState *env, DisasContext *ctx, uint32_t op,
         {
             int l1 = gen_new_label();
             int l2 = gen_new_label();
-            TCGv r_tmp1 = tcg_temp_new(TCG_TYPE_I32);
 
-            get_fp_cond(r_tmp1);
-            tcg_gen_ext_i32_tl(t0, r_tmp1);
-            tcg_temp_free(r_tmp1);
-            tcg_gen_not_tl(t0, t0);
-            tcg_gen_movi_tl(t1, 0x1 << cc);
-            tcg_gen_and_tl(t0, t0, t1);
-            tcg_gen_brcondi_tl(TCG_COND_NE, t0, 0, l1);
-            tcg_gen_movi_tl(t0, 0);
+            get_fp_cond(t0);
+            tcg_gen_andi_i32(t0, t0, 0x1 << cc);
+            tcg_gen_brcondi_i32(TCG_COND_EQ, t0, 0, l1);
+            tcg_gen_movi_i32(bcond, 0);
             tcg_gen_br(l2);
             gen_set_label(l1);
-            tcg_gen_movi_tl(t0, 1);
+            tcg_gen_movi_i32(bcond, 1);
             gen_set_label(l2);
         }
         opn = "bc1fl";
@@ -5689,18 +5678,14 @@ static void gen_compute_branch1 (CPUState *env, DisasContext *ctx, uint32_t op,
         {
             int l1 = gen_new_label();
             int l2 = gen_new_label();
-            TCGv r_tmp1 = tcg_temp_new(TCG_TYPE_I32);
 
-            get_fp_cond(r_tmp1);
-            tcg_gen_ext_i32_tl(t0, r_tmp1);
-            tcg_temp_free(r_tmp1);
-            tcg_gen_movi_tl(t1, 0x1 << cc);
-            tcg_gen_and_tl(t0, t0, t1);
-            tcg_gen_brcondi_tl(TCG_COND_NE, t0, 0, l1);
-            tcg_gen_movi_tl(t0, 0);
+            get_fp_cond(t0);
+            tcg_gen_andi_i32(t0, t0, 0x1 << cc);
+            tcg_gen_brcondi_i32(TCG_COND_NE, t0, 0, l1);
+            tcg_gen_movi_i32(bcond, 0);
             tcg_gen_br(l2);
             gen_set_label(l1);
-            tcg_gen_movi_tl(t0, 1);
+            tcg_gen_movi_i32(bcond, 1);
             gen_set_label(l2);
         }
         opn = "bc1t";
@@ -5709,42 +5694,32 @@ static void gen_compute_branch1 (CPUState *env, DisasContext *ctx, uint32_t op,
         {
             int l1 = gen_new_label();
             int l2 = gen_new_label();
-            TCGv r_tmp1 = tcg_temp_new(TCG_TYPE_I32);
 
-            get_fp_cond(r_tmp1);
-            tcg_gen_ext_i32_tl(t0, r_tmp1);
-            tcg_temp_free(r_tmp1);
-            tcg_gen_movi_tl(t1, 0x1 << cc);
-            tcg_gen_and_tl(t0, t0, t1);
-            tcg_gen_brcondi_tl(TCG_COND_NE, t0, 0, l1);
-            tcg_gen_movi_tl(t0, 0);
+            get_fp_cond(t0);
+            tcg_gen_andi_i32(t0, t0, 0x1 << cc);
+            tcg_gen_brcondi_i32(TCG_COND_NE, t0, 0, l1);
+            tcg_gen_movi_i32(bcond, 0);
             tcg_gen_br(l2);
             gen_set_label(l1);
-            tcg_gen_movi_tl(t0, 1);
+            tcg_gen_movi_i32(bcond, 1);
             gen_set_label(l2);
         }
         opn = "bc1tl";
     likely:
         ctx->hflags |= MIPS_HFLAG_BL;
-        tcg_gen_trunc_tl_i32(bcond, t0);
         break;
     case OPC_BC1FANY2:
         {
             int l1 = gen_new_label();
             int l2 = gen_new_label();
-            TCGv r_tmp1 = tcg_temp_new(TCG_TYPE_I32);
 
-            get_fp_cond(r_tmp1);
-            tcg_gen_ext_i32_tl(t0, r_tmp1);
-            tcg_temp_free(r_tmp1);
-            tcg_gen_not_tl(t0, t0);
-            tcg_gen_movi_tl(t1, 0x3 << cc);
-            tcg_gen_and_tl(t0, t0, t1);
-            tcg_gen_brcondi_tl(TCG_COND_NE, t0, 0, l1);
-            tcg_gen_movi_tl(t0, 0);
+            get_fp_cond(t0);
+            tcg_gen_andi_i32(t0, t0, 0x3 << cc);
+            tcg_gen_brcondi_i32(TCG_COND_EQ, t0, 0, l1);
+            tcg_gen_movi_i32(bcond, 0);
             tcg_gen_br(l2);
             gen_set_label(l1);
-            tcg_gen_movi_tl(t0, 1);
+            tcg_gen_movi_i32(bcond, 1);
             gen_set_label(l2);
         }
         opn = "bc1any2f";
@@ -5753,18 +5728,14 @@ static void gen_compute_branch1 (CPUState *env, DisasContext *ctx, uint32_t op,
         {
             int l1 = gen_new_label();
             int l2 = gen_new_label();
-            TCGv r_tmp1 = tcg_temp_new(TCG_TYPE_I32);
 
-            get_fp_cond(r_tmp1);
-            tcg_gen_ext_i32_tl(t0, r_tmp1);
-            tcg_temp_free(r_tmp1);
-            tcg_gen_movi_tl(t1, 0x3 << cc);
-            tcg_gen_and_tl(t0, t0, t1);
-            tcg_gen_brcondi_tl(TCG_COND_NE, t0, 0, l1);
-            tcg_gen_movi_tl(t0, 0);
+            get_fp_cond(t0);
+            tcg_gen_andi_i32(t0, t0, 0x3 << cc);
+            tcg_gen_brcondi_i32(TCG_COND_NE, t0, 0, l1);
+            tcg_gen_movi_i32(bcond, 0);
             tcg_gen_br(l2);
             gen_set_label(l1);
-            tcg_gen_movi_tl(t0, 1);
+            tcg_gen_movi_i32(bcond, 1);
             gen_set_label(l2);
         }
         opn = "bc1any2t";
@@ -5773,19 +5744,14 @@ static void gen_compute_branch1 (CPUState *env, DisasContext *ctx, uint32_t op,
         {
             int l1 = gen_new_label();
             int l2 = gen_new_label();
-            TCGv r_tmp1 = tcg_temp_new(TCG_TYPE_I32);
 
-            get_fp_cond(r_tmp1);
-            tcg_gen_ext_i32_tl(t0, r_tmp1);
-            tcg_temp_free(r_tmp1);
-            tcg_gen_not_tl(t0, t0);
-            tcg_gen_movi_tl(t1, 0xf << cc);
-            tcg_gen_and_tl(t0, t0, t1);
-            tcg_gen_brcondi_tl(TCG_COND_NE, t0, 0, l1);
-            tcg_gen_movi_tl(t0, 0);
+            get_fp_cond(t0);
+            tcg_gen_andi_i32(t0, t0, 0xf << cc);
+            tcg_gen_brcondi_i32(TCG_COND_EQ, t0, 0, l1);
+            tcg_gen_movi_i32(bcond, 0);
             tcg_gen_br(l2);
             gen_set_label(l1);
-            tcg_gen_movi_tl(t0, 1);
+            tcg_gen_movi_i32(bcond, 1);
             gen_set_label(l2);
         }
         opn = "bc1any4f";
@@ -5794,37 +5760,28 @@ static void gen_compute_branch1 (CPUState *env, DisasContext *ctx, uint32_t op,
         {
             int l1 = gen_new_label();
             int l2 = gen_new_label();
-            TCGv r_tmp1 = tcg_temp_new(TCG_TYPE_I32);
 
-            get_fp_cond(r_tmp1);
-            tcg_gen_ext_i32_tl(t0, r_tmp1);
-            tcg_temp_free(r_tmp1);
-            tcg_gen_movi_tl(t1, 0xf << cc);
-            tcg_gen_and_tl(t0, t0, t1);
-            tcg_gen_brcondi_tl(TCG_COND_NE, t0, 0, l1);
-            tcg_gen_movi_tl(t0, 0);
+            get_fp_cond(t0);
+            tcg_gen_andi_i32(t0, t0, 0xf << cc);
+            tcg_gen_brcondi_i32(TCG_COND_NE, t0, 0, l1);
+            tcg_gen_movi_i32(bcond, 0);
             tcg_gen_br(l2);
             gen_set_label(l1);
-            tcg_gen_movi_tl(t0, 1);
+            tcg_gen_movi_i32(bcond, 1);
             gen_set_label(l2);
         }
         opn = "bc1any4t";
     not_likely:
         ctx->hflags |= MIPS_HFLAG_BC;
-        tcg_gen_trunc_tl_i32(bcond, t0);
         break;
     default:
         MIPS_INVAL(opn);
         generate_exception (ctx, EXCP_RI);
-        goto out;
+        return;
     }
     MIPS_DEBUG("%s: cond %02x target " TARGET_FMT_lx, opn,
                ctx->hflags, btarget);
     ctx->btarget = btarget;
-
- out:
-    tcg_temp_free(t0);
-    tcg_temp_free(t1);
 }
 
 /* Coprocessor 1 (FPU) */
-- 
1.5.6.5

-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32@debian.org         | aurelien@aurel32.net
   `-    people.debian.org/~aurel32 | www.aurel32.net

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 11/11] target-mips: fix temporary variable freeing in op_ldst_##insn()
  2008-11-08  8:31 [Qemu-devel] [PATCH 0/11] target-mips: optimizations Aurelien Jarno
                   ` (9 preceding siblings ...)
  2008-11-08  8:39 ` [Qemu-devel] [PATCH 10/11] target-mips: gen_compute_branch1() Aurelien Jarno
@ 2008-11-08  8:39 ` Aurelien Jarno
  10 siblings, 0 replies; 14+ messages in thread
From: Aurelien Jarno @ 2008-11-08  8:39 UTC (permalink / raw)
  To: qemu-devel

Move tcg_temp_free() out of the conditional part to make sure
the TCG temporary variable is freed in all cases.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target-mips/translate.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/target-mips/translate.c b/target-mips/translate.c
index fdd7530..15795a6 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -1011,13 +1011,13 @@ static inline void op_ldst_##insn(TCGv t0, TCGv t1, DisasContext *ctx)  \
     gen_set_label(l1);                                                  \
     tcg_gen_ld_tl(r_tmp, cpu_env, offsetof(CPUState, CP0_LLAddr));      \
     tcg_gen_brcond_tl(TCG_COND_NE, t0, r_tmp, l2);                      \
-    tcg_temp_free(r_tmp);                                               \
     tcg_gen_qemu_##fname(t1, t0, ctx->mem_idx);                         \
     tcg_gen_movi_tl(t0, 1);                                             \
     tcg_gen_br(l3);                                                     \
     gen_set_label(l2);                                                  \
     tcg_gen_movi_tl(t0, 0);                                             \
     gen_set_label(l3);                                                  \
+    tcg_temp_free(r_tmp);                                               \
 }
 OP_ST_ATOMIC(sc,st32,0x3);
 #if defined(TARGET_MIPS64)
-- 
1.5.6.5


-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32@debian.org         | aurelien@aurel32.net
   `-    people.debian.org/~aurel32 | www.aurel32.net

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 04/11] target-mips: convert bitfield ops to TCG
  2008-11-08  8:34 ` [Qemu-devel] [PATCH 04/11] target-mips: convert bitfield ops to TCG Aurelien Jarno
@ 2008-11-08 12:57   ` Laurent Desnogues
  2008-11-08 19:13     ` Aurelien Jarno
  0 siblings, 1 reply; 14+ messages in thread
From: Laurent Desnogues @ 2008-11-08 12:57 UTC (permalink / raw)
  To: qemu-devel

On Sat, Nov 8, 2008 at 9:34 AM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> Bitfield operations can be written with very few TCG instructions
> (between 2 and 5), so it is worth converting them to TCG.
>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
>  target-mips/helper.h    |    6 +----
>  target-mips/op_helper.c |   26 +------------------------
>  target-mips/translate.c |   49 +++++++++++++++++++++++++++++++++++++---------
>  3 files changed, 41 insertions(+), 40 deletions(-)
[...]
> diff --git a/target-mips/translate.c b/target-mips/translate.c
> index af01f73..2cd1868 100644
> --- a/target-mips/translate.c
> +++ b/target-mips/translate.c
> @@ -2682,57 +2682,86 @@ static void gen_compute_branch (DisasContext *ctx, uint32_t opc,
>  static void gen_bitops (DisasContext *ctx, uint32_t opc, int rt,
>                         int rs, int lsb, int msb)
>  {
> -    TCGv t0 = tcg_temp_local_new(TCG_TYPE_TL);
> -    TCGv t1 = tcg_temp_local_new(TCG_TYPE_TL);
> +    TCGv t0 = tcg_temp_new(TCG_TYPE_TL);
> +    TCGv t1 = tcg_temp_new(TCG_TYPE_TL);
> +    target_ulong mask;
>
>     gen_load_gpr(t1, rs);
>     switch (opc) {
>     case OPC_EXT:
>         if (lsb + msb > 31)
>             goto fail;
> -        tcg_gen_helper_1_1ii(do_ext, t0, t1, lsb, msb + 1);
> +        tcg_gen_shri_tl(t0, t1, lsb);
> +        if (msb + 1 < 32) {

Given the above restriction of lsb + msb <= 31, this test can
be rewritten as:

     if (msb != 31)

I find this more readable, but that's personal taste :-)

> +            tcg_gen_andi_tl(t0, t0, (1 << (msb + 1)) - 1);
> +        } else {
> +            tcg_gen_ext32s_tl(t0, t0);
> +        }
>         break;
>  #if defined(TARGET_MIPS64)
>     case OPC_DEXTM:
>         if (lsb + msb > 63)
>             goto fail;

Can this really happen?  lsb and msb are 5 bit wide as far
as I could see.

> -        tcg_gen_helper_1_1ii(do_dext, t0, t1, lsb, msb + 1 + 32);
> +        tcg_gen_shri_tl(t0, t1, lsb);
> +        if (msb + 1 + 32 < 64) {

This can be rewritten as

    if (msb != 31)

> +            tcg_gen_andi_tl(t0, t0, (1ULL << (msb + 1 + 32)) - 1);
> +        }
>         break;
>     case OPC_DEXTU:
>         if (lsb + msb > 63)
>             goto fail;

Same as above.

> -        tcg_gen_helper_1_1ii(do_dext, t0, t1, lsb + 32, msb + 1);
> +        tcg_gen_shri_tl(t0, t1, lsb + 32);
> +        tcg_gen_andi_tl(t0, t0, (1ULL << (msb + 1)) - 1);
>         break;
>     case OPC_DEXT:
>         if (lsb + msb > 63)
>             goto fail;

Same as above.


Laurent

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 04/11] target-mips: convert bitfield ops to TCG
  2008-11-08 12:57   ` Laurent Desnogues
@ 2008-11-08 19:13     ` Aurelien Jarno
  0 siblings, 0 replies; 14+ messages in thread
From: Aurelien Jarno @ 2008-11-08 19:13 UTC (permalink / raw)
  To: qemu-devel

On Sat, Nov 08, 2008 at 01:57:21PM +0100, Laurent Desnogues wrote:
> On Sat, Nov 8, 2008 at 9:34 AM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> > Bitfield operations can be written with very few TCG instructions
> > (between 2 and 5), so it is worth converting them to TCG.
> >
> > Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> > ---
> >  target-mips/helper.h    |    6 +----
> >  target-mips/op_helper.c |   26 +------------------------
> >  target-mips/translate.c |   49 +++++++++++++++++++++++++++++++++++++---------
> >  3 files changed, 41 insertions(+), 40 deletions(-)
> [...]
> > diff --git a/target-mips/translate.c b/target-mips/translate.c
> > index af01f73..2cd1868 100644
> > --- a/target-mips/translate.c
> > +++ b/target-mips/translate.c
> > @@ -2682,57 +2682,86 @@ static void gen_compute_branch (DisasContext *ctx, uint32_t opc,
> >  static void gen_bitops (DisasContext *ctx, uint32_t opc, int rt,
> >                         int rs, int lsb, int msb)
> >  {
> > -    TCGv t0 = tcg_temp_local_new(TCG_TYPE_TL);
> > -    TCGv t1 = tcg_temp_local_new(TCG_TYPE_TL);
> > +    TCGv t0 = tcg_temp_new(TCG_TYPE_TL);
> > +    TCGv t1 = tcg_temp_new(TCG_TYPE_TL);
> > +    target_ulong mask;
> >
> >     gen_load_gpr(t1, rs);
> >     switch (opc) {
> >     case OPC_EXT:
> >         if (lsb + msb > 31)
> >             goto fail;
> > -        tcg_gen_helper_1_1ii(do_ext, t0, t1, lsb, msb + 1);
> > +        tcg_gen_shri_tl(t0, t1, lsb);
> > +        if (msb + 1 < 32) {
> 
> Given the above restriction of lsb + msb <= 31, this test can
> be rewritten as:
> 
>      if (msb != 31)
> 
> I find this more readable, but that's personal taste :-)

Fixed.
 
> > +            tcg_gen_andi_tl(t0, t0, (1 << (msb + 1)) - 1);
> > +        } else {
> > +            tcg_gen_ext32s_tl(t0, t0);
> > +        }
> >         break;
> >  #if defined(TARGET_MIPS64)
> >     case OPC_DEXTM:
> >         if (lsb + msb > 63)
> >             goto fail;
> 
> Can this really happen?  lsb and msb are 5 bit wide as far
> as I could see.

Not it can't. It comes for the previous code, and I forget to "optimize"
this part.

> > -        tcg_gen_helper_1_1ii(do_dext, t0, t1, lsb, msb + 1 + 32);
> > +        tcg_gen_shri_tl(t0, t1, lsb);
> > +        if (msb + 1 + 32 < 64) {
> 
> This can be rewritten as
> 
>     if (msb != 31)

Fixed.

> > +            tcg_gen_andi_tl(t0, t0, (1ULL << (msb + 1 + 32)) - 1);
> > +        }
> >         break;
> >     case OPC_DEXTU:
> >         if (lsb + msb > 63)
> >             goto fail;
> 
> Same as above.

Fixed.

> > -        tcg_gen_helper_1_1ii(do_dext, t0, t1, lsb + 32, msb + 1);
> > +        tcg_gen_shri_tl(t0, t1, lsb + 32);
> > +        tcg_gen_andi_tl(t0, t0, (1ULL << (msb + 1)) - 1);
> >         break;
> >     case OPC_DEXT:
> >         if (lsb + msb > 63)
> >             goto fail;
> 
> Same as above.
> 
Fixed.
> 
> Laurent
> 
> 
> 

-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32@debian.org         | aurelien@aurel32.net
   `-    people.debian.org/~aurel32 | www.aurel32.net

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2008-11-08 19:13 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-08  8:31 [Qemu-devel] [PATCH 0/11] target-mips: optimizations Aurelien Jarno
2008-11-08  8:32 ` [Qemu-devel] [PATCH 01/11] target-mips: optimize gen_save_pc() Aurelien Jarno
2008-11-08  8:32 ` [Qemu-devel] [PATCH 02/11] target-mips: optimize gen_op_addr_add() (1/2) Aurelien Jarno
2008-11-08  8:33 ` [Qemu-devel] [PATCH 03/11] target-mips: optimize gen_op_addr_add() (2/2) Aurelien Jarno
2008-11-08  8:34 ` [Qemu-devel] [PATCH 04/11] target-mips: convert bitfield ops to TCG Aurelien Jarno
2008-11-08 12:57   ` Laurent Desnogues
2008-11-08 19:13     ` Aurelien Jarno
2008-11-08  8:34 ` [Qemu-devel] [PATCH 05/11] target-mips: convert bit shuffle " Aurelien Jarno
2008-11-08  8:35 ` [Qemu-devel] [PATCH 06/11] target-mips: optimize gen_arith()/gen_arith_imm() Aurelien Jarno
2008-11-08  8:37 ` [Qemu-devel] [PATCH 07/11] target-mips: optimize gen_muldiv() Aurelien Jarno
2008-11-08  8:37 ` [Qemu-devel] [PATCH 08/11] target-mips: optimize gen_farith() Aurelien Jarno
2008-11-08  8:38 ` [Qemu-devel] [PATCH 09/11] target-mips: optimize movc*() Aurelien Jarno
2008-11-08  8:39 ` [Qemu-devel] [PATCH 10/11] target-mips: gen_compute_branch1() Aurelien Jarno
2008-11-08  8:39 ` [Qemu-devel] [PATCH 11/11] target-mips: fix temporary variable freeing in op_ldst_##insn() Aurelien Jarno

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).