qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 for-2.5 00/12] tcg: improve optimizer
@ 2015-07-27 10:55 Aurelien Jarno
  2015-07-27 10:55 ` [Qemu-devel] [PATCH for-2.4 01/12] tcg: correctly mark dead inputs for mov with a constant Aurelien Jarno
                   ` (11 more replies)
  0 siblings, 12 replies; 17+ messages in thread
From: Aurelien Jarno @ 2015-07-27 10:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno, Richard Henderson

This patchset improves the optimizer in 3 different ways:
 - by optimizing temp tracking using a bit array
 - by allowing constants to have copy
 - by differentiating 32 <-> 64 bits conversions from moves in the
   frontend by using specific instructions

The latter change introduces 2 new mandatory ext/extu_i32_i64 ops.
For the future we might want to allow each guest to only implement 2 of
the 3 size changing ops as I started to do in the following patchset:
http://lists.nongnu.org/archive/html/qemu-devel/2015-07/msg03369.html
That said we probably want to experiment with that first and see the
impact on the performances. If wrongly implemented, things might seem
to work but some bugs might appear in very rare cases, like the
truncation problem we recently got on aarch64 and x86-64 hosts.
Alternatively we might want to always implement the 3 ops on all
backends to avoid multiplying variants of the code and with that bugs.

This patchset has been fully tested on x86-64 host, but only lightly
tested on aarch64, ppc64 and s390x hosts.

Note that the two first patches have already been sent separately on the
mailing list and should make their way to the 2.4 release. I have
included them as they are fixing bugs triggered by patches from these
series.

Changes v1->v2
 - Added patches 1 & 2
 - patch 3: use bitmap_zero instead of memset
            for the call op, only reset temps that are in use
 - patch 4: do not use a bool to detect copies

Aurelien Jarno (12):
  tcg: correctly mark dead inputs for mov with a constant
  tcg: mark temps as mem_coherent = 0 for mov with a constant
  tcg/optimize: optimize temps tracking
  tcg/optimize: add temp_is_const and temp_is_copy functions
  tcg/optimize: track const/copy status separately
  tcg/optimize: allow constant to have copies
  tcg: rename trunc_shr_i32 into trunc_shr_i64_i32
  tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32
  tcg: implement real ext_i32_i64 and extu_i32_i64 ops
  tcg/optimize: add optimizations for ext_i32_i64 and extu_i32_i64 ops
  tcg/optimize: do not remember garbage high bits for 32-bit ops
  tcg: update README about size changing ops

 tcg/README               |  20 +++-
 tcg/aarch64/tcg-target.c |   4 +
 tcg/aarch64/tcg-target.h |   2 +-
 tcg/i386/tcg-target.c    |   5 +
 tcg/i386/tcg-target.h    |   2 +-
 tcg/ia64/tcg-target.c    |   4 +
 tcg/ia64/tcg-target.h    |   2 +-
 tcg/optimize.c           | 259 +++++++++++++++++++++--------------------------
 tcg/ppc/tcg-target.c     |   6 ++
 tcg/ppc/tcg-target.h     |   2 +-
 tcg/s390/tcg-target.c    |   5 +
 tcg/s390/tcg-target.h    |   2 +-
 tcg/sparc/tcg-target.c   |  12 ++-
 tcg/sparc/tcg-target.h   |   2 +-
 tcg/tcg-op.c             |  16 ++-
 tcg/tcg-opc.h            |   7 +-
 tcg/tcg.c                |   4 +
 tcg/tcg.h                |   2 +-
 tcg/tci/tcg-target.c     |   4 +
 tcg/tci/tcg-target.h     |   2 +-
 tci.c                    |   6 +-
 21 files changed, 198 insertions(+), 170 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH for-2.4 01/12] tcg: correctly mark dead inputs for mov with a constant
  2015-07-27 10:55 [Qemu-devel] [PATCH v2 for-2.5 00/12] tcg: improve optimizer Aurelien Jarno
@ 2015-07-27 10:55 ` Aurelien Jarno
  2015-07-27 10:55 ` [Qemu-devel] [PATCH for-2.4 02/12] tcg: mark temps as mem_coherent = 0 " Aurelien Jarno
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Aurelien Jarno @ 2015-07-27 10:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno, Richard Henderson

When tcg_reg_alloc_mov propagate a constant, we failed to correctly mark
a temp as dead if the liveness analysis hints so. This fixes the
following assert when configure with --enable-debug-tcg:

  qemu-x86_64: tcg/tcg.c:1827: tcg_reg_alloc_bb_end: Assertion `ts->val_type == TEMP_VAL_DEAD' failed.

Cc: Richard Henderson <rth@twiddle.net>
Reported-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/tcg.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 7e088b1..9a2508b 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1920,6 +1920,9 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOpDef *def,
         }
         ots->val_type = TEMP_VAL_CONST;
         ots->val = ts->val;
+        if (IS_DEAD_ARG(1)) {
+            temp_dead(s, args[1]);
+        }
     } else {
         /* The code in the first if block should have moved the
            temp to a register. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH for-2.4 02/12] tcg: mark temps as mem_coherent = 0 for mov with a constant
  2015-07-27 10:55 [Qemu-devel] [PATCH v2 for-2.5 00/12] tcg: improve optimizer Aurelien Jarno
  2015-07-27 10:55 ` [Qemu-devel] [PATCH for-2.4 01/12] tcg: correctly mark dead inputs for mov with a constant Aurelien Jarno
@ 2015-07-27 10:55 ` Aurelien Jarno
  2015-07-27 10:55 ` [Qemu-devel] [PATCH v2 for-2.5 03/12] tcg/optimize: optimize temps tracking Aurelien Jarno
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Aurelien Jarno @ 2015-07-27 10:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno, Richard Henderson

When a constant has to be loaded in a mov op, we fail to set
mem_coherent = 0. This patch fixes that.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/tcg.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 9a2508b..0892a9b 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1894,6 +1894,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOpDef *def,
             ts->mem_coherent = 1;
         } else if (ts->val_type == TEMP_VAL_CONST) {
             tcg_out_movi(s, itype, ts->reg, ts->val);
+            ts->mem_coherent = 0;
         }
         s->reg_to_temp[ts->reg] = args[1];
         ts->val_type = TEMP_VAL_REG;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 for-2.5 03/12] tcg/optimize: optimize temps tracking
  2015-07-27 10:55 [Qemu-devel] [PATCH v2 for-2.5 00/12] tcg: improve optimizer Aurelien Jarno
  2015-07-27 10:55 ` [Qemu-devel] [PATCH for-2.4 01/12] tcg: correctly mark dead inputs for mov with a constant Aurelien Jarno
  2015-07-27 10:55 ` [Qemu-devel] [PATCH for-2.4 02/12] tcg: mark temps as mem_coherent = 0 " Aurelien Jarno
@ 2015-07-27 10:55 ` Aurelien Jarno
  2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 04/12] tcg/optimize: add temp_is_const and temp_is_copy functions Aurelien Jarno
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Aurelien Jarno @ 2015-07-27 10:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno, Richard Henderson

The tcg_temp_info structure uses 24 bytes per temp. Now that we emulate
vector registers on most guests, it's not uncommon to have more than 100
used temps. This means we have initialize more than 2kB at least twice
per TB, often more when there is a few goto_tb.

Instead used a TCGTempSet bit array to track which temps are in used in
the current basic block. This means there are only around 16 bytes to
initialize.

This improves the boot time of a MIPS guest on an x86-64 host by around
7% and moves out tcg_optimize from the the top of the profiler list.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c | 36 +++++++++++++++++++++++++-----------
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index cd0e793..425c14b 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -50,6 +50,7 @@ struct tcg_temp_info {
 };
 
 static struct tcg_temp_info temps[TCG_MAX_TEMPS];
+static TCGTempSet temps_used;
 
 /* Reset TEMP's state to TCG_TEMP_UNDEF.  If TEMP only had one copy, remove
    the copy flag from the left temp.  */
@@ -67,6 +68,22 @@ static void reset_temp(TCGArg temp)
     temps[temp].mask = -1;
 }
 
+/* Reset all temporaries, given that there are NB_TEMPS of them.  */
+static void reset_all_temps(int nb_temps)
+{
+    bitmap_zero(temps_used.l, nb_temps);
+}
+
+/* Initialize and activate a temporary.  */
+static void init_temp_info(TCGArg temp)
+{
+    if (!test_bit(temp, temps_used.l)) {
+        temps[temp].state = TCG_TEMP_UNDEF;
+        temps[temp].mask = -1;
+        set_bit(temp, temps_used.l);
+    }
+}
+
 static TCGOp *insert_op_before(TCGContext *s, TCGOp *old_op,
                                 TCGOpcode opc, int nargs)
 {
@@ -98,16 +115,6 @@ static TCGOp *insert_op_before(TCGContext *s, TCGOp *old_op,
     return new_op;
 }
 
-/* Reset all temporaries, given that there are NB_TEMPS of them.  */
-static void reset_all_temps(int nb_temps)
-{
-    int i;
-    for (i = 0; i < nb_temps; i++) {
-        temps[i].state = TCG_TEMP_UNDEF;
-        temps[i].mask = -1;
-    }
-}
-
 static int op_bits(TCGOpcode op)
 {
     const TCGOpDef *def = &tcg_op_defs[op];
@@ -606,6 +613,11 @@ void tcg_optimize(TCGContext *s)
             nb_iargs = def->nb_iargs;
         }
 
+        /* Initialize the temps that are going to be used */
+        for (i = 0; i < nb_oargs + nb_iargs; i++) {
+            init_temp_info(args[i]);
+        }
+
         /* Do copy propagation */
         for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
             if (temps[args[i]].state == TCG_TEMP_COPY) {
@@ -1299,7 +1311,9 @@ void tcg_optimize(TCGContext *s)
             if (!(args[nb_oargs + nb_iargs + 1]
                   & (TCG_CALL_NO_READ_GLOBALS | TCG_CALL_NO_WRITE_GLOBALS))) {
                 for (i = 0; i < nb_globals; i++) {
-                    reset_temp(i);
+                    if (test_bit(i, temps_used.l)) {
+                        reset_temp(i);
+                    }
                 }
             }
             goto do_reset_output;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 for-2.5 04/12] tcg/optimize: add temp_is_const and temp_is_copy functions
  2015-07-27 10:55 [Qemu-devel] [PATCH v2 for-2.5 00/12] tcg: improve optimizer Aurelien Jarno
                   ` (2 preceding siblings ...)
  2015-07-27 10:55 ` [Qemu-devel] [PATCH v2 for-2.5 03/12] tcg/optimize: optimize temps tracking Aurelien Jarno
@ 2015-07-27 10:56 ` Aurelien Jarno
  2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 05/12] tcg/optimize: track const/copy status separately Aurelien Jarno
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Aurelien Jarno @ 2015-07-27 10:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno, Richard Henderson

Add two accessor functions temp_is_const and temp_is_copy, to make the
code more readable and make code change easier.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c | 131 ++++++++++++++++++++++++++-------------------------------
 1 file changed, 60 insertions(+), 71 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 425c14b..719fee2 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -52,11 +52,21 @@ struct tcg_temp_info {
 static struct tcg_temp_info temps[TCG_MAX_TEMPS];
 static TCGTempSet temps_used;
 
+static inline bool temp_is_const(TCGArg arg)
+{
+    return temps[arg].state == TCG_TEMP_CONST;
+}
+
+static inline bool temp_is_copy(TCGArg arg)
+{
+    return temps[arg].state == TCG_TEMP_COPY;
+}
+
 /* Reset TEMP's state to TCG_TEMP_UNDEF.  If TEMP only had one copy, remove
    the copy flag from the left temp.  */
 static void reset_temp(TCGArg temp)
 {
-    if (temps[temp].state == TCG_TEMP_COPY) {
+    if (temp_is_copy(temp)) {
         if (temps[temp].prev_copy == temps[temp].next_copy) {
             temps[temps[temp].next_copy].state = TCG_TEMP_UNDEF;
         } else {
@@ -186,8 +196,7 @@ static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
         return true;
     }
 
-    if (temps[arg1].state != TCG_TEMP_COPY
-        || temps[arg2].state != TCG_TEMP_COPY) {
+    if (!temp_is_copy(arg1) || !temp_is_copy(arg2)) {
         return false;
     }
 
@@ -230,7 +239,7 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
         return;
     }
 
-    if (temps[src].state == TCG_TEMP_CONST) {
+    if (temp_is_const(src)) {
         tcg_opt_gen_movi(s, op, args, dst, temps[src].val);
         return;
     }
@@ -248,10 +257,10 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
     }
     temps[dst].mask = mask;
 
-    assert(temps[src].state != TCG_TEMP_CONST);
+    assert(!temp_is_const(src));
 
     if (s->temps[src].type == s->temps[dst].type) {
-        if (temps[src].state != TCG_TEMP_COPY) {
+        if (!temp_is_copy(src)) {
             temps[src].state = TCG_TEMP_COPY;
             temps[src].next_copy = src;
             temps[src].prev_copy = src;
@@ -488,7 +497,7 @@ static bool do_constant_folding_cond_eq(TCGCond c)
 static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
                                        TCGArg y, TCGCond c)
 {
-    if (temps[x].state == TCG_TEMP_CONST && temps[y].state == TCG_TEMP_CONST) {
+    if (temp_is_const(x) && temp_is_const(y)) {
         switch (op_bits(op)) {
         case 32:
             return do_constant_folding_cond_32(temps[x].val, temps[y].val, c);
@@ -499,7 +508,7 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
         }
     } else if (temps_are_copies(x, y)) {
         return do_constant_folding_cond_eq(c);
-    } else if (temps[y].state == TCG_TEMP_CONST && temps[y].val == 0) {
+    } else if (temp_is_const(y) && temps[y].val == 0) {
         switch (c) {
         case TCG_COND_LTU:
             return 0;
@@ -520,12 +529,10 @@ static TCGArg do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
     TCGArg al = p1[0], ah = p1[1];
     TCGArg bl = p2[0], bh = p2[1];
 
-    if (temps[bl].state == TCG_TEMP_CONST
-        && temps[bh].state == TCG_TEMP_CONST) {
+    if (temp_is_const(bl) && temp_is_const(bh)) {
         uint64_t b = ((uint64_t)temps[bh].val << 32) | (uint32_t)temps[bl].val;
 
-        if (temps[al].state == TCG_TEMP_CONST
-            && temps[ah].state == TCG_TEMP_CONST) {
+        if (temp_is_const(al) && temp_is_const(ah)) {
             uint64_t a;
             a = ((uint64_t)temps[ah].val << 32) | (uint32_t)temps[al].val;
             return do_constant_folding_cond_64(a, b, c);
@@ -551,8 +558,8 @@ static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
 {
     TCGArg a1 = *p1, a2 = *p2;
     int sum = 0;
-    sum += temps[a1].state == TCG_TEMP_CONST;
-    sum -= temps[a2].state == TCG_TEMP_CONST;
+    sum += temp_is_const(a1);
+    sum -= temp_is_const(a2);
 
     /* Prefer the constant in second argument, and then the form
        op a, a, b, which is better handled on non-RISC hosts. */
@@ -567,10 +574,10 @@ static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
 static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
 {
     int sum = 0;
-    sum += temps[p1[0]].state == TCG_TEMP_CONST;
-    sum += temps[p1[1]].state == TCG_TEMP_CONST;
-    sum -= temps[p2[0]].state == TCG_TEMP_CONST;
-    sum -= temps[p2[1]].state == TCG_TEMP_CONST;
+    sum += temp_is_const(p1[0]);
+    sum += temp_is_const(p1[1]);
+    sum -= temp_is_const(p2[0]);
+    sum -= temp_is_const(p2[1]);
     if (sum > 0) {
         TCGArg t;
         t = p1[0], p1[0] = p2[0], p2[0] = t;
@@ -620,7 +627,7 @@ void tcg_optimize(TCGContext *s)
 
         /* Do copy propagation */
         for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
-            if (temps[args[i]].state == TCG_TEMP_COPY) {
+            if (temp_is_copy(args[i])) {
                 args[i] = find_better_copy(s, args[i]);
             }
         }
@@ -690,8 +697,7 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(sar):
         CASE_OP_32_64(rotl):
         CASE_OP_32_64(rotr):
-            if (temps[args[1]].state == TCG_TEMP_CONST
-                && temps[args[1]].val == 0) {
+            if (temp_is_const(args[1]) && temps[args[1]].val == 0) {
                 tcg_opt_gen_movi(s, op, args, args[0], 0);
                 continue;
             }
@@ -701,7 +707,7 @@ void tcg_optimize(TCGContext *s)
                 TCGOpcode neg_op;
                 bool have_neg;
 
-                if (temps[args[2]].state == TCG_TEMP_CONST) {
+                if (temp_is_const(args[2])) {
                     /* Proceed with possible constant folding. */
                     break;
                 }
@@ -715,8 +721,7 @@ void tcg_optimize(TCGContext *s)
                 if (!have_neg) {
                     break;
                 }
-                if (temps[args[1]].state == TCG_TEMP_CONST
-                    && temps[args[1]].val == 0) {
+                if (temp_is_const(args[1]) && temps[args[1]].val == 0) {
                     op->opc = neg_op;
                     reset_temp(args[0]);
                     args[1] = args[2];
@@ -726,34 +731,30 @@ void tcg_optimize(TCGContext *s)
             break;
         CASE_OP_32_64(xor):
         CASE_OP_32_64(nand):
-            if (temps[args[1]].state != TCG_TEMP_CONST
-                && temps[args[2]].state == TCG_TEMP_CONST
-                && temps[args[2]].val == -1) {
+            if (!temp_is_const(args[1])
+                && temp_is_const(args[2]) && temps[args[2]].val == -1) {
                 i = 1;
                 goto try_not;
             }
             break;
         CASE_OP_32_64(nor):
-            if (temps[args[1]].state != TCG_TEMP_CONST
-                && temps[args[2]].state == TCG_TEMP_CONST
-                && temps[args[2]].val == 0) {
+            if (!temp_is_const(args[1])
+                && temp_is_const(args[2]) && temps[args[2]].val == 0) {
                 i = 1;
                 goto try_not;
             }
             break;
         CASE_OP_32_64(andc):
-            if (temps[args[2]].state != TCG_TEMP_CONST
-                && temps[args[1]].state == TCG_TEMP_CONST
-                && temps[args[1]].val == -1) {
+            if (!temp_is_const(args[2])
+                && temp_is_const(args[1]) && temps[args[1]].val == -1) {
                 i = 2;
                 goto try_not;
             }
             break;
         CASE_OP_32_64(orc):
         CASE_OP_32_64(eqv):
-            if (temps[args[2]].state != TCG_TEMP_CONST
-                && temps[args[1]].state == TCG_TEMP_CONST
-                && temps[args[1]].val == 0) {
+            if (!temp_is_const(args[2])
+                && temp_is_const(args[1]) && temps[args[1]].val == 0) {
                 i = 2;
                 goto try_not;
             }
@@ -794,9 +795,8 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(or):
         CASE_OP_32_64(xor):
         CASE_OP_32_64(andc):
-            if (temps[args[1]].state != TCG_TEMP_CONST
-                && temps[args[2]].state == TCG_TEMP_CONST
-                && temps[args[2]].val == 0) {
+            if (!temp_is_const(args[1])
+                && temp_is_const(args[2]) && temps[args[2]].val == 0) {
                 tcg_opt_gen_mov(s, op, args, args[0], args[1]);
                 continue;
             }
@@ -804,9 +804,8 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(and):
         CASE_OP_32_64(orc):
         CASE_OP_32_64(eqv):
-            if (temps[args[1]].state != TCG_TEMP_CONST
-                && temps[args[2]].state == TCG_TEMP_CONST
-                && temps[args[2]].val == -1) {
+            if (!temp_is_const(args[1])
+                && temp_is_const(args[2]) && temps[args[2]].val == -1) {
                 tcg_opt_gen_mov(s, op, args, args[0], args[1]);
                 continue;
             }
@@ -844,7 +843,7 @@ void tcg_optimize(TCGContext *s)
 
         CASE_OP_32_64(and):
             mask = temps[args[2]].mask;
-            if (temps[args[2]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[2])) {
         and_const:
                 affected = temps[args[1]].mask & ~mask;
             }
@@ -854,7 +853,7 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(andc):
             /* Known-zeros does not imply known-ones.  Therefore unless
                args[2] is constant, we can't infer anything from it.  */
-            if (temps[args[2]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[2])) {
                 mask = ~temps[args[2]].mask;
                 goto and_const;
             }
@@ -863,26 +862,26 @@ void tcg_optimize(TCGContext *s)
             break;
 
         case INDEX_op_sar_i32:
-            if (temps[args[2]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[2])) {
                 tmp = temps[args[2]].val & 31;
                 mask = (int32_t)temps[args[1]].mask >> tmp;
             }
             break;
         case INDEX_op_sar_i64:
-            if (temps[args[2]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[2])) {
                 tmp = temps[args[2]].val & 63;
                 mask = (int64_t)temps[args[1]].mask >> tmp;
             }
             break;
 
         case INDEX_op_shr_i32:
-            if (temps[args[2]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[2])) {
                 tmp = temps[args[2]].val & 31;
                 mask = (uint32_t)temps[args[1]].mask >> tmp;
             }
             break;
         case INDEX_op_shr_i64:
-            if (temps[args[2]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[2])) {
                 tmp = temps[args[2]].val & 63;
                 mask = (uint64_t)temps[args[1]].mask >> tmp;
             }
@@ -893,7 +892,7 @@ void tcg_optimize(TCGContext *s)
             break;
 
         CASE_OP_32_64(shl):
-            if (temps[args[2]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[2])) {
                 tmp = temps[args[2]].val & (TCG_TARGET_REG_BITS - 1);
                 mask = temps[args[1]].mask << tmp;
             }
@@ -974,8 +973,7 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(mul):
         CASE_OP_32_64(muluh):
         CASE_OP_32_64(mulsh):
-            if ((temps[args[2]].state == TCG_TEMP_CONST
-                && temps[args[2]].val == 0)) {
+            if ((temp_is_const(args[2]) && temps[args[2]].val == 0)) {
                 tcg_opt_gen_movi(s, op, args, args[0], 0);
                 continue;
             }
@@ -1030,7 +1028,7 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(ext16u):
         case INDEX_op_ext32s_i64:
         case INDEX_op_ext32u_i64:
-            if (temps[args[1]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[1])) {
                 tmp = do_constant_folding(opc, temps[args[1]].val, 0);
                 tcg_opt_gen_movi(s, op, args, args[0], tmp);
                 break;
@@ -1038,7 +1036,7 @@ void tcg_optimize(TCGContext *s)
             goto do_default;
 
         case INDEX_op_trunc_shr_i32:
-            if (temps[args[1]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[1])) {
                 tmp = do_constant_folding(opc, temps[args[1]].val, args[2]);
                 tcg_opt_gen_movi(s, op, args, args[0], tmp);
                 break;
@@ -1067,8 +1065,7 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(divu):
         CASE_OP_32_64(rem):
         CASE_OP_32_64(remu):
-            if (temps[args[1]].state == TCG_TEMP_CONST
-                && temps[args[2]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[1]) && temp_is_const(args[2])) {
                 tmp = do_constant_folding(opc, temps[args[1]].val,
                                           temps[args[2]].val);
                 tcg_opt_gen_movi(s, op, args, args[0], tmp);
@@ -1077,8 +1074,7 @@ void tcg_optimize(TCGContext *s)
             goto do_default;
 
         CASE_OP_32_64(deposit):
-            if (temps[args[1]].state == TCG_TEMP_CONST
-                && temps[args[2]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[1]) && temp_is_const(args[2])) {
                 tmp = deposit64(temps[args[1]].val, args[3], args[4],
                                 temps[args[2]].val);
                 tcg_opt_gen_movi(s, op, args, args[0], tmp);
@@ -1118,10 +1114,8 @@ void tcg_optimize(TCGContext *s)
 
         case INDEX_op_add2_i32:
         case INDEX_op_sub2_i32:
-            if (temps[args[2]].state == TCG_TEMP_CONST
-                && temps[args[3]].state == TCG_TEMP_CONST
-                && temps[args[4]].state == TCG_TEMP_CONST
-                && temps[args[5]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[2]) && temp_is_const(args[3])
+                && temp_is_const(args[4]) && temp_is_const(args[5])) {
                 uint32_t al = temps[args[2]].val;
                 uint32_t ah = temps[args[3]].val;
                 uint32_t bl = temps[args[4]].val;
@@ -1150,8 +1144,7 @@ void tcg_optimize(TCGContext *s)
             goto do_default;
 
         case INDEX_op_mulu2_i32:
-            if (temps[args[2]].state == TCG_TEMP_CONST
-                && temps[args[3]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[2]) && temp_is_const(args[3])) {
                 uint32_t a = temps[args[2]].val;
                 uint32_t b = temps[args[3]].val;
                 uint64_t r = (uint64_t)a * b;
@@ -1183,10 +1176,8 @@ void tcg_optimize(TCGContext *s)
                     tcg_op_remove(s, op);
                 }
             } else if ((args[4] == TCG_COND_LT || args[4] == TCG_COND_GE)
-                       && temps[args[2]].state == TCG_TEMP_CONST
-                       && temps[args[3]].state == TCG_TEMP_CONST
-                       && temps[args[2]].val == 0
-                       && temps[args[3]].val == 0) {
+                       && temp_is_const(args[2]) && temps[args[2]].val == 0
+                       && temp_is_const(args[3]) && temps[args[3]].val == 0) {
                 /* Simplify LT/GE comparisons vs zero to a single compare
                    vs the high word of the input.  */
             do_brcond_high:
@@ -1248,10 +1239,8 @@ void tcg_optimize(TCGContext *s)
             do_setcond_const:
                 tcg_opt_gen_movi(s, op, args, args[0], tmp);
             } else if ((args[5] == TCG_COND_LT || args[5] == TCG_COND_GE)
-                       && temps[args[3]].state == TCG_TEMP_CONST
-                       && temps[args[4]].state == TCG_TEMP_CONST
-                       && temps[args[3]].val == 0
-                       && temps[args[4]].val == 0) {
+                       && temp_is_const(args[3]) && temps[args[3]].val == 0
+                       && temp_is_const(args[4]) && temps[args[4]].val == 0) {
                 /* Simplify LT/GE comparisons vs zero to a single compare
                    vs the high word of the input.  */
             do_setcond_high:
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 for-2.5 05/12] tcg/optimize: track const/copy status separately
  2015-07-27 10:55 [Qemu-devel] [PATCH v2 for-2.5 00/12] tcg: improve optimizer Aurelien Jarno
                   ` (3 preceding siblings ...)
  2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 04/12] tcg/optimize: add temp_is_const and temp_is_copy functions Aurelien Jarno
@ 2015-07-27 10:56 ` Aurelien Jarno
  2015-07-27 11:26   ` Paolo Bonzini
  2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 06/12] tcg/optimize: allow constant to have copies Aurelien Jarno
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 17+ messages in thread
From: Aurelien Jarno @ 2015-07-27 10:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno, Richard Henderson

Instead of using an enum which could be either a copy or a const, track
them separately. Constants are tracked through a bool. Copies are
tracked by initializing temp's next_copy and prev_copy to itself,
allowing to simplify the code a bit.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c | 42 ++++++++++++++----------------------------
 1 file changed, 14 insertions(+), 28 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 719fee2..e69ab05 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -35,14 +35,8 @@
         glue(glue(case INDEX_op_, x), _i32):    \
         glue(glue(case INDEX_op_, x), _i64)
 
-typedef enum {
-    TCG_TEMP_UNDEF = 0,
-    TCG_TEMP_CONST,
-    TCG_TEMP_COPY,
-} tcg_temp_state;
-
 struct tcg_temp_info {
-    tcg_temp_state state;
+    bool is_const;
     uint16_t prev_copy;
     uint16_t next_copy;
     tcg_target_ulong val;
@@ -54,27 +48,22 @@ static TCGTempSet temps_used;
 
 static inline bool temp_is_const(TCGArg arg)
 {
-    return temps[arg].state == TCG_TEMP_CONST;
+    return temps[arg].is_const;
 }
 
 static inline bool temp_is_copy(TCGArg arg)
 {
-    return temps[arg].state == TCG_TEMP_COPY;
+    return temps[arg].next_copy != arg;
 }
 
-/* Reset TEMP's state to TCG_TEMP_UNDEF.  If TEMP only had one copy, remove
-   the copy flag from the left temp.  */
+/* Reset TEMP's state, possibly removing the temp for the list of copies.  */
 static void reset_temp(TCGArg temp)
 {
-    if (temp_is_copy(temp)) {
-        if (temps[temp].prev_copy == temps[temp].next_copy) {
-            temps[temps[temp].next_copy].state = TCG_TEMP_UNDEF;
-        } else {
-            temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
-            temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
-        }
-    }
-    temps[temp].state = TCG_TEMP_UNDEF;
+    temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
+    temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
+    temps[temp].next_copy = temp;
+    temps[temp].prev_copy = temp;
+    temps[temp].is_const = false;
     temps[temp].mask = -1;
 }
 
@@ -88,7 +77,9 @@ static void reset_all_temps(int nb_temps)
 static void init_temp_info(TCGArg temp)
 {
     if (!test_bit(temp, temps_used.l)) {
-        temps[temp].state = TCG_TEMP_UNDEF;
+        temps[temp].next_copy = temp;
+        temps[temp].prev_copy = temp;
+        temps[temp].is_const = false;
         temps[temp].mask = -1;
         set_bit(temp, temps_used.l);
     }
@@ -218,7 +209,7 @@ static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args,
     op->opc = new_op;
 
     reset_temp(dst);
-    temps[dst].state = TCG_TEMP_CONST;
+    temps[dst].is_const = true;
     temps[dst].val = val;
     mask = val;
     if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_movi_i32) {
@@ -260,16 +251,11 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
     assert(!temp_is_const(src));
 
     if (s->temps[src].type == s->temps[dst].type) {
-        if (!temp_is_copy(src)) {
-            temps[src].state = TCG_TEMP_COPY;
-            temps[src].next_copy = src;
-            temps[src].prev_copy = src;
-        }
-        temps[dst].state = TCG_TEMP_COPY;
         temps[dst].next_copy = temps[src].next_copy;
         temps[dst].prev_copy = src;
         temps[temps[dst].next_copy].prev_copy = dst;
         temps[src].next_copy = dst;
+        temps[dst].is_const = false;
     }
 
     args[0] = dst;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 for-2.5 06/12] tcg/optimize: allow constant to have copies
  2015-07-27 10:55 [Qemu-devel] [PATCH v2 for-2.5 00/12] tcg: improve optimizer Aurelien Jarno
                   ` (4 preceding siblings ...)
  2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 05/12] tcg/optimize: track const/copy status separately Aurelien Jarno
@ 2015-07-27 10:56 ` Aurelien Jarno
  2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 07/12] tcg: rename trunc_shr_i32 into trunc_shr_i64_i32 Aurelien Jarno
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Aurelien Jarno @ 2015-07-27 10:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno, Richard Henderson

Now that copies and constants are tracked separately, we can allow
constant to have copies, deferring the choice to use a register or a
constant to the register allocation pass. This prevent this kind of
regular constant reloading:

-OUT: [size=338]
+OUT: [size=298]
   mov    -0x4(%r14),%ebp
   test   %ebp,%ebp
   jne    0x7ffbe9cb0ed6
   mov    $0x40002219f8,%rbp
   mov    %rbp,(%r14)
-  mov    $0x40002219f8,%rbp
   mov    $0x4000221a20,%rbx
   mov    %rbp,(%rbx)
   mov    $0x4000000000,%rbp
   mov    %rbp,(%r14)
-  mov    $0x4000000000,%rbp
   mov    $0x4000221d38,%rbx
   mov    %rbp,(%rbx)
   mov    $0x40002221a8,%rbp
   mov    %rbp,(%r14)
-  mov    $0x40002221a8,%rbp
   mov    $0x4000221d40,%rbx
   mov    %rbp,(%rbx)
   mov    $0x4000019170,%rbp
   mov    %rbp,(%r14)
-  mov    $0x4000019170,%rbp
   mov    $0x4000221d48,%rbx
   mov    %rbp,(%rbx)
   mov    $0x40000049ee,%rbp
   mov    %rbp,0x80(%r14)
   mov    %r14,%rdi
   callq  0x7ffbe99924d0
   mov    $0x4000001680,%rbp
   mov    %rbp,0x30(%r14)
   mov    0x10(%r14),%rbp
   mov    $0x4000001680,%rbp
   mov    %rbp,0x30(%r14)
   mov    0x10(%r14),%rbp
   shl    $0x20,%rbp
   mov    (%r14),%rbx
   mov    %ebx,%ebx
   mov    %rbx,(%r14)
   or     %rbx,%rbp
   mov    %rbp,0x10(%r14)
   mov    %rbp,0x90(%r14)
   mov    0x60(%r14),%rbx
   mov    %rbx,0x38(%r14)
   mov    0x28(%r14),%rbx
   mov    $0x4000220e60,%r12
   mov    %rbx,(%r12)
   mov    $0x40002219c8,%rbx
   mov    %rbp,(%rbx)
   mov    0x20(%r14),%rbp
   sub    $0x8,%rbp
   mov    $0x4000004a16,%rbx
   mov    %rbx,0x0(%rbp)
   mov    %rbp,0x20(%r14)
   mov    $0x19,%ebp
   mov    %ebp,0xa8(%r14)
   mov    $0x4000015110,%rbp
   mov    %rbp,0x80(%r14)
   xor    %eax,%eax
   jmpq   0x7ffbebcae426
   lea    -0x5f6d72a(%rip),%rax        # 0x7ffbe3d437b3
   jmpq   0x7ffbebcae426

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index e69ab05..2d3c72b 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -230,11 +230,6 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
         return;
     }
 
-    if (temp_is_const(src)) {
-        tcg_opt_gen_movi(s, op, args, dst, temps[src].val);
-        return;
-    }
-
     TCGOpcode new_op = op_to_mov(op->opc);
     tcg_target_ulong mask;
 
@@ -248,14 +243,13 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
     }
     temps[dst].mask = mask;
 
-    assert(!temp_is_const(src));
-
     if (s->temps[src].type == s->temps[dst].type) {
         temps[dst].next_copy = temps[src].next_copy;
         temps[dst].prev_copy = src;
         temps[temps[dst].next_copy].prev_copy = dst;
         temps[src].next_copy = dst;
-        temps[dst].is_const = false;
+        temps[dst].is_const = temps[src].is_const;
+        temps[dst].val = temps[src].val;
     }
 
     args[0] = dst;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 for-2.5 07/12] tcg: rename trunc_shr_i32 into trunc_shr_i64_i32
  2015-07-27 10:55 [Qemu-devel] [PATCH v2 for-2.5 00/12] tcg: improve optimizer Aurelien Jarno
                   ` (5 preceding siblings ...)
  2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 06/12] tcg/optimize: allow constant to have copies Aurelien Jarno
@ 2015-07-27 10:56 ` Aurelien Jarno
  2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 08/12] tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32 Aurelien Jarno
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Aurelien Jarno @ 2015-07-27 10:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno, Richard Henderson

The op is sometimes named trunc_shr_i32 and sometimes trunc_shr_i64_i32,
and the name in the README doesn't match the name offered to the
frontends.

Always use the long name to make it clear it is a size changing op.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/README               | 2 +-
 tcg/aarch64/tcg-target.h | 2 +-
 tcg/i386/tcg-target.h    | 2 +-
 tcg/ia64/tcg-target.h    | 2 +-
 tcg/optimize.c           | 6 +++---
 tcg/ppc/tcg-target.h     | 2 +-
 tcg/s390/tcg-target.h    | 2 +-
 tcg/sparc/tcg-target.c   | 4 ++--
 tcg/sparc/tcg-target.h   | 2 +-
 tcg/tcg-op.c             | 4 ++--
 tcg/tcg-opc.h            | 4 ++--
 tcg/tcg.h                | 2 +-
 tcg/tci/tcg-target.h     | 2 +-
 13 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/tcg/README b/tcg/README
index a550ff1..61b3899 100644
--- a/tcg/README
+++ b/tcg/README
@@ -314,7 +314,7 @@ This operation would be equivalent to
 
   dest = (t1 & ~0x0f00) | ((t2 << 8) & 0x0f00)
 
-* trunc_shr_i32 t0, t1, pos
+* trunc_shr_i64_i32 t0, t1, pos
 
 For 64-bit hosts only, right shift the 64-bit input T1 by POS and
 truncate to 32-bit output T0.  Depending on the host, this may be
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 8aec04d..dfd8801 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -70,7 +70,7 @@ typedef enum {
 #define TCG_TARGET_HAS_muls2_i32        0
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 
 #define TCG_TARGET_HAS_div_i64          1
 #define TCG_TARGET_HAS_rem_i64          1
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 25b5133..dae50ba 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -102,7 +102,7 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_mulsh_i32        0
 
 #if TCG_TARGET_REG_BITS == 64
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 #define TCG_TARGET_HAS_div2_i64         1
 #define TCG_TARGET_HAS_rot_i64          1
 #define TCG_TARGET_HAS_ext8s_i64        1
diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
index a04ed81..29902f9 100644
--- a/tcg/ia64/tcg-target.h
+++ b/tcg/ia64/tcg-target.h
@@ -160,7 +160,7 @@ typedef enum {
 #define TCG_TARGET_HAS_muluh_i64        0
 #define TCG_TARGET_HAS_mulsh_i32        0
 #define TCG_TARGET_HAS_mulsh_i64        0
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 
 #define TCG_TARGET_deposit_i32_valid(ofs, len) ((len) <= 16)
 #define TCG_TARGET_deposit_i64_valid(ofs, len) ((len) <= 16)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 2d3c72b..8bfe7ff 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -288,7 +288,7 @@ static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
     case INDEX_op_shr_i32:
         return (uint32_t)x >> (y & 31);
 
-    case INDEX_op_trunc_shr_i32:
+    case INDEX_op_trunc_shr_i64_i32:
     case INDEX_op_shr_i64:
         return (uint64_t)x >> (y & 63);
 
@@ -867,7 +867,7 @@ void tcg_optimize(TCGContext *s)
             }
             break;
 
-        case INDEX_op_trunc_shr_i32:
+        case INDEX_op_trunc_shr_i64_i32:
             mask = (uint64_t)temps[args[1]].mask >> args[2];
             break;
 
@@ -1015,7 +1015,7 @@ void tcg_optimize(TCGContext *s)
             }
             goto do_default;
 
-        case INDEX_op_trunc_shr_i32:
+        case INDEX_op_trunc_shr_i64_i32:
             if (temp_is_const(args[1])) {
                 tmp = do_constant_folding(opc, temps[args[1]].val, args[2]);
                 tcg_opt_gen_movi(s, op, args, args[0], tmp);
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 7ce7048..b7e6861 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -77,7 +77,7 @@ typedef enum {
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_add2_i32         0
 #define TCG_TARGET_HAS_sub2_i32         0
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 #define TCG_TARGET_HAS_div_i64          1
 #define TCG_TARGET_HAS_rem_i64          0
 #define TCG_TARGET_HAS_rot_i64          1
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 91576d5..50016a8 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -72,7 +72,7 @@ typedef enum TCGReg {
 #define TCG_TARGET_HAS_muls2_i32        0
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 
 #define TCG_TARGET_HAS_div2_i64         1
 #define TCG_TARGET_HAS_rot_i64          1
diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 1a870a8..b23032b 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -1413,7 +1413,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_ext32u_i64:
         tcg_out_arithi(s, a0, a1, 0, SHIFT_SRL);
         break;
-    case INDEX_op_trunc_shr_i32:
+    case INDEX_op_trunc_shr_i64_i32:
         if (a2 == 0) {
             tcg_out_mov(s, TCG_TYPE_I32, a0, a1);
         } else {
@@ -1533,7 +1533,7 @@ static const TCGTargetOpDef sparc_op_defs[] = {
 
     { INDEX_op_ext32s_i64, { "R", "r" } },
     { INDEX_op_ext32u_i64, { "R", "r" } },
-    { INDEX_op_trunc_shr_i32,  { "r", "R" } },
+    { INDEX_op_trunc_shr_i64_i32,  { "r", "R" } },
 
     { INDEX_op_brcond_i64, { "RZ", "RJ" } },
     { INDEX_op_setcond_i64, { "R", "RZ", "RJ" } },
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index f584de4..336c47f 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -118,7 +118,7 @@ extern bool use_vis3_instructions;
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
 
-#define TCG_TARGET_HAS_trunc_shr_i32    1
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 1
 #define TCG_TARGET_HAS_div_i64          1
 #define TCG_TARGET_HAS_rem_i64          0
 #define TCG_TARGET_HAS_rot_i64          0
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 45098c3..61b64db 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1751,8 +1751,8 @@ void tcg_gen_trunc_shr_i64_i32(TCGv_i32 ret, TCGv_i64 arg, unsigned count)
             tcg_gen_mov_i32(ret, TCGV_LOW(t));
             tcg_temp_free_i64(t);
         }
-    } else if (TCG_TARGET_HAS_trunc_shr_i32) {
-        tcg_gen_op3i_i32(INDEX_op_trunc_shr_i32, ret,
+    } else if (TCG_TARGET_HAS_trunc_shr_i64_i32) {
+        tcg_gen_op3i_i32(INDEX_op_trunc_shr_i64_i32, ret,
                          MAKE_TCGV_I32(GET_TCGV_I64(arg)), count);
     } else if (count == 0) {
         tcg_gen_mov_i32(ret, MAKE_TCGV_I32(GET_TCGV_I64(arg)));
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 13ccb60..4a34f43 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -138,8 +138,8 @@ DEF(rotl_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
 DEF(rotr_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
 DEF(deposit_i64, 1, 2, 2, IMPL64 | IMPL(TCG_TARGET_HAS_deposit_i64))
 
-DEF(trunc_shr_i32, 1, 1, 1,
-    IMPL(TCG_TARGET_HAS_trunc_shr_i32)
+DEF(trunc_shr_i64_i32, 1, 1, 1,
+    IMPL(TCG_TARGET_HAS_trunc_shr_i64_i32)
     | (TCG_TARGET_REG_BITS == 32 ? TCG_OPF_NOT_PRESENT : 0))
 
 DEF(brcond_i64, 0, 2, 2, TCG_OPF_BB_END | IMPL64)
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 231a781..e7e33b9 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -66,7 +66,7 @@ typedef uint64_t TCGRegSet;
 
 #if TCG_TARGET_REG_BITS == 32
 /* Turn some undef macros into false macros.  */
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 #define TCG_TARGET_HAS_div_i64          0
 #define TCG_TARGET_HAS_rem_i64          0
 #define TCG_TARGET_HAS_div2_i64         0
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index cbf3f9b..8b1139b 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -84,7 +84,7 @@
 #define TCG_TARGET_HAS_mulsh_i32        0
 
 #if TCG_TARGET_REG_BITS == 64
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 #define TCG_TARGET_HAS_bswap16_i64      1
 #define TCG_TARGET_HAS_bswap32_i64      1
 #define TCG_TARGET_HAS_bswap64_i64      1
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 for-2.5 08/12] tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32
  2015-07-27 10:55 [Qemu-devel] [PATCH v2 for-2.5 00/12] tcg: improve optimizer Aurelien Jarno
                   ` (6 preceding siblings ...)
  2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 07/12] tcg: rename trunc_shr_i32 into trunc_shr_i64_i32 Aurelien Jarno
@ 2015-07-27 10:56 ` Aurelien Jarno
  2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 09/12] tcg: implement real ext_i32_i64 and extu_i32_i64 ops Aurelien Jarno
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Aurelien Jarno @ 2015-07-27 10:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno, Richard Henderson

The tcg_gen_trunc_shr_i64_i32 function takes a 64-bit argument and
returns a 32-bit value. Directly call tcg_gen_op3 with the correct
types instead of calling tcg_gen_op3i_i32 and abusing the TCG types.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/tcg-op.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 61b64db..0e79fd1 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1752,8 +1752,8 @@ void tcg_gen_trunc_shr_i64_i32(TCGv_i32 ret, TCGv_i64 arg, unsigned count)
             tcg_temp_free_i64(t);
         }
     } else if (TCG_TARGET_HAS_trunc_shr_i64_i32) {
-        tcg_gen_op3i_i32(INDEX_op_trunc_shr_i64_i32, ret,
-                         MAKE_TCGV_I32(GET_TCGV_I64(arg)), count);
+        tcg_gen_op3(&tcg_ctx, INDEX_op_trunc_shr_i64_i32,
+                    GET_TCGV_I32(ret), GET_TCGV_I64(arg), count);
     } else if (count == 0) {
         tcg_gen_mov_i32(ret, MAKE_TCGV_I32(GET_TCGV_I64(arg)));
     } else {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 for-2.5 09/12] tcg: implement real ext_i32_i64 and extu_i32_i64 ops
  2015-07-27 10:55 [Qemu-devel] [PATCH v2 for-2.5 00/12] tcg: improve optimizer Aurelien Jarno
                   ` (7 preceding siblings ...)
  2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 08/12] tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32 Aurelien Jarno
@ 2015-07-27 10:56 ` Aurelien Jarno
  2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 10/12] tcg/optimize: add optimizations for " Aurelien Jarno
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Aurelien Jarno @ 2015-07-27 10:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Claudio Fontana, Stefan Weil, Claudio Fontana, Alexander Graf,
	Blue Swirl, Paolo Bonzini, Aurelien Jarno, Richard Henderson

Implement real ext_i32_i64 and extu_i32_i64 ops. They ensure that a
32-bit value is always converted to a 64-bit value and not propagated
through the register allocator or the optimizer.

Cc: Andrzej Zaborowski <balrogg@gmail.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: Blue Swirl <blauwirbel@gmail.com>
Cc: Claudio Fontana <claudio.fontana@huawei.com>
Cc: Claudio Fontana <claudio.fontana@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/aarch64/tcg-target.c |  4 ++++
 tcg/i386/tcg-target.c    |  5 +++++
 tcg/ia64/tcg-target.c    |  4 ++++
 tcg/ppc/tcg-target.c     |  6 ++++++
 tcg/s390/tcg-target.c    |  5 +++++
 tcg/sparc/tcg-target.c   |  8 ++++++--
 tcg/tcg-op.c             | 10 ++++------
 tcg/tcg-opc.h            |  3 +++
 tcg/tci/tcg-target.c     |  4 ++++
 tci.c                    |  6 ++++--
 10 files changed, 45 insertions(+), 10 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 1fb67de..bc3a539 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -1567,6 +1567,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_ext16s_i32:
         tcg_out_sxt(s, ext, MO_16, a0, a1);
         break;
+    case INDEX_op_ext_i32_i64:
     case INDEX_op_ext32s_i64:
         tcg_out_sxt(s, TCG_TYPE_I64, MO_32, a0, a1);
         break;
@@ -1578,6 +1579,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_ext16u_i32:
         tcg_out_uxt(s, MO_16, a0, a1);
         break;
+    case INDEX_op_extu_i32_i64:
     case INDEX_op_ext32u_i64:
         tcg_out_movr(s, TCG_TYPE_I32, a0, a1);
         break;
@@ -1723,6 +1725,8 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
     { INDEX_op_ext8u_i64, { "r", "r" } },
     { INDEX_op_ext16u_i64, { "r", "r" } },
     { INDEX_op_ext32u_i64, { "r", "r" } },
+    { INDEX_op_ext_i32_i64, { "r", "r" } },
+    { INDEX_op_extu_i32_i64, { "r", "r" } },
 
     { INDEX_op_deposit_i32, { "r", "0", "rZ" } },
     { INDEX_op_deposit_i64, { "r", "0", "rZ" } },
diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index 4f40468..ff55499 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -2068,9 +2068,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_bswap64_i64:
         tcg_out_bswap64(s, args[0]);
         break;
+    case INDEX_op_extu_i32_i64:
     case INDEX_op_ext32u_i64:
         tcg_out_ext32u(s, args[0], args[1]);
         break;
+    case INDEX_op_ext_i32_i64:
     case INDEX_op_ext32s_i64:
         tcg_out_ext32s(s, args[0], args[1]);
         break;
@@ -2205,6 +2207,9 @@ static const TCGTargetOpDef x86_op_defs[] = {
     { INDEX_op_ext16u_i64, { "r", "r" } },
     { INDEX_op_ext32u_i64, { "r", "r" } },
 
+    { INDEX_op_ext_i32_i64, { "r", "r" } },
+    { INDEX_op_extu_i32_i64, { "r", "r" } },
+
     { INDEX_op_deposit_i64, { "Q", "0", "Q" } },
     { INDEX_op_movcond_i64, { "r", "r", "re", "r", "0" } },
 
diff --git a/tcg/ia64/tcg-target.c b/tcg/ia64/tcg-target.c
index 81cb9f7..71e79cf 100644
--- a/tcg/ia64/tcg-target.c
+++ b/tcg/ia64/tcg-target.c
@@ -2148,9 +2148,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_ext16u_i64:
         tcg_out_ext(s, OPC_ZXT2_I29, args[0], args[1]);
         break;
+    case INDEX_op_ext_i32_i64:
     case INDEX_op_ext32s_i64:
         tcg_out_ext(s, OPC_SXT4_I29, args[0], args[1]);
         break;
+    case INDEX_op_extu_i32_i64:
     case INDEX_op_ext32u_i64:
         tcg_out_ext(s, OPC_ZXT4_I29, args[0], args[1]);
         break;
@@ -2301,6 +2303,8 @@ static const TCGTargetOpDef ia64_op_defs[] = {
     { INDEX_op_ext16u_i64, { "r", "rZ"} },
     { INDEX_op_ext32s_i64, { "r", "rZ"} },
     { INDEX_op_ext32u_i64, { "r", "rZ"} },
+    { INDEX_op_ext_i32_i64, { "r", "rZ" } },
+    { INDEX_op_extu_i32_i64, { "r", "rZ" } },
 
     { INDEX_op_bswap16_i64, { "r", "rZ" } },
     { INDEX_op_bswap32_i64, { "r", "rZ" } },
diff --git a/tcg/ppc/tcg-target.c b/tcg/ppc/tcg-target.c
index ce8d546..1672220 100644
--- a/tcg/ppc/tcg-target.c
+++ b/tcg/ppc/tcg-target.c
@@ -2221,12 +2221,16 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     case INDEX_op_ext16s_i64:
         c = EXTSH;
         goto gen_ext;
+    case INDEX_op_ext_i32_i64:
     case INDEX_op_ext32s_i64:
         c = EXTSW;
         goto gen_ext;
     gen_ext:
         tcg_out32(s, c | RS(args[1]) | RA(args[0]));
         break;
+    case INDEX_op_extu_i32_i64:
+        tcg_out_ext32u(s, args[0], args[1]);
+        break;
 
     case INDEX_op_setcond_i32:
         tcg_out_setcond(s, TCG_TYPE_I32, args[3], args[0], args[1], args[2],
@@ -2503,6 +2507,8 @@ static const TCGTargetOpDef ppc_op_defs[] = {
     { INDEX_op_ext8s_i64, { "r", "r" } },
     { INDEX_op_ext16s_i64, { "r", "r" } },
     { INDEX_op_ext32s_i64, { "r", "r" } },
+    { INDEX_op_ext_i32_i64, { "r", "r" } },
+    { INDEX_op_extu_i32_i64, { "r", "r" } },
     { INDEX_op_bswap16_i64, { "r", "r" } },
     { INDEX_op_bswap32_i64, { "r", "r" } },
     { INDEX_op_bswap64_i64, { "r", "r" } },
diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index b3433ce..d4db6d3 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -2106,6 +2106,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_ext16s_i64:
         tgen_ext16s(s, TCG_TYPE_I64, args[0], args[1]);
         break;
+    case INDEX_op_ext_i32_i64:
     case INDEX_op_ext32s_i64:
         tgen_ext32s(s, args[0], args[1]);
         break;
@@ -2115,6 +2116,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_ext16u_i64:
         tgen_ext16u(s, TCG_TYPE_I64, args[0], args[1]);
         break;
+    case INDEX_op_extu_i32_i64:
     case INDEX_op_ext32u_i64:
         tgen_ext32u(s, args[0], args[1]);
         break;
@@ -2267,6 +2269,9 @@ static const TCGTargetOpDef s390_op_defs[] = {
     { INDEX_op_ext32s_i64, { "r", "r" } },
     { INDEX_op_ext32u_i64, { "r", "r" } },
 
+    { INDEX_op_ext_i32_i64, { "r", "r" } },
+    { INDEX_op_extu_i32_i64, { "r", "r" } },
+
     { INDEX_op_bswap16_i64, { "r", "r" } },
     { INDEX_op_bswap32_i64, { "r", "r" } },
     { INDEX_op_bswap64_i64, { "r", "r" } },
diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index b23032b..fe75af0 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -1407,9 +1407,11 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_divu_i64:
         c = ARITH_UDIVX;
         goto gen_arith;
+    case INDEX_op_ext_i32_i64:
     case INDEX_op_ext32s_i64:
         tcg_out_arithi(s, a0, a1, 0, SHIFT_SRA);
         break;
+    case INDEX_op_extu_i32_i64:
     case INDEX_op_ext32u_i64:
         tcg_out_arithi(s, a0, a1, 0, SHIFT_SRL);
         break;
@@ -1531,8 +1533,10 @@ static const TCGTargetOpDef sparc_op_defs[] = {
     { INDEX_op_neg_i64, { "R", "RJ" } },
     { INDEX_op_not_i64, { "R", "RJ" } },
 
-    { INDEX_op_ext32s_i64, { "R", "r" } },
-    { INDEX_op_ext32u_i64, { "R", "r" } },
+    { INDEX_op_ext32s_i64, { "R", "R" } },
+    { INDEX_op_ext32u_i64, { "R", "R" } },
+    { INDEX_op_ext_i32_i64, { "R", "r" } },
+    { INDEX_op_extu_i32_i64, { "R", "r" } },
     { INDEX_op_trunc_shr_i64_i32,  { "r", "R" } },
 
     { INDEX_op_brcond_i64, { "RZ", "RJ" } },
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 0e79fd1..7114315 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1770,9 +1770,8 @@ void tcg_gen_extu_i32_i64(TCGv_i64 ret, TCGv_i32 arg)
         tcg_gen_mov_i32(TCGV_LOW(ret), arg);
         tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
     } else {
-        /* Note: we assume the target supports move between
-           32 and 64 bit registers.  */
-        tcg_gen_ext32u_i64(ret, MAKE_TCGV_I64(GET_TCGV_I32(arg)));
+        tcg_gen_op2(&tcg_ctx, INDEX_op_extu_i32_i64,
+                    GET_TCGV_I64(ret), GET_TCGV_I32(arg));
     }
 }
 
@@ -1782,9 +1781,8 @@ void tcg_gen_ext_i32_i64(TCGv_i64 ret, TCGv_i32 arg)
         tcg_gen_mov_i32(TCGV_LOW(ret), arg);
         tcg_gen_sari_i32(TCGV_HIGH(ret), TCGV_LOW(ret), 31);
     } else {
-        /* Note: we assume the target supports move between
-           32 and 64 bit registers.  */
-        tcg_gen_ext32s_i64(ret, MAKE_TCGV_I64(GET_TCGV_I32(arg)));
+        tcg_gen_op2(&tcg_ctx, INDEX_op_ext_i32_i64,
+                    GET_TCGV_I64(ret), GET_TCGV_I32(arg));
     }
 }
 
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 4a34f43..f721a5a 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -138,6 +138,9 @@ DEF(rotl_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
 DEF(rotr_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
 DEF(deposit_i64, 1, 2, 2, IMPL64 | IMPL(TCG_TARGET_HAS_deposit_i64))
 
+/* size changing ops */
+DEF(ext_i32_i64, 1, 1, 0, IMPL64)
+DEF(extu_i32_i64, 1, 1, 0, IMPL64)
 DEF(trunc_shr_i64_i32, 1, 1, 1,
     IMPL(TCG_TARGET_HAS_trunc_shr_i64_i32)
     | (TCG_TARGET_REG_BITS == 32 ? TCG_OPF_NOT_PRESENT : 0))
diff --git a/tcg/tci/tcg-target.c b/tcg/tci/tcg-target.c
index 83472db..bbb54d4 100644
--- a/tcg/tci/tcg-target.c
+++ b/tcg/tci/tcg-target.c
@@ -210,6 +210,8 @@ static const TCGTargetOpDef tcg_target_op_defs[] = {
 #if TCG_TARGET_HAS_ext32u_i64
     { INDEX_op_ext32u_i64, { R, R } },
 #endif
+    { INDEX_op_ext_i32_i64, { R, R } },
+    { INDEX_op_extu_i32_i64, { R, R } },
 #if TCG_TARGET_HAS_bswap16_i64
     { INDEX_op_bswap16_i64, { R, R } },
 #endif
@@ -701,6 +703,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     case INDEX_op_ext16u_i64:   /* Optional (TCG_TARGET_HAS_ext16u_i64). */
     case INDEX_op_ext32s_i64:   /* Optional (TCG_TARGET_HAS_ext32s_i64). */
     case INDEX_op_ext32u_i64:   /* Optional (TCG_TARGET_HAS_ext32u_i64). */
+    case INDEX_op_ext_i32_i64:
+    case INDEX_op_extu_i32_i64:
 #endif /* TCG_TARGET_REG_BITS == 64 */
     case INDEX_op_neg_i32:      /* Optional (TCG_TARGET_HAS_neg_i32). */
     case INDEX_op_not_i32:      /* Optional (TCG_TARGET_HAS_not_i32). */
diff --git a/tci.c b/tci.c
index 8444948..3d6d177 100644
--- a/tci.c
+++ b/tci.c
@@ -1033,18 +1033,20 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
 #endif
 #if TCG_TARGET_HAS_ext32s_i64
         case INDEX_op_ext32s_i64:
+#endif
+        case INDEX_op_ext_i32_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_r32s(&tb_ptr);
             tci_write_reg64(t0, t1);
             break;
-#endif
 #if TCG_TARGET_HAS_ext32u_i64
         case INDEX_op_ext32u_i64:
+#endif
+        case INDEX_op_extu_i32_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_r32(&tb_ptr);
             tci_write_reg64(t0, t1);
             break;
-#endif
 #if TCG_TARGET_HAS_bswap16_i64
         case INDEX_op_bswap16_i64:
             TODO();
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 for-2.5 10/12] tcg/optimize: add optimizations for ext_i32_i64 and extu_i32_i64 ops
  2015-07-27 10:55 [Qemu-devel] [PATCH v2 for-2.5 00/12] tcg: improve optimizer Aurelien Jarno
                   ` (8 preceding siblings ...)
  2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 09/12] tcg: implement real ext_i32_i64 and extu_i32_i64 ops Aurelien Jarno
@ 2015-07-27 10:56 ` Aurelien Jarno
  2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 11/12] tcg/optimize: do not remember garbage high bits for 32-bit ops Aurelien Jarno
  2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 12/12] tcg: update README about size changing ops Aurelien Jarno
  11 siblings, 0 replies; 17+ messages in thread
From: Aurelien Jarno @ 2015-07-27 10:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno, Richard Henderson

They behave the same as ext32s_i64 and ext32u_i64 from the constant
folding and zero propagation point of view, except that they can't
be replaced by a mov, so we don't compute the affected value.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 8bfe7ff..8f33755 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -343,9 +343,11 @@ static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
     CASE_OP_32_64(ext16u):
         return (uint16_t)x;
 
+    case INDEX_op_ext_i32_i64:
     case INDEX_op_ext32s_i64:
         return (int32_t)x;
 
+    case INDEX_op_extu_i32_i64:
     case INDEX_op_ext32u_i64:
         return (uint32_t)x;
 
@@ -830,6 +832,15 @@ void tcg_optimize(TCGContext *s)
             mask = temps[args[1]].mask & mask;
             break;
 
+        case INDEX_op_ext_i32_i64:
+            if ((temps[args[1]].mask & 0x80000000) != 0) {
+                break;
+            }
+        case INDEX_op_extu_i32_i64:
+            /* We do not compute affected as it is a size changing op.  */
+            mask = (uint32_t)temps[args[1]].mask;
+            break;
+
         CASE_OP_32_64(andc):
             /* Known-zeros does not imply known-ones.  Therefore unless
                args[2] is constant, we can't infer anything from it.  */
@@ -1008,6 +1019,8 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(ext16u):
         case INDEX_op_ext32s_i64:
         case INDEX_op_ext32u_i64:
+        case INDEX_op_ext_i32_i64:
+        case INDEX_op_extu_i32_i64:
             if (temp_is_const(args[1])) {
                 tmp = do_constant_folding(opc, temps[args[1]].val, 0);
                 tcg_opt_gen_movi(s, op, args, args[0], tmp);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 for-2.5 11/12] tcg/optimize: do not remember garbage high bits for 32-bit ops
  2015-07-27 10:55 [Qemu-devel] [PATCH v2 for-2.5 00/12] tcg: improve optimizer Aurelien Jarno
                   ` (9 preceding siblings ...)
  2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 10/12] tcg/optimize: add optimizations for " Aurelien Jarno
@ 2015-07-27 10:56 ` Aurelien Jarno
  2015-07-29 16:34   ` Aurelien Jarno
  2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 12/12] tcg: update README about size changing ops Aurelien Jarno
  11 siblings, 1 reply; 17+ messages in thread
From: Aurelien Jarno @ 2015-07-27 10:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno, Richard Henderson

Now that we have real size changing ops, we don't need to mark high
bits of the destination as garbage. The goal of the optimizer is to
predict the value of the temps (and not of the registers) and do
simplifications when possible. The problem there is therefore not the
fact that those bits are not counted as garbage, but that a size
changing op is replaced by a move.

This patch is basically a revert of 24666baf, including the changes that
have been made since then.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c | 41 +++++++++++------------------------------
 1 file changed, 11 insertions(+), 30 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 8f33755..166074e 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -203,20 +203,12 @@ static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
 static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args,
                              TCGArg dst, TCGArg val)
 {
-    TCGOpcode new_op = op_to_movi(op->opc);
-    tcg_target_ulong mask;
-
-    op->opc = new_op;
+    op->opc = op_to_movi(op->opc);
 
     reset_temp(dst);
     temps[dst].is_const = true;
     temps[dst].val = val;
-    mask = val;
-    if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_movi_i32) {
-        /* High bits of the destination are now garbage.  */
-        mask |= ~0xffffffffull;
-    }
-    temps[dst].mask = mask;
+    temps[dst].mask = val;
 
     args[0] = dst;
     args[1] = val;
@@ -230,28 +222,21 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
         return;
     }
 
-    TCGOpcode new_op = op_to_mov(op->opc);
-    tcg_target_ulong mask;
-
-    op->opc = new_op;
+    op->opc = op_to_mov(op->opc);
 
     reset_temp(dst);
-    mask = temps[src].mask;
-    if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_mov_i32) {
-        /* High bits of the destination are now garbage.  */
-        mask |= ~0xffffffffull;
-    }
-    temps[dst].mask = mask;
 
     if (s->temps[src].type == s->temps[dst].type) {
         temps[dst].next_copy = temps[src].next_copy;
         temps[dst].prev_copy = src;
         temps[temps[dst].next_copy].prev_copy = dst;
         temps[src].next_copy = dst;
-        temps[dst].is_const = temps[src].is_const;
-        temps[dst].val = temps[src].val;
     }
 
+    temps[dst].is_const = temps[src].is_const;
+    temps[dst].val = temps[src].val;
+    temps[dst].mask = temps[src].mask;
+
     args[0] = dst;
     args[1] = src;
 }
@@ -584,7 +569,7 @@ void tcg_optimize(TCGContext *s)
     reset_all_temps(nb_temps);
 
     for (oi = s->gen_first_op_idx; oi >= 0; oi = oi_next) {
-        tcg_target_ulong mask, partmask, affected;
+        tcg_target_ulong mask, affected;
         int nb_oargs, nb_iargs, i;
         TCGArg tmp;
 
@@ -937,17 +922,13 @@ void tcg_optimize(TCGContext *s)
             break;
         }
 
-        /* 32-bit ops generate 32-bit results.  For the result is zero test
-           below, we can ignore high bits, but for further optimizations we
-           need to record that the high bits contain garbage.  */
-        partmask = mask;
+        /* 32-bit ops generate 32-bit results.  */
         if (!(def->flags & TCG_OPF_64BIT)) {
-            mask |= ~(tcg_target_ulong)0xffffffffu;
-            partmask &= 0xffffffffu;
+            mask &= 0xffffffffu;
             affected &= 0xffffffffu;
         }
 
-        if (partmask == 0) {
+        if (mask == 0) {
             assert(nb_oargs == 1);
             tcg_opt_gen_movi(s, op, args, args[0], 0);
             continue;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 for-2.5 12/12] tcg: update README about size changing ops
  2015-07-27 10:55 [Qemu-devel] [PATCH v2 for-2.5 00/12] tcg: improve optimizer Aurelien Jarno
                   ` (10 preceding siblings ...)
  2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 11/12] tcg/optimize: do not remember garbage high bits for 32-bit ops Aurelien Jarno
@ 2015-07-27 10:56 ` Aurelien Jarno
  11 siblings, 0 replies; 17+ messages in thread
From: Aurelien Jarno @ 2015-07-27 10:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno, Richard Henderson

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/README | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/tcg/README b/tcg/README
index 61b3899..a22f251 100644
--- a/tcg/README
+++ b/tcg/README
@@ -466,13 +466,25 @@ On a 32 bit target, all 64 bit operations are converted to 32 bits. A
 few specific operations must be implemented to allow it (see add2_i32,
 sub2_i32, brcond2_i32).
 
+On a 64 bit target, the values are transfered between 32 and 64-bit
+registers using the following ops:
+- trunc_shr_i64_i32
+- ext_i32_i64
+- extu_i32_i64
+
+They ensure that the values are correctly truncated or extended when
+moved from a 32-bit to a 64-bit register or vice-versa. Note that the
+trunc_shr_i64_i32 is an optional op. It is not necessary to implement
+it if all the following conditions are met:
+- 64-bit registers can hold 32-bit values
+- 32-bit values in a 64-bit register do not need to stay zero or
+  sign extended
+- all 32-bit TCG ops ignore the high part of 64-bit registers
+
 Floating point operations are not supported in this version. A
 previous incarnation of the code generator had full support of them,
 but it is better to concentrate on integer operations first.
 
-On a 64 bit target, no assumption is made in TCG about the storage of
-the 32 bit values in 64 bit registers.
-
 4.2) Constraints
 
 GCC like constraints are used to define the constraints of every
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for-2.5 05/12] tcg/optimize: track const/copy status separately
  2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 05/12] tcg/optimize: track const/copy status separately Aurelien Jarno
@ 2015-07-27 11:26   ` Paolo Bonzini
  2015-07-27 13:40     ` Aurelien Jarno
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2015-07-27 11:26 UTC (permalink / raw)
  To: Aurelien Jarno, qemu-devel; +Cc: Richard Henderson



On 27/07/2015 12:56, Aurelien Jarno wrote:
>          temps[dst].next_copy = temps[src].next_copy;
>          temps[dst].prev_copy = src;
>          temps[temps[dst].next_copy].prev_copy = dst;
>          temps[src].next_copy = dst;

This is:

    dst->next = src->next;
    dst->prev = src;
    dst->next->prev = dst;
    src->next = dst;

which seems weird.  I think it should be one of

    /* splice src after dst */
    dst->next->prev = src->prev;
    src->prev->next = dst->next;
    dst->next = src;
    src->prev = dst;

or

    /* splice src before dst */
    last = src->prev;
    dst->prev->next = src;
    src->prev = dst->prev;
    last->next = dst;
    dst->prev = last;

(written as pointer manipulations for clarity).

Maybe I'm missing the obvious, but if it's a problem it's better to fix
it before the links are used more pervasively.

Paolo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for-2.5 05/12] tcg/optimize: track const/copy status separately
  2015-07-27 11:26   ` Paolo Bonzini
@ 2015-07-27 13:40     ` Aurelien Jarno
  2015-07-27 13:44       ` Paolo Bonzini
  0 siblings, 1 reply; 17+ messages in thread
From: Aurelien Jarno @ 2015-07-27 13:40 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Richard Henderson

On 2015-07-27 13:26, Paolo Bonzini wrote:
> 
> 
> On 27/07/2015 12:56, Aurelien Jarno wrote:
> >          temps[dst].next_copy = temps[src].next_copy;
> >          temps[dst].prev_copy = src;
> >          temps[temps[dst].next_copy].prev_copy = dst;
> >          temps[src].next_copy = dst;

Note that the patch doesn't change this part, it's already there in the
original code.

> This is:
> 
>     dst->next = src->next;
>     dst->prev = src;
>     dst->next->prev = dst;
>     src->next = dst;
> 
> which seems weird.  I think it should be one of
> 
>     /* splice src after dst */
>     dst->next->prev = src->prev;
>     src->prev->next = dst->next;
>     dst->next = src;
>     src->prev = dst;
> 
> or
> 
>     /* splice src before dst */
>     last = src->prev;
>     dst->prev->next = src;
>     src->prev = dst->prev;
>     last->next = dst;
>     dst->prev = last;
> 
> (written as pointer manipulations for clarity).
> 
> Maybe I'm missing the obvious, but if it's a problem it's better to fix
> it before the links are used more pervasively.

I think your code is the generic for inserting a circular linked list
into another. Here we just want to insert one element, and thus before
the insertion we have dst->next = dst->prev = dst.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for-2.5 05/12] tcg/optimize: track const/copy status separately
  2015-07-27 13:40     ` Aurelien Jarno
@ 2015-07-27 13:44       ` Paolo Bonzini
  0 siblings, 0 replies; 17+ messages in thread
From: Paolo Bonzini @ 2015-07-27 13:44 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel, Richard Henderson



On 27/07/2015 15:40, Aurelien Jarno wrote:
> >          temps[dst].next_copy = temps[src].next_copy;
> >          temps[dst].prev_copy = src;
> >          temps[temps[dst].next_copy].prev_copy = dst;
> >          temps[src].next_copy = dst;
>
> Note that the patch doesn't change this part, it's already there in the
> original code.

Understood.

> > This is:
> > 
> >     dst->next = src->next;
> >     dst->prev = src;
> >     dst->next->prev = dst;
> >     src->next = dst;
> 
> Here we just want to insert one element, and thus before
> the insertion we have dst->next = dst->prev = dst.

Ah, you're right.

Paolo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for-2.5 11/12] tcg/optimize: do not remember garbage high bits for 32-bit ops
  2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 11/12] tcg/optimize: do not remember garbage high bits for 32-bit ops Aurelien Jarno
@ 2015-07-29 16:34   ` Aurelien Jarno
  0 siblings, 0 replies; 17+ messages in thread
From: Aurelien Jarno @ 2015-07-29 16:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Richard Henderson

On 2015-07-27 12:56, Aurelien Jarno wrote:
> Now that we have real size changing ops, we don't need to mark high
> bits of the destination as garbage. The goal of the optimizer is to
> predict the value of the temps (and not of the registers) and do
> simplifications when possible. The problem there is therefore not the
> fact that those bits are not counted as garbage, but that a size
> changing op is replaced by a move.
> 
> This patch is basically a revert of 24666baf, including the changes that
> have been made since then.
> 
> Cc: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
>  tcg/optimize.c | 41 +++++++++++------------------------------
>  1 file changed, 11 insertions(+), 30 deletions(-)
> 
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 8f33755..166074e 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -937,17 +922,13 @@ void tcg_optimize(TCGContext *s)
>              break;
>          }
>  
> -        /* 32-bit ops generate 32-bit results.  For the result is zero test
> -           below, we can ignore high bits, but for further optimizations we
> -           need to record that the high bits contain garbage.  */
> -        partmask = mask;
> +        /* 32-bit ops generate 32-bit results.  */
>          if (!(def->flags & TCG_OPF_64BIT)) {
> -            mask |= ~(tcg_target_ulong)0xffffffffu;
> -            partmask &= 0xffffffffu;
> +            mask &= 0xffffffffu;
>              affected &= 0xffffffffu;
>          }
>  
> -        if (partmask == 0) {
> +        if (mask == 0) {
>              assert(nb_oargs == 1);
>              tcg_opt_gen_movi(s, op, args, args[0], 0);
>              continue;

Answering to myself, this actually doesn't work as the current code
wrongly assumes that all ops writing a 64-bit temp will have the
TCG_OPF_64BIT flag set. This is wrong for at least call.

I still haven't decided what is the best way to fix that, either by
special casing these ops, or by actually looking at the temp type. I
guess performances will decide. In early version of this patchset, I
tried to access the temp type at other places, and it has some
performances impact.

Aurelien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-07-29 16:34 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-27 10:55 [Qemu-devel] [PATCH v2 for-2.5 00/12] tcg: improve optimizer Aurelien Jarno
2015-07-27 10:55 ` [Qemu-devel] [PATCH for-2.4 01/12] tcg: correctly mark dead inputs for mov with a constant Aurelien Jarno
2015-07-27 10:55 ` [Qemu-devel] [PATCH for-2.4 02/12] tcg: mark temps as mem_coherent = 0 " Aurelien Jarno
2015-07-27 10:55 ` [Qemu-devel] [PATCH v2 for-2.5 03/12] tcg/optimize: optimize temps tracking Aurelien Jarno
2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 04/12] tcg/optimize: add temp_is_const and temp_is_copy functions Aurelien Jarno
2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 05/12] tcg/optimize: track const/copy status separately Aurelien Jarno
2015-07-27 11:26   ` Paolo Bonzini
2015-07-27 13:40     ` Aurelien Jarno
2015-07-27 13:44       ` Paolo Bonzini
2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 06/12] tcg/optimize: allow constant to have copies Aurelien Jarno
2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 07/12] tcg: rename trunc_shr_i32 into trunc_shr_i64_i32 Aurelien Jarno
2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 08/12] tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32 Aurelien Jarno
2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 09/12] tcg: implement real ext_i32_i64 and extu_i32_i64 ops Aurelien Jarno
2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 10/12] tcg/optimize: add optimizations for " Aurelien Jarno
2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 11/12] tcg/optimize: do not remember garbage high bits for 32-bit ops Aurelien Jarno
2015-07-29 16:34   ` Aurelien Jarno
2015-07-27 10:56 ` [Qemu-devel] [PATCH v2 for-2.5 12/12] tcg: update README about size changing ops Aurelien Jarno

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).