qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 00/10] tcg/optimize: rework copy propagation
@ 2012-09-21 19:43 Aurelien Jarno
  2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 01/10] tcg/optimize: remove TCG_TEMP_ANY Aurelien Jarno
                   ` (10 more replies)
  0 siblings, 11 replies; 19+ messages in thread
From: Aurelien Jarno @ 2012-09-21 19:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

This patch series rework the copy propagation in order to generate                                                                                                           
better code. The first two patches are cleanup and bug fixes, the third                                                                                                      
patch is the heart of the series, and the remaining ones are small                                                                                                           
optimizations using the new copy propagation.                                                                         
                                               
I have measured a decrease of the generated code size of about 4%, with                                        
a gain in speed between 0 and 2% depending on the workload.                      
                                                                                                                           
For better benefits in ARM emulation, it should be used with the setcond
patches series I have sent a few days ago.

--
Changes v1 -> v2
 - fixed conflict in patch 3 following movcond introduction
 - movcond optimization added in patch 7
 - added patch 10

Aurelien Jarno (10):
  tcg/optimize: remove TCG_TEMP_ANY
  tcg/optimize: check types in copy propagation
  tcg/optimize: rework copy progagation
  tcg/optimize: do copy propagation for all operations
  tcg/optimize: optimize "op r, a, a => mov r, a"
  tcg/optimize: optimize "op r, a, a => movi r, 0"
  tcg/optimize: further optimize brcond/movcond/setcond
  tcg/optimize: prefer the "op a, a, b" form for commutative ops
  tcg: remove #ifdef #endif around TCGOpcode tests
  tcg/optimize: add constant folding for deposit

 tcg/optimize.c |  361 ++++++++++++++++++++++++++++++++++----------------------
 tcg/tcg.c      |   13 +-
 2 files changed, 224 insertions(+), 150 deletions(-)

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 01/10] tcg/optimize: remove TCG_TEMP_ANY
  2012-09-21 19:43 [Qemu-devel] [PATCH v2 00/10] tcg/optimize: rework copy propagation Aurelien Jarno
@ 2012-09-21 19:43 ` Aurelien Jarno
  2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 02/10] tcg/optimize: check types in copy propagation Aurelien Jarno
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Aurelien Jarno @ 2012-09-21 19:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

TCG_TEMP_ANY has no different meaning than TCG_TEMP_UNDEF, so use
the later instead.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c |   11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 1be7631..308b7f9 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -39,8 +39,7 @@ typedef enum {
     TCG_TEMP_UNDEF = 0,
     TCG_TEMP_CONST,
     TCG_TEMP_COPY,
-    TCG_TEMP_HAS_COPY,
-    TCG_TEMP_ANY
+    TCG_TEMP_HAS_COPY
 } tcg_temp_state;
 
 struct tcg_temp_info {
@@ -52,7 +51,7 @@ struct tcg_temp_info {
 
 static struct tcg_temp_info temps[TCG_MAX_TEMPS];
 
-/* Reset TEMP's state to TCG_TEMP_ANY.  If TEMP was a representative of some
+/* Reset TEMP's state to TCG_TEMP_UNDEF.  If TEMP was a representative of some
    class of equivalent temp's, a new representative should be chosen in this
    class. */
 static void reset_temp(TCGArg temp, int nb_temps, int nb_globals)
@@ -69,7 +68,7 @@ static void reset_temp(TCGArg temp, int nb_temps, int nb_globals)
         }
         for (i = temps[temp].next_copy; i != temp; i = temps[i].next_copy) {
             if (new_base == (TCGArg)-1) {
-                temps[i].state = TCG_TEMP_ANY;
+                temps[i].state = TCG_TEMP_UNDEF;
             } else {
                 temps[i].val = new_base;
             }
@@ -81,9 +80,9 @@ static void reset_temp(TCGArg temp, int nb_temps, int nb_globals)
         temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
         new_base = temps[temp].val;
     }
-    temps[temp].state = TCG_TEMP_ANY;
+    temps[temp].state = TCG_TEMP_UNDEF;
     if (new_base != (TCGArg)-1 && temps[new_base].next_copy == new_base) {
-        temps[new_base].state = TCG_TEMP_ANY;
+        temps[new_base].state = TCG_TEMP_UNDEF;
     }
 }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 02/10] tcg/optimize: check types in copy propagation
  2012-09-21 19:43 [Qemu-devel] [PATCH v2 00/10] tcg/optimize: rework copy propagation Aurelien Jarno
  2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 01/10] tcg/optimize: remove TCG_TEMP_ANY Aurelien Jarno
@ 2012-09-21 19:43 ` Aurelien Jarno
  2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 03/10] tcg/optimize: rework copy progagation Aurelien Jarno
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Aurelien Jarno @ 2012-09-21 19:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

The copy propagation doesn't check the types of the temps during copy
propagation. However TCG is using the mov_i32 for the i64 to i32
conversion and thus the two are not equivalent.

With this patch tcg_opt_gen_mov() doesn't consider two temps of
different type as copies anymore.

So far it seems the optimization was not aggressive enough to trigger
this bug, but it will be triggered later in this series once the copy
propagation is improved.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c |   18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 308b7f9..da8dffe 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -106,12 +106,13 @@ static TCGOpcode op_to_movi(TCGOpcode op)
     }
 }
 
-static void tcg_opt_gen_mov(TCGArg *gen_args, TCGArg dst, TCGArg src,
-                            int nb_temps, int nb_globals)
+static void tcg_opt_gen_mov(TCGContext *s, TCGArg *gen_args,
+                            TCGArg dst, TCGArg src)
 {
-        reset_temp(dst, nb_temps, nb_globals);
+        reset_temp(dst, s->nb_temps, s->nb_globals);
         assert(temps[src].state != TCG_TEMP_COPY);
-        if (src >= nb_globals) {
+        /* Only consider temps with the same type (width) as copies. */
+        if (src >= s->nb_globals && s->temps[dst].type == s->temps[src].type) {
             assert(temps[src].state != TCG_TEMP_CONST);
             if (temps[src].state != TCG_TEMP_HAS_COPY) {
                 temps[src].state = TCG_TEMP_HAS_COPY;
@@ -461,8 +462,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                     gen_opc_buf[op_index] = INDEX_op_nop;
                 } else {
                     gen_opc_buf[op_index] = op_to_mov(op);
-                    tcg_opt_gen_mov(gen_args, args[0], args[1],
-                                    nb_temps, nb_globals);
+                    tcg_opt_gen_mov(s, gen_args, args[0], args[1]);
                     gen_args += 2;
                 }
                 args += 3;
@@ -499,8 +499,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                     gen_opc_buf[op_index] = INDEX_op_nop;
                 } else {
                     gen_opc_buf[op_index] = op_to_mov(op);
-                    tcg_opt_gen_mov(gen_args, args[0], args[1], nb_temps,
-                                    nb_globals);
+                    tcg_opt_gen_mov(s, gen_args, args[0], args[1]);
                     gen_args += 2;
                 }
                 args += 3;
@@ -524,8 +523,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                 break;
             }
             if (temps[args[1]].state != TCG_TEMP_CONST) {
-                tcg_opt_gen_mov(gen_args, args[0], args[1],
-                                nb_temps, nb_globals);
+                tcg_opt_gen_mov(s, gen_args, args[0], args[1]);
                 gen_args += 2;
                 args += 2;
                 break;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 03/10] tcg/optimize: rework copy progagation
  2012-09-21 19:43 [Qemu-devel] [PATCH v2 00/10] tcg/optimize: rework copy propagation Aurelien Jarno
  2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 01/10] tcg/optimize: remove TCG_TEMP_ANY Aurelien Jarno
  2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 02/10] tcg/optimize: check types in copy propagation Aurelien Jarno
@ 2012-09-21 19:43 ` Aurelien Jarno
  2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 04/10] tcg/optimize: do copy propagation for all operations Aurelien Jarno
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Aurelien Jarno @ 2012-09-21 19:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

The copy propagation pass tries to keep track what is a copy of what
and what has copy of what, and in addition it keep a circular list of
of all the copies. Unfortunately this doesn't fully work: a mov from
a temp which has a state "COPY" changed it into a state "HAS_COPY".
Later when this temp is used again, it is considered has not having
copy and thus no propagation is done.

This patch fixes that by removing the hiearchy between copies, and thus
only keeping a "COPY" state both meaning "is a copy" and "has a copy".
The decision of which copy to use is deferred to the actual temp
replacement. At this stage there is not one best choice to do, but only
better choices than others. For doing the best choice the operation
would have to be parsed in reversed to know if a temp is going to be
used later or not. That what is done by the liveness analysis. At this
stage it is known that globals will be always live, that local temps
will be dead at the end of the translation block, and that the temps
will be dead at the end of the basic block. This means that this stage
should try to replace temps by local temps or globals and local temps
by globals.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c |  167 +++++++++++++++++++++++++++++++-------------------------
 1 file changed, 92 insertions(+), 75 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index da8dffe..1904b39 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -39,7 +39,6 @@ typedef enum {
     TCG_TEMP_UNDEF = 0,
     TCG_TEMP_CONST,
     TCG_TEMP_COPY,
-    TCG_TEMP_HAS_COPY
 } tcg_temp_state;
 
 struct tcg_temp_info {
@@ -51,39 +50,19 @@ struct tcg_temp_info {
 
 static struct tcg_temp_info temps[TCG_MAX_TEMPS];
 
-/* Reset TEMP's state to TCG_TEMP_UNDEF.  If TEMP was a representative of some
-   class of equivalent temp's, a new representative should be chosen in this
-   class. */
-static void reset_temp(TCGArg temp, int nb_temps, int nb_globals)
+/* Reset TEMP's state to TCG_TEMP_UNDEF.  If TEMP only had one copy, remove
+   the copy flag from the left temp.  */
+static void reset_temp(TCGArg temp)
 {
-    int i;
-    TCGArg new_base = (TCGArg)-1;
-    if (temps[temp].state == TCG_TEMP_HAS_COPY) {
-        for (i = temps[temp].next_copy; i != temp; i = temps[i].next_copy) {
-            if (i >= nb_globals) {
-                temps[i].state = TCG_TEMP_HAS_COPY;
-                new_base = i;
-                break;
-            }
-        }
-        for (i = temps[temp].next_copy; i != temp; i = temps[i].next_copy) {
-            if (new_base == (TCGArg)-1) {
-                temps[i].state = TCG_TEMP_UNDEF;
-            } else {
-                temps[i].val = new_base;
-            }
+    if (temps[temp].state == TCG_TEMP_COPY) {
+        if (temps[temp].prev_copy == temps[temp].next_copy) {
+            temps[temps[temp].next_copy].state = TCG_TEMP_UNDEF;
+        } else {
+            temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
+            temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
         }
-        temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
-        temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
-    } else if (temps[temp].state == TCG_TEMP_COPY) {
-        temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
-        temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
-        new_base = temps[temp].val;
     }
     temps[temp].state = TCG_TEMP_UNDEF;
-    if (new_base != (TCGArg)-1 && temps[new_base].next_copy == new_base) {
-        temps[new_base].state = TCG_TEMP_UNDEF;
-    }
 }
 
 static int op_bits(TCGOpcode op)
@@ -106,34 +85,83 @@ static TCGOpcode op_to_movi(TCGOpcode op)
     }
 }
 
+static TCGArg find_better_copy(TCGContext *s, TCGArg temp)
+{
+    TCGArg i;
+
+    /* If this is already a global, we can't do better. */
+    if (temp < s->nb_globals) {
+        return temp;
+    }
+
+    /* Search for a global first. */
+    for (i = temps[temp].next_copy ; i != temp ; i = temps[i].next_copy) {
+        if (i < s->nb_globals) {
+            return i;
+        }
+    }
+
+    /* If it is a temp, search for a temp local. */
+    if (!s->temps[temp].temp_local) {
+        for (i = temps[temp].next_copy ; i != temp ; i = temps[i].next_copy) {
+            if (s->temps[i].temp_local) {
+                return i;
+            }
+        }
+    }
+
+    /* Failure to find a better representation, return the same temp. */
+    return temp;
+}
+
+static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
+{
+    TCGArg i;
+
+    if (arg1 == arg2) {
+        return true;
+    }
+
+    if (temps[arg1].state != TCG_TEMP_COPY
+        || temps[arg2].state != TCG_TEMP_COPY) {
+        return false;
+    }
+
+    for (i = temps[arg1].next_copy ; i != arg1 ; i = temps[i].next_copy) {
+        if (i == arg2) {
+            return true;
+        }
+    }
+
+    return false;
+}
+
 static void tcg_opt_gen_mov(TCGContext *s, TCGArg *gen_args,
                             TCGArg dst, TCGArg src)
 {
-        reset_temp(dst, s->nb_temps, s->nb_globals);
-        assert(temps[src].state != TCG_TEMP_COPY);
-        /* Only consider temps with the same type (width) as copies. */
-        if (src >= s->nb_globals && s->temps[dst].type == s->temps[src].type) {
-            assert(temps[src].state != TCG_TEMP_CONST);
-            if (temps[src].state != TCG_TEMP_HAS_COPY) {
-                temps[src].state = TCG_TEMP_HAS_COPY;
+        reset_temp(dst);
+        assert(temps[src].state != TCG_TEMP_CONST);
+
+        if (s->temps[src].type == s->temps[dst].type) {
+            if (temps[src].state != TCG_TEMP_COPY) {
+                temps[src].state = TCG_TEMP_COPY;
                 temps[src].next_copy = src;
                 temps[src].prev_copy = src;
             }
             temps[dst].state = TCG_TEMP_COPY;
-            temps[dst].val = src;
             temps[dst].next_copy = temps[src].next_copy;
             temps[dst].prev_copy = src;
             temps[temps[dst].next_copy].prev_copy = dst;
             temps[src].next_copy = dst;
         }
+
         gen_args[0] = dst;
         gen_args[1] = src;
 }
 
-static void tcg_opt_gen_movi(TCGArg *gen_args, TCGArg dst, TCGArg val,
-                             int nb_temps, int nb_globals)
+static void tcg_opt_gen_movi(TCGArg *gen_args, TCGArg dst, TCGArg val)
 {
-        reset_temp(dst, nb_temps, nb_globals);
+        reset_temp(dst);
         temps[dst].state = TCG_TEMP_CONST;
         temps[dst].val = val;
         gen_args[0] = dst;
@@ -324,7 +352,6 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
     tcg_abort();
 }
 
-
 /* Propagate constants and copies, fold constant expressions. */
 static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                                     TCGArg *args, TCGOpDef *tcg_op_defs)
@@ -338,10 +365,8 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
 
     /* Array VALS has an element for each temp.
        If this temp holds a constant then its value is kept in VALS' element.
-       If this temp is a copy of other ones then this equivalence class'
-       representative is kept in VALS' element.
-       If this temp is neither copy nor constant then corresponding VALS'
-       element is unused. */
+       If this temp is a copy of other ones then the other copies are
+       available through the doubly linked circular list. */
 
     nb_temps = s->nb_temps;
     nb_globals = s->nb_globals;
@@ -357,7 +382,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             assert(op != INDEX_op_call);
             for (i = def->nb_oargs; i < def->nb_oargs + def->nb_iargs; i++) {
                 if (temps[args[i]].state == TCG_TEMP_COPY) {
-                    args[i] = temps[args[i]].val;
+                    args[i] = find_better_copy(s, args[i]);
                 }
             }
         }
@@ -429,7 +454,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             if (temps[args[1]].state == TCG_TEMP_CONST
                 && temps[args[1]].val == 0) {
                 gen_opc_buf[op_index] = op_to_movi(op);
-                tcg_opt_gen_movi(gen_args, args[0], 0, nb_temps, nb_globals);
+                tcg_opt_gen_movi(gen_args, args[0], 0);
                 args += 3;
                 gen_args += 2;
                 continue;
@@ -456,9 +481,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             }
             if (temps[args[2]].state == TCG_TEMP_CONST
                 && temps[args[2]].val == 0) {
-                if ((temps[args[0]].state == TCG_TEMP_COPY
-                    && temps[args[0]].val == args[1])
-                    || args[0] == args[1]) {
+                if (temps_are_copies(args[0], args[1])) {
                     gen_opc_buf[op_index] = INDEX_op_nop;
                 } else {
                     gen_opc_buf[op_index] = op_to_mov(op);
@@ -480,7 +503,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             if ((temps[args[2]].state == TCG_TEMP_CONST
                 && temps[args[2]].val == 0)) {
                 gen_opc_buf[op_index] = op_to_movi(op);
-                tcg_opt_gen_movi(gen_args, args[0], 0, nb_temps, nb_globals);
+                tcg_opt_gen_movi(gen_args, args[0], 0);
                 args += 3;
                 gen_args += 2;
                 continue;
@@ -495,7 +518,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
         CASE_OP_32_64(or):
         CASE_OP_32_64(and):
             if (args[1] == args[2]) {
-                if (args[1] == args[0]) {
+                if (temps_are_copies(args[0], args[1])) {
                     gen_opc_buf[op_index] = INDEX_op_nop;
                 } else {
                     gen_opc_buf[op_index] = op_to_mov(op);
@@ -515,9 +538,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
            allocator where needed and possible.  Also detect copies. */
         switch (op) {
         CASE_OP_32_64(mov):
-            if ((temps[args[1]].state == TCG_TEMP_COPY
-                && temps[args[1]].val == args[0])
-                || args[0] == args[1]) {
+            if (temps_are_copies(args[0], args[1])) {
                 args += 2;
                 gen_opc_buf[op_index] = INDEX_op_nop;
                 break;
@@ -535,7 +556,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             args[1] = temps[args[1]].val;
             /* fallthrough */
         CASE_OP_32_64(movi):
-            tcg_opt_gen_movi(gen_args, args[0], args[1], nb_temps, nb_globals);
+            tcg_opt_gen_movi(gen_args, args[0], args[1]);
             gen_args += 2;
             args += 2;
             break;
@@ -550,9 +571,9 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             if (temps[args[1]].state == TCG_TEMP_CONST) {
                 gen_opc_buf[op_index] = op_to_movi(op);
                 tmp = do_constant_folding(op, temps[args[1]].val, 0);
-                tcg_opt_gen_movi(gen_args, args[0], tmp, nb_temps, nb_globals);
+                tcg_opt_gen_movi(gen_args, args[0], tmp);
             } else {
-                reset_temp(args[0], nb_temps, nb_globals);
+                reset_temp(args[0]);
                 gen_args[0] = args[0];
                 gen_args[1] = args[1];
             }
@@ -580,10 +601,10 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                 gen_opc_buf[op_index] = op_to_movi(op);
                 tmp = do_constant_folding(op, temps[args[1]].val,
                                           temps[args[2]].val);
-                tcg_opt_gen_movi(gen_args, args[0], tmp, nb_temps, nb_globals);
+                tcg_opt_gen_movi(gen_args, args[0], tmp);
                 gen_args += 2;
             } else {
-                reset_temp(args[0], nb_temps, nb_globals);
+                reset_temp(args[0]);
                 gen_args[0] = args[0];
                 gen_args[1] = args[1];
                 gen_args[2] = args[2];
@@ -597,10 +618,10 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                 gen_opc_buf[op_index] = op_to_movi(op);
                 tmp = do_constant_folding_cond(op, temps[args[1]].val,
                                                temps[args[2]].val, args[3]);
-                tcg_opt_gen_movi(gen_args, args[0], tmp, nb_temps, nb_globals);
+                tcg_opt_gen_movi(gen_args, args[0], tmp);
                 gen_args += 2;
             } else {
-                reset_temp(args[0], nb_temps, nb_globals);
+                reset_temp(args[0]);
                 gen_args[0] = args[0];
                 gen_args[1] = args[1];
                 gen_args[2] = args[2];
@@ -623,7 +644,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                 }
             } else {
                 memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));
-                reset_temp(args[0], nb_temps, nb_globals);
+                reset_temp(args[0]);
                 gen_args[0] = args[0];
                 gen_args[1] = args[1];
                 gen_args[2] = args[2];
@@ -637,23 +658,19 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                 && temps[args[2]].state == TCG_TEMP_CONST) {
                 tmp = do_constant_folding_cond(op, temps[args[1]].val,
                                                temps[args[2]].val, args[5]);
-                if (args[0] == args[4-tmp]
-                    || (temps[args[4-tmp]].state == TCG_TEMP_COPY
-                        && temps[args[4-tmp]].val == args[0])) {
+                if (temps_are_copies(args[0], args[4-tmp])) {
                     gen_opc_buf[op_index] = INDEX_op_nop;
                 } else if (temps[args[4-tmp]].state == TCG_TEMP_CONST) {
                     gen_opc_buf[op_index] = op_to_movi(op);
-                    tcg_opt_gen_movi(gen_args, args[0], temps[args[4-tmp]].val,
-                                     nb_temps, nb_globals);
+                    tcg_opt_gen_movi(gen_args, args[0], temps[args[4-tmp]].val);
                     gen_args += 2;
                 } else {
                     gen_opc_buf[op_index] = op_to_mov(op);
-                    tcg_opt_gen_mov(gen_args, args[0], args[4-tmp],
-                                    nb_temps, nb_globals);
+                    tcg_opt_gen_mov(s, gen_args, args[0], args[4-tmp]);
                     gen_args += 2;
                 }
             } else {
-                reset_temp(args[0], nb_temps, nb_globals);
+                reset_temp(args[0]);
                 gen_args[0] = args[0];
                 gen_args[1] = args[1];
                 gen_args[2] = args[2];
@@ -668,11 +685,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             nb_call_args = (args[0] >> 16) + (args[0] & 0xffff);
             if (!(args[nb_call_args + 1] & (TCG_CALL_CONST | TCG_CALL_PURE))) {
                 for (i = 0; i < nb_globals; i++) {
-                    reset_temp(i, nb_temps, nb_globals);
+                    reset_temp(i);
                 }
             }
             for (i = 0; i < (args[0] >> 16); i++) {
-                reset_temp(args[i + 1], nb_temps, nb_globals);
+                reset_temp(args[i + 1]);
             }
             i = nb_call_args + 3;
             while (i) {
@@ -691,7 +708,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                 memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));
             } else {
                 for (i = 0; i < def->nb_oargs; i++) {
-                    reset_temp(args[i], nb_temps, nb_globals);
+                    reset_temp(args[i]);
                 }
             }
             for (i = 0; i < def->nb_args; i++) {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 04/10] tcg/optimize: do copy propagation for all operations
  2012-09-21 19:43 [Qemu-devel] [PATCH v2 00/10] tcg/optimize: rework copy propagation Aurelien Jarno
                   ` (2 preceding siblings ...)
  2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 03/10] tcg/optimize: rework copy progagation Aurelien Jarno
@ 2012-09-21 19:43 ` Aurelien Jarno
  2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 05/10] tcg/optimize: optimize "op r, a, a => mov r, a" Aurelien Jarno
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Aurelien Jarno @ 2012-09-21 19:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

It is possible to due copy propagation for all operations, even the one
that have side effects or clobber arguments (it only concerns input
arguments). That said, the call operation should be handled differently
due to the variable number of arguments.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c |   11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 1904b39..aeb2225 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -378,8 +378,15 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
         op = gen_opc_buf[op_index];
         def = &tcg_op_defs[op];
         /* Do copy propagation */
-        if (!(def->flags & (TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS))) {
-            assert(op != INDEX_op_call);
+        if (op == INDEX_op_call) {
+            int nb_oargs = args[0] >> 16;
+            int nb_iargs = args[0] & 0xffff;
+            for (i = nb_oargs + 1; i < nb_oargs + nb_iargs + 1; i++) {
+                if (temps[args[i]].state == TCG_TEMP_COPY) {
+                    args[i] = find_better_copy(s, args[i]);
+                }
+            }
+        } else {
             for (i = def->nb_oargs; i < def->nb_oargs + def->nb_iargs; i++) {
                 if (temps[args[i]].state == TCG_TEMP_COPY) {
                     args[i] = find_better_copy(s, args[i]);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 05/10] tcg/optimize: optimize "op r, a, a => mov r, a"
  2012-09-21 19:43 [Qemu-devel] [PATCH v2 00/10] tcg/optimize: rework copy propagation Aurelien Jarno
                   ` (3 preceding siblings ...)
  2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 04/10] tcg/optimize: do copy propagation for all operations Aurelien Jarno
@ 2012-09-21 19:43 ` Aurelien Jarno
  2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 06/10] tcg/optimize: optimize "op r, a, a => movi r, 0" Aurelien Jarno
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Aurelien Jarno @ 2012-09-21 19:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Now that we can easily detect all copies, we can optimize the
"op r, a, a => mov r, a" case a bit more.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index aeb2225..b9a7da9 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -524,7 +524,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
         switch (op) {
         CASE_OP_32_64(or):
         CASE_OP_32_64(and):
-            if (args[1] == args[2]) {
+            if (temps_are_copies(args[1], args[2])) {
                 if (temps_are_copies(args[0], args[1])) {
                     gen_opc_buf[op_index] = INDEX_op_nop;
                 } else {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 06/10] tcg/optimize: optimize "op r, a, a => movi r, 0"
  2012-09-21 19:43 [Qemu-devel] [PATCH v2 00/10] tcg/optimize: rework copy propagation Aurelien Jarno
                   ` (4 preceding siblings ...)
  2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 05/10] tcg/optimize: optimize "op r, a, a => mov r, a" Aurelien Jarno
@ 2012-09-21 19:43 ` Aurelien Jarno
  2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 07/10] tcg/optimize: further optimize brcond/movcond/setcond Aurelien Jarno
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Aurelien Jarno @ 2012-09-21 19:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Now that it's possible to detect copies, we can optimize the case
the "op r, a, a => movi r, 0". This helps in the computation of
overflow flags when one of the two args is 0.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index b9a7da9..ceea644 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -540,6 +540,22 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             break;
         }
 
+        /* Simplify expression for "op r, a, a => movi r, 0" cases */
+        switch (op) {
+        CASE_OP_32_64(sub):
+        CASE_OP_32_64(xor):
+            if (temps_are_copies(args[1], args[2])) {
+                gen_opc_buf[op_index] = op_to_movi(op);
+                tcg_opt_gen_movi(gen_args, args[0], 0);
+                gen_args += 2;
+                args += 3;
+                continue;
+            }
+            break;
+        default:
+            break;
+        }
+
         /* Propagate constants through copy operations and do constant
            folding.  Constants will be substituted to arguments by register
            allocator where needed and possible.  Also detect copies. */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 07/10] tcg/optimize: further optimize brcond/movcond/setcond
  2012-09-21 19:43 [Qemu-devel] [PATCH v2 00/10] tcg/optimize: rework copy propagation Aurelien Jarno
                   ` (5 preceding siblings ...)
  2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 06/10] tcg/optimize: optimize "op r, a, a => movi r, 0" Aurelien Jarno
@ 2012-09-21 19:43 ` Aurelien Jarno
  2012-09-21 23:17   ` Richard Henderson
  2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 08/10] tcg/optimize: prefer the "op a, a, b" form for commutative ops Aurelien Jarno
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 19+ messages in thread
From: Aurelien Jarno @ 2012-09-21 19:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

When both argument of brcond/movcond/setcond are the same or when one
of the two values is a constant equal to zero, it's possible to do
further optimizations.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c |  127 +++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 76 insertions(+), 51 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index ceea644..abe016a 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -292,58 +292,88 @@ static TCGArg do_constant_folding(TCGOpcode op, TCGArg x, TCGArg y)
     return res;
 }
 
+/* Return 2 if the condition can't be simplified, and the result
+   of the condition (0 or 1) if it can */
 static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
                                        TCGArg y, TCGCond c)
 {
-    switch (op_bits(op)) {
-    case 32:
+    if (temps[x].state == TCG_TEMP_CONST && temps[y].state == TCG_TEMP_CONST) {
+        switch (op_bits(op)) {
+        case 32:
+            switch (c) {
+            case TCG_COND_EQ:
+                return (uint32_t)temps[x].val == (uint32_t)temps[y].val;
+            case TCG_COND_NE:
+                return (uint32_t)temps[x].val != (uint32_t)temps[y].val;
+            case TCG_COND_LT:
+                return (int32_t)temps[x].val < (int32_t)temps[y].val;
+            case TCG_COND_GE:
+                return (int32_t)temps[x].val >= (int32_t)temps[y].val;
+            case TCG_COND_LE:
+                return (int32_t)temps[x].val <= (int32_t)temps[y].val;
+            case TCG_COND_GT:
+                return (int32_t)temps[x].val > (int32_t)temps[y].val;
+            case TCG_COND_LTU:
+                return (uint32_t)temps[x].val < (uint32_t)temps[y].val;
+            case TCG_COND_GEU:
+                return (uint32_t)temps[x].val >= (uint32_t)temps[y].val;
+            case TCG_COND_LEU:
+                return (uint32_t)temps[x].val <= (uint32_t)temps[y].val;
+            case TCG_COND_GTU:
+                return (uint32_t)temps[x].val > (uint32_t)temps[y].val;
+            }
+            break;
+        case 64:
+            switch (c) {
+            case TCG_COND_EQ:
+                return (uint64_t)temps[x].val == (uint64_t)temps[y].val;
+            case TCG_COND_NE:
+                return (uint64_t)temps[x].val != (uint64_t)temps[y].val;
+            case TCG_COND_LT:
+                return (int64_t)temps[x].val < (int64_t)temps[y].val;
+            case TCG_COND_GE:
+                return (int64_t)temps[x].val >= (int64_t)temps[y].val;
+            case TCG_COND_LE:
+                return (int64_t)temps[x].val <= (int64_t)temps[y].val;
+            case TCG_COND_GT:
+                return (int64_t)temps[x].val > (int64_t)temps[y].val;
+            case TCG_COND_LTU:
+                return (uint64_t)temps[x].val < (uint64_t)temps[y].val;
+            case TCG_COND_GEU:
+                return (uint64_t)temps[x].val >= (uint64_t)temps[y].val;
+            case TCG_COND_LEU:
+                return (uint64_t)temps[x].val <= (uint64_t)temps[y].val;
+            case TCG_COND_GTU:
+                return (uint64_t)temps[x].val > (uint64_t)temps[y].val;
+            }
+            break;
+        }
+    } else if (temps_are_copies(x, y)) {
         switch (c) {
-        case TCG_COND_EQ:
-            return (uint32_t)x == (uint32_t)y;
-        case TCG_COND_NE:
-            return (uint32_t)x != (uint32_t)y;
-        case TCG_COND_LT:
-            return (int32_t)x < (int32_t)y;
-        case TCG_COND_GE:
-            return (int32_t)x >= (int32_t)y;
-        case TCG_COND_LE:
-            return (int32_t)x <= (int32_t)y;
         case TCG_COND_GT:
-            return (int32_t)x > (int32_t)y;
         case TCG_COND_LTU:
-            return (uint32_t)x < (uint32_t)y;
-        case TCG_COND_GEU:
-            return (uint32_t)x >= (uint32_t)y;
-        case TCG_COND_LEU:
-            return (uint32_t)x <= (uint32_t)y;
+        case TCG_COND_LT:
         case TCG_COND_GTU:
-            return (uint32_t)x > (uint32_t)y;
-        }
-        break;
-    case 64:
-        switch (c) {
-        case TCG_COND_EQ:
-            return (uint64_t)x == (uint64_t)y;
         case TCG_COND_NE:
-            return (uint64_t)x != (uint64_t)y;
-        case TCG_COND_LT:
-            return (int64_t)x < (int64_t)y;
+            return 0;
         case TCG_COND_GE:
-            return (int64_t)x >= (int64_t)y;
+        case TCG_COND_GEU:
         case TCG_COND_LE:
-            return (int64_t)x <= (int64_t)y;
-        case TCG_COND_GT:
-            return (int64_t)x > (int64_t)y;
+        case TCG_COND_LEU:
+        case TCG_COND_EQ:
+            return 1;
+        }
+    } else if (temps[y].state == TCG_TEMP_CONST && temps[y].val == 0) {
+        switch (c) {
         case TCG_COND_LTU:
-            return (uint64_t)x < (uint64_t)y;
+            return 0;
         case TCG_COND_GEU:
-            return (uint64_t)x >= (uint64_t)y;
-        case TCG_COND_LEU:
-            return (uint64_t)x <= (uint64_t)y;
-        case TCG_COND_GTU:
-            return (uint64_t)x > (uint64_t)y;
+            return 1;
+        default:
+            return 2;
         }
-        break;
+    } else {
+        return 2;
     }
 
     fprintf(stderr,
@@ -636,11 +666,9 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             args += 3;
             break;
         CASE_OP_32_64(setcond):
-            if (temps[args[1]].state == TCG_TEMP_CONST
-                && temps[args[2]].state == TCG_TEMP_CONST) {
+            tmp = do_constant_folding_cond(op, args[1], args[2], args[3]);
+            if (tmp != 2) {
                 gen_opc_buf[op_index] = op_to_movi(op);
-                tmp = do_constant_folding_cond(op, temps[args[1]].val,
-                                               temps[args[2]].val, args[3]);
                 tcg_opt_gen_movi(gen_args, args[0], tmp);
                 gen_args += 2;
             } else {
@@ -654,10 +682,9 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             args += 4;
             break;
         CASE_OP_32_64(brcond):
-            if (temps[args[0]].state == TCG_TEMP_CONST
-                && temps[args[1]].state == TCG_TEMP_CONST) {
-                if (do_constant_folding_cond(op, temps[args[0]].val,
-                                             temps[args[1]].val, args[2])) {
+            tmp = do_constant_folding_cond(op, args[0], args[1], args[2]);
+            if (tmp != 2) {
+                if (tmp) {
                     memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));
                     gen_opc_buf[op_index] = INDEX_op_br;
                     gen_args[0] = args[3];
@@ -677,10 +704,8 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             args += 4;
             break;
         CASE_OP_32_64(movcond):
-            if (temps[args[1]].state == TCG_TEMP_CONST
-                && temps[args[2]].state == TCG_TEMP_CONST) {
-                tmp = do_constant_folding_cond(op, temps[args[1]].val,
-                                               temps[args[2]].val, args[5]);
+            tmp = do_constant_folding_cond(op, args[1], args[2], args[5]);
+            if (tmp != 2) {
                 if (temps_are_copies(args[0], args[4-tmp])) {
                     gen_opc_buf[op_index] = INDEX_op_nop;
                 } else if (temps[args[4-tmp]].state == TCG_TEMP_CONST) {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 08/10] tcg/optimize: prefer the "op a, a, b" form for commutative ops
  2012-09-21 19:43 [Qemu-devel] [PATCH v2 00/10] tcg/optimize: rework copy propagation Aurelien Jarno
                   ` (6 preceding siblings ...)
  2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 07/10] tcg/optimize: further optimize brcond/movcond/setcond Aurelien Jarno
@ 2012-09-21 19:43 ` Aurelien Jarno
  2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 09/10] tcg: remove #ifdef #endif around TCGOpcode tests Aurelien Jarno
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Aurelien Jarno @ 2012-09-21 19:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

The "op a, a, b" form is better handled on non-RISC host than the "op
a, b, a" form, so swap the arguments to this form when possible, and
when b is not a constant.

This reduces the number of generated instructions by a tiny bit.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index abe016a..c8ae50b 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -434,7 +434,10 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
         CASE_OP_32_64(eqv):
         CASE_OP_32_64(nand):
         CASE_OP_32_64(nor):
-            if (temps[args[1]].state == TCG_TEMP_CONST) {
+            /* Prefer the constant in second argument, and then the form
+               op a, a, b, which is better handled on non-RISC hosts. */
+            if (temps[args[1]].state == TCG_TEMP_CONST || (args[0] == args[2]
+                && temps[args[2]].state != TCG_TEMP_CONST)) {
                 tmp = args[1];
                 args[1] = args[2];
                 args[2] = tmp;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 09/10] tcg: remove #ifdef #endif around TCGOpcode tests
  2012-09-21 19:43 [Qemu-devel] [PATCH v2 00/10] tcg/optimize: rework copy propagation Aurelien Jarno
                   ` (7 preceding siblings ...)
  2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 08/10] tcg/optimize: prefer the "op a, a, b" form for commutative ops Aurelien Jarno
@ 2012-09-21 19:43 ` Aurelien Jarno
  2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 10/10] tcg/optimize: add constant folding for deposit Aurelien Jarno
  2012-09-21 23:23 ` [Qemu-devel] [PATCH v2 00/10] tcg/optimize: rework copy propagation Richard Henderson
  10 siblings, 0 replies; 19+ messages in thread
From: Aurelien Jarno @ 2012-09-21 19:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Commit 25c4d9cc changed all TCGOpcode enums to be available, so we don't
need to #ifdef #endif the one that are available only on some targets.
This makes the code easier to read.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/tcg.c |   13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index bb9c995..4afd035 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -937,11 +937,7 @@ void tcg_dump_ops(TCGContext *s)
                                                        args[nb_oargs + i]));
                 }
             }
-        } else if (c == INDEX_op_movi_i32 
-#if TCG_TARGET_REG_BITS == 64
-                   || c == INDEX_op_movi_i64
-#endif
-                   ) {
+        } else if (c == INDEX_op_movi_i32 || c == INDEX_op_movi_i64) {
             tcg_target_ulong val;
             TCGHelperInfo *th;
 
@@ -993,14 +989,11 @@ void tcg_dump_ops(TCGContext *s)
             case INDEX_op_brcond_i32:
             case INDEX_op_setcond_i32:
             case INDEX_op_movcond_i32:
-#if TCG_TARGET_REG_BITS == 32
             case INDEX_op_brcond2_i32:
             case INDEX_op_setcond2_i32:
-#else
             case INDEX_op_brcond_i64:
             case INDEX_op_setcond_i64:
             case INDEX_op_movcond_i64:
-#endif
                 if (args[k] < ARRAY_SIZE(cond_name) && cond_name[args[k]]) {
                     qemu_log(",%s", cond_name[args[k++]]);
                 } else {
@@ -2102,16 +2095,12 @@ static inline int tcg_gen_code_common(TCGContext *s, uint8_t *gen_code_buf,
 #endif
         switch(opc) {
         case INDEX_op_mov_i32:
-#if TCG_TARGET_REG_BITS == 64
         case INDEX_op_mov_i64:
-#endif
             dead_args = s->op_dead_args[op_index];
             tcg_reg_alloc_mov(s, def, args, dead_args);
             break;
         case INDEX_op_movi_i32:
-#if TCG_TARGET_REG_BITS == 64
         case INDEX_op_movi_i64:
-#endif
             tcg_reg_alloc_movi(s, args);
             break;
         case INDEX_op_debug_insn_start:
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 10/10] tcg/optimize: add constant folding for deposit
  2012-09-21 19:43 [Qemu-devel] [PATCH v2 00/10] tcg/optimize: rework copy propagation Aurelien Jarno
                   ` (8 preceding siblings ...)
  2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 09/10] tcg: remove #ifdef #endif around TCGOpcode tests Aurelien Jarno
@ 2012-09-21 19:43 ` Aurelien Jarno
  2012-09-21 23:22   ` Richard Henderson
  2012-09-21 23:23 ` [Qemu-devel] [PATCH v2 00/10] tcg/optimize: rework copy propagation Richard Henderson
  10 siblings, 1 reply; 19+ messages in thread
From: Aurelien Jarno @ 2012-09-21 19:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c |   20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index c8ae50b..35532a1 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -668,6 +668,26 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             }
             args += 3;
             break;
+        CASE_OP_32_64(deposit):
+            if (temps[args[1]].state == TCG_TEMP_CONST
+                && temps[args[2]].state == TCG_TEMP_CONST) {
+                gen_opc_buf[op_index] = op_to_movi(op);
+                tmp = ((1ull << args[4]) - 1);
+                tmp = (temps[args[1]].val & ~(tmp << args[3]))
+                      | ((temps[args[2]].val & tmp) << args[3]);
+                tcg_opt_gen_movi(gen_args, args[0], tmp);
+                gen_args += 2;
+            } else {
+                reset_temp(args[0]);
+                gen_args[0] = args[0];
+                gen_args[1] = args[1];
+                gen_args[2] = args[2];
+                gen_args[3] = args[3];
+                gen_args[4] = args[4];
+                gen_args += 5;
+            }
+            args += 5;
+            break;
         CASE_OP_32_64(setcond):
             tmp = do_constant_folding_cond(op, args[1], args[2], args[3]);
             if (tmp != 2) {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/10] tcg/optimize: further optimize brcond/movcond/setcond
  2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 07/10] tcg/optimize: further optimize brcond/movcond/setcond Aurelien Jarno
@ 2012-09-21 23:17   ` Richard Henderson
  2012-09-22  9:35     ` Aurelien Jarno
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Henderson @ 2012-09-21 23:17 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 09/21/2012 12:43 PM, Aurelien Jarno wrote:
> +/* Return 2 if the condition can't be simplified, and the result
> +   of the condition (0 or 1) if it can */

Ok as-is.  But I'd bike-shed the "fail" return to -1.


r~

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/10] tcg/optimize: add constant folding for deposit
  2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 10/10] tcg/optimize: add constant folding for deposit Aurelien Jarno
@ 2012-09-21 23:22   ` Richard Henderson
  2012-09-22  9:41     ` Aurelien Jarno
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Henderson @ 2012-09-21 23:22 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 09/21/2012 12:43 PM, Aurelien Jarno wrote:
> +                tmp = ((1ull << args[4]) - 1);
> +                tmp = (temps[args[1]].val & ~(tmp << args[3]))
> +                      | ((temps[args[2]].val & tmp) << args[3]);
> +                tcg_opt_gen_movi(gen_args, args[0], tmp);

We do have a deposit64 function in bitops.h, fwiw.

Though, really, bitops.h has some crappy functions in it that are
redundant with the better functions in host-utils.h...


r~

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/10] tcg/optimize: rework copy propagation
  2012-09-21 19:43 [Qemu-devel] [PATCH v2 00/10] tcg/optimize: rework copy propagation Aurelien Jarno
                   ` (9 preceding siblings ...)
  2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 10/10] tcg/optimize: add constant folding for deposit Aurelien Jarno
@ 2012-09-21 23:23 ` Richard Henderson
  10 siblings, 0 replies; 19+ messages in thread
From: Richard Henderson @ 2012-09-21 23:23 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 09/21/2012 12:43 PM, Aurelien Jarno wrote:
> Changes v1 -> v2
>  - fixed conflict in patch 3 following movcond introduction
>  - movcond optimization added in patch 7
>  - added patch 10
> 
> Aurelien Jarno (10):
>   tcg/optimize: remove TCG_TEMP_ANY
>   tcg/optimize: check types in copy propagation
>   tcg/optimize: rework copy progagation
>   tcg/optimize: do copy propagation for all operations
>   tcg/optimize: optimize "op r, a, a => mov r, a"
>   tcg/optimize: optimize "op r, a, a => movi r, 0"
>   tcg/optimize: further optimize brcond/movcond/setcond
>   tcg/optimize: prefer the "op a, a, b" form for commutative ops
>   tcg: remove #ifdef #endif around TCGOpcode tests
>   tcg/optimize: add constant folding for deposit

All 10 now

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/10] tcg/optimize: further optimize brcond/movcond/setcond
  2012-09-21 23:17   ` Richard Henderson
@ 2012-09-22  9:35     ` Aurelien Jarno
  2012-09-22 12:46       ` Richard Henderson
  0 siblings, 1 reply; 19+ messages in thread
From: Aurelien Jarno @ 2012-09-22  9:35 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Fri, Sep 21, 2012 at 04:17:08PM -0700, Richard Henderson wrote:
> On 09/21/2012 12:43 PM, Aurelien Jarno wrote:
> > +/* Return 2 if the condition can't be simplified, and the result
> > +   of the condition (0 or 1) if it can */
> 
> Ok as-is.  But I'd bike-shed the "fail" return to -1.
> 

I have done it that way, as tmp is an unsigned value, and I didn't want
to add another one. But that can be changed.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/10] tcg/optimize: add constant folding for deposit
  2012-09-21 23:22   ` Richard Henderson
@ 2012-09-22  9:41     ` Aurelien Jarno
  2012-09-22 12:45       ` Richard Henderson
  2012-09-22 14:41       ` Blue Swirl
  0 siblings, 2 replies; 19+ messages in thread
From: Aurelien Jarno @ 2012-09-22  9:41 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Fri, Sep 21, 2012 at 04:22:53PM -0700, Richard Henderson wrote:
> On 09/21/2012 12:43 PM, Aurelien Jarno wrote:
> > +                tmp = ((1ull << args[4]) - 1);
> > +                tmp = (temps[args[1]].val & ~(tmp << args[3]))
> > +                      | ((temps[args[2]].val & tmp) << args[3]);
> > +                tcg_opt_gen_movi(gen_args, args[0], tmp);
> 
> We do have a deposit64 function in bitops.h, fwiw.
> 
> Though, really, bitops.h has some crappy functions in it that are
> redundant with the better functions in host-utils.h...
> 

Is there a reason that bitops.h is not used from tcg/*, while
host-utils.h is? Licensing issue maybe (bitops.h is LGPL, host-utils.h
is BSD like).


-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/10] tcg/optimize: add constant folding for deposit
  2012-09-22  9:41     ` Aurelien Jarno
@ 2012-09-22 12:45       ` Richard Henderson
  2012-09-22 14:41       ` Blue Swirl
  1 sibling, 0 replies; 19+ messages in thread
From: Richard Henderson @ 2012-09-22 12:45 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 2012-09-22 02:41, Aurelien Jarno wrote:
> On Fri, Sep 21, 2012 at 04:22:53PM -0700, Richard Henderson wrote:
>> On 09/21/2012 12:43 PM, Aurelien Jarno wrote:
>>> +                tmp = ((1ull << args[4]) - 1);
>>> +                tmp = (temps[args[1]].val & ~(tmp << args[3]))
>>> +                      | ((temps[args[2]].val & tmp) << args[3]);
>>> +                tcg_opt_gen_movi(gen_args, args[0], tmp);
>>
>> We do have a deposit64 function in bitops.h, fwiw.
>>
>> Though, really, bitops.h has some crappy functions in it that are
>> redundant with the better functions in host-utils.h...
>>
> 
> Is there a reason that bitops.h is not used from tcg/*, while
> host-utils.h is? Licensing issue maybe (bitops.h is LGPL, host-utils.h
> is BSD like).

I assumed without looking that bitops.h is newer.


r~

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/10] tcg/optimize: further optimize brcond/movcond/setcond
  2012-09-22  9:35     ` Aurelien Jarno
@ 2012-09-22 12:46       ` Richard Henderson
  0 siblings, 0 replies; 19+ messages in thread
From: Richard Henderson @ 2012-09-22 12:46 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 2012-09-22 02:35, Aurelien Jarno wrote:
> On Fri, Sep 21, 2012 at 04:17:08PM -0700, Richard Henderson wrote:
>> On 09/21/2012 12:43 PM, Aurelien Jarno wrote:
>>> +/* Return 2 if the condition can't be simplified, and the result
>>> +   of the condition (0 or 1) if it can */
>>
>> Ok as-is.  But I'd bike-shed the "fail" return to -1.
>>
> 
> I have done it that way, as tmp is an unsigned value, and I didn't want
> to add another one. But that can be changed.
> 

Fair enough.  Green it is for the shed.


r~

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/10] tcg/optimize: add constant folding for deposit
  2012-09-22  9:41     ` Aurelien Jarno
  2012-09-22 12:45       ` Richard Henderson
@ 2012-09-22 14:41       ` Blue Swirl
  1 sibling, 0 replies; 19+ messages in thread
From: Blue Swirl @ 2012-09-22 14:41 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel, Richard Henderson

On Sat, Sep 22, 2012 at 9:41 AM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> On Fri, Sep 21, 2012 at 04:22:53PM -0700, Richard Henderson wrote:
>> On 09/21/2012 12:43 PM, Aurelien Jarno wrote:
>> > +                tmp = ((1ull << args[4]) - 1);
>> > +                tmp = (temps[args[1]].val & ~(tmp << args[3]))
>> > +                      | ((temps[args[2]].val & tmp) << args[3]);
>> > +                tcg_opt_gen_movi(gen_args, args[0], tmp);
>>
>> We do have a deposit64 function in bitops.h, fwiw.
>>
>> Though, really, bitops.h has some crappy functions in it that are
>> redundant with the better functions in host-utils.h...

Better code should always replace bad code.

>>
>
> Is there a reason that bitops.h is not used from tcg/*, while
> host-utils.h is? Licensing issue maybe (bitops.h is LGPL, host-utils.h
> is BSD like).

No, it's just that bitops were added without converting all possible users.

>
>
> --
> Aurelien Jarno                          GPG: 1024D/F1BCDB73
> aurelien@aurel32.net                 http://www.aurel32.net
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-09-22 14:42 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-21 19:43 [Qemu-devel] [PATCH v2 00/10] tcg/optimize: rework copy propagation Aurelien Jarno
2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 01/10] tcg/optimize: remove TCG_TEMP_ANY Aurelien Jarno
2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 02/10] tcg/optimize: check types in copy propagation Aurelien Jarno
2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 03/10] tcg/optimize: rework copy progagation Aurelien Jarno
2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 04/10] tcg/optimize: do copy propagation for all operations Aurelien Jarno
2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 05/10] tcg/optimize: optimize "op r, a, a => mov r, a" Aurelien Jarno
2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 06/10] tcg/optimize: optimize "op r, a, a => movi r, 0" Aurelien Jarno
2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 07/10] tcg/optimize: further optimize brcond/movcond/setcond Aurelien Jarno
2012-09-21 23:17   ` Richard Henderson
2012-09-22  9:35     ` Aurelien Jarno
2012-09-22 12:46       ` Richard Henderson
2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 08/10] tcg/optimize: prefer the "op a, a, b" form for commutative ops Aurelien Jarno
2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 09/10] tcg: remove #ifdef #endif around TCGOpcode tests Aurelien Jarno
2012-09-21 19:43 ` [Qemu-devel] [PATCH v2 10/10] tcg/optimize: add constant folding for deposit Aurelien Jarno
2012-09-21 23:22   ` Richard Henderson
2012-09-22  9:41     ` Aurelien Jarno
2012-09-22 12:45       ` Richard Henderson
2012-09-22 14:41       ` Blue Swirl
2012-09-21 23:23 ` [Qemu-devel] [PATCH v2 00/10] tcg/optimize: rework copy propagation Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).