qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/9] Improve TCG optimizer
@ 2012-09-07 13:16 Aurelien Jarno
  2012-09-07 13:16 ` [Qemu-devel] [PATCH v2 1/9] tcg: improve profiler Aurelien Jarno
                   ` (10 more replies)
  0 siblings, 11 replies; 17+ messages in thread
From: Aurelien Jarno @ 2012-09-07 13:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

This patch series improves the TCG optimizer, based on patterns found
while executing various guest. The brcond ad setcond constant folding
are useful especially useful when they are used to avoid some argument
values (e.g. division by 0), and thus can be optimized when this argument
is a constant.

This bring around 0.5% improvement on openssl like benchmarks.


Modifications between V1 and V2 following feedback I got:
 - In the first patch, account for the liveness analysis time and 
   optimizing pass time separately
 - Fixed swith/break in patch 7 to correctly throw an error
 - Added patch 9 to make the code more readable
Other patches are unmodified.


Aurelien Jarno (9):
  tcg: improve profiler
  tcg/optimize: split expression simplification
  tcg/optimize: simplify or/xor r, a, 0 cases
  tcg/optimize: simplify and r, a, 0 cases
  tcg/optimize: simplify shift/rot r, 0, a => movi r, 0 cases
  tcg/optimize: swap brcond/setcond arguments when possible
  tcg/optimize: add constant folding for setcond
  tcg/optimize: add constant folding for brcond
  tcg/optimize: fix if/else/break coding style

 tcg/optimize.c |  179 +++++++++++++++++++++++++++++++++++++++++++++++++++-----
 tcg/tcg.c      |   12 +++-
 tcg/tcg.h      |    1 +
 3 files changed, 175 insertions(+), 17 deletions(-)

--
1.7.10.4

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 1/9] tcg: improve profiler
  2012-09-07 13:16 [Qemu-devel] [PATCH v2 0/9] Improve TCG optimizer Aurelien Jarno
@ 2012-09-07 13:16 ` Aurelien Jarno
  2012-09-07 13:16 ` [Qemu-devel] [PATCH v2 2/9] tcg/optimize: split expression simplification Aurelien Jarno
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Aurelien Jarno @ 2012-09-07 13:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Now that there are two passes of optimization (optimize.c, liveness)
there is no point of outputing the statistics of the liveness part
only. Update the code to take into account both optimizations.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/tcg.c |   12 +++++++++++-
 tcg/tcg.h |    1 +
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 8386b70..a4e7f42 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -2059,22 +2059,29 @@ static inline int tcg_gen_code_common(TCGContext *s, uint8_t *gen_code_buf,
     }
 #endif
 
+#ifdef CONFIG_PROFILER
+    s->opt_time -= profile_getclock();
+#endif
+
 #ifdef USE_TCG_OPTIMIZATIONS
     gen_opparam_ptr =
         tcg_optimize(s, gen_opc_ptr, gen_opparam_buf, tcg_op_defs);
 #endif
 
 #ifdef CONFIG_PROFILER
+    s->opt_time += profile_getclock();
     s->la_time -= profile_getclock();
 #endif
+
     tcg_liveness_analysis(s);
+
 #ifdef CONFIG_PROFILER
     s->la_time += profile_getclock();
 #endif
 
 #ifdef DEBUG_DISAS
     if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP_OPT))) {
-        qemu_log("OP after liveness analysis:\n");
+        qemu_log("OP after optimization and liveness analysis:\n");
         tcg_dump_ops(s);
         qemu_log("\n");
     }
@@ -2241,6 +2248,9 @@ void tcg_dump_info(FILE *f, fprintf_function cpu_fprintf)
                 (double)s->interm_time / tot * 100.0);
     cpu_fprintf(f, "  gen_code time     %0.1f%%\n", 
                 (double)s->code_time / tot * 100.0);
+    cpu_fprintf(f, "optim./code time    %0.1f%%\n",
+                (double)s->opt_time / (s->code_time ? s->code_time : 1)
+                * 100.0);
     cpu_fprintf(f, "liveness/code time  %0.1f%%\n", 
                 (double)s->la_time / (s->code_time ? s->code_time : 1) * 100.0);
     cpu_fprintf(f, "cpu_restore count   %" PRId64 "\n",
diff --git a/tcg/tcg.h b/tcg/tcg.h
index d710694..7a72729 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -382,6 +382,7 @@ struct TCGContext {
     int64_t interm_time;
     int64_t code_time;
     int64_t la_time;
+    int64_t opt_time;
     int64_t restore_count;
     int64_t restore_time;
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 2/9] tcg/optimize: split expression simplification
  2012-09-07 13:16 [Qemu-devel] [PATCH v2 0/9] Improve TCG optimizer Aurelien Jarno
  2012-09-07 13:16 ` [Qemu-devel] [PATCH v2 1/9] tcg: improve profiler Aurelien Jarno
@ 2012-09-07 13:16 ` Aurelien Jarno
  2012-09-07 13:16 ` [Qemu-devel] [PATCH v2 3/9] tcg/optimize: simplify or/xor r, a, 0 cases Aurelien Jarno
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Aurelien Jarno @ 2012-09-07 13:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Split expression simplification in multiple parts so that a given op
can appear multiple times. This patch should not change anything.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c |   14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 9c65474..63f970d 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -322,7 +322,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             break;
         }
 
-        /* Simplify expression if possible. */
+        /* Simplify expression for "op r, a, 0 => mov r, a" cases */
         switch (op) {
         CASE_OP_32_64(add):
         CASE_OP_32_64(sub):
@@ -352,6 +352,12 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                 continue;
             }
             break;
+        default:
+            break;
+        }
+
+        /* Simplify expression for "op r, a, 0 => movi r, 0" cases */
+        switch (op) {
         CASE_OP_32_64(mul):
             if ((temps[args[2]].state == TCG_TEMP_CONST
                 && temps[args[2]].val == 0)) {
@@ -362,6 +368,12 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                 continue;
             }
             break;
+        default:
+            break;
+        }
+
+        /* Simplify expression for "op r, a, a => mov r, a" cases */
+        switch (op) {
         CASE_OP_32_64(or):
         CASE_OP_32_64(and):
             if (args[1] == args[2]) {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 3/9] tcg/optimize: simplify or/xor r, a, 0 cases
  2012-09-07 13:16 [Qemu-devel] [PATCH v2 0/9] Improve TCG optimizer Aurelien Jarno
  2012-09-07 13:16 ` [Qemu-devel] [PATCH v2 1/9] tcg: improve profiler Aurelien Jarno
  2012-09-07 13:16 ` [Qemu-devel] [PATCH v2 2/9] tcg/optimize: split expression simplification Aurelien Jarno
@ 2012-09-07 13:16 ` Aurelien Jarno
  2012-09-07 13:16 ` [Qemu-devel] [PATCH v2 4/9] tcg/optimize: simplify and " Aurelien Jarno
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Aurelien Jarno @ 2012-09-07 13:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

or/xor r, a, 0 is equivalent to a mov r, a.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 63f970d..0db849e 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -331,6 +331,8 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
         CASE_OP_32_64(sar):
         CASE_OP_32_64(rotl):
         CASE_OP_32_64(rotr):
+        CASE_OP_32_64(or):
+        CASE_OP_32_64(xor):
             if (temps[args[1]].state == TCG_TEMP_CONST) {
                 /* Proceed with possible constant folding. */
                 break;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 4/9] tcg/optimize: simplify and r, a, 0 cases
  2012-09-07 13:16 [Qemu-devel] [PATCH v2 0/9] Improve TCG optimizer Aurelien Jarno
                   ` (2 preceding siblings ...)
  2012-09-07 13:16 ` [Qemu-devel] [PATCH v2 3/9] tcg/optimize: simplify or/xor r, a, 0 cases Aurelien Jarno
@ 2012-09-07 13:16 ` Aurelien Jarno
  2012-09-07 13:16 ` [Qemu-devel] [PATCH v2 5/9] tcg/optimize: simplify shift/rot r, 0, a => movi r, " Aurelien Jarno
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Aurelien Jarno @ 2012-09-07 13:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

and r, a, 0 is equivalent to a movi r, 0.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 0db849e..c12cb2b 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -360,6 +360,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
 
         /* Simplify expression for "op r, a, 0 => movi r, 0" cases */
         switch (op) {
+        CASE_OP_32_64(and):
         CASE_OP_32_64(mul):
             if ((temps[args[2]].state == TCG_TEMP_CONST
                 && temps[args[2]].val == 0)) {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 5/9] tcg/optimize: simplify shift/rot r, 0, a => movi r, 0 cases
  2012-09-07 13:16 [Qemu-devel] [PATCH v2 0/9] Improve TCG optimizer Aurelien Jarno
                   ` (3 preceding siblings ...)
  2012-09-07 13:16 ` [Qemu-devel] [PATCH v2 4/9] tcg/optimize: simplify and " Aurelien Jarno
@ 2012-09-07 13:16 ` Aurelien Jarno
  2012-09-07 13:16 ` [Qemu-devel] [PATCH v2 6/9] tcg/optimize: swap brcond/setcond arguments when possible Aurelien Jarno
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Aurelien Jarno @ 2012-09-07 13:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

shift/rot r, 0, a is equivalent to movi r, 0.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c |   20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index c12cb2b..1698ba3 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -322,6 +322,26 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             break;
         }
 
+        /* Simplify expressions for "shift/rot r, 0, a => movi r, 0" */
+        switch (op) {
+        CASE_OP_32_64(shl):
+        CASE_OP_32_64(shr):
+        CASE_OP_32_64(sar):
+        CASE_OP_32_64(rotl):
+        CASE_OP_32_64(rotr):
+            if (temps[args[1]].state == TCG_TEMP_CONST
+                && temps[args[1]].val == 0) {
+                gen_opc_buf[op_index] = op_to_movi(op);
+                tcg_opt_gen_movi(gen_args, args[0], 0, nb_temps, nb_globals);
+                args += 3;
+                gen_args += 2;
+                continue;
+            }
+            break;
+        default:
+            break;
+        }
+
         /* Simplify expression for "op r, a, 0 => mov r, a" cases */
         switch (op) {
         CASE_OP_32_64(add):
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 6/9] tcg/optimize: swap brcond/setcond arguments when possible
  2012-09-07 13:16 [Qemu-devel] [PATCH v2 0/9] Improve TCG optimizer Aurelien Jarno
                   ` (4 preceding siblings ...)
  2012-09-07 13:16 ` [Qemu-devel] [PATCH v2 5/9] tcg/optimize: simplify shift/rot r, 0, a => movi r, " Aurelien Jarno
@ 2012-09-07 13:16 ` Aurelien Jarno
  2012-09-07 13:16 ` [Qemu-devel] [PATCH v2 7/9] tcg/optimize: add constant folding for setcond Aurelien Jarno
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Aurelien Jarno @ 2012-09-07 13:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

brcond and setcond ops are not commutative, but it's easy to compute the
new condition after swapping the arguments. Try to always put the constant
argument in second position like for commutative ops, to help backends to
generate better code.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c |   18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 1698ba3..7debc8a 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -318,6 +318,24 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                 args[2] = tmp;
             }
             break;
+        CASE_OP_32_64(brcond):
+            if (temps[args[0]].state == TCG_TEMP_CONST
+                && temps[args[1]].state != TCG_TEMP_CONST) {
+                tmp = args[0];
+                args[0] = args[1];
+                args[1] = tmp;
+                args[2] = tcg_swap_cond(args[2]);
+            }
+            break;
+        CASE_OP_32_64(setcond):
+            if (temps[args[1]].state == TCG_TEMP_CONST
+                && temps[args[2]].state != TCG_TEMP_CONST) {
+                tmp = args[1];
+                args[1] = args[2];
+                args[2] = tmp;
+                args[3] = tcg_swap_cond(args[3]);
+            }
+            break;
         default:
             break;
         }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 7/9] tcg/optimize: add constant folding for setcond
  2012-09-07 13:16 [Qemu-devel] [PATCH v2 0/9] Improve TCG optimizer Aurelien Jarno
                   ` (5 preceding siblings ...)
  2012-09-07 13:16 ` [Qemu-devel] [PATCH v2 6/9] tcg/optimize: swap brcond/setcond arguments when possible Aurelien Jarno
@ 2012-09-07 13:16 ` Aurelien Jarno
  2012-09-07 13:16 ` [Qemu-devel] [PATCH v2 8/9] tcg/optimize: add constant folding for brcond Aurelien Jarno
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Aurelien Jarno @ 2012-09-07 13:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c |   81 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 7debc8a..1cb1f36 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -267,6 +267,67 @@ static TCGArg do_constant_folding(TCGOpcode op, TCGArg x, TCGArg y)
     return res;
 }
 
+static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
+                                       TCGArg y, TCGCond c)
+{
+    switch (op_bits(op)) {
+    case 32:
+        switch (c) {
+        case TCG_COND_EQ:
+            return (uint32_t)x == (uint32_t)y;
+        case TCG_COND_NE:
+            return (uint32_t)x != (uint32_t)y;
+        case TCG_COND_LT:
+            return (int32_t)x < (int32_t)y;
+        case TCG_COND_GE:
+            return (int32_t)x >= (int32_t)y;
+        case TCG_COND_LE:
+            return (int32_t)x <= (int32_t)y;
+        case TCG_COND_GT:
+            return (int32_t)x > (int32_t)y;
+        case TCG_COND_LTU:
+            return (uint32_t)x < (uint32_t)y;
+        case TCG_COND_GEU:
+            return (uint32_t)x >= (uint32_t)y;
+        case TCG_COND_LEU:
+            return (uint32_t)x <= (uint32_t)y;
+        case TCG_COND_GTU:
+            return (uint32_t)x > (uint32_t)y;
+        }
+        break;
+    case 64:
+        switch (c) {
+        case TCG_COND_EQ:
+            return (uint64_t)x == (uint64_t)y;
+        case TCG_COND_NE:
+            return (uint64_t)x != (uint64_t)y;
+        case TCG_COND_LT:
+            return (int64_t)x < (int64_t)y;
+        case TCG_COND_GE:
+            return (int64_t)x >= (int64_t)y;
+        case TCG_COND_LE:
+            return (int64_t)x <= (int64_t)y;
+        case TCG_COND_GT:
+            return (int64_t)x > (int64_t)y;
+        case TCG_COND_LTU:
+            return (uint64_t)x < (uint64_t)y;
+        case TCG_COND_GEU:
+            return (uint64_t)x >= (uint64_t)y;
+        case TCG_COND_LEU:
+            return (uint64_t)x <= (uint64_t)y;
+        case TCG_COND_GTU:
+            return (uint64_t)x > (uint64_t)y;
+        }
+        break;
+    }
+
+    fprintf(stderr,
+            "Unrecognized bitness %d or condition %d in "
+            "do_constant_folding_cond.\n", op_bits(op), c);
+    tcg_abort();
+}
+
+
 /* Propagate constants and copies, fold constant expressions. */
 static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                                     TCGArg *args, TCGOpDef *tcg_op_defs)
@@ -522,6 +583,26 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                 args += 3;
                 break;
             }
+        CASE_OP_32_64(setcond):
+            if (temps[args[1]].state == TCG_TEMP_CONST
+                && temps[args[2]].state == TCG_TEMP_CONST) {
+                gen_opc_buf[op_index] = op_to_movi(op);
+                tmp = do_constant_folding_cond(op, temps[args[1]].val,
+                                               temps[args[2]].val, args[3]);
+                tcg_opt_gen_movi(gen_args, args[0], tmp, nb_temps, nb_globals);
+                gen_args += 2;
+                args += 4;
+                break;
+            } else {
+                reset_temp(args[0], nb_temps, nb_globals);
+                gen_args[0] = args[0];
+                gen_args[1] = args[1];
+                gen_args[2] = args[2];
+                gen_args[3] = args[3];
+                gen_args += 4;
+                args += 4;
+                break;
+            }
         case INDEX_op_call:
             nb_call_args = (args[0] >> 16) + (args[0] & 0xffff);
             if (!(args[nb_call_args + 1] & (TCG_CALL_CONST | TCG_CALL_PURE))) {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 8/9] tcg/optimize: add constant folding for brcond
  2012-09-07 13:16 [Qemu-devel] [PATCH v2 0/9] Improve TCG optimizer Aurelien Jarno
                   ` (6 preceding siblings ...)
  2012-09-07 13:16 ` [Qemu-devel] [PATCH v2 7/9] tcg/optimize: add constant folding for setcond Aurelien Jarno
@ 2012-09-07 13:16 ` Aurelien Jarno
  2012-09-07 13:16 ` [Qemu-devel] [PATCH v2 9/9] tcg/optimize: fix if/else/break coding style Aurelien Jarno
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Aurelien Jarno @ 2012-09-07 13:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c |   27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 1cb1f36..156e8d9 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -603,6 +603,32 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                 args += 4;
                 break;
             }
+        CASE_OP_32_64(brcond):
+            if (temps[args[0]].state == TCG_TEMP_CONST
+                && temps[args[1]].state == TCG_TEMP_CONST) {
+                if (do_constant_folding_cond(op, temps[args[0]].val,
+                                             temps[args[1]].val, args[2])) {
+                    memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));
+                    gen_opc_buf[op_index] = INDEX_op_br;
+                    gen_args[0] = args[3];
+                    gen_args += 1;
+                    args += 4;
+                } else {
+                    gen_opc_buf[op_index] = INDEX_op_nop;
+                    args += 4;
+                }
+                break;
+            } else {
+                memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));
+                reset_temp(args[0], nb_temps, nb_globals);
+                gen_args[0] = args[0];
+                gen_args[1] = args[1];
+                gen_args[2] = args[2];
+                gen_args[3] = args[3];
+                gen_args += 4;
+                args += 4;
+                break;
+            }
         case INDEX_op_call:
             nb_call_args = (args[0] >> 16) + (args[0] & 0xffff);
             if (!(args[nb_call_args + 1] & (TCG_CALL_CONST | TCG_CALL_PURE))) {
@@ -624,7 +650,6 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
         case INDEX_op_set_label:
         case INDEX_op_jmp:
         case INDEX_op_br:
-        CASE_OP_32_64(brcond):
             memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));
             for (i = 0; i < def->nb_args; i++) {
                 *gen_args = *args;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2 9/9] tcg/optimize: fix if/else/break coding style
  2012-09-07 13:16 [Qemu-devel] [PATCH v2 0/9] Improve TCG optimizer Aurelien Jarno
                   ` (7 preceding siblings ...)
  2012-09-07 13:16 ` [Qemu-devel] [PATCH v2 8/9] tcg/optimize: add constant folding for brcond Aurelien Jarno
@ 2012-09-07 13:16 ` Aurelien Jarno
  2012-09-08  8:18 ` [Qemu-devel] [PATCH v2 0/9] Improve TCG optimizer Blue Swirl
  2012-09-10 13:55 ` Richard Henderson
  10 siblings, 0 replies; 17+ messages in thread
From: Aurelien Jarno @ 2012-09-07 13:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

optimizer.c contains some cases were the break is appearing in both the
if and the else parts. Fix that by moving it to the outer part. Also
move some common code there.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c |   34 +++++++++++-----------------------
 1 file changed, 11 insertions(+), 23 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 156e8d9..fba0ed9 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -441,15 +441,14 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                 if ((temps[args[0]].state == TCG_TEMP_COPY
                     && temps[args[0]].val == args[1])
                     || args[0] == args[1]) {
-                    args += 3;
                     gen_opc_buf[op_index] = INDEX_op_nop;
                 } else {
                     gen_opc_buf[op_index] = op_to_mov(op);
                     tcg_opt_gen_mov(s, gen_args, args[0], args[1],
                                     nb_temps, nb_globals);
                     gen_args += 2;
-                    args += 3;
                 }
+                args += 3;
                 continue;
             }
             break;
@@ -480,15 +479,14 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
         CASE_OP_32_64(and):
             if (args[1] == args[2]) {
                 if (args[1] == args[0]) {
-                    args += 3;
                     gen_opc_buf[op_index] = INDEX_op_nop;
                 } else {
                     gen_opc_buf[op_index] = op_to_mov(op);
                     tcg_opt_gen_mov(s, gen_args, args[0], args[1], nb_temps,
                                     nb_globals);
                     gen_args += 2;
-                    args += 3;
                 }
+                args += 3;
                 continue;
             }
             break;
@@ -538,17 +536,14 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                 gen_opc_buf[op_index] = op_to_movi(op);
                 tmp = do_constant_folding(op, temps[args[1]].val, 0);
                 tcg_opt_gen_movi(gen_args, args[0], tmp, nb_temps, nb_globals);
-                gen_args += 2;
-                args += 2;
-                break;
             } else {
                 reset_temp(args[0], nb_temps, nb_globals);
                 gen_args[0] = args[0];
                 gen_args[1] = args[1];
-                gen_args += 2;
-                args += 2;
-                break;
             }
+            gen_args += 2;
+            args += 2;
+            break;
         CASE_OP_32_64(add):
         CASE_OP_32_64(sub):
         CASE_OP_32_64(mul):
@@ -572,17 +567,15 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                                           temps[args[2]].val);
                 tcg_opt_gen_movi(gen_args, args[0], tmp, nb_temps, nb_globals);
                 gen_args += 2;
-                args += 3;
-                break;
             } else {
                 reset_temp(args[0], nb_temps, nb_globals);
                 gen_args[0] = args[0];
                 gen_args[1] = args[1];
                 gen_args[2] = args[2];
                 gen_args += 3;
-                args += 3;
-                break;
             }
+            args += 3;
+            break;
         CASE_OP_32_64(setcond):
             if (temps[args[1]].state == TCG_TEMP_CONST
                 && temps[args[2]].state == TCG_TEMP_CONST) {
@@ -591,8 +584,6 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                                                temps[args[2]].val, args[3]);
                 tcg_opt_gen_movi(gen_args, args[0], tmp, nb_temps, nb_globals);
                 gen_args += 2;
-                args += 4;
-                break;
             } else {
                 reset_temp(args[0], nb_temps, nb_globals);
                 gen_args[0] = args[0];
@@ -600,9 +591,9 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                 gen_args[2] = args[2];
                 gen_args[3] = args[3];
                 gen_args += 4;
-                args += 4;
-                break;
             }
+            args += 4;
+            break;
         CASE_OP_32_64(brcond):
             if (temps[args[0]].state == TCG_TEMP_CONST
                 && temps[args[1]].state == TCG_TEMP_CONST) {
@@ -612,12 +603,9 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                     gen_opc_buf[op_index] = INDEX_op_br;
                     gen_args[0] = args[3];
                     gen_args += 1;
-                    args += 4;
                 } else {
                     gen_opc_buf[op_index] = INDEX_op_nop;
-                    args += 4;
                 }
-                break;
             } else {
                 memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));
                 reset_temp(args[0], nb_temps, nb_globals);
@@ -626,9 +614,9 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                 gen_args[2] = args[2];
                 gen_args[3] = args[3];
                 gen_args += 4;
-                args += 4;
-                break;
             }
+            args += 4;
+            break;
         case INDEX_op_call:
             nb_call_args = (args[0] >> 16) + (args[0] & 0xffff);
             if (!(args[nb_call_args + 1] & (TCG_CALL_CONST | TCG_CALL_PURE))) {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/9] Improve TCG optimizer
  2012-09-07 13:16 [Qemu-devel] [PATCH v2 0/9] Improve TCG optimizer Aurelien Jarno
                   ` (8 preceding siblings ...)
  2012-09-07 13:16 ` [Qemu-devel] [PATCH v2 9/9] tcg/optimize: fix if/else/break coding style Aurelien Jarno
@ 2012-09-08  8:18 ` Blue Swirl
  2012-09-08  9:01   ` Aurelien Jarno
  2012-09-10 13:55 ` Richard Henderson
  10 siblings, 1 reply; 17+ messages in thread
From: Blue Swirl @ 2012-09-08  8:18 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On Fri, Sep 7, 2012 at 1:16 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> This patch series improves the TCG optimizer, based on patterns found
> while executing various guest. The brcond ad setcond constant folding
> are useful especially useful when they are used to avoid some argument
> values (e.g. division by 0), and thus can be optimized when this argument
> is a constant.
>
> This bring around 0.5% improvement on openssl like benchmarks.
>
>
> Modifications between V1 and V2 following feedback I got:
>  - In the first patch, account for the liveness analysis time and
>    optimizing pass time separately
>  - Fixed swith/break in patch 7 to correctly throw an error
>  - Added patch 9 to make the code more readable
> Other patches are unmodified.
>
>
> Aurelien Jarno (9):
>   tcg: improve profiler
>   tcg/optimize: split expression simplification
>   tcg/optimize: simplify or/xor r, a, 0 cases
>   tcg/optimize: simplify and r, a, 0 cases
>   tcg/optimize: simplify shift/rot r, 0, a => movi r, 0 cases

Aren't the above or/and/shift/rot simplifications (and also for
example OR with 0xfffffffff and XOR register by itself) already
handled by tcg/tcg-op.h?

>   tcg/optimize: swap brcond/setcond arguments when possible
>   tcg/optimize: add constant folding for setcond
>   tcg/optimize: add constant folding for brcond
>   tcg/optimize: fix if/else/break coding style

Otherwise a very nice series.

>
>  tcg/optimize.c |  179 +++++++++++++++++++++++++++++++++++++++++++++++++++-----
>  tcg/tcg.c      |   12 +++-
>  tcg/tcg.h      |    1 +
>  3 files changed, 175 insertions(+), 17 deletions(-)
>
> --
> 1.7.10.4
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/9] Improve TCG optimizer
  2012-09-08  8:18 ` [Qemu-devel] [PATCH v2 0/9] Improve TCG optimizer Blue Swirl
@ 2012-09-08  9:01   ` Aurelien Jarno
  2012-09-08  9:06     ` Blue Swirl
  0 siblings, 1 reply; 17+ messages in thread
From: Aurelien Jarno @ 2012-09-08  9:01 UTC (permalink / raw)
  To: Blue Swirl; +Cc: qemu-devel

On Sat, Sep 08, 2012 at 08:18:50AM +0000, Blue Swirl wrote:
> On Fri, Sep 7, 2012 at 1:16 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> > This patch series improves the TCG optimizer, based on patterns found
> > while executing various guest. The brcond ad setcond constant folding
> > are useful especially useful when they are used to avoid some argument
> > values (e.g. division by 0), and thus can be optimized when this argument
> > is a constant.
> >
> > This bring around 0.5% improvement on openssl like benchmarks.
> >
> >
> > Modifications between V1 and V2 following feedback I got:
> >  - In the first patch, account for the liveness analysis time and
> >    optimizing pass time separately
> >  - Fixed swith/break in patch 7 to correctly throw an error
> >  - Added patch 9 to make the code more readable
> > Other patches are unmodified.
> >
> >
> > Aurelien Jarno (9):
> >   tcg: improve profiler
> >   tcg/optimize: split expression simplification
> >   tcg/optimize: simplify or/xor r, a, 0 cases
> >   tcg/optimize: simplify and r, a, 0 cases
> >   tcg/optimize: simplify shift/rot r, 0, a => movi r, 0 cases
> 
> Aren't the above or/and/shift/rot simplifications (and also for
> example OR with 0xfffffffff and XOR register by itself) already
> handled by tcg/tcg-op.h?

They are handled there when the values are known at decode time. It is
not the case when the value are propagated in the TB.

For example, this is optimized in tcg/tcg-op.h:
  ori t0, t1, 0 

This is not optimized in tcg/tcg-op.h:
  movi t2, 0
  or t0, t1, t2

> >   tcg/optimize: swap brcond/setcond arguments when possible
> >   tcg/optimize: add constant folding for setcond
> >   tcg/optimize: add constant folding for brcond
> >   tcg/optimize: fix if/else/break coding style
> 
> Otherwise a very nice series.
> 
> >
> >  tcg/optimize.c |  179 +++++++++++++++++++++++++++++++++++++++++++++++++++-----
> >  tcg/tcg.c      |   12 +++-
> >  tcg/tcg.h      |    1 +
> >  3 files changed, 175 insertions(+), 17 deletions(-)
> >
> > --
> > 1.7.10.4
> >
> >
> 
> 

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/9] Improve TCG optimizer
  2012-09-08  9:01   ` Aurelien Jarno
@ 2012-09-08  9:06     ` Blue Swirl
  2012-09-08  9:12       ` Aurelien Jarno
  0 siblings, 1 reply; 17+ messages in thread
From: Blue Swirl @ 2012-09-08  9:06 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On Sat, Sep 8, 2012 at 9:01 AM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> On Sat, Sep 08, 2012 at 08:18:50AM +0000, Blue Swirl wrote:
>> On Fri, Sep 7, 2012 at 1:16 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
>> > This patch series improves the TCG optimizer, based on patterns found
>> > while executing various guest. The brcond ad setcond constant folding
>> > are useful especially useful when they are used to avoid some argument
>> > values (e.g. division by 0), and thus can be optimized when this argument
>> > is a constant.
>> >
>> > This bring around 0.5% improvement on openssl like benchmarks.
>> >
>> >
>> > Modifications between V1 and V2 following feedback I got:
>> >  - In the first patch, account for the liveness analysis time and
>> >    optimizing pass time separately
>> >  - Fixed swith/break in patch 7 to correctly throw an error
>> >  - Added patch 9 to make the code more readable
>> > Other patches are unmodified.
>> >
>> >
>> > Aurelien Jarno (9):
>> >   tcg: improve profiler
>> >   tcg/optimize: split expression simplification
>> >   tcg/optimize: simplify or/xor r, a, 0 cases
>> >   tcg/optimize: simplify and r, a, 0 cases
>> >   tcg/optimize: simplify shift/rot r, 0, a => movi r, 0 cases
>>
>> Aren't the above or/and/shift/rot simplifications (and also for
>> example OR with 0xfffffffff and XOR register by itself) already
>> handled by tcg/tcg-op.h?
>
> They are handled there when the values are known at decode time. It is
> not the case when the value are propagated in the TB.
>
> For example, this is optimized in tcg/tcg-op.h:
>   ori t0, t1, 0
>
> This is not optimized in tcg/tcg-op.h:
>   movi t2, 0
>   or t0, t1, t2

I see. Does the optimizer pass then make the tcg/tcg-op.h optimization
redundant, could we do the optimizations only in optimizer?

>
>> >   tcg/optimize: swap brcond/setcond arguments when possible
>> >   tcg/optimize: add constant folding for setcond
>> >   tcg/optimize: add constant folding for brcond
>> >   tcg/optimize: fix if/else/break coding style
>>
>> Otherwise a very nice series.
>>
>> >
>> >  tcg/optimize.c |  179 +++++++++++++++++++++++++++++++++++++++++++++++++++-----
>> >  tcg/tcg.c      |   12 +++-
>> >  tcg/tcg.h      |    1 +
>> >  3 files changed, 175 insertions(+), 17 deletions(-)
>> >
>> > --
>> > 1.7.10.4
>> >
>> >
>>
>>
>
> --
> Aurelien Jarno                          GPG: 1024D/F1BCDB73
> aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/9] Improve TCG optimizer
  2012-09-08  9:06     ` Blue Swirl
@ 2012-09-08  9:12       ` Aurelien Jarno
  2012-09-08  9:29         ` Blue Swirl
  0 siblings, 1 reply; 17+ messages in thread
From: Aurelien Jarno @ 2012-09-08  9:12 UTC (permalink / raw)
  To: Blue Swirl; +Cc: qemu-devel

On Sat, Sep 08, 2012 at 09:06:52AM +0000, Blue Swirl wrote:
> On Sat, Sep 8, 2012 at 9:01 AM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> > On Sat, Sep 08, 2012 at 08:18:50AM +0000, Blue Swirl wrote:
> >> On Fri, Sep 7, 2012 at 1:16 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> >> > This patch series improves the TCG optimizer, based on patterns found
> >> > while executing various guest. The brcond ad setcond constant folding
> >> > are useful especially useful when they are used to avoid some argument
> >> > values (e.g. division by 0), and thus can be optimized when this argument
> >> > is a constant.
> >> >
> >> > This bring around 0.5% improvement on openssl like benchmarks.
> >> >
> >> >
> >> > Modifications between V1 and V2 following feedback I got:
> >> >  - In the first patch, account for the liveness analysis time and
> >> >    optimizing pass time separately
> >> >  - Fixed swith/break in patch 7 to correctly throw an error
> >> >  - Added patch 9 to make the code more readable
> >> > Other patches are unmodified.
> >> >
> >> >
> >> > Aurelien Jarno (9):
> >> >   tcg: improve profiler
> >> >   tcg/optimize: split expression simplification
> >> >   tcg/optimize: simplify or/xor r, a, 0 cases
> >> >   tcg/optimize: simplify and r, a, 0 cases
> >> >   tcg/optimize: simplify shift/rot r, 0, a => movi r, 0 cases
> >>
> >> Aren't the above or/and/shift/rot simplifications (and also for
> >> example OR with 0xfffffffff and XOR register by itself) already
> >> handled by tcg/tcg-op.h?
> >
> > They are handled there when the values are known at decode time. It is
> > not the case when the value are propagated in the TB.
> >
> > For example, this is optimized in tcg/tcg-op.h:
> >   ori t0, t1, 0
> >
> > This is not optimized in tcg/tcg-op.h:
> >   movi t2, 0
> >   or t0, t1, t2
> 
> I see. Does the optimizer pass then make the tcg/tcg-op.h optimization
> redundant, could we do the optimizations only in optimizer?

Technically yes. In practice it's a good idea to keep simple
optimizations in tcg/tcg-op.h, as they cost less in CPU time than when
done later.

On the other hand, we can remove such optimizations done in some
TCG backends as they won't see this kind of ops anymore.

> >
> >> >   tcg/optimize: swap brcond/setcond arguments when possible
> >> >   tcg/optimize: add constant folding for setcond
> >> >   tcg/optimize: add constant folding for brcond
> >> >   tcg/optimize: fix if/else/break coding style
> >>
> >> Otherwise a very nice series.
> >>
> >> >
> >> >  tcg/optimize.c |  179 +++++++++++++++++++++++++++++++++++++++++++++++++++-----
> >> >  tcg/tcg.c      |   12 +++-
> >> >  tcg/tcg.h      |    1 +
> >> >  3 files changed, 175 insertions(+), 17 deletions(-)
> >> >
> >> > --
> >> > 1.7.10.4
> >> >
> >> >
> >>
> >>
> >
> > --
> > Aurelien Jarno                          GPG: 1024D/F1BCDB73
> > aurelien@aurel32.net                 http://www.aurel32.net
> 

-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/9] Improve TCG optimizer
  2012-09-08  9:12       ` Aurelien Jarno
@ 2012-09-08  9:29         ` Blue Swirl
  2012-09-08  9:35           ` Aurelien Jarno
  0 siblings, 1 reply; 17+ messages in thread
From: Blue Swirl @ 2012-09-08  9:29 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On Sat, Sep 8, 2012 at 9:12 AM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> On Sat, Sep 08, 2012 at 09:06:52AM +0000, Blue Swirl wrote:
>> On Sat, Sep 8, 2012 at 9:01 AM, Aurelien Jarno <aurelien@aurel32.net> wrote:
>> > On Sat, Sep 08, 2012 at 08:18:50AM +0000, Blue Swirl wrote:
>> >> On Fri, Sep 7, 2012 at 1:16 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
>> >> > This patch series improves the TCG optimizer, based on patterns found
>> >> > while executing various guest. The brcond ad setcond constant folding
>> >> > are useful especially useful when they are used to avoid some argument
>> >> > values (e.g. division by 0), and thus can be optimized when this argument
>> >> > is a constant.
>> >> >
>> >> > This bring around 0.5% improvement on openssl like benchmarks.
>> >> >
>> >> >
>> >> > Modifications between V1 and V2 following feedback I got:
>> >> >  - In the first patch, account for the liveness analysis time and
>> >> >    optimizing pass time separately
>> >> >  - Fixed swith/break in patch 7 to correctly throw an error
>> >> >  - Added patch 9 to make the code more readable
>> >> > Other patches are unmodified.
>> >> >
>> >> >
>> >> > Aurelien Jarno (9):
>> >> >   tcg: improve profiler
>> >> >   tcg/optimize: split expression simplification
>> >> >   tcg/optimize: simplify or/xor r, a, 0 cases
>> >> >   tcg/optimize: simplify and r, a, 0 cases
>> >> >   tcg/optimize: simplify shift/rot r, 0, a => movi r, 0 cases
>> >>
>> >> Aren't the above or/and/shift/rot simplifications (and also for
>> >> example OR with 0xfffffffff and XOR register by itself) already
>> >> handled by tcg/tcg-op.h?
>> >
>> > They are handled there when the values are known at decode time. It is
>> > not the case when the value are propagated in the TB.
>> >
>> > For example, this is optimized in tcg/tcg-op.h:
>> >   ori t0, t1, 0
>> >
>> > This is not optimized in tcg/tcg-op.h:
>> >   movi t2, 0
>> >   or t0, t1, t2
>>
>> I see. Does the optimizer pass then make the tcg/tcg-op.h optimization
>> redundant, could we do the optimizations only in optimizer?
>
> Technically yes. In practice it's a good idea to keep simple
> optimizations in tcg/tcg-op.h, as they cost less in CPU time than when
> done later.

OK. Could there be further optimizations based on tcg/tcg-op.h, for
example case OR reg, 0xffffffff -> mov reg, 0xffffffff could be
rechecked?

>
> On the other hand, we can remove such optimizations done in some
> TCG backends as they won't see this kind of ops anymore.
>
>> >
>> >> >   tcg/optimize: swap brcond/setcond arguments when possible
>> >> >   tcg/optimize: add constant folding for setcond
>> >> >   tcg/optimize: add constant folding for brcond
>> >> >   tcg/optimize: fix if/else/break coding style
>> >>
>> >> Otherwise a very nice series.
>> >>
>> >> >
>> >> >  tcg/optimize.c |  179 +++++++++++++++++++++++++++++++++++++++++++++++++++-----
>> >> >  tcg/tcg.c      |   12 +++-
>> >> >  tcg/tcg.h      |    1 +
>> >> >  3 files changed, 175 insertions(+), 17 deletions(-)
>> >> >
>> >> > --
>> >> > 1.7.10.4
>> >> >
>> >> >
>> >>
>> >>
>> >
>> > --
>> > Aurelien Jarno                          GPG: 1024D/F1BCDB73
>> > aurelien@aurel32.net                 http://www.aurel32.net
>>
>
> --
> Aurelien Jarno                          GPG: 1024D/F1BCDB73
> aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/9] Improve TCG optimizer
  2012-09-08  9:29         ` Blue Swirl
@ 2012-09-08  9:35           ` Aurelien Jarno
  0 siblings, 0 replies; 17+ messages in thread
From: Aurelien Jarno @ 2012-09-08  9:35 UTC (permalink / raw)
  To: Blue Swirl; +Cc: qemu-devel

On Sat, Sep 08, 2012 at 09:29:59AM +0000, Blue Swirl wrote:
> On Sat, Sep 8, 2012 at 9:12 AM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> > On Sat, Sep 08, 2012 at 09:06:52AM +0000, Blue Swirl wrote:
> >> On Sat, Sep 8, 2012 at 9:01 AM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> >> > On Sat, Sep 08, 2012 at 08:18:50AM +0000, Blue Swirl wrote:
> >> >> On Fri, Sep 7, 2012 at 1:16 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> >> >> > This patch series improves the TCG optimizer, based on patterns found
> >> >> > while executing various guest. The brcond ad setcond constant folding
> >> >> > are useful especially useful when they are used to avoid some argument
> >> >> > values (e.g. division by 0), and thus can be optimized when this argument
> >> >> > is a constant.
> >> >> >
> >> >> > This bring around 0.5% improvement on openssl like benchmarks.
> >> >> >
> >> >> >
> >> >> > Modifications between V1 and V2 following feedback I got:
> >> >> >  - In the first patch, account for the liveness analysis time and
> >> >> >    optimizing pass time separately
> >> >> >  - Fixed swith/break in patch 7 to correctly throw an error
> >> >> >  - Added patch 9 to make the code more readable
> >> >> > Other patches are unmodified.
> >> >> >
> >> >> >
> >> >> > Aurelien Jarno (9):
> >> >> >   tcg: improve profiler
> >> >> >   tcg/optimize: split expression simplification
> >> >> >   tcg/optimize: simplify or/xor r, a, 0 cases
> >> >> >   tcg/optimize: simplify and r, a, 0 cases
> >> >> >   tcg/optimize: simplify shift/rot r, 0, a => movi r, 0 cases
> >> >>
> >> >> Aren't the above or/and/shift/rot simplifications (and also for
> >> >> example OR with 0xfffffffff and XOR register by itself) already
> >> >> handled by tcg/tcg-op.h?
> >> >
> >> > They are handled there when the values are known at decode time. It is
> >> > not the case when the value are propagated in the TB.
> >> >
> >> > For example, this is optimized in tcg/tcg-op.h:
> >> >   ori t0, t1, 0
> >> >
> >> > This is not optimized in tcg/tcg-op.h:
> >> >   movi t2, 0
> >> >   or t0, t1, t2
> >>
> >> I see. Does the optimizer pass then make the tcg/tcg-op.h optimization
> >> redundant, could we do the optimizations only in optimizer?
> >
> > Technically yes. In practice it's a good idea to keep simple
> > optimizations in tcg/tcg-op.h, as they cost less in CPU time than when
> > done later.
> 
> OK. Could there be further optimizations based on tcg/tcg-op.h, for
> example case OR reg, 0xffffffff -> mov reg, 0xffffffff could be
> rechecked?
> 

Yes this is something we can add. That said I based this patch series on
instructions I have found while running a few targets (arm, ppc, mips,
x86_64) and looking at qemu.log. I haven't seen this one.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/9] Improve TCG optimizer
  2012-09-07 13:16 [Qemu-devel] [PATCH v2 0/9] Improve TCG optimizer Aurelien Jarno
                   ` (9 preceding siblings ...)
  2012-09-08  8:18 ` [Qemu-devel] [PATCH v2 0/9] Improve TCG optimizer Blue Swirl
@ 2012-09-10 13:55 ` Richard Henderson
  10 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2012-09-10 13:55 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On Fri, 2012-09-07 at 15:16 +0200, Aurelien Jarno wrote:
> Aurelien Jarno (9):
>   tcg: improve profiler
>   tcg/optimize: split expression simplification
>   tcg/optimize: simplify or/xor r, a, 0 cases
>   tcg/optimize: simplify and r, a, 0 cases
>   tcg/optimize: simplify shift/rot r, 0, a => movi r, 0 cases
>   tcg/optimize: swap brcond/setcond arguments when possible
>   tcg/optimize: add constant folding for setcond
>   tcg/optimize: add constant folding for brcond
>   tcg/optimize: fix if/else/break coding style

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2012-09-10 13:55 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-07 13:16 [Qemu-devel] [PATCH v2 0/9] Improve TCG optimizer Aurelien Jarno
2012-09-07 13:16 ` [Qemu-devel] [PATCH v2 1/9] tcg: improve profiler Aurelien Jarno
2012-09-07 13:16 ` [Qemu-devel] [PATCH v2 2/9] tcg/optimize: split expression simplification Aurelien Jarno
2012-09-07 13:16 ` [Qemu-devel] [PATCH v2 3/9] tcg/optimize: simplify or/xor r, a, 0 cases Aurelien Jarno
2012-09-07 13:16 ` [Qemu-devel] [PATCH v2 4/9] tcg/optimize: simplify and " Aurelien Jarno
2012-09-07 13:16 ` [Qemu-devel] [PATCH v2 5/9] tcg/optimize: simplify shift/rot r, 0, a => movi r, " Aurelien Jarno
2012-09-07 13:16 ` [Qemu-devel] [PATCH v2 6/9] tcg/optimize: swap brcond/setcond arguments when possible Aurelien Jarno
2012-09-07 13:16 ` [Qemu-devel] [PATCH v2 7/9] tcg/optimize: add constant folding for setcond Aurelien Jarno
2012-09-07 13:16 ` [Qemu-devel] [PATCH v2 8/9] tcg/optimize: add constant folding for brcond Aurelien Jarno
2012-09-07 13:16 ` [Qemu-devel] [PATCH v2 9/9] tcg/optimize: fix if/else/break coding style Aurelien Jarno
2012-09-08  8:18 ` [Qemu-devel] [PATCH v2 0/9] Improve TCG optimizer Blue Swirl
2012-09-08  9:01   ` Aurelien Jarno
2012-09-08  9:06     ` Blue Swirl
2012-09-08  9:12       ` Aurelien Jarno
2012-09-08  9:29         ` Blue Swirl
2012-09-08  9:35           ` Aurelien Jarno
2012-09-10 13:55 ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).