qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG
@ 2011-07-07 12:37 Kirill Batuzov
  2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 1/6] Add TCG optimizations stub Kirill Batuzov
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Kirill Batuzov @ 2011-07-07 12:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: zhur

This series implements some basic machine-independent optimizations.  They
simplify code and allow liveness analysis do it's work better.

Suppose we have following ARM code:

 movw    r12, #0xb6db
 movt    r12, #0xdb6d

In TCG before optimizations we'll have:

 movi_i32 tmp8,$0xb6db
 mov_i32 r12,tmp8
 mov_i32 tmp8,r12
 ext16u_i32 tmp8,tmp8
 movi_i32 tmp9,$0xdb6d0000
 or_i32 tmp8,tmp8,tmp9
 mov_i32 r12,tmp8

And after optimizations we'll have this:

 movi_i32 r12,$0xdb6db6db

Here are performance evaluation results on SPEC CPU2000 integer tests in
user-mode emulation on x86_64 host.  There were 5 runs of each test on
reference data set.  The tables below show runtime in seconds for all these
runs.

ARM guest without optimizations:
Test name       #1       #2       #3       #4       #5    Median
164.gzip    1408.891 1402.323 1407.623 1404.955 1405.396 1405.396
175.vpr     1245.31  1248.758 1247.936 1248.534 1247.534 1247.936
176.gcc      912.561  809.481  847.057 912.636   912.544  912.544
181.mcf      198.384  197.841  199.127 197.976   197.29   197.976
186.crafty  1545.881 1546.051 1546.002 1545.927 1545.945 1545.945
197.parser  3779.954 3779.878 3779.79  3779.94  3779.88  3779.88
252.eon     2563.168 2776.152 2776.395 2776.577 2776.202 2776.202
253.perlbmk 2591.781 2504.078 2507.07  2591.337 2463.401 2507.07
256.bzip2   1306.197 1304.639 1184.853 1305.141 1305.606 1305.141
300.twolf   2918.984 2918.926 2918.93  2918.97  2918.914 2918.93

ARM guest with optimizations:
Test name       #1       #2       #3       #4       #5    Median    Gain
164.gzip    1401.198 1376.337 1401.117 1401.23  1401.246 1401.198   0.30%
175.vpr     1247.964 1151.468 1247.76  1154.419 1242.017 1242.017   0.47%
176.gcc      896.882  918.546  918.297  851.465  918.39   918.297  -0.63%
181.mcf      198.19   197.399  198.421  198.663  198.312  198.312  -0.17%
186.crafty  1520.425 1520.362 1520.477 1520.445 1520.957 1520.445   1.65%
197.parser  3770.943 3770.927 3770.578 3771.048 3770.904 3770.927   0.24%
252.eon     2752.371 2752.111 2752.005 2752.214 2752.109 2752.111   0.87%
253.perlbmk 2577.462 2578.588 2493.567 2578.571 2578.318 2578.318  -2.84%
256.bzip2   1296.198 1271.128 1296.044 1296.321 1296.147 1296.147   0.69%
300.twolf   2888.984 2889.023 2889.225 2889.039 2889.05  2889.039   1.02%


x86_64 guest without optimizations:
Test name       #1       #2       #3       #4       #5    Median
164.gzip     857.654  857.646  857.678  798.119  857.675  857.654
175.vpr      959.265  959.207  959.185  959.461  959.332  959.265
176.gcc      625.722  637.257  646.638  646.614  646.56   646.56
181.mcf      221.666  220.194  220.079  219.868  221.5    220.194
186.crafty  1129.531 1129.739 1129.573 1129.588 1129.624 1129.588
197.parser  1809.517 1809.516 1809.386 1809.477 1809.427 1809.477
253.perlbmk 1774.944 1776.046 1769.865 1774.052 1775.236 1774.944
254.gap     1061.033 1061.158 1061.064 1061.047 1061.01  1061.047
255.vortex  1871.261 1914.144 1914.057 1914.086 1914.127 1914.086
256.bzip2    918.916 1011.828 1011.819 1012.11  1011.932 1011.828
300.twolf   1332.797 1330.56  1330.687 1330.917 1330.602 1330.687 

x86_64 guest with optimizations:
Test name       #1       #2       #3       #4       #5    Median    Gain
164.gzip     806.198  854.159  854.184  854.168  854.187  854.168   0.41%
175.vpr      955.905  950.86   955.876  876.397  955.957  955.876   1.82%
176.gcc      641.663  640.189  641.57   641.552  641.514  641.552   0.03%
181.mcf      217.619  218.627  218.699  217.977  216.955  217.977   1.18%
186.crafty  1123.909 1123.852 1123.917 1123.781 1123.805 1123.852   0.51%
197.parser  1813.94  1814.643 1815.286 1814.445 1813.72  1814.445  -0.27%
253.perlbmk 1791.536 1795.642 1793.0   1797.486 1791.401 1793.0    -1.02%
254.gap     1070.605 1070.216 1070.637 1070.168 1070.491 1070.491  -0.89%
255.vortex  1918.764 1918.573 1917.411 1918.287 1918.735 1918.573  -0.23%
256.bzip2   1017.179 1017.083 1017.283 1016.913 1017.189 1017.179  -0.53%
300.twolf   1321.072 1321.109 1321.019 1321.072 1321.004 1321.072   0.72%

ARM guests for 254.gap and 255.vortex and x86_64 guest for 252.eon does not
work under QEMU for some unrelated reason.

Changes:
v1 -> v2
 - State and Vals arrays merged to an array of structures.
 - Added reference counting of temp's copies. This helps to reset temp's state
   faster in most cases.
 - Do not make copy propagation through operations with TCG_OPF_CALL_CLOBBER or
   TCG_OPF_SIDE_EFFECTS flag.
 - Split some expression simplifications into independent switch.
 - Let compiler handle signed shifts and sign/zero extends in it's
   implementation defined way.

v2 -> v3
 - Elements of equiv class are placed in a double-linked circular list so it's
   easier to choose a new representative.
 - CASE_OP_32_64 macro is used to reduce amount of ifdefdsi. Checkpatch is not
   happy about this change but I do not think spaces would be appropriate here.
 - Some constraints during copy propagation are relaxed.
 - Functions tcg_opt_gen_mov and tcg_opt_gen_movi are introduced to reduce code
   duplication.

Kirill Batuzov (6):
  Add TCG optimizations stub
  Add copy and constant propagation.
  Do constant folding for basic arithmetic operations.
  Do constant folding for boolean operations.
  Do constant folding for shift operations.
  Do constant folding for unary operations.

 Makefile.target |    2 +-
 tcg/optimize.c  |  568 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tcg/tcg.c       |    6 +
 tcg/tcg.h       |    3 +
 4 files changed, 578 insertions(+), 1 deletions(-)
 create mode 100644 tcg/optimize.c

-- 
1.7.4.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v3 1/6] Add TCG optimizations stub
  2011-07-07 12:37 [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG Kirill Batuzov
@ 2011-07-07 12:37 ` Kirill Batuzov
  2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 2/6] Add copy and constant propagation Kirill Batuzov
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Kirill Batuzov @ 2011-07-07 12:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: zhur

Added file tcg/optimize.c to hold TCG optimizations. Function tcg_optimize
is called from tcg_gen_code_common. It calls other functions performing
specific optimizations. Stub for constant folding was added.

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
 Makefile.target |    2 +-
 tcg/optimize.c  |   97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tcg/tcg.c       |    6 +++
 tcg/tcg.h       |    3 ++
 4 files changed, 107 insertions(+), 1 deletions(-)
 create mode 100644 tcg/optimize.c

diff --git a/Makefile.target b/Makefile.target
index 2e281a4..0b045ce 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -70,7 +70,7 @@ all: $(PROGS) stap
 #########################################################
 # cpu emulator library
 libobj-y = exec.o translate-all.o cpu-exec.o translate.o
-libobj-y += tcg/tcg.o
+libobj-y += tcg/tcg.o tcg/optimize.o
 libobj-$(CONFIG_SOFTFLOAT) += fpu/softfloat.o
 libobj-$(CONFIG_NOSOFTFLOAT) += fpu/softfloat-native.o
 libobj-y += op_helper.o helper.o
diff --git a/tcg/optimize.c b/tcg/optimize.c
new file mode 100644
index 0000000..c7c7da9
--- /dev/null
+++ b/tcg/optimize.c
@@ -0,0 +1,97 @@
+/*
+ * Optimizations for Tiny Code Generator for QEMU
+ *
+ * Copyright (c) 2010 Samsung Electronics.
+ * Contributed by Kirill Batuzov <batuzovk@ispras.ru>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "config.h"
+
+#include <stdlib.h>
+#include <stdio.h>
+
+#include "qemu-common.h"
+#include "tcg-op.h"
+
+#if TCG_TARGET_REG_BITS == 64
+#define CASE_OP_32_64(x)                        \
+        glue(glue(case INDEX_op_, x), _i32):    \
+        glue(glue(case INDEX_op_, x), _i64)
+#else
+#define CASE_OP_32_64(x)                        \
+        glue(glue(case INDEX_op_, x), _i32)
+#endif
+
+static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
+                                    TCGArg *args, TCGOpDef *tcg_op_defs)
+{
+    int i, nb_ops, op_index, op, nb_temps, nb_globals;
+    const TCGOpDef *def;
+    TCGArg *gen_args;
+
+    nb_temps = s->nb_temps;
+    nb_globals = s->nb_globals;
+
+    nb_ops = tcg_opc_ptr - gen_opc_buf;
+    gen_args = args;
+    for (op_index = 0; op_index < nb_ops; op_index++) {
+        op = gen_opc_buf[op_index];
+        def = &tcg_op_defs[op];
+        switch (op) {
+        case INDEX_op_call:
+            i = (args[0] >> 16) + (args[0] & 0xffff) + 3;
+            while (i) {
+                *gen_args = *args;
+                args++;
+                gen_args++;
+                i--;
+            }
+            break;
+        case INDEX_op_set_label:
+        case INDEX_op_jmp:
+        case INDEX_op_br:
+        CASE_OP_32_64(brcond):
+            for (i = 0; i < def->nb_args; i++) {
+                *gen_args = *args;
+                args++;
+                gen_args++;
+            }
+            break;
+        default:
+            for (i = 0; i < def->nb_args; i++) {
+                gen_args[i] = args[i];
+            }
+            args += def->nb_args;
+            gen_args += def->nb_args;
+            break;
+        }
+    }
+
+    return gen_args;
+}
+
+TCGArg *tcg_optimize(TCGContext *s, uint16_t *tcg_opc_ptr,
+        TCGArg *args, TCGOpDef *tcg_op_defs)
+{
+    TCGArg *res;
+    res = tcg_constant_folding(s, tcg_opc_ptr, args, tcg_op_defs);
+    return res;
+}
diff --git a/tcg/tcg.c b/tcg/tcg.c
index fad92f9..6309dce 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -24,6 +24,7 @@
 
 /* define it to use liveness analysis (better code) */
 #define USE_LIVENESS_ANALYSIS
+#define USE_TCG_OPTIMIZATIONS
 
 #include "config.h"
 
@@ -2033,6 +2034,11 @@ static inline int tcg_gen_code_common(TCGContext *s, uint8_t *gen_code_buf,
     }
 #endif
 
+#ifdef USE_TCG_OPTIMIZATIONS
+    gen_opparam_ptr =
+        tcg_optimize(s, gen_opc_ptr, gen_opparam_buf, tcg_op_defs);
+#endif
+
 #ifdef CONFIG_PROFILER
     s->la_time -= profile_getclock();
 #endif
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 2b985ac..91a3cda 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -486,6 +486,9 @@ void tcg_gen_callN(TCGContext *s, TCGv_ptr func, unsigned int flags,
 void tcg_gen_shifti_i64(TCGv_i64 ret, TCGv_i64 arg1,
                         int c, int right, int arith);
 
+TCGArg *tcg_optimize(TCGContext *s, uint16_t *tcg_opc_ptr, TCGArg *args,
+                     TCGOpDef *tcg_op_def);
+
 /* only used for debugging purposes */
 void tcg_register_helper(void *func, const char *name);
 const char *tcg_helper_get_name(TCGContext *s, void *func);
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v3 2/6] Add copy and constant propagation.
  2011-07-07 12:37 [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG Kirill Batuzov
  2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 1/6] Add TCG optimizations stub Kirill Batuzov
@ 2011-07-07 12:37 ` Kirill Batuzov
  2011-08-03 19:00   ` Stefan Weil
  2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 3/6] Do constant folding for basic arithmetic operations Kirill Batuzov
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Kirill Batuzov @ 2011-07-07 12:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: zhur

Make tcg_constant_folding do copy and constant propagation. It is a
preparational work before actual constant folding.

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
 tcg/optimize.c |  182 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 180 insertions(+), 2 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index c7c7da9..f8afe71 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -40,24 +40,196 @@
         glue(glue(case INDEX_op_, x), _i32)
 #endif
 
+typedef enum {
+    TCG_TEMP_UNDEF = 0,
+    TCG_TEMP_CONST,
+    TCG_TEMP_COPY,
+    TCG_TEMP_HAS_COPY,
+    TCG_TEMP_ANY
+} tcg_temp_state;
+
+struct tcg_temp_info {
+    tcg_temp_state state;
+    uint16_t prev_copy;
+    uint16_t next_copy;
+    tcg_target_ulong val;
+};
+
+static struct tcg_temp_info temps[TCG_MAX_TEMPS];
+
+/* Reset TEMP's state to TCG_TEMP_ANY.  If TEMP was a representative of some
+   class of equivalent temp's, a new representative should be chosen in this
+   class. */
+static void reset_temp(TCGArg temp, int nb_temps, int nb_globals)
+{
+    int i;
+    TCGArg new_base = (TCGArg)-1;
+    if (temps[temp].state == TCG_TEMP_HAS_COPY) {
+        for (i = temps[temp].next_copy; i != temp; i = temps[i].next_copy) {
+            if (i >= nb_globals) {
+                temps[i].state = TCG_TEMP_HAS_COPY;
+                new_base = i;
+                break;
+            }
+        }
+        for (i = temps[temp].next_copy; i != temp; i = temps[i].next_copy) {
+            if (new_base == (TCGArg)-1) {
+                temps[i].state = TCG_TEMP_ANY;
+            } else {
+                temps[i].val = new_base;
+            }
+        }
+        temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
+        temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
+    } else if (temps[temp].state == TCG_TEMP_COPY) {
+        temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
+        temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
+        new_base = temps[temp].val;
+    }
+    temps[temp].state = TCG_TEMP_ANY;
+    if (new_base != (TCGArg)-1 && temps[new_base].next_copy == new_base) {
+        temps[new_base].state = TCG_TEMP_ANY;
+    }
+}
+
+static int op_bits(int op)
+{
+    switch (op) {
+    case INDEX_op_mov_i32:
+        return 32;
+#if TCG_TARGET_REG_BITS == 64
+    case INDEX_op_mov_i64:
+        return 64;
+#endif
+    default:
+        fprintf(stderr, "Unrecognized operation %d in op_bits.\n", op);
+        tcg_abort();
+    }
+}
+
+static int op_to_movi(int op)
+{
+    switch (op_bits(op)) {
+    case 32:
+        return INDEX_op_movi_i32;
+#if TCG_TARGET_REG_BITS == 64
+    case 64:
+        return INDEX_op_movi_i64;
+#endif
+    default:
+        fprintf(stderr, "op_to_movi: unexpected return value of "
+                "function op_bits.\n");
+        tcg_abort();
+    }
+}
+
+static void tcg_opt_gen_mov(TCGArg *gen_args, TCGArg dst, TCGArg src,
+                            int nb_temps, int nb_globals)
+{
+        reset_temp(dst, nb_temps, nb_globals);
+        assert(temps[src].state != TCG_TEMP_COPY);
+        if (src >= nb_globals) {
+            assert(temps[src].state != TCG_TEMP_CONST);
+            if (temps[src].state != TCG_TEMP_HAS_COPY) {
+                temps[src].state = TCG_TEMP_HAS_COPY;
+                temps[src].next_copy = src;
+                temps[src].prev_copy = src;
+            }
+            temps[dst].state = TCG_TEMP_COPY;
+            temps[dst].val = src;
+            temps[dst].next_copy = temps[src].next_copy;
+            temps[dst].prev_copy = src;
+            temps[temps[dst].next_copy].prev_copy = dst;
+            temps[src].next_copy = dst;
+        }
+        gen_args[0] = dst;
+        gen_args[1] = src;
+}
+
+static void tcg_opt_gen_movi(TCGArg *gen_args, TCGArg dst, TCGArg val,
+                             int nb_temps, int nb_globals)
+{
+        reset_temp(dst, nb_temps, nb_globals);
+        temps[dst].state = TCG_TEMP_CONST;
+        temps[dst].val = val;
+        gen_args[0] = dst;
+        gen_args[1] = val;
+}
+
+/* Propagate constants and copies, fold constant expressions. */
 static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                                     TCGArg *args, TCGOpDef *tcg_op_defs)
 {
-    int i, nb_ops, op_index, op, nb_temps, nb_globals;
+    int i, nb_ops, op_index, op, nb_temps, nb_globals, nb_call_args;
     const TCGOpDef *def;
     TCGArg *gen_args;
+    /* Array VALS has an element for each temp.
+       If this temp holds a constant then its value is kept in VALS' element.
+       If this temp is a copy of other ones then this equivalence class'
+       representative is kept in VALS' element.
+       If this temp is neither copy nor constant then corresponding VALS'
+       element is unused. */
 
     nb_temps = s->nb_temps;
     nb_globals = s->nb_globals;
+    memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));
 
     nb_ops = tcg_opc_ptr - gen_opc_buf;
     gen_args = args;
     for (op_index = 0; op_index < nb_ops; op_index++) {
         op = gen_opc_buf[op_index];
         def = &tcg_op_defs[op];
+        /* Do copy propagation */
+        if (!(def->flags & (TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS))) {
+            assert(op != INDEX_op_call);
+            for (i = def->nb_oargs; i < def->nb_oargs + def->nb_iargs; i++) {
+                if (temps[args[i]].state == TCG_TEMP_COPY) {
+                    args[i] = temps[args[i]].val;
+                }
+            }
+        }
+
+        /* Propagate constants through copy operations and do constant
+           folding.  Constants will be substituted to arguments by register
+           allocator where needed and possible.  Also detect copies. */
         switch (op) {
+        CASE_OP_32_64(mov):
+            if ((temps[args[1]].state == TCG_TEMP_COPY
+                && temps[args[1]].val == args[0])
+                || args[0] == args[1]) {
+                args += 2;
+                gen_opc_buf[op_index] = INDEX_op_nop;
+                break;
+            }
+            if (temps[args[1]].state != TCG_TEMP_CONST) {
+                tcg_opt_gen_mov(gen_args, args[0], args[1],
+                                nb_temps, nb_globals);
+                gen_args += 2;
+                args += 2;
+                break;
+            }
+            /* Source argument is constant.  Rewrite the operation and
+               let movi case handle it. */
+            op = op_to_movi(op);
+            gen_opc_buf[op_index] = op;
+            args[1] = temps[args[1]].val;
+            /* fallthrough */
+        CASE_OP_32_64(movi):
+            tcg_opt_gen_movi(gen_args, args[0], args[1], nb_temps, nb_globals);
+            gen_args += 2;
+            args += 2;
+            break;
         case INDEX_op_call:
-            i = (args[0] >> 16) + (args[0] & 0xffff) + 3;
+            nb_call_args = (args[0] >> 16) + (args[0] & 0xffff);
+            if (!(args[nb_call_args + 1] & (TCG_CALL_CONST | TCG_CALL_PURE))) {
+                for (i = 0; i < nb_globals; i++) {
+                    reset_temp(i, nb_temps, nb_globals);
+                }
+            }
+            for (i = 0; i < (args[0] >> 16); i++) {
+                reset_temp(args[i + 1], nb_temps, nb_globals);
+            }
+            i = nb_call_args + 3;
             while (i) {
                 *gen_args = *args;
                 args++;
@@ -69,6 +241,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
         case INDEX_op_jmp:
         case INDEX_op_br:
         CASE_OP_32_64(brcond):
+            memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));
             for (i = 0; i < def->nb_args; i++) {
                 *gen_args = *args;
                 args++;
@@ -76,6 +249,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             }
             break;
         default:
+            /* Default case: we do know nothing about operation so no
+               propagation is done.  We only trash output args.  */
+            for (i = 0; i < def->nb_oargs; i++) {
+                reset_temp(args[i], nb_temps, nb_globals);
+            }
             for (i = 0; i < def->nb_args; i++) {
                 gen_args[i] = args[i];
             }
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v3 3/6] Do constant folding for basic arithmetic operations.
  2011-07-07 12:37 [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG Kirill Batuzov
  2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 1/6] Add TCG optimizations stub Kirill Batuzov
  2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 2/6] Add copy and constant propagation Kirill Batuzov
@ 2011-07-07 12:37 ` Kirill Batuzov
  2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 4/6] Do constant folding for boolean operations Kirill Batuzov
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Kirill Batuzov @ 2011-07-07 12:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: zhur

Perform actual constant folding for ADD, SUB and MUL operations.

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
 tcg/optimize.c |  125 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 125 insertions(+), 0 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index f8afe71..42a1bda 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -96,9 +96,15 @@ static int op_bits(int op)
 {
     switch (op) {
     case INDEX_op_mov_i32:
+    case INDEX_op_add_i32:
+    case INDEX_op_sub_i32:
+    case INDEX_op_mul_i32:
         return 32;
 #if TCG_TARGET_REG_BITS == 64
     case INDEX_op_mov_i64:
+    case INDEX_op_add_i64:
+    case INDEX_op_sub_i64:
+    case INDEX_op_mul_i64:
         return 64;
 #endif
     default:
@@ -156,6 +162,52 @@ static void tcg_opt_gen_movi(TCGArg *gen_args, TCGArg dst, TCGArg val,
         gen_args[1] = val;
 }
 
+static int op_to_mov(int op)
+{
+    switch (op_bits(op)) {
+    case 32:
+        return INDEX_op_mov_i32;
+#if TCG_TARGET_REG_BITS == 64
+    case 64:
+        return INDEX_op_mov_i64;
+#endif
+    default:
+        fprintf(stderr, "op_to_mov: unexpected return value of "
+                "function op_bits.\n");
+        tcg_abort();
+    }
+}
+
+static TCGArg do_constant_folding_2(int op, TCGArg x, TCGArg y)
+{
+    switch (op) {
+    CASE_OP_32_64(add):
+        return x + y;
+
+    CASE_OP_32_64(sub):
+        return x - y;
+
+    CASE_OP_32_64(mul):
+        return x * y;
+
+    default:
+        fprintf(stderr,
+                "Unrecognized operation %d in do_constant_folding.\n", op);
+        tcg_abort();
+    }
+}
+
+static TCGArg do_constant_folding(int op, TCGArg x, TCGArg y)
+{
+    TCGArg res = do_constant_folding_2(op, x, y);
+#if TCG_TARGET_REG_BITS == 64
+    if (op_bits(op) == 32) {
+        res &= 0xffffffff;
+    }
+#endif
+    return res;
+}
+
 /* Propagate constants and copies, fold constant expressions. */
 static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                                     TCGArg *args, TCGOpDef *tcg_op_defs)
@@ -163,6 +215,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
     int i, nb_ops, op_index, op, nb_temps, nb_globals, nb_call_args;
     const TCGOpDef *def;
     TCGArg *gen_args;
+    TCGArg tmp;
     /* Array VALS has an element for each temp.
        If this temp holds a constant then its value is kept in VALS' element.
        If this temp is a copy of other ones then this equivalence class'
@@ -189,6 +242,57 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             }
         }
 
+        /* For commutative operations make constant second argument */
+        switch (op) {
+        CASE_OP_32_64(add):
+        CASE_OP_32_64(mul):
+            if (temps[args[1]].state == TCG_TEMP_CONST) {
+                tmp = args[1];
+                args[1] = args[2];
+                args[2] = tmp;
+            }
+            break;
+        default:
+            break;
+        }
+
+        /* Simplify expression if possible. */
+        switch (op) {
+        CASE_OP_32_64(add):
+        CASE_OP_32_64(sub):
+            if (temps[args[1]].state == TCG_TEMP_CONST) {
+                /* Proceed with possible constant folding. */
+                break;
+            }
+            if (temps[args[2]].state == TCG_TEMP_CONST
+                && temps[args[2]].val == 0) {
+                if ((temps[args[0]].state == TCG_TEMP_COPY
+                    && temps[args[0]].val == args[1])
+                    || args[0] == args[1]) {
+                    args += 3;
+                    gen_opc_buf[op_index] = INDEX_op_nop;
+                } else {
+                    gen_opc_buf[op_index] = op_to_mov(op);
+                    tcg_opt_gen_mov(gen_args, args[0], args[1],
+                                    nb_temps, nb_globals);
+                    gen_args += 2;
+                    args += 3;
+                }
+                continue;
+            }
+            break;
+        CASE_OP_32_64(mul):
+            if ((temps[args[2]].state == TCG_TEMP_CONST
+                && temps[args[2]].val == 0)) {
+                gen_opc_buf[op_index] = op_to_movi(op);
+                tcg_opt_gen_movi(gen_args, args[0], 0, nb_temps, nb_globals);
+                args += 3;
+                gen_args += 2;
+                continue;
+            }
+            break;
+        }
+
         /* Propagate constants through copy operations and do constant
            folding.  Constants will be substituted to arguments by register
            allocator where needed and possible.  Also detect copies. */
@@ -219,6 +323,27 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             gen_args += 2;
             args += 2;
             break;
+        CASE_OP_32_64(add):
+        CASE_OP_32_64(sub):
+        CASE_OP_32_64(mul):
+            if (temps[args[1]].state == TCG_TEMP_CONST
+                && temps[args[2]].state == TCG_TEMP_CONST) {
+                gen_opc_buf[op_index] = op_to_movi(op);
+                tmp = do_constant_folding(op, temps[args[1]].val,
+                                          temps[args[2]].val);
+                tcg_opt_gen_movi(gen_args, args[0], tmp, nb_temps, nb_globals);
+                gen_args += 2;
+                args += 3;
+                break;
+            } else {
+                reset_temp(args[0], nb_temps, nb_globals);
+                gen_args[0] = args[0];
+                gen_args[1] = args[1];
+                gen_args[2] = args[2];
+                gen_args += 3;
+                args += 3;
+                break;
+            }
         case INDEX_op_call:
             nb_call_args = (args[0] >> 16) + (args[0] & 0xffff);
             if (!(args[nb_call_args + 1] & (TCG_CALL_CONST | TCG_CALL_PURE))) {
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v3 4/6] Do constant folding for boolean operations.
  2011-07-07 12:37 [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG Kirill Batuzov
                   ` (2 preceding siblings ...)
  2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 3/6] Do constant folding for basic arithmetic operations Kirill Batuzov
@ 2011-07-07 12:37 ` Kirill Batuzov
  2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 5/6] Do constant folding for shift operations Kirill Batuzov
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Kirill Batuzov @ 2011-07-07 12:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: zhur

Perform constant folding for AND, OR, XOR operations.

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
 tcg/optimize.c |   37 +++++++++++++++++++++++++++++++++++++
 1 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 42a1bda..c469952 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -99,12 +99,18 @@ static int op_bits(int op)
     case INDEX_op_add_i32:
     case INDEX_op_sub_i32:
     case INDEX_op_mul_i32:
+    case INDEX_op_and_i32:
+    case INDEX_op_or_i32:
+    case INDEX_op_xor_i32:
         return 32;
 #if TCG_TARGET_REG_BITS == 64
     case INDEX_op_mov_i64:
     case INDEX_op_add_i64:
     case INDEX_op_sub_i64:
     case INDEX_op_mul_i64:
+    case INDEX_op_and_i64:
+    case INDEX_op_or_i64:
+    case INDEX_op_xor_i64:
         return 64;
 #endif
     default:
@@ -190,6 +196,15 @@ static TCGArg do_constant_folding_2(int op, TCGArg x, TCGArg y)
     CASE_OP_32_64(mul):
         return x * y;
 
+    CASE_OP_32_64(and):
+        return x & y;
+
+    CASE_OP_32_64(or):
+        return x | y;
+
+    CASE_OP_32_64(xor):
+        return x ^ y;
+
     default:
         fprintf(stderr,
                 "Unrecognized operation %d in do_constant_folding.\n", op);
@@ -246,6 +261,9 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
         switch (op) {
         CASE_OP_32_64(add):
         CASE_OP_32_64(mul):
+        CASE_OP_32_64(and):
+        CASE_OP_32_64(or):
+        CASE_OP_32_64(xor):
             if (temps[args[1]].state == TCG_TEMP_CONST) {
                 tmp = args[1];
                 args[1] = args[2];
@@ -291,6 +309,22 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                 continue;
             }
             break;
+        CASE_OP_32_64(or):
+        CASE_OP_32_64(and):
+            if (args[1] == args[2]) {
+                if (args[1] == args[0]) {
+                    args += 3;
+                    gen_opc_buf[op_index] = INDEX_op_nop;
+                } else {
+                    gen_opc_buf[op_index] = op_to_mov(op);
+                    tcg_opt_gen_mov(gen_args, args[0], args[1], nb_temps,
+                                    nb_globals);
+                    gen_args += 2;
+                    args += 3;
+                }
+                continue;
+            }
+            break;
         }
 
         /* Propagate constants through copy operations and do constant
@@ -326,6 +360,9 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
         CASE_OP_32_64(add):
         CASE_OP_32_64(sub):
         CASE_OP_32_64(mul):
+        CASE_OP_32_64(or):
+        CASE_OP_32_64(and):
+        CASE_OP_32_64(xor):
             if (temps[args[1]].state == TCG_TEMP_CONST
                 && temps[args[2]].state == TCG_TEMP_CONST) {
                 gen_opc_buf[op_index] = op_to_movi(op);
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v3 5/6] Do constant folding for shift operations.
  2011-07-07 12:37 [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG Kirill Batuzov
                   ` (3 preceding siblings ...)
  2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 4/6] Do constant folding for boolean operations Kirill Batuzov
@ 2011-07-07 12:37 ` Kirill Batuzov
  2011-07-30 12:25   ` Blue Swirl
  2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 6/6] Do constant folding for unary operations Kirill Batuzov
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Kirill Batuzov @ 2011-07-07 12:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: zhur

Perform constant forlding for SHR, SHL, SAR, ROTR, ROTL operations.

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
 tcg/optimize.c |   72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 72 insertions(+), 0 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index c469952..a1bb287 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -102,6 +102,11 @@ static int op_bits(int op)
     case INDEX_op_and_i32:
     case INDEX_op_or_i32:
     case INDEX_op_xor_i32:
+    case INDEX_op_shl_i32:
+    case INDEX_op_shr_i32:
+    case INDEX_op_sar_i32:
+    case INDEX_op_rotl_i32:
+    case INDEX_op_rotr_i32:
         return 32;
 #if TCG_TARGET_REG_BITS == 64
     case INDEX_op_mov_i64:
@@ -111,6 +116,11 @@ static int op_bits(int op)
     case INDEX_op_and_i64:
     case INDEX_op_or_i64:
     case INDEX_op_xor_i64:
+    case INDEX_op_shl_i64:
+    case INDEX_op_shr_i64:
+    case INDEX_op_sar_i64:
+    case INDEX_op_rotl_i64:
+    case INDEX_op_rotr_i64:
         return 64;
 #endif
     default:
@@ -205,6 +215,58 @@ static TCGArg do_constant_folding_2(int op, TCGArg x, TCGArg y)
     CASE_OP_32_64(xor):
         return x ^ y;
 
+    case INDEX_op_shl_i32:
+        return (uint32_t)x << (uint32_t)y;
+
+#if TCG_TARGET_REG_BITS == 64
+    case INDEX_op_shl_i64:
+        return (uint64_t)x << (uint64_t)y;
+#endif
+
+    case INDEX_op_shr_i32:
+        return (uint32_t)x >> (uint32_t)y;
+
+#if TCG_TARGET_REG_BITS == 64
+    case INDEX_op_shr_i64:
+        return (uint64_t)x >> (uint64_t)y;
+#endif
+
+    case INDEX_op_sar_i32:
+        return (int32_t)x >> (int32_t)y;
+
+#if TCG_TARGET_REG_BITS == 64
+    case INDEX_op_sar_i64:
+        return (int64_t)x >> (int64_t)y;
+#endif
+
+    case INDEX_op_rotr_i32:
+#if TCG_TARGET_REG_BITS == 64
+        x &= 0xffffffff;
+        y &= 0xffffffff;
+#endif
+        x = (x << (32 - y)) | (x >> y);
+        return x;
+
+#if TCG_TARGET_REG_BITS == 64
+    case INDEX_op_rotr_i64:
+        x = (x << (64 - y)) | (x >> y);
+        return x;
+#endif
+
+    case INDEX_op_rotl_i32:
+#if TCG_TARGET_REG_BITS == 64
+        x &= 0xffffffff;
+        y &= 0xffffffff;
+#endif
+        x = (x << y) | (x >> (32 - y));
+        return x;
+
+#if TCG_TARGET_REG_BITS == 64
+    case INDEX_op_rotl_i64:
+        x = (x << y) | (x >> (64 - y));
+        return x;
+#endif
+
     default:
         fprintf(stderr,
                 "Unrecognized operation %d in do_constant_folding.\n", op);
@@ -278,6 +340,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
         switch (op) {
         CASE_OP_32_64(add):
         CASE_OP_32_64(sub):
+        CASE_OP_32_64(shl):
+        CASE_OP_32_64(shr):
+        CASE_OP_32_64(sar):
+        CASE_OP_32_64(rotl):
+        CASE_OP_32_64(rotr):
             if (temps[args[1]].state == TCG_TEMP_CONST) {
                 /* Proceed with possible constant folding. */
                 break;
@@ -363,6 +430,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
         CASE_OP_32_64(or):
         CASE_OP_32_64(and):
         CASE_OP_32_64(xor):
+        CASE_OP_32_64(shl):
+        CASE_OP_32_64(shr):
+        CASE_OP_32_64(sar):
+        CASE_OP_32_64(rotl):
+        CASE_OP_32_64(rotr):
             if (temps[args[1]].state == TCG_TEMP_CONST
                 && temps[args[2]].state == TCG_TEMP_CONST) {
                 gen_opc_buf[op_index] = op_to_movi(op);
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v3 6/6] Do constant folding for unary operations.
  2011-07-07 12:37 [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG Kirill Batuzov
                   ` (4 preceding siblings ...)
  2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 5/6] Do constant folding for shift operations Kirill Batuzov
@ 2011-07-07 12:37 ` Kirill Batuzov
  2011-07-07 12:54 ` [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG Peter Maydell
  2011-07-30 10:52 ` Blue Swirl
  7 siblings, 0 replies; 18+ messages in thread
From: Kirill Batuzov @ 2011-07-07 12:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: zhur

Perform constant folding for NOT and EXT{8,16,32}{S,U} operations.

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
 tcg/optimize.c |   59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 59 insertions(+), 0 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index a1bb287..a324e98 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -107,6 +107,11 @@ static int op_bits(int op)
     case INDEX_op_sar_i32:
     case INDEX_op_rotl_i32:
     case INDEX_op_rotr_i32:
+    case INDEX_op_not_i32:
+    case INDEX_op_ext8s_i32:
+    case INDEX_op_ext16s_i32:
+    case INDEX_op_ext8u_i32:
+    case INDEX_op_ext16u_i32:
         return 32;
 #if TCG_TARGET_REG_BITS == 64
     case INDEX_op_mov_i64:
@@ -121,6 +126,13 @@ static int op_bits(int op)
     case INDEX_op_sar_i64:
     case INDEX_op_rotl_i64:
     case INDEX_op_rotr_i64:
+    case INDEX_op_not_i64:
+    case INDEX_op_ext8s_i64:
+    case INDEX_op_ext16s_i64:
+    case INDEX_op_ext32s_i64:
+    case INDEX_op_ext8u_i64:
+    case INDEX_op_ext16u_i64:
+    case INDEX_op_ext32u_i64:
         return 64;
 #endif
     default:
@@ -267,6 +279,29 @@ static TCGArg do_constant_folding_2(int op, TCGArg x, TCGArg y)
         return x;
 #endif
 
+    CASE_OP_32_64(not):
+        return ~x;
+
+    CASE_OP_32_64(ext8s):
+        return (int8_t)x;
+
+    CASE_OP_32_64(ext16s):
+        return (int16_t)x;
+
+    CASE_OP_32_64(ext8u):
+        return (uint8_t)x;
+
+    CASE_OP_32_64(ext16u):
+        return (uint16_t)x;
+
+#if TCG_TARGET_REG_BITS == 64
+    case INDEX_op_ext32s_i64:
+        return (int32_t)x;
+
+    case INDEX_op_ext32u_i64:
+        return (uint32_t)x;
+#endif
+
     default:
         fprintf(stderr,
                 "Unrecognized operation %d in do_constant_folding.\n", op);
@@ -424,6 +459,30 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             gen_args += 2;
             args += 2;
             break;
+        CASE_OP_32_64(not):
+        CASE_OP_32_64(ext8s):
+        CASE_OP_32_64(ext16s):
+        CASE_OP_32_64(ext8u):
+        CASE_OP_32_64(ext16u):
+#if TCG_TARGET_REG_BITS == 64
+        case INDEX_op_ext32s_i64:
+        case INDEX_op_ext32u_i64:
+#endif
+            if (temps[args[1]].state == TCG_TEMP_CONST) {
+                gen_opc_buf[op_index] = op_to_movi(op);
+                tmp = do_constant_folding(op, temps[args[1]].val, 0);
+                tcg_opt_gen_movi(gen_args, args[0], tmp, nb_temps, nb_globals);
+                gen_args += 2;
+                args += 2;
+                break;
+            } else {
+                reset_temp(args[0], nb_temps, nb_globals);
+                gen_args[0] = args[0];
+                gen_args[1] = args[1];
+                gen_args += 2;
+                args += 2;
+                break;
+            }
         CASE_OP_32_64(add):
         CASE_OP_32_64(sub):
         CASE_OP_32_64(mul):
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG
  2011-07-07 12:37 [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG Kirill Batuzov
                   ` (5 preceding siblings ...)
  2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 6/6] Do constant folding for unary operations Kirill Batuzov
@ 2011-07-07 12:54 ` Peter Maydell
  2011-07-07 14:22   ` Kirill Batuzov
  2011-07-30 10:52 ` Blue Swirl
  7 siblings, 1 reply; 18+ messages in thread
From: Peter Maydell @ 2011-07-07 12:54 UTC (permalink / raw)
  To: Kirill Batuzov; +Cc: qemu-devel, zhur

On 7 July 2011 13:37, Kirill Batuzov <batuzovk@ispras.ru> wrote:
> ARM guests for 254.gap and 255.vortex and x86_64 guest for 252.eon does not
> work under QEMU for some unrelated reason.

If you can provide a binary and a command line for these I can have
a look at what's going on with the failing ARM guest binaries...

-- PMM

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG
  2011-07-07 12:54 ` [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG Peter Maydell
@ 2011-07-07 14:22   ` Kirill Batuzov
  0 siblings, 0 replies; 18+ messages in thread
From: Kirill Batuzov @ 2011-07-07 14:22 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel, zhur



On Thu, 7 Jul 2011, Peter Maydell wrote:

> On 7 July 2011 13:37, Kirill Batuzov <batuzovk@ispras.ru> wrote:
> > ARM guests for 254.gap and 255.vortex and x86_64 guest for 252.eon does not
> > work under QEMU for some unrelated reason.
> 
> If you can provide a binary and a command line for these I can have
> a look at what's going on with the failing ARM guest binaries...
>

I've just checked more carefully: these tests fail the same way on
hardware. So it is some SPEC or compiler problem not related to QEMU at
all.

----
  Kirill

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG
  2011-07-07 12:37 [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG Kirill Batuzov
                   ` (6 preceding siblings ...)
  2011-07-07 12:54 ` [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG Peter Maydell
@ 2011-07-30 10:52 ` Blue Swirl
  7 siblings, 0 replies; 18+ messages in thread
From: Blue Swirl @ 2011-07-30 10:52 UTC (permalink / raw)
  To: Kirill Batuzov; +Cc: qemu-devel, zhur

Thanks, applied all.

On Thu, Jul 7, 2011 at 3:37 PM, Kirill Batuzov <batuzovk@ispras.ru> wrote:
> This series implements some basic machine-independent optimizations.  They
> simplify code and allow liveness analysis do it's work better.
>
> Suppose we have following ARM code:
>
>  movw    r12, #0xb6db
>  movt    r12, #0xdb6d
>
> In TCG before optimizations we'll have:
>
>  movi_i32 tmp8,$0xb6db
>  mov_i32 r12,tmp8
>  mov_i32 tmp8,r12
>  ext16u_i32 tmp8,tmp8
>  movi_i32 tmp9,$0xdb6d0000
>  or_i32 tmp8,tmp8,tmp9
>  mov_i32 r12,tmp8
>
> And after optimizations we'll have this:
>
>  movi_i32 r12,$0xdb6db6db
>
> Here are performance evaluation results on SPEC CPU2000 integer tests in
> user-mode emulation on x86_64 host.  There were 5 runs of each test on
> reference data set.  The tables below show runtime in seconds for all these
> runs.
>
> ARM guest without optimizations:
> Test name       #1       #2       #3       #4       #5    Median
> 164.gzip    1408.891 1402.323 1407.623 1404.955 1405.396 1405.396
> 175.vpr     1245.31  1248.758 1247.936 1248.534 1247.534 1247.936
> 176.gcc      912.561  809.481  847.057 912.636   912.544  912.544
> 181.mcf      198.384  197.841  199.127 197.976   197.29   197.976
> 186.crafty  1545.881 1546.051 1546.002 1545.927 1545.945 1545.945
> 197.parser  3779.954 3779.878 3779.79  3779.94  3779.88  3779.88
> 252.eon     2563.168 2776.152 2776.395 2776.577 2776.202 2776.202
> 253.perlbmk 2591.781 2504.078 2507.07  2591.337 2463.401 2507.07
> 256.bzip2   1306.197 1304.639 1184.853 1305.141 1305.606 1305.141
> 300.twolf   2918.984 2918.926 2918.93  2918.97  2918.914 2918.93
>
> ARM guest with optimizations:
> Test name       #1       #2       #3       #4       #5    Median    Gain
> 164.gzip    1401.198 1376.337 1401.117 1401.23  1401.246 1401.198   0.30%
> 175.vpr     1247.964 1151.468 1247.76  1154.419 1242.017 1242.017   0.47%
> 176.gcc      896.882  918.546  918.297  851.465  918.39   918.297  -0.63%
> 181.mcf      198.19   197.399  198.421  198.663  198.312  198.312  -0.17%
> 186.crafty  1520.425 1520.362 1520.477 1520.445 1520.957 1520.445   1.65%
> 197.parser  3770.943 3770.927 3770.578 3771.048 3770.904 3770.927   0.24%
> 252.eon     2752.371 2752.111 2752.005 2752.214 2752.109 2752.111   0.87%
> 253.perlbmk 2577.462 2578.588 2493.567 2578.571 2578.318 2578.318  -2.84%
> 256.bzip2   1296.198 1271.128 1296.044 1296.321 1296.147 1296.147   0.69%
> 300.twolf   2888.984 2889.023 2889.225 2889.039 2889.05  2889.039   1.02%
>
>
> x86_64 guest without optimizations:
> Test name       #1       #2       #3       #4       #5    Median
> 164.gzip     857.654  857.646  857.678  798.119  857.675  857.654
> 175.vpr      959.265  959.207  959.185  959.461  959.332  959.265
> 176.gcc      625.722  637.257  646.638  646.614  646.56   646.56
> 181.mcf      221.666  220.194  220.079  219.868  221.5    220.194
> 186.crafty  1129.531 1129.739 1129.573 1129.588 1129.624 1129.588
> 197.parser  1809.517 1809.516 1809.386 1809.477 1809.427 1809.477
> 253.perlbmk 1774.944 1776.046 1769.865 1774.052 1775.236 1774.944
> 254.gap     1061.033 1061.158 1061.064 1061.047 1061.01  1061.047
> 255.vortex  1871.261 1914.144 1914.057 1914.086 1914.127 1914.086
> 256.bzip2    918.916 1011.828 1011.819 1012.11  1011.932 1011.828
> 300.twolf   1332.797 1330.56  1330.687 1330.917 1330.602 1330.687
>
> x86_64 guest with optimizations:
> Test name       #1       #2       #3       #4       #5    Median    Gain
> 164.gzip     806.198  854.159  854.184  854.168  854.187  854.168   0.41%
> 175.vpr      955.905  950.86   955.876  876.397  955.957  955.876   1.82%
> 176.gcc      641.663  640.189  641.57   641.552  641.514  641.552   0.03%
> 181.mcf      217.619  218.627  218.699  217.977  216.955  217.977   1.18%
> 186.crafty  1123.909 1123.852 1123.917 1123.781 1123.805 1123.852   0.51%
> 197.parser  1813.94  1814.643 1815.286 1814.445 1813.72  1814.445  -0.27%
> 253.perlbmk 1791.536 1795.642 1793.0   1797.486 1791.401 1793.0    -1.02%
> 254.gap     1070.605 1070.216 1070.637 1070.168 1070.491 1070.491  -0.89%
> 255.vortex  1918.764 1918.573 1917.411 1918.287 1918.735 1918.573  -0.23%
> 256.bzip2   1017.179 1017.083 1017.283 1016.913 1017.189 1017.179  -0.53%
> 300.twolf   1321.072 1321.109 1321.019 1321.072 1321.004 1321.072   0.72%
>
> ARM guests for 254.gap and 255.vortex and x86_64 guest for 252.eon does not
> work under QEMU for some unrelated reason.
>
> Changes:
> v1 -> v2
>  - State and Vals arrays merged to an array of structures.
>  - Added reference counting of temp's copies. This helps to reset temp's state
>   faster in most cases.
>  - Do not make copy propagation through operations with TCG_OPF_CALL_CLOBBER or
>   TCG_OPF_SIDE_EFFECTS flag.
>  - Split some expression simplifications into independent switch.
>  - Let compiler handle signed shifts and sign/zero extends in it's
>   implementation defined way.
>
> v2 -> v3
>  - Elements of equiv class are placed in a double-linked circular list so it's
>   easier to choose a new representative.
>  - CASE_OP_32_64 macro is used to reduce amount of ifdefdsi. Checkpatch is not
>   happy about this change but I do not think spaces would be appropriate here.
>  - Some constraints during copy propagation are relaxed.
>  - Functions tcg_opt_gen_mov and tcg_opt_gen_movi are introduced to reduce code
>   duplication.
>
> Kirill Batuzov (6):
>  Add TCG optimizations stub
>  Add copy and constant propagation.
>  Do constant folding for basic arithmetic operations.
>  Do constant folding for boolean operations.
>  Do constant folding for shift operations.
>  Do constant folding for unary operations.
>
>  Makefile.target |    2 +-
>  tcg/optimize.c  |  568 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  tcg/tcg.c       |    6 +
>  tcg/tcg.h       |    3 +
>  4 files changed, 578 insertions(+), 1 deletions(-)
>  create mode 100644 tcg/optimize.c
>
> --
> 1.7.4.1
>
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v3 5/6] Do constant folding for shift operations.
  2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 5/6] Do constant folding for shift operations Kirill Batuzov
@ 2011-07-30 12:25   ` Blue Swirl
  2011-07-30 19:13     ` Blue Swirl
  0 siblings, 1 reply; 18+ messages in thread
From: Blue Swirl @ 2011-07-30 12:25 UTC (permalink / raw)
  To: Kirill Batuzov; +Cc: qemu-devel, zhur

On Thu, Jul 7, 2011 at 3:37 PM, Kirill Batuzov <batuzovk@ispras.ru> wrote:
> Perform constant forlding for SHR, SHL, SAR, ROTR, ROTL operations.

This patch broke build on targets (Sparc, MIPS) which don't implement
rotation ops, the next patch likewise. I committed a fix.

> Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
> ---
>  tcg/optimize.c |   72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 72 insertions(+), 0 deletions(-)
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index c469952..a1bb287 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -102,6 +102,11 @@ static int op_bits(int op)
>     case INDEX_op_and_i32:
>     case INDEX_op_or_i32:
>     case INDEX_op_xor_i32:
> +    case INDEX_op_shl_i32:
> +    case INDEX_op_shr_i32:
> +    case INDEX_op_sar_i32:
> +    case INDEX_op_rotl_i32:
> +    case INDEX_op_rotr_i32:
>         return 32;
>  #if TCG_TARGET_REG_BITS == 64
>     case INDEX_op_mov_i64:
> @@ -111,6 +116,11 @@ static int op_bits(int op)
>     case INDEX_op_and_i64:
>     case INDEX_op_or_i64:
>     case INDEX_op_xor_i64:
> +    case INDEX_op_shl_i64:
> +    case INDEX_op_shr_i64:
> +    case INDEX_op_sar_i64:
> +    case INDEX_op_rotl_i64:
> +    case INDEX_op_rotr_i64:
>         return 64;
>  #endif
>     default:
> @@ -205,6 +215,58 @@ static TCGArg do_constant_folding_2(int op, TCGArg x, TCGArg y)
>     CASE_OP_32_64(xor):
>         return x ^ y;
>
> +    case INDEX_op_shl_i32:
> +        return (uint32_t)x << (uint32_t)y;
> +
> +#if TCG_TARGET_REG_BITS == 64
> +    case INDEX_op_shl_i64:
> +        return (uint64_t)x << (uint64_t)y;
> +#endif
> +
> +    case INDEX_op_shr_i32:
> +        return (uint32_t)x >> (uint32_t)y;
> +
> +#if TCG_TARGET_REG_BITS == 64
> +    case INDEX_op_shr_i64:
> +        return (uint64_t)x >> (uint64_t)y;
> +#endif
> +
> +    case INDEX_op_sar_i32:
> +        return (int32_t)x >> (int32_t)y;
> +
> +#if TCG_TARGET_REG_BITS == 64
> +    case INDEX_op_sar_i64:
> +        return (int64_t)x >> (int64_t)y;
> +#endif
> +
> +    case INDEX_op_rotr_i32:
> +#if TCG_TARGET_REG_BITS == 64
> +        x &= 0xffffffff;
> +        y &= 0xffffffff;
> +#endif
> +        x = (x << (32 - y)) | (x >> y);
> +        return x;
> +
> +#if TCG_TARGET_REG_BITS == 64
> +    case INDEX_op_rotr_i64:
> +        x = (x << (64 - y)) | (x >> y);
> +        return x;
> +#endif
> +
> +    case INDEX_op_rotl_i32:
> +#if TCG_TARGET_REG_BITS == 64
> +        x &= 0xffffffff;
> +        y &= 0xffffffff;
> +#endif
> +        x = (x << y) | (x >> (32 - y));
> +        return x;
> +
> +#if TCG_TARGET_REG_BITS == 64
> +    case INDEX_op_rotl_i64:
> +        x = (x << y) | (x >> (64 - y));
> +        return x;
> +#endif
> +
>     default:
>         fprintf(stderr,
>                 "Unrecognized operation %d in do_constant_folding.\n", op);
> @@ -278,6 +340,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>         switch (op) {
>         CASE_OP_32_64(add):
>         CASE_OP_32_64(sub):
> +        CASE_OP_32_64(shl):
> +        CASE_OP_32_64(shr):
> +        CASE_OP_32_64(sar):
> +        CASE_OP_32_64(rotl):
> +        CASE_OP_32_64(rotr):
>             if (temps[args[1]].state == TCG_TEMP_CONST) {
>                 /* Proceed with possible constant folding. */
>                 break;
> @@ -363,6 +430,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>         CASE_OP_32_64(or):
>         CASE_OP_32_64(and):
>         CASE_OP_32_64(xor):
> +        CASE_OP_32_64(shl):
> +        CASE_OP_32_64(shr):
> +        CASE_OP_32_64(sar):
> +        CASE_OP_32_64(rotl):
> +        CASE_OP_32_64(rotr):
>             if (temps[args[1]].state == TCG_TEMP_CONST
>                 && temps[args[2]].state == TCG_TEMP_CONST) {
>                 gen_opc_buf[op_index] = op_to_movi(op);
> --
> 1.7.4.1
>
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v3 5/6] Do constant folding for shift operations.
  2011-07-30 12:25   ` Blue Swirl
@ 2011-07-30 19:13     ` Blue Swirl
  0 siblings, 0 replies; 18+ messages in thread
From: Blue Swirl @ 2011-07-30 19:13 UTC (permalink / raw)
  To: Kirill Batuzov; +Cc: qemu-devel, zhur

On Sat, Jul 30, 2011 at 3:25 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
> On Thu, Jul 7, 2011 at 3:37 PM, Kirill Batuzov <batuzovk@ispras.ru> wrote:
>> Perform constant forlding for SHR, SHL, SAR, ROTR, ROTL operations.
>
> This patch broke build on targets (Sparc, MIPS) which don't implement
> rotation ops, the next patch likewise. I committed a fix.

Unfortunately my patch which fixed Sparc build broke i386 build, so I
committed another fix.

>> Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
>> ---
>>  tcg/optimize.c |   72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 files changed, 72 insertions(+), 0 deletions(-)
>>
>> diff --git a/tcg/optimize.c b/tcg/optimize.c
>> index c469952..a1bb287 100644
>> --- a/tcg/optimize.c
>> +++ b/tcg/optimize.c
>> @@ -102,6 +102,11 @@ static int op_bits(int op)
>>     case INDEX_op_and_i32:
>>     case INDEX_op_or_i32:
>>     case INDEX_op_xor_i32:
>> +    case INDEX_op_shl_i32:
>> +    case INDEX_op_shr_i32:
>> +    case INDEX_op_sar_i32:
>> +    case INDEX_op_rotl_i32:
>> +    case INDEX_op_rotr_i32:
>>         return 32;
>>  #if TCG_TARGET_REG_BITS == 64
>>     case INDEX_op_mov_i64:
>> @@ -111,6 +116,11 @@ static int op_bits(int op)
>>     case INDEX_op_and_i64:
>>     case INDEX_op_or_i64:
>>     case INDEX_op_xor_i64:
>> +    case INDEX_op_shl_i64:
>> +    case INDEX_op_shr_i64:
>> +    case INDEX_op_sar_i64:
>> +    case INDEX_op_rotl_i64:
>> +    case INDEX_op_rotr_i64:
>>         return 64;
>>  #endif
>>     default:
>> @@ -205,6 +215,58 @@ static TCGArg do_constant_folding_2(int op, TCGArg x, TCGArg y)
>>     CASE_OP_32_64(xor):
>>         return x ^ y;
>>
>> +    case INDEX_op_shl_i32:
>> +        return (uint32_t)x << (uint32_t)y;
>> +
>> +#if TCG_TARGET_REG_BITS == 64
>> +    case INDEX_op_shl_i64:
>> +        return (uint64_t)x << (uint64_t)y;
>> +#endif
>> +
>> +    case INDEX_op_shr_i32:
>> +        return (uint32_t)x >> (uint32_t)y;
>> +
>> +#if TCG_TARGET_REG_BITS == 64
>> +    case INDEX_op_shr_i64:
>> +        return (uint64_t)x >> (uint64_t)y;
>> +#endif
>> +
>> +    case INDEX_op_sar_i32:
>> +        return (int32_t)x >> (int32_t)y;
>> +
>> +#if TCG_TARGET_REG_BITS == 64
>> +    case INDEX_op_sar_i64:
>> +        return (int64_t)x >> (int64_t)y;
>> +#endif
>> +
>> +    case INDEX_op_rotr_i32:
>> +#if TCG_TARGET_REG_BITS == 64
>> +        x &= 0xffffffff;
>> +        y &= 0xffffffff;
>> +#endif
>> +        x = (x << (32 - y)) | (x >> y);
>> +        return x;
>> +
>> +#if TCG_TARGET_REG_BITS == 64
>> +    case INDEX_op_rotr_i64:
>> +        x = (x << (64 - y)) | (x >> y);
>> +        return x;
>> +#endif
>> +
>> +    case INDEX_op_rotl_i32:
>> +#if TCG_TARGET_REG_BITS == 64
>> +        x &= 0xffffffff;
>> +        y &= 0xffffffff;
>> +#endif
>> +        x = (x << y) | (x >> (32 - y));
>> +        return x;
>> +
>> +#if TCG_TARGET_REG_BITS == 64
>> +    case INDEX_op_rotl_i64:
>> +        x = (x << y) | (x >> (64 - y));
>> +        return x;
>> +#endif
>> +
>>     default:
>>         fprintf(stderr,
>>                 "Unrecognized operation %d in do_constant_folding.\n", op);
>> @@ -278,6 +340,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>>         switch (op) {
>>         CASE_OP_32_64(add):
>>         CASE_OP_32_64(sub):
>> +        CASE_OP_32_64(shl):
>> +        CASE_OP_32_64(shr):
>> +        CASE_OP_32_64(sar):
>> +        CASE_OP_32_64(rotl):
>> +        CASE_OP_32_64(rotr):
>>             if (temps[args[1]].state == TCG_TEMP_CONST) {
>>                 /* Proceed with possible constant folding. */
>>                 break;
>> @@ -363,6 +430,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>>         CASE_OP_32_64(or):
>>         CASE_OP_32_64(and):
>>         CASE_OP_32_64(xor):
>> +        CASE_OP_32_64(shl):
>> +        CASE_OP_32_64(shr):
>> +        CASE_OP_32_64(sar):
>> +        CASE_OP_32_64(rotl):
>> +        CASE_OP_32_64(rotr):
>>             if (temps[args[1]].state == TCG_TEMP_CONST
>>                 && temps[args[2]].state == TCG_TEMP_CONST) {
>>                 gen_opc_buf[op_index] = op_to_movi(op);
>> --
>> 1.7.4.1
>>
>>
>>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/6] Add copy and constant propagation.
  2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 2/6] Add copy and constant propagation Kirill Batuzov
@ 2011-08-03 19:00   ` Stefan Weil
  2011-08-03 20:20     ` Blue Swirl
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Weil @ 2011-08-03 19:00 UTC (permalink / raw)
  To: Kirill Batuzov; +Cc: Blue Swirl, qemu-devel, zhur

Am 07.07.2011 14:37, schrieb Kirill Batuzov:
> Make tcg_constant_folding do copy and constant propagation. It is a
> preparational work before actual constant folding.
>
> Signed-off-by: Kirill Batuzov<batuzovk@ispras.ru>
> ---
>   tcg/optimize.c |  182 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>   1 files changed, 180 insertions(+), 2 deletions(-)
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index c7c7da9..f8afe71 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
>    
...

This patch breaks QEMU on 32 bit hosts (tested on 386 Linux
and w32 hosts). Simply running qemu (BIOS only) terminates
with abort(). As the error is easy to reproduce, I don't provide
a stack frame here.

> +static void tcg_opt_gen_mov(TCGArg *gen_args, TCGArg dst, TCGArg src,
> +                            int nb_temps, int nb_globals)
> +{
> +        reset_temp(dst, nb_temps, nb_globals);
> +        assert(temps[src].state != TCG_TEMP_COPY);
> +        if (src>= nb_globals) {
> +            assert(temps[src].state != TCG_TEMP_CONST);
> +            if (temps[src].state != TCG_TEMP_HAS_COPY) {
> +                temps[src].state = TCG_TEMP_HAS_COPY;
> +                temps[src].next_copy = src;
> +                temps[src].prev_copy = src;
> +            }
> +            temps[dst].state = TCG_TEMP_COPY;
> +            temps[dst].val = src;
> +            temps[dst].next_copy = temps[src].next_copy;
> +            temps[dst].prev_copy = src;
> +            temps[temps[dst].next_copy].prev_copy = dst;
> +            temps[src].next_copy = dst;
> +        }
> +        gen_args[0] = dst;
> +        gen_args[1] = src;
> +}
>    

QEMU with a modified tcg_opt_gen_mov() (without the if block) works.

Kind regards,
Stefan Weil

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/6] Add copy and constant propagation.
  2011-08-03 19:00   ` Stefan Weil
@ 2011-08-03 20:20     ` Blue Swirl
  2011-08-03 20:56       ` Stefan Weil
  0 siblings, 1 reply; 18+ messages in thread
From: Blue Swirl @ 2011-08-03 20:20 UTC (permalink / raw)
  To: Stefan Weil; +Cc: qemu-devel, zhur, Kirill Batuzov

On Wed, Aug 3, 2011 at 7:00 PM, Stefan Weil <weil@mail.berlios.de> wrote:
> Am 07.07.2011 14:37, schrieb Kirill Batuzov:
>>
>> Make tcg_constant_folding do copy and constant propagation. It is a
>> preparational work before actual constant folding.
>>
>> Signed-off-by: Kirill Batuzov<batuzovk@ispras.ru>
>> ---
>>  tcg/optimize.c |  182
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>  1 files changed, 180 insertions(+), 2 deletions(-)
>>
>> diff --git a/tcg/optimize.c b/tcg/optimize.c
>> index c7c7da9..f8afe71 100644
>> --- a/tcg/optimize.c
>> +++ b/tcg/optimize.c
>>
>
> ...
>
> This patch breaks QEMU on 32 bit hosts (tested on 386 Linux
> and w32 hosts). Simply running qemu (BIOS only) terminates
> with abort(). As the error is easy to reproduce, I don't provide
> a stack frame here.

I can't reproduce, i386/Linux and win32 versions of i386, Sparc32 and
Sparc64 emulators work fine.

Maybe you have a stale build (bug in Makefile dependencies)?

>> +static void tcg_opt_gen_mov(TCGArg *gen_args, TCGArg dst, TCGArg src,
>> +                            int nb_temps, int nb_globals)
>> +{
>> +        reset_temp(dst, nb_temps, nb_globals);
>> +        assert(temps[src].state != TCG_TEMP_COPY);
>> +        if (src>= nb_globals) {
>> +            assert(temps[src].state != TCG_TEMP_CONST);
>> +            if (temps[src].state != TCG_TEMP_HAS_COPY) {
>> +                temps[src].state = TCG_TEMP_HAS_COPY;
>> +                temps[src].next_copy = src;
>> +                temps[src].prev_copy = src;
>> +            }
>> +            temps[dst].state = TCG_TEMP_COPY;
>> +            temps[dst].val = src;
>> +            temps[dst].next_copy = temps[src].next_copy;
>> +            temps[dst].prev_copy = src;
>> +            temps[temps[dst].next_copy].prev_copy = dst;
>> +            temps[src].next_copy = dst;
>> +        }
>> +        gen_args[0] = dst;
>> +        gen_args[1] = src;
>> +}
>>
>
> QEMU with a modified tcg_opt_gen_mov() (without the if block) works.
>
> Kind regards,
> Stefan Weil
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/6] Add copy and constant propagation.
  2011-08-03 20:20     ` Blue Swirl
@ 2011-08-03 20:56       ` Stefan Weil
  2011-08-03 21:03         ` Stefan Weil
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Weil @ 2011-08-03 20:56 UTC (permalink / raw)
  To: Blue Swirl; +Cc: qemu-devel, zhur, Kirill Batuzov

Am 03.08.2011 22:20, schrieb Blue Swirl:
> On Wed, Aug 3, 2011 at 7:00 PM, Stefan Weil <weil@mail.berlios.de> wrote:
>> Am 07.07.2011 14:37, schrieb Kirill Batuzov:
>>>
>>> Make tcg_constant_folding do copy and constant propagation. It is a
>>> preparational work before actual constant folding.
>>>
>>> Signed-off-by: Kirill Batuzov<batuzovk@ispras.ru>
>>> ---
>>>  tcg/optimize.c |  182
>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>  1 files changed, 180 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/tcg/optimize.c b/tcg/optimize.c
>>> index c7c7da9..f8afe71 100644
>>> --- a/tcg/optimize.c
>>> +++ b/tcg/optimize.c
>>>
>>
>> ...
>>
>> This patch breaks QEMU on 32 bit hosts (tested on 386 Linux
>> and w32 hosts). Simply running qemu (BIOS only) terminates
>> with abort(). As the error is easy to reproduce, I don't provide
>> a stack frame here.
>
> I can't reproduce, i386/Linux and win32 versions of i386, Sparc32 and
> Sparc64 emulators work fine.
>
> Maybe you have a stale build (bug in Makefile dependencies)?

Sorry, an important information was wrong / missing in my report.
It's not qemu, but qemu-system-x86_64 which fails to work.

I just tested it once more with a new build:

$ bin/x86_64-softmmu/qemu-system-x86_64 -L pc-bios
/qemu/tcg/tcg.c:1646: tcg fatal error
Abgebrochen

Cheers,
Stefan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/6] Add copy and constant propagation.
  2011-08-03 20:56       ` Stefan Weil
@ 2011-08-03 21:03         ` Stefan Weil
  2011-08-04 18:42           ` Blue Swirl
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Weil @ 2011-08-03 21:03 UTC (permalink / raw)
  To: Stefan Weil; +Cc: Blue Swirl, qemu-devel, zhur, Kirill Batuzov

Am 03.08.2011 22:56, schrieb Stefan Weil:
> Am 03.08.2011 22:20, schrieb Blue Swirl:
>> On Wed, Aug 3, 2011 at 7:00 PM, Stefan Weil <weil@mail.berlios.de> 
>> wrote:
>>> Am 07.07.2011 14:37, schrieb Kirill Batuzov:
>>>>
>>>> Make tcg_constant_folding do copy and constant propagation. It is a
>>>> preparational work before actual constant folding.
>>>>
>>>> Signed-off-by: Kirill Batuzov<batuzovk@ispras.ru>
>>>> ---
>>>>  tcg/optimize.c |  182
>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>>  1 files changed, 180 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/tcg/optimize.c b/tcg/optimize.c
>>>> index c7c7da9..f8afe71 100644
>>>> --- a/tcg/optimize.c
>>>> +++ b/tcg/optimize.c
>>>>
>>>
>>> ...
>>>
>>> This patch breaks QEMU on 32 bit hosts (tested on 386 Linux
>>> and w32 hosts). Simply running qemu (BIOS only) terminates
>>> with abort(). As the error is easy to reproduce, I don't provide
>>> a stack frame here.
>>
>> I can't reproduce, i386/Linux and win32 versions of i386, Sparc32 and
>> Sparc64 emulators work fine.
>>
>> Maybe you have a stale build (bug in Makefile dependencies)?
>
> Sorry, an important information was wrong / missing in my report.
> It's not qemu, but qemu-system-x86_64 which fails to work.
>
> I just tested it once more with a new build:
>
> $ bin/x86_64-softmmu/qemu-system-x86_64 -L pc-bios
> /qemu/tcg/tcg.c:1646: tcg fatal error
> Abgebrochen
>
> Cheers,
> Stefan

qemu-system-mips64el fails with the same error, so the problem
occurs when running 64 bit emulations on 32 bit hosts.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/6] Add copy and constant propagation.
  2011-08-03 21:03         ` Stefan Weil
@ 2011-08-04 18:42           ` Blue Swirl
  2011-08-04 19:24             ` Blue Swirl
  0 siblings, 1 reply; 18+ messages in thread
From: Blue Swirl @ 2011-08-04 18:42 UTC (permalink / raw)
  To: Stefan Weil; +Cc: qemu-devel, zhur, Kirill Batuzov

On Wed, Aug 3, 2011 at 9:03 PM, Stefan Weil <weil@mail.berlios.de> wrote:
> Am 03.08.2011 22:56, schrieb Stefan Weil:
>>
>> Am 03.08.2011 22:20, schrieb Blue Swirl:
>>>
>>> On Wed, Aug 3, 2011 at 7:00 PM, Stefan Weil <weil@mail.berlios.de> wrote:
>>>>
>>>> Am 07.07.2011 14:37, schrieb Kirill Batuzov:
>>>>>
>>>>> Make tcg_constant_folding do copy and constant propagation. It is a
>>>>> preparational work before actual constant folding.
>>>>>
>>>>> Signed-off-by: Kirill Batuzov<batuzovk@ispras.ru>
>>>>> ---
>>>>>  tcg/optimize.c |  182
>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>>>  1 files changed, 180 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/tcg/optimize.c b/tcg/optimize.c
>>>>> index c7c7da9..f8afe71 100644
>>>>> --- a/tcg/optimize.c
>>>>> +++ b/tcg/optimize.c
>>>>>
>>>>
>>>> ...
>>>>
>>>> This patch breaks QEMU on 32 bit hosts (tested on 386 Linux
>>>> and w32 hosts). Simply running qemu (BIOS only) terminates
>>>> with abort(). As the error is easy to reproduce, I don't provide
>>>> a stack frame here.
>>>
>>> I can't reproduce, i386/Linux and win32 versions of i386, Sparc32 and
>>> Sparc64 emulators work fine.
>>>
>>> Maybe you have a stale build (bug in Makefile dependencies)?
>>
>> Sorry, an important information was wrong / missing in my report.
>> It's not qemu, but qemu-system-x86_64 which fails to work.
>>
>> I just tested it once more with a new build:
>>
>> $ bin/x86_64-softmmu/qemu-system-x86_64 -L pc-bios
>> /qemu/tcg/tcg.c:1646: tcg fatal error
>> Abgebrochen

OK, now that is broken also for me.

>> Cheers,
>> Stefan
>
> qemu-system-mips64el fails with the same error, so the problem
> occurs when running 64 bit emulations on 32 bit hosts.

Not always, Sparc64 still works fine.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/6] Add copy and constant propagation.
  2011-08-04 18:42           ` Blue Swirl
@ 2011-08-04 19:24             ` Blue Swirl
  0 siblings, 0 replies; 18+ messages in thread
From: Blue Swirl @ 2011-08-04 19:24 UTC (permalink / raw)
  To: Stefan Weil; +Cc: qemu-devel, zhur, Kirill Batuzov

On Thu, Aug 4, 2011 at 6:42 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
> On Wed, Aug 3, 2011 at 9:03 PM, Stefan Weil <weil@mail.berlios.de> wrote:
>> Am 03.08.2011 22:56, schrieb Stefan Weil:
>>>
>>> Am 03.08.2011 22:20, schrieb Blue Swirl:
>>>>
>>>> On Wed, Aug 3, 2011 at 7:00 PM, Stefan Weil <weil@mail.berlios.de> wrote:
>>>>>
>>>>> Am 07.07.2011 14:37, schrieb Kirill Batuzov:
>>>>>>
>>>>>> Make tcg_constant_folding do copy and constant propagation. It is a
>>>>>> preparational work before actual constant folding.
>>>>>>
>>>>>> Signed-off-by: Kirill Batuzov<batuzovk@ispras.ru>
>>>>>> ---
>>>>>>  tcg/optimize.c |  182
>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>>>>  1 files changed, 180 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/tcg/optimize.c b/tcg/optimize.c
>>>>>> index c7c7da9..f8afe71 100644
>>>>>> --- a/tcg/optimize.c
>>>>>> +++ b/tcg/optimize.c
>>>>>>
>>>>>
>>>>> ...
>>>>>
>>>>> This patch breaks QEMU on 32 bit hosts (tested on 386 Linux
>>>>> and w32 hosts). Simply running qemu (BIOS only) terminates
>>>>> with abort(). As the error is easy to reproduce, I don't provide
>>>>> a stack frame here.
>>>>
>>>> I can't reproduce, i386/Linux and win32 versions of i386, Sparc32 and
>>>> Sparc64 emulators work fine.
>>>>
>>>> Maybe you have a stale build (bug in Makefile dependencies)?
>>>
>>> Sorry, an important information was wrong / missing in my report.
>>> It's not qemu, but qemu-system-x86_64 which fails to work.
>>>
>>> I just tested it once more with a new build:
>>>
>>> $ bin/x86_64-softmmu/qemu-system-x86_64 -L pc-bios
>>> /qemu/tcg/tcg.c:1646: tcg fatal error
>>> Abgebrochen
>
> OK, now that is broken also for me.
>
>>> Cheers,
>>> Stefan
>>
>> qemu-system-mips64el fails with the same error, so the problem
>> occurs when running 64 bit emulations on 32 bit hosts.
>
> Not always, Sparc64 still works fine.

x86_64 fails because 'mov_i32 cc_src_0,loc25' is incorrectly optimized
to 'mov_i32 cc_src_0,tmp6' where tmp6 is dead after brcond.

IN:
0x000000000ffeb90a:  shl    %cl,%eax

OP:
 ---- 0xffeb90a
 mov_i32 tmp2,rcx_0
 mov_i32 tmp3,rcx_1
 mov_i32 tmp0,rax_0
 mov_i32 tmp1,rax_1
 movi_i32 tmp20,$0x1f
 and_i32 tmp2,tmp2,tmp20
 movi_i32 tmp3,$0x0
 movi_i32 tmp21,$0xffffffff
 movi_i32 tmp22,$0xffffffff
 add2_i32 tmp16,tmp17,tmp2,tmp3,tmp21,tmp22
 movi_i32 tmp20,$0x80bd4e0
 call tmp20,$0x30,$2,tmp6,tmp7,tmp0,tmp1,tmp16,tmp17

...tmp6 is assigned here...

 movi_i32 tmp20,$0x80bd4e0
 call tmp20,$0x30,$2,tmp0,tmp1,tmp0,tmp1,tmp2,tmp3
 mov_i32 rax_0,tmp0
 movi_i32 rax_1,$0x0
 mov_i32 loc23,tmp0
 mov_i32 loc24,tmp1
 mov_i32 loc25,tmp6

...tmp6 saved to loc25 to survive brcond...

 mov_i32 loc26,tmp7
 movi_i32 tmp21,$0x0
 movi_i32 tmp22,$0x0
 brcond2_i32 tmp2,tmp3,tmp21,tmp22,eq,$0x0
 mov_i32 cc_src_0,loc25

...used here.

 mov_i32 cc_src_1,loc26
 mov_i32 cc_dst_0,loc23
 mov_i32 cc_dst_1,loc24
 movi_i32 cc_op,$0x24
 set_label $0x0
 movi_i32 tmp8,$0xffeb90c
 movi_i32 tmp9,$0x0
 st_i32 tmp8,env,$0x80
 st_i32 tmp9,env,$0x84
 movi_i32 tmp20,$debug
 call tmp20,$0x0,$0

OP after liveness analysis:
 ---- 0xffeb90a
 mov_i32 tmp2,rcx_0
 nopn $0x2,$0x2
 mov_i32 tmp0,rax_0
 mov_i32 tmp1,rax_1
 movi_i32 tmp20,$0x1f
 and_i32 tmp2,tmp2,tmp20
 movi_i32 tmp3,$0x0
 movi_i32 tmp21,$0xffffffff
 movi_i32 tmp22,$0xffffffff
 add2_i32 tmp16,tmp17,tmp2,tmp3,tmp21,tmp22
 movi_i32 tmp20,$0x80bd4e0
 call tmp20,$0x30,$2,tmp6,tmp7,tmp0,tmp1,tmp16,tmp17

OK

 movi_i32 tmp20,$0x80bd4e0
 call tmp20,$0x30,$2,tmp0,tmp1,tmp0,tmp1,tmp2,tmp3
 mov_i32 rax_0,tmp0
 movi_i32 rax_1,$0x0
 mov_i32 loc23,tmp0
 mov_i32 loc24,tmp1
 mov_i32 loc25,tmp6

OK, though loc25 is unused after this, why it is not optimized away?

 mov_i32 loc26,tmp7
 movi_i32 tmp21,$0x0
 movi_i32 tmp22,$0x0
 brcond2_i32 tmp2,tmp3,tmp21,tmp22,eq,$0x0
 mov_i32 cc_src_0,tmp6

Incorrect optimization.

 mov_i32 cc_src_1,tmp7
 mov_i32 cc_dst_0,tmp0
 mov_i32 cc_dst_1,tmp1
 movi_i32 cc_op,$0x24
 set_label $0x0
 movi_i32 tmp8,$0xffeb90c
 movi_i32 tmp9,$0x0
 st_i32 tmp8,env,$0x80
 st_i32 tmp9,env,$0x84
 movi_i32 tmp20,$debug
 call tmp20,$0x0,$0
 end

The corresponding translation code is in target-i386/translate.c:1456,
it looks correct.

Maybe the optimizer should consider stack and memory temporaries
different from register temporaries?

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2011-08-04 19:24 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-07 12:37 [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG Kirill Batuzov
2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 1/6] Add TCG optimizations stub Kirill Batuzov
2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 2/6] Add copy and constant propagation Kirill Batuzov
2011-08-03 19:00   ` Stefan Weil
2011-08-03 20:20     ` Blue Swirl
2011-08-03 20:56       ` Stefan Weil
2011-08-03 21:03         ` Stefan Weil
2011-08-04 18:42           ` Blue Swirl
2011-08-04 19:24             ` Blue Swirl
2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 3/6] Do constant folding for basic arithmetic operations Kirill Batuzov
2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 4/6] Do constant folding for boolean operations Kirill Batuzov
2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 5/6] Do constant folding for shift operations Kirill Batuzov
2011-07-30 12:25   ` Blue Swirl
2011-07-30 19:13     ` Blue Swirl
2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 6/6] Do constant folding for unary operations Kirill Batuzov
2011-07-07 12:54 ` [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG Peter Maydell
2011-07-07 14:22   ` Kirill Batuzov
2011-07-30 10:52 ` Blue Swirl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).