* [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG
@ 2011-07-07 12:37 Kirill Batuzov
2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 1/6] Add TCG optimizations stub Kirill Batuzov
` (7 more replies)
0 siblings, 8 replies; 18+ messages in thread
From: Kirill Batuzov @ 2011-07-07 12:37 UTC (permalink / raw)
To: qemu-devel; +Cc: zhur
This series implements some basic machine-independent optimizations. They
simplify code and allow liveness analysis do it's work better.
Suppose we have following ARM code:
movw r12, #0xb6db
movt r12, #0xdb6d
In TCG before optimizations we'll have:
movi_i32 tmp8,$0xb6db
mov_i32 r12,tmp8
mov_i32 tmp8,r12
ext16u_i32 tmp8,tmp8
movi_i32 tmp9,$0xdb6d0000
or_i32 tmp8,tmp8,tmp9
mov_i32 r12,tmp8
And after optimizations we'll have this:
movi_i32 r12,$0xdb6db6db
Here are performance evaluation results on SPEC CPU2000 integer tests in
user-mode emulation on x86_64 host. There were 5 runs of each test on
reference data set. The tables below show runtime in seconds for all these
runs.
ARM guest without optimizations:
Test name #1 #2 #3 #4 #5 Median
164.gzip 1408.891 1402.323 1407.623 1404.955 1405.396 1405.396
175.vpr 1245.31 1248.758 1247.936 1248.534 1247.534 1247.936
176.gcc 912.561 809.481 847.057 912.636 912.544 912.544
181.mcf 198.384 197.841 199.127 197.976 197.29 197.976
186.crafty 1545.881 1546.051 1546.002 1545.927 1545.945 1545.945
197.parser 3779.954 3779.878 3779.79 3779.94 3779.88 3779.88
252.eon 2563.168 2776.152 2776.395 2776.577 2776.202 2776.202
253.perlbmk 2591.781 2504.078 2507.07 2591.337 2463.401 2507.07
256.bzip2 1306.197 1304.639 1184.853 1305.141 1305.606 1305.141
300.twolf 2918.984 2918.926 2918.93 2918.97 2918.914 2918.93
ARM guest with optimizations:
Test name #1 #2 #3 #4 #5 Median Gain
164.gzip 1401.198 1376.337 1401.117 1401.23 1401.246 1401.198 0.30%
175.vpr 1247.964 1151.468 1247.76 1154.419 1242.017 1242.017 0.47%
176.gcc 896.882 918.546 918.297 851.465 918.39 918.297 -0.63%
181.mcf 198.19 197.399 198.421 198.663 198.312 198.312 -0.17%
186.crafty 1520.425 1520.362 1520.477 1520.445 1520.957 1520.445 1.65%
197.parser 3770.943 3770.927 3770.578 3771.048 3770.904 3770.927 0.24%
252.eon 2752.371 2752.111 2752.005 2752.214 2752.109 2752.111 0.87%
253.perlbmk 2577.462 2578.588 2493.567 2578.571 2578.318 2578.318 -2.84%
256.bzip2 1296.198 1271.128 1296.044 1296.321 1296.147 1296.147 0.69%
300.twolf 2888.984 2889.023 2889.225 2889.039 2889.05 2889.039 1.02%
x86_64 guest without optimizations:
Test name #1 #2 #3 #4 #5 Median
164.gzip 857.654 857.646 857.678 798.119 857.675 857.654
175.vpr 959.265 959.207 959.185 959.461 959.332 959.265
176.gcc 625.722 637.257 646.638 646.614 646.56 646.56
181.mcf 221.666 220.194 220.079 219.868 221.5 220.194
186.crafty 1129.531 1129.739 1129.573 1129.588 1129.624 1129.588
197.parser 1809.517 1809.516 1809.386 1809.477 1809.427 1809.477
253.perlbmk 1774.944 1776.046 1769.865 1774.052 1775.236 1774.944
254.gap 1061.033 1061.158 1061.064 1061.047 1061.01 1061.047
255.vortex 1871.261 1914.144 1914.057 1914.086 1914.127 1914.086
256.bzip2 918.916 1011.828 1011.819 1012.11 1011.932 1011.828
300.twolf 1332.797 1330.56 1330.687 1330.917 1330.602 1330.687
x86_64 guest with optimizations:
Test name #1 #2 #3 #4 #5 Median Gain
164.gzip 806.198 854.159 854.184 854.168 854.187 854.168 0.41%
175.vpr 955.905 950.86 955.876 876.397 955.957 955.876 1.82%
176.gcc 641.663 640.189 641.57 641.552 641.514 641.552 0.03%
181.mcf 217.619 218.627 218.699 217.977 216.955 217.977 1.18%
186.crafty 1123.909 1123.852 1123.917 1123.781 1123.805 1123.852 0.51%
197.parser 1813.94 1814.643 1815.286 1814.445 1813.72 1814.445 -0.27%
253.perlbmk 1791.536 1795.642 1793.0 1797.486 1791.401 1793.0 -1.02%
254.gap 1070.605 1070.216 1070.637 1070.168 1070.491 1070.491 -0.89%
255.vortex 1918.764 1918.573 1917.411 1918.287 1918.735 1918.573 -0.23%
256.bzip2 1017.179 1017.083 1017.283 1016.913 1017.189 1017.179 -0.53%
300.twolf 1321.072 1321.109 1321.019 1321.072 1321.004 1321.072 0.72%
ARM guests for 254.gap and 255.vortex and x86_64 guest for 252.eon does not
work under QEMU for some unrelated reason.
Changes:
v1 -> v2
- State and Vals arrays merged to an array of structures.
- Added reference counting of temp's copies. This helps to reset temp's state
faster in most cases.
- Do not make copy propagation through operations with TCG_OPF_CALL_CLOBBER or
TCG_OPF_SIDE_EFFECTS flag.
- Split some expression simplifications into independent switch.
- Let compiler handle signed shifts and sign/zero extends in it's
implementation defined way.
v2 -> v3
- Elements of equiv class are placed in a double-linked circular list so it's
easier to choose a new representative.
- CASE_OP_32_64 macro is used to reduce amount of ifdefdsi. Checkpatch is not
happy about this change but I do not think spaces would be appropriate here.
- Some constraints during copy propagation are relaxed.
- Functions tcg_opt_gen_mov and tcg_opt_gen_movi are introduced to reduce code
duplication.
Kirill Batuzov (6):
Add TCG optimizations stub
Add copy and constant propagation.
Do constant folding for basic arithmetic operations.
Do constant folding for boolean operations.
Do constant folding for shift operations.
Do constant folding for unary operations.
Makefile.target | 2 +-
tcg/optimize.c | 568 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
tcg/tcg.c | 6 +
tcg/tcg.h | 3 +
4 files changed, 578 insertions(+), 1 deletions(-)
create mode 100644 tcg/optimize.c
--
1.7.4.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH v3 1/6] Add TCG optimizations stub
2011-07-07 12:37 [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG Kirill Batuzov
@ 2011-07-07 12:37 ` Kirill Batuzov
2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 2/6] Add copy and constant propagation Kirill Batuzov
` (6 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Kirill Batuzov @ 2011-07-07 12:37 UTC (permalink / raw)
To: qemu-devel; +Cc: zhur
Added file tcg/optimize.c to hold TCG optimizations. Function tcg_optimize
is called from tcg_gen_code_common. It calls other functions performing
specific optimizations. Stub for constant folding was added.
Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
Makefile.target | 2 +-
tcg/optimize.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
tcg/tcg.c | 6 +++
tcg/tcg.h | 3 ++
4 files changed, 107 insertions(+), 1 deletions(-)
create mode 100644 tcg/optimize.c
diff --git a/Makefile.target b/Makefile.target
index 2e281a4..0b045ce 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -70,7 +70,7 @@ all: $(PROGS) stap
#########################################################
# cpu emulator library
libobj-y = exec.o translate-all.o cpu-exec.o translate.o
-libobj-y += tcg/tcg.o
+libobj-y += tcg/tcg.o tcg/optimize.o
libobj-$(CONFIG_SOFTFLOAT) += fpu/softfloat.o
libobj-$(CONFIG_NOSOFTFLOAT) += fpu/softfloat-native.o
libobj-y += op_helper.o helper.o
diff --git a/tcg/optimize.c b/tcg/optimize.c
new file mode 100644
index 0000000..c7c7da9
--- /dev/null
+++ b/tcg/optimize.c
@@ -0,0 +1,97 @@
+/*
+ * Optimizations for Tiny Code Generator for QEMU
+ *
+ * Copyright (c) 2010 Samsung Electronics.
+ * Contributed by Kirill Batuzov <batuzovk@ispras.ru>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "config.h"
+
+#include <stdlib.h>
+#include <stdio.h>
+
+#include "qemu-common.h"
+#include "tcg-op.h"
+
+#if TCG_TARGET_REG_BITS == 64
+#define CASE_OP_32_64(x) \
+ glue(glue(case INDEX_op_, x), _i32): \
+ glue(glue(case INDEX_op_, x), _i64)
+#else
+#define CASE_OP_32_64(x) \
+ glue(glue(case INDEX_op_, x), _i32)
+#endif
+
+static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
+ TCGArg *args, TCGOpDef *tcg_op_defs)
+{
+ int i, nb_ops, op_index, op, nb_temps, nb_globals;
+ const TCGOpDef *def;
+ TCGArg *gen_args;
+
+ nb_temps = s->nb_temps;
+ nb_globals = s->nb_globals;
+
+ nb_ops = tcg_opc_ptr - gen_opc_buf;
+ gen_args = args;
+ for (op_index = 0; op_index < nb_ops; op_index++) {
+ op = gen_opc_buf[op_index];
+ def = &tcg_op_defs[op];
+ switch (op) {
+ case INDEX_op_call:
+ i = (args[0] >> 16) + (args[0] & 0xffff) + 3;
+ while (i) {
+ *gen_args = *args;
+ args++;
+ gen_args++;
+ i--;
+ }
+ break;
+ case INDEX_op_set_label:
+ case INDEX_op_jmp:
+ case INDEX_op_br:
+ CASE_OP_32_64(brcond):
+ for (i = 0; i < def->nb_args; i++) {
+ *gen_args = *args;
+ args++;
+ gen_args++;
+ }
+ break;
+ default:
+ for (i = 0; i < def->nb_args; i++) {
+ gen_args[i] = args[i];
+ }
+ args += def->nb_args;
+ gen_args += def->nb_args;
+ break;
+ }
+ }
+
+ return gen_args;
+}
+
+TCGArg *tcg_optimize(TCGContext *s, uint16_t *tcg_opc_ptr,
+ TCGArg *args, TCGOpDef *tcg_op_defs)
+{
+ TCGArg *res;
+ res = tcg_constant_folding(s, tcg_opc_ptr, args, tcg_op_defs);
+ return res;
+}
diff --git a/tcg/tcg.c b/tcg/tcg.c
index fad92f9..6309dce 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -24,6 +24,7 @@
/* define it to use liveness analysis (better code) */
#define USE_LIVENESS_ANALYSIS
+#define USE_TCG_OPTIMIZATIONS
#include "config.h"
@@ -2033,6 +2034,11 @@ static inline int tcg_gen_code_common(TCGContext *s, uint8_t *gen_code_buf,
}
#endif
+#ifdef USE_TCG_OPTIMIZATIONS
+ gen_opparam_ptr =
+ tcg_optimize(s, gen_opc_ptr, gen_opparam_buf, tcg_op_defs);
+#endif
+
#ifdef CONFIG_PROFILER
s->la_time -= profile_getclock();
#endif
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 2b985ac..91a3cda 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -486,6 +486,9 @@ void tcg_gen_callN(TCGContext *s, TCGv_ptr func, unsigned int flags,
void tcg_gen_shifti_i64(TCGv_i64 ret, TCGv_i64 arg1,
int c, int right, int arith);
+TCGArg *tcg_optimize(TCGContext *s, uint16_t *tcg_opc_ptr, TCGArg *args,
+ TCGOpDef *tcg_op_def);
+
/* only used for debugging purposes */
void tcg_register_helper(void *func, const char *name);
const char *tcg_helper_get_name(TCGContext *s, void *func);
--
1.7.4.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH v3 2/6] Add copy and constant propagation.
2011-07-07 12:37 [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG Kirill Batuzov
2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 1/6] Add TCG optimizations stub Kirill Batuzov
@ 2011-07-07 12:37 ` Kirill Batuzov
2011-08-03 19:00 ` Stefan Weil
2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 3/6] Do constant folding for basic arithmetic operations Kirill Batuzov
` (5 subsequent siblings)
7 siblings, 1 reply; 18+ messages in thread
From: Kirill Batuzov @ 2011-07-07 12:37 UTC (permalink / raw)
To: qemu-devel; +Cc: zhur
Make tcg_constant_folding do copy and constant propagation. It is a
preparational work before actual constant folding.
Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
tcg/optimize.c | 182 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 180 insertions(+), 2 deletions(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index c7c7da9..f8afe71 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -40,24 +40,196 @@
glue(glue(case INDEX_op_, x), _i32)
#endif
+typedef enum {
+ TCG_TEMP_UNDEF = 0,
+ TCG_TEMP_CONST,
+ TCG_TEMP_COPY,
+ TCG_TEMP_HAS_COPY,
+ TCG_TEMP_ANY
+} tcg_temp_state;
+
+struct tcg_temp_info {
+ tcg_temp_state state;
+ uint16_t prev_copy;
+ uint16_t next_copy;
+ tcg_target_ulong val;
+};
+
+static struct tcg_temp_info temps[TCG_MAX_TEMPS];
+
+/* Reset TEMP's state to TCG_TEMP_ANY. If TEMP was a representative of some
+ class of equivalent temp's, a new representative should be chosen in this
+ class. */
+static void reset_temp(TCGArg temp, int nb_temps, int nb_globals)
+{
+ int i;
+ TCGArg new_base = (TCGArg)-1;
+ if (temps[temp].state == TCG_TEMP_HAS_COPY) {
+ for (i = temps[temp].next_copy; i != temp; i = temps[i].next_copy) {
+ if (i >= nb_globals) {
+ temps[i].state = TCG_TEMP_HAS_COPY;
+ new_base = i;
+ break;
+ }
+ }
+ for (i = temps[temp].next_copy; i != temp; i = temps[i].next_copy) {
+ if (new_base == (TCGArg)-1) {
+ temps[i].state = TCG_TEMP_ANY;
+ } else {
+ temps[i].val = new_base;
+ }
+ }
+ temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
+ temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
+ } else if (temps[temp].state == TCG_TEMP_COPY) {
+ temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
+ temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
+ new_base = temps[temp].val;
+ }
+ temps[temp].state = TCG_TEMP_ANY;
+ if (new_base != (TCGArg)-1 && temps[new_base].next_copy == new_base) {
+ temps[new_base].state = TCG_TEMP_ANY;
+ }
+}
+
+static int op_bits(int op)
+{
+ switch (op) {
+ case INDEX_op_mov_i32:
+ return 32;
+#if TCG_TARGET_REG_BITS == 64
+ case INDEX_op_mov_i64:
+ return 64;
+#endif
+ default:
+ fprintf(stderr, "Unrecognized operation %d in op_bits.\n", op);
+ tcg_abort();
+ }
+}
+
+static int op_to_movi(int op)
+{
+ switch (op_bits(op)) {
+ case 32:
+ return INDEX_op_movi_i32;
+#if TCG_TARGET_REG_BITS == 64
+ case 64:
+ return INDEX_op_movi_i64;
+#endif
+ default:
+ fprintf(stderr, "op_to_movi: unexpected return value of "
+ "function op_bits.\n");
+ tcg_abort();
+ }
+}
+
+static void tcg_opt_gen_mov(TCGArg *gen_args, TCGArg dst, TCGArg src,
+ int nb_temps, int nb_globals)
+{
+ reset_temp(dst, nb_temps, nb_globals);
+ assert(temps[src].state != TCG_TEMP_COPY);
+ if (src >= nb_globals) {
+ assert(temps[src].state != TCG_TEMP_CONST);
+ if (temps[src].state != TCG_TEMP_HAS_COPY) {
+ temps[src].state = TCG_TEMP_HAS_COPY;
+ temps[src].next_copy = src;
+ temps[src].prev_copy = src;
+ }
+ temps[dst].state = TCG_TEMP_COPY;
+ temps[dst].val = src;
+ temps[dst].next_copy = temps[src].next_copy;
+ temps[dst].prev_copy = src;
+ temps[temps[dst].next_copy].prev_copy = dst;
+ temps[src].next_copy = dst;
+ }
+ gen_args[0] = dst;
+ gen_args[1] = src;
+}
+
+static void tcg_opt_gen_movi(TCGArg *gen_args, TCGArg dst, TCGArg val,
+ int nb_temps, int nb_globals)
+{
+ reset_temp(dst, nb_temps, nb_globals);
+ temps[dst].state = TCG_TEMP_CONST;
+ temps[dst].val = val;
+ gen_args[0] = dst;
+ gen_args[1] = val;
+}
+
+/* Propagate constants and copies, fold constant expressions. */
static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
TCGArg *args, TCGOpDef *tcg_op_defs)
{
- int i, nb_ops, op_index, op, nb_temps, nb_globals;
+ int i, nb_ops, op_index, op, nb_temps, nb_globals, nb_call_args;
const TCGOpDef *def;
TCGArg *gen_args;
+ /* Array VALS has an element for each temp.
+ If this temp holds a constant then its value is kept in VALS' element.
+ If this temp is a copy of other ones then this equivalence class'
+ representative is kept in VALS' element.
+ If this temp is neither copy nor constant then corresponding VALS'
+ element is unused. */
nb_temps = s->nb_temps;
nb_globals = s->nb_globals;
+ memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));
nb_ops = tcg_opc_ptr - gen_opc_buf;
gen_args = args;
for (op_index = 0; op_index < nb_ops; op_index++) {
op = gen_opc_buf[op_index];
def = &tcg_op_defs[op];
+ /* Do copy propagation */
+ if (!(def->flags & (TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS))) {
+ assert(op != INDEX_op_call);
+ for (i = def->nb_oargs; i < def->nb_oargs + def->nb_iargs; i++) {
+ if (temps[args[i]].state == TCG_TEMP_COPY) {
+ args[i] = temps[args[i]].val;
+ }
+ }
+ }
+
+ /* Propagate constants through copy operations and do constant
+ folding. Constants will be substituted to arguments by register
+ allocator where needed and possible. Also detect copies. */
switch (op) {
+ CASE_OP_32_64(mov):
+ if ((temps[args[1]].state == TCG_TEMP_COPY
+ && temps[args[1]].val == args[0])
+ || args[0] == args[1]) {
+ args += 2;
+ gen_opc_buf[op_index] = INDEX_op_nop;
+ break;
+ }
+ if (temps[args[1]].state != TCG_TEMP_CONST) {
+ tcg_opt_gen_mov(gen_args, args[0], args[1],
+ nb_temps, nb_globals);
+ gen_args += 2;
+ args += 2;
+ break;
+ }
+ /* Source argument is constant. Rewrite the operation and
+ let movi case handle it. */
+ op = op_to_movi(op);
+ gen_opc_buf[op_index] = op;
+ args[1] = temps[args[1]].val;
+ /* fallthrough */
+ CASE_OP_32_64(movi):
+ tcg_opt_gen_movi(gen_args, args[0], args[1], nb_temps, nb_globals);
+ gen_args += 2;
+ args += 2;
+ break;
case INDEX_op_call:
- i = (args[0] >> 16) + (args[0] & 0xffff) + 3;
+ nb_call_args = (args[0] >> 16) + (args[0] & 0xffff);
+ if (!(args[nb_call_args + 1] & (TCG_CALL_CONST | TCG_CALL_PURE))) {
+ for (i = 0; i < nb_globals; i++) {
+ reset_temp(i, nb_temps, nb_globals);
+ }
+ }
+ for (i = 0; i < (args[0] >> 16); i++) {
+ reset_temp(args[i + 1], nb_temps, nb_globals);
+ }
+ i = nb_call_args + 3;
while (i) {
*gen_args = *args;
args++;
@@ -69,6 +241,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
case INDEX_op_jmp:
case INDEX_op_br:
CASE_OP_32_64(brcond):
+ memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));
for (i = 0; i < def->nb_args; i++) {
*gen_args = *args;
args++;
@@ -76,6 +249,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
}
break;
default:
+ /* Default case: we do know nothing about operation so no
+ propagation is done. We only trash output args. */
+ for (i = 0; i < def->nb_oargs; i++) {
+ reset_temp(args[i], nb_temps, nb_globals);
+ }
for (i = 0; i < def->nb_args; i++) {
gen_args[i] = args[i];
}
--
1.7.4.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH v3 3/6] Do constant folding for basic arithmetic operations.
2011-07-07 12:37 [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG Kirill Batuzov
2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 1/6] Add TCG optimizations stub Kirill Batuzov
2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 2/6] Add copy and constant propagation Kirill Batuzov
@ 2011-07-07 12:37 ` Kirill Batuzov
2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 4/6] Do constant folding for boolean operations Kirill Batuzov
` (4 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Kirill Batuzov @ 2011-07-07 12:37 UTC (permalink / raw)
To: qemu-devel; +Cc: zhur
Perform actual constant folding for ADD, SUB and MUL operations.
Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
tcg/optimize.c | 125 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 125 insertions(+), 0 deletions(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index f8afe71..42a1bda 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -96,9 +96,15 @@ static int op_bits(int op)
{
switch (op) {
case INDEX_op_mov_i32:
+ case INDEX_op_add_i32:
+ case INDEX_op_sub_i32:
+ case INDEX_op_mul_i32:
return 32;
#if TCG_TARGET_REG_BITS == 64
case INDEX_op_mov_i64:
+ case INDEX_op_add_i64:
+ case INDEX_op_sub_i64:
+ case INDEX_op_mul_i64:
return 64;
#endif
default:
@@ -156,6 +162,52 @@ static void tcg_opt_gen_movi(TCGArg *gen_args, TCGArg dst, TCGArg val,
gen_args[1] = val;
}
+static int op_to_mov(int op)
+{
+ switch (op_bits(op)) {
+ case 32:
+ return INDEX_op_mov_i32;
+#if TCG_TARGET_REG_BITS == 64
+ case 64:
+ return INDEX_op_mov_i64;
+#endif
+ default:
+ fprintf(stderr, "op_to_mov: unexpected return value of "
+ "function op_bits.\n");
+ tcg_abort();
+ }
+}
+
+static TCGArg do_constant_folding_2(int op, TCGArg x, TCGArg y)
+{
+ switch (op) {
+ CASE_OP_32_64(add):
+ return x + y;
+
+ CASE_OP_32_64(sub):
+ return x - y;
+
+ CASE_OP_32_64(mul):
+ return x * y;
+
+ default:
+ fprintf(stderr,
+ "Unrecognized operation %d in do_constant_folding.\n", op);
+ tcg_abort();
+ }
+}
+
+static TCGArg do_constant_folding(int op, TCGArg x, TCGArg y)
+{
+ TCGArg res = do_constant_folding_2(op, x, y);
+#if TCG_TARGET_REG_BITS == 64
+ if (op_bits(op) == 32) {
+ res &= 0xffffffff;
+ }
+#endif
+ return res;
+}
+
/* Propagate constants and copies, fold constant expressions. */
static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
TCGArg *args, TCGOpDef *tcg_op_defs)
@@ -163,6 +215,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
int i, nb_ops, op_index, op, nb_temps, nb_globals, nb_call_args;
const TCGOpDef *def;
TCGArg *gen_args;
+ TCGArg tmp;
/* Array VALS has an element for each temp.
If this temp holds a constant then its value is kept in VALS' element.
If this temp is a copy of other ones then this equivalence class'
@@ -189,6 +242,57 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
}
}
+ /* For commutative operations make constant second argument */
+ switch (op) {
+ CASE_OP_32_64(add):
+ CASE_OP_32_64(mul):
+ if (temps[args[1]].state == TCG_TEMP_CONST) {
+ tmp = args[1];
+ args[1] = args[2];
+ args[2] = tmp;
+ }
+ break;
+ default:
+ break;
+ }
+
+ /* Simplify expression if possible. */
+ switch (op) {
+ CASE_OP_32_64(add):
+ CASE_OP_32_64(sub):
+ if (temps[args[1]].state == TCG_TEMP_CONST) {
+ /* Proceed with possible constant folding. */
+ break;
+ }
+ if (temps[args[2]].state == TCG_TEMP_CONST
+ && temps[args[2]].val == 0) {
+ if ((temps[args[0]].state == TCG_TEMP_COPY
+ && temps[args[0]].val == args[1])
+ || args[0] == args[1]) {
+ args += 3;
+ gen_opc_buf[op_index] = INDEX_op_nop;
+ } else {
+ gen_opc_buf[op_index] = op_to_mov(op);
+ tcg_opt_gen_mov(gen_args, args[0], args[1],
+ nb_temps, nb_globals);
+ gen_args += 2;
+ args += 3;
+ }
+ continue;
+ }
+ break;
+ CASE_OP_32_64(mul):
+ if ((temps[args[2]].state == TCG_TEMP_CONST
+ && temps[args[2]].val == 0)) {
+ gen_opc_buf[op_index] = op_to_movi(op);
+ tcg_opt_gen_movi(gen_args, args[0], 0, nb_temps, nb_globals);
+ args += 3;
+ gen_args += 2;
+ continue;
+ }
+ break;
+ }
+
/* Propagate constants through copy operations and do constant
folding. Constants will be substituted to arguments by register
allocator where needed and possible. Also detect copies. */
@@ -219,6 +323,27 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
gen_args += 2;
args += 2;
break;
+ CASE_OP_32_64(add):
+ CASE_OP_32_64(sub):
+ CASE_OP_32_64(mul):
+ if (temps[args[1]].state == TCG_TEMP_CONST
+ && temps[args[2]].state == TCG_TEMP_CONST) {
+ gen_opc_buf[op_index] = op_to_movi(op);
+ tmp = do_constant_folding(op, temps[args[1]].val,
+ temps[args[2]].val);
+ tcg_opt_gen_movi(gen_args, args[0], tmp, nb_temps, nb_globals);
+ gen_args += 2;
+ args += 3;
+ break;
+ } else {
+ reset_temp(args[0], nb_temps, nb_globals);
+ gen_args[0] = args[0];
+ gen_args[1] = args[1];
+ gen_args[2] = args[2];
+ gen_args += 3;
+ args += 3;
+ break;
+ }
case INDEX_op_call:
nb_call_args = (args[0] >> 16) + (args[0] & 0xffff);
if (!(args[nb_call_args + 1] & (TCG_CALL_CONST | TCG_CALL_PURE))) {
--
1.7.4.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH v3 4/6] Do constant folding for boolean operations.
2011-07-07 12:37 [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG Kirill Batuzov
` (2 preceding siblings ...)
2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 3/6] Do constant folding for basic arithmetic operations Kirill Batuzov
@ 2011-07-07 12:37 ` Kirill Batuzov
2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 5/6] Do constant folding for shift operations Kirill Batuzov
` (3 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Kirill Batuzov @ 2011-07-07 12:37 UTC (permalink / raw)
To: qemu-devel; +Cc: zhur
Perform constant folding for AND, OR, XOR operations.
Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
tcg/optimize.c | 37 +++++++++++++++++++++++++++++++++++++
1 files changed, 37 insertions(+), 0 deletions(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 42a1bda..c469952 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -99,12 +99,18 @@ static int op_bits(int op)
case INDEX_op_add_i32:
case INDEX_op_sub_i32:
case INDEX_op_mul_i32:
+ case INDEX_op_and_i32:
+ case INDEX_op_or_i32:
+ case INDEX_op_xor_i32:
return 32;
#if TCG_TARGET_REG_BITS == 64
case INDEX_op_mov_i64:
case INDEX_op_add_i64:
case INDEX_op_sub_i64:
case INDEX_op_mul_i64:
+ case INDEX_op_and_i64:
+ case INDEX_op_or_i64:
+ case INDEX_op_xor_i64:
return 64;
#endif
default:
@@ -190,6 +196,15 @@ static TCGArg do_constant_folding_2(int op, TCGArg x, TCGArg y)
CASE_OP_32_64(mul):
return x * y;
+ CASE_OP_32_64(and):
+ return x & y;
+
+ CASE_OP_32_64(or):
+ return x | y;
+
+ CASE_OP_32_64(xor):
+ return x ^ y;
+
default:
fprintf(stderr,
"Unrecognized operation %d in do_constant_folding.\n", op);
@@ -246,6 +261,9 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
switch (op) {
CASE_OP_32_64(add):
CASE_OP_32_64(mul):
+ CASE_OP_32_64(and):
+ CASE_OP_32_64(or):
+ CASE_OP_32_64(xor):
if (temps[args[1]].state == TCG_TEMP_CONST) {
tmp = args[1];
args[1] = args[2];
@@ -291,6 +309,22 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
continue;
}
break;
+ CASE_OP_32_64(or):
+ CASE_OP_32_64(and):
+ if (args[1] == args[2]) {
+ if (args[1] == args[0]) {
+ args += 3;
+ gen_opc_buf[op_index] = INDEX_op_nop;
+ } else {
+ gen_opc_buf[op_index] = op_to_mov(op);
+ tcg_opt_gen_mov(gen_args, args[0], args[1], nb_temps,
+ nb_globals);
+ gen_args += 2;
+ args += 3;
+ }
+ continue;
+ }
+ break;
}
/* Propagate constants through copy operations and do constant
@@ -326,6 +360,9 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
CASE_OP_32_64(add):
CASE_OP_32_64(sub):
CASE_OP_32_64(mul):
+ CASE_OP_32_64(or):
+ CASE_OP_32_64(and):
+ CASE_OP_32_64(xor):
if (temps[args[1]].state == TCG_TEMP_CONST
&& temps[args[2]].state == TCG_TEMP_CONST) {
gen_opc_buf[op_index] = op_to_movi(op);
--
1.7.4.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH v3 5/6] Do constant folding for shift operations.
2011-07-07 12:37 [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG Kirill Batuzov
` (3 preceding siblings ...)
2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 4/6] Do constant folding for boolean operations Kirill Batuzov
@ 2011-07-07 12:37 ` Kirill Batuzov
2011-07-30 12:25 ` Blue Swirl
2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 6/6] Do constant folding for unary operations Kirill Batuzov
` (2 subsequent siblings)
7 siblings, 1 reply; 18+ messages in thread
From: Kirill Batuzov @ 2011-07-07 12:37 UTC (permalink / raw)
To: qemu-devel; +Cc: zhur
Perform constant forlding for SHR, SHL, SAR, ROTR, ROTL operations.
Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
tcg/optimize.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 72 insertions(+), 0 deletions(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index c469952..a1bb287 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -102,6 +102,11 @@ static int op_bits(int op)
case INDEX_op_and_i32:
case INDEX_op_or_i32:
case INDEX_op_xor_i32:
+ case INDEX_op_shl_i32:
+ case INDEX_op_shr_i32:
+ case INDEX_op_sar_i32:
+ case INDEX_op_rotl_i32:
+ case INDEX_op_rotr_i32:
return 32;
#if TCG_TARGET_REG_BITS == 64
case INDEX_op_mov_i64:
@@ -111,6 +116,11 @@ static int op_bits(int op)
case INDEX_op_and_i64:
case INDEX_op_or_i64:
case INDEX_op_xor_i64:
+ case INDEX_op_shl_i64:
+ case INDEX_op_shr_i64:
+ case INDEX_op_sar_i64:
+ case INDEX_op_rotl_i64:
+ case INDEX_op_rotr_i64:
return 64;
#endif
default:
@@ -205,6 +215,58 @@ static TCGArg do_constant_folding_2(int op, TCGArg x, TCGArg y)
CASE_OP_32_64(xor):
return x ^ y;
+ case INDEX_op_shl_i32:
+ return (uint32_t)x << (uint32_t)y;
+
+#if TCG_TARGET_REG_BITS == 64
+ case INDEX_op_shl_i64:
+ return (uint64_t)x << (uint64_t)y;
+#endif
+
+ case INDEX_op_shr_i32:
+ return (uint32_t)x >> (uint32_t)y;
+
+#if TCG_TARGET_REG_BITS == 64
+ case INDEX_op_shr_i64:
+ return (uint64_t)x >> (uint64_t)y;
+#endif
+
+ case INDEX_op_sar_i32:
+ return (int32_t)x >> (int32_t)y;
+
+#if TCG_TARGET_REG_BITS == 64
+ case INDEX_op_sar_i64:
+ return (int64_t)x >> (int64_t)y;
+#endif
+
+ case INDEX_op_rotr_i32:
+#if TCG_TARGET_REG_BITS == 64
+ x &= 0xffffffff;
+ y &= 0xffffffff;
+#endif
+ x = (x << (32 - y)) | (x >> y);
+ return x;
+
+#if TCG_TARGET_REG_BITS == 64
+ case INDEX_op_rotr_i64:
+ x = (x << (64 - y)) | (x >> y);
+ return x;
+#endif
+
+ case INDEX_op_rotl_i32:
+#if TCG_TARGET_REG_BITS == 64
+ x &= 0xffffffff;
+ y &= 0xffffffff;
+#endif
+ x = (x << y) | (x >> (32 - y));
+ return x;
+
+#if TCG_TARGET_REG_BITS == 64
+ case INDEX_op_rotl_i64:
+ x = (x << y) | (x >> (64 - y));
+ return x;
+#endif
+
default:
fprintf(stderr,
"Unrecognized operation %d in do_constant_folding.\n", op);
@@ -278,6 +340,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
switch (op) {
CASE_OP_32_64(add):
CASE_OP_32_64(sub):
+ CASE_OP_32_64(shl):
+ CASE_OP_32_64(shr):
+ CASE_OP_32_64(sar):
+ CASE_OP_32_64(rotl):
+ CASE_OP_32_64(rotr):
if (temps[args[1]].state == TCG_TEMP_CONST) {
/* Proceed with possible constant folding. */
break;
@@ -363,6 +430,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
CASE_OP_32_64(or):
CASE_OP_32_64(and):
CASE_OP_32_64(xor):
+ CASE_OP_32_64(shl):
+ CASE_OP_32_64(shr):
+ CASE_OP_32_64(sar):
+ CASE_OP_32_64(rotl):
+ CASE_OP_32_64(rotr):
if (temps[args[1]].state == TCG_TEMP_CONST
&& temps[args[2]].state == TCG_TEMP_CONST) {
gen_opc_buf[op_index] = op_to_movi(op);
--
1.7.4.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH v3 6/6] Do constant folding for unary operations.
2011-07-07 12:37 [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG Kirill Batuzov
` (4 preceding siblings ...)
2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 5/6] Do constant folding for shift operations Kirill Batuzov
@ 2011-07-07 12:37 ` Kirill Batuzov
2011-07-07 12:54 ` [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG Peter Maydell
2011-07-30 10:52 ` Blue Swirl
7 siblings, 0 replies; 18+ messages in thread
From: Kirill Batuzov @ 2011-07-07 12:37 UTC (permalink / raw)
To: qemu-devel; +Cc: zhur
Perform constant folding for NOT and EXT{8,16,32}{S,U} operations.
Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
tcg/optimize.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 59 insertions(+), 0 deletions(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index a1bb287..a324e98 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -107,6 +107,11 @@ static int op_bits(int op)
case INDEX_op_sar_i32:
case INDEX_op_rotl_i32:
case INDEX_op_rotr_i32:
+ case INDEX_op_not_i32:
+ case INDEX_op_ext8s_i32:
+ case INDEX_op_ext16s_i32:
+ case INDEX_op_ext8u_i32:
+ case INDEX_op_ext16u_i32:
return 32;
#if TCG_TARGET_REG_BITS == 64
case INDEX_op_mov_i64:
@@ -121,6 +126,13 @@ static int op_bits(int op)
case INDEX_op_sar_i64:
case INDEX_op_rotl_i64:
case INDEX_op_rotr_i64:
+ case INDEX_op_not_i64:
+ case INDEX_op_ext8s_i64:
+ case INDEX_op_ext16s_i64:
+ case INDEX_op_ext32s_i64:
+ case INDEX_op_ext8u_i64:
+ case INDEX_op_ext16u_i64:
+ case INDEX_op_ext32u_i64:
return 64;
#endif
default:
@@ -267,6 +279,29 @@ static TCGArg do_constant_folding_2(int op, TCGArg x, TCGArg y)
return x;
#endif
+ CASE_OP_32_64(not):
+ return ~x;
+
+ CASE_OP_32_64(ext8s):
+ return (int8_t)x;
+
+ CASE_OP_32_64(ext16s):
+ return (int16_t)x;
+
+ CASE_OP_32_64(ext8u):
+ return (uint8_t)x;
+
+ CASE_OP_32_64(ext16u):
+ return (uint16_t)x;
+
+#if TCG_TARGET_REG_BITS == 64
+ case INDEX_op_ext32s_i64:
+ return (int32_t)x;
+
+ case INDEX_op_ext32u_i64:
+ return (uint32_t)x;
+#endif
+
default:
fprintf(stderr,
"Unrecognized operation %d in do_constant_folding.\n", op);
@@ -424,6 +459,30 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
gen_args += 2;
args += 2;
break;
+ CASE_OP_32_64(not):
+ CASE_OP_32_64(ext8s):
+ CASE_OP_32_64(ext16s):
+ CASE_OP_32_64(ext8u):
+ CASE_OP_32_64(ext16u):
+#if TCG_TARGET_REG_BITS == 64
+ case INDEX_op_ext32s_i64:
+ case INDEX_op_ext32u_i64:
+#endif
+ if (temps[args[1]].state == TCG_TEMP_CONST) {
+ gen_opc_buf[op_index] = op_to_movi(op);
+ tmp = do_constant_folding(op, temps[args[1]].val, 0);
+ tcg_opt_gen_movi(gen_args, args[0], tmp, nb_temps, nb_globals);
+ gen_args += 2;
+ args += 2;
+ break;
+ } else {
+ reset_temp(args[0], nb_temps, nb_globals);
+ gen_args[0] = args[0];
+ gen_args[1] = args[1];
+ gen_args += 2;
+ args += 2;
+ break;
+ }
CASE_OP_32_64(add):
CASE_OP_32_64(sub):
CASE_OP_32_64(mul):
--
1.7.4.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG
2011-07-07 12:37 [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG Kirill Batuzov
` (5 preceding siblings ...)
2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 6/6] Do constant folding for unary operations Kirill Batuzov
@ 2011-07-07 12:54 ` Peter Maydell
2011-07-07 14:22 ` Kirill Batuzov
2011-07-30 10:52 ` Blue Swirl
7 siblings, 1 reply; 18+ messages in thread
From: Peter Maydell @ 2011-07-07 12:54 UTC (permalink / raw)
To: Kirill Batuzov; +Cc: qemu-devel, zhur
On 7 July 2011 13:37, Kirill Batuzov <batuzovk@ispras.ru> wrote:
> ARM guests for 254.gap and 255.vortex and x86_64 guest for 252.eon does not
> work under QEMU for some unrelated reason.
If you can provide a binary and a command line for these I can have
a look at what's going on with the failing ARM guest binaries...
-- PMM
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG
2011-07-07 12:54 ` [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG Peter Maydell
@ 2011-07-07 14:22 ` Kirill Batuzov
0 siblings, 0 replies; 18+ messages in thread
From: Kirill Batuzov @ 2011-07-07 14:22 UTC (permalink / raw)
To: Peter Maydell; +Cc: qemu-devel, zhur
On Thu, 7 Jul 2011, Peter Maydell wrote:
> On 7 July 2011 13:37, Kirill Batuzov <batuzovk@ispras.ru> wrote:
> > ARM guests for 254.gap and 255.vortex and x86_64 guest for 252.eon does not
> > work under QEMU for some unrelated reason.
>
> If you can provide a binary and a command line for these I can have
> a look at what's going on with the failing ARM guest binaries...
>
I've just checked more carefully: these tests fail the same way on
hardware. So it is some SPEC or compiler problem not related to QEMU at
all.
----
Kirill
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG
2011-07-07 12:37 [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG Kirill Batuzov
` (6 preceding siblings ...)
2011-07-07 12:54 ` [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG Peter Maydell
@ 2011-07-30 10:52 ` Blue Swirl
7 siblings, 0 replies; 18+ messages in thread
From: Blue Swirl @ 2011-07-30 10:52 UTC (permalink / raw)
To: Kirill Batuzov; +Cc: qemu-devel, zhur
Thanks, applied all.
On Thu, Jul 7, 2011 at 3:37 PM, Kirill Batuzov <batuzovk@ispras.ru> wrote:
> This series implements some basic machine-independent optimizations. They
> simplify code and allow liveness analysis do it's work better.
>
> Suppose we have following ARM code:
>
> movw r12, #0xb6db
> movt r12, #0xdb6d
>
> In TCG before optimizations we'll have:
>
> movi_i32 tmp8,$0xb6db
> mov_i32 r12,tmp8
> mov_i32 tmp8,r12
> ext16u_i32 tmp8,tmp8
> movi_i32 tmp9,$0xdb6d0000
> or_i32 tmp8,tmp8,tmp9
> mov_i32 r12,tmp8
>
> And after optimizations we'll have this:
>
> movi_i32 r12,$0xdb6db6db
>
> Here are performance evaluation results on SPEC CPU2000 integer tests in
> user-mode emulation on x86_64 host. There were 5 runs of each test on
> reference data set. The tables below show runtime in seconds for all these
> runs.
>
> ARM guest without optimizations:
> Test name #1 #2 #3 #4 #5 Median
> 164.gzip 1408.891 1402.323 1407.623 1404.955 1405.396 1405.396
> 175.vpr 1245.31 1248.758 1247.936 1248.534 1247.534 1247.936
> 176.gcc 912.561 809.481 847.057 912.636 912.544 912.544
> 181.mcf 198.384 197.841 199.127 197.976 197.29 197.976
> 186.crafty 1545.881 1546.051 1546.002 1545.927 1545.945 1545.945
> 197.parser 3779.954 3779.878 3779.79 3779.94 3779.88 3779.88
> 252.eon 2563.168 2776.152 2776.395 2776.577 2776.202 2776.202
> 253.perlbmk 2591.781 2504.078 2507.07 2591.337 2463.401 2507.07
> 256.bzip2 1306.197 1304.639 1184.853 1305.141 1305.606 1305.141
> 300.twolf 2918.984 2918.926 2918.93 2918.97 2918.914 2918.93
>
> ARM guest with optimizations:
> Test name #1 #2 #3 #4 #5 Median Gain
> 164.gzip 1401.198 1376.337 1401.117 1401.23 1401.246 1401.198 0.30%
> 175.vpr 1247.964 1151.468 1247.76 1154.419 1242.017 1242.017 0.47%
> 176.gcc 896.882 918.546 918.297 851.465 918.39 918.297 -0.63%
> 181.mcf 198.19 197.399 198.421 198.663 198.312 198.312 -0.17%
> 186.crafty 1520.425 1520.362 1520.477 1520.445 1520.957 1520.445 1.65%
> 197.parser 3770.943 3770.927 3770.578 3771.048 3770.904 3770.927 0.24%
> 252.eon 2752.371 2752.111 2752.005 2752.214 2752.109 2752.111 0.87%
> 253.perlbmk 2577.462 2578.588 2493.567 2578.571 2578.318 2578.318 -2.84%
> 256.bzip2 1296.198 1271.128 1296.044 1296.321 1296.147 1296.147 0.69%
> 300.twolf 2888.984 2889.023 2889.225 2889.039 2889.05 2889.039 1.02%
>
>
> x86_64 guest without optimizations:
> Test name #1 #2 #3 #4 #5 Median
> 164.gzip 857.654 857.646 857.678 798.119 857.675 857.654
> 175.vpr 959.265 959.207 959.185 959.461 959.332 959.265
> 176.gcc 625.722 637.257 646.638 646.614 646.56 646.56
> 181.mcf 221.666 220.194 220.079 219.868 221.5 220.194
> 186.crafty 1129.531 1129.739 1129.573 1129.588 1129.624 1129.588
> 197.parser 1809.517 1809.516 1809.386 1809.477 1809.427 1809.477
> 253.perlbmk 1774.944 1776.046 1769.865 1774.052 1775.236 1774.944
> 254.gap 1061.033 1061.158 1061.064 1061.047 1061.01 1061.047
> 255.vortex 1871.261 1914.144 1914.057 1914.086 1914.127 1914.086
> 256.bzip2 918.916 1011.828 1011.819 1012.11 1011.932 1011.828
> 300.twolf 1332.797 1330.56 1330.687 1330.917 1330.602 1330.687
>
> x86_64 guest with optimizations:
> Test name #1 #2 #3 #4 #5 Median Gain
> 164.gzip 806.198 854.159 854.184 854.168 854.187 854.168 0.41%
> 175.vpr 955.905 950.86 955.876 876.397 955.957 955.876 1.82%
> 176.gcc 641.663 640.189 641.57 641.552 641.514 641.552 0.03%
> 181.mcf 217.619 218.627 218.699 217.977 216.955 217.977 1.18%
> 186.crafty 1123.909 1123.852 1123.917 1123.781 1123.805 1123.852 0.51%
> 197.parser 1813.94 1814.643 1815.286 1814.445 1813.72 1814.445 -0.27%
> 253.perlbmk 1791.536 1795.642 1793.0 1797.486 1791.401 1793.0 -1.02%
> 254.gap 1070.605 1070.216 1070.637 1070.168 1070.491 1070.491 -0.89%
> 255.vortex 1918.764 1918.573 1917.411 1918.287 1918.735 1918.573 -0.23%
> 256.bzip2 1017.179 1017.083 1017.283 1016.913 1017.189 1017.179 -0.53%
> 300.twolf 1321.072 1321.109 1321.019 1321.072 1321.004 1321.072 0.72%
>
> ARM guests for 254.gap and 255.vortex and x86_64 guest for 252.eon does not
> work under QEMU for some unrelated reason.
>
> Changes:
> v1 -> v2
> - State and Vals arrays merged to an array of structures.
> - Added reference counting of temp's copies. This helps to reset temp's state
> faster in most cases.
> - Do not make copy propagation through operations with TCG_OPF_CALL_CLOBBER or
> TCG_OPF_SIDE_EFFECTS flag.
> - Split some expression simplifications into independent switch.
> - Let compiler handle signed shifts and sign/zero extends in it's
> implementation defined way.
>
> v2 -> v3
> - Elements of equiv class are placed in a double-linked circular list so it's
> easier to choose a new representative.
> - CASE_OP_32_64 macro is used to reduce amount of ifdefdsi. Checkpatch is not
> happy about this change but I do not think spaces would be appropriate here.
> - Some constraints during copy propagation are relaxed.
> - Functions tcg_opt_gen_mov and tcg_opt_gen_movi are introduced to reduce code
> duplication.
>
> Kirill Batuzov (6):
> Add TCG optimizations stub
> Add copy and constant propagation.
> Do constant folding for basic arithmetic operations.
> Do constant folding for boolean operations.
> Do constant folding for shift operations.
> Do constant folding for unary operations.
>
> Makefile.target | 2 +-
> tcg/optimize.c | 568 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> tcg/tcg.c | 6 +
> tcg/tcg.h | 3 +
> 4 files changed, 578 insertions(+), 1 deletions(-)
> create mode 100644 tcg/optimize.c
>
> --
> 1.7.4.1
>
>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH v3 5/6] Do constant folding for shift operations.
2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 5/6] Do constant folding for shift operations Kirill Batuzov
@ 2011-07-30 12:25 ` Blue Swirl
2011-07-30 19:13 ` Blue Swirl
0 siblings, 1 reply; 18+ messages in thread
From: Blue Swirl @ 2011-07-30 12:25 UTC (permalink / raw)
To: Kirill Batuzov; +Cc: qemu-devel, zhur
On Thu, Jul 7, 2011 at 3:37 PM, Kirill Batuzov <batuzovk@ispras.ru> wrote:
> Perform constant forlding for SHR, SHL, SAR, ROTR, ROTL operations.
This patch broke build on targets (Sparc, MIPS) which don't implement
rotation ops, the next patch likewise. I committed a fix.
> Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
> ---
> tcg/optimize.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 files changed, 72 insertions(+), 0 deletions(-)
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index c469952..a1bb287 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -102,6 +102,11 @@ static int op_bits(int op)
> case INDEX_op_and_i32:
> case INDEX_op_or_i32:
> case INDEX_op_xor_i32:
> + case INDEX_op_shl_i32:
> + case INDEX_op_shr_i32:
> + case INDEX_op_sar_i32:
> + case INDEX_op_rotl_i32:
> + case INDEX_op_rotr_i32:
> return 32;
> #if TCG_TARGET_REG_BITS == 64
> case INDEX_op_mov_i64:
> @@ -111,6 +116,11 @@ static int op_bits(int op)
> case INDEX_op_and_i64:
> case INDEX_op_or_i64:
> case INDEX_op_xor_i64:
> + case INDEX_op_shl_i64:
> + case INDEX_op_shr_i64:
> + case INDEX_op_sar_i64:
> + case INDEX_op_rotl_i64:
> + case INDEX_op_rotr_i64:
> return 64;
> #endif
> default:
> @@ -205,6 +215,58 @@ static TCGArg do_constant_folding_2(int op, TCGArg x, TCGArg y)
> CASE_OP_32_64(xor):
> return x ^ y;
>
> + case INDEX_op_shl_i32:
> + return (uint32_t)x << (uint32_t)y;
> +
> +#if TCG_TARGET_REG_BITS == 64
> + case INDEX_op_shl_i64:
> + return (uint64_t)x << (uint64_t)y;
> +#endif
> +
> + case INDEX_op_shr_i32:
> + return (uint32_t)x >> (uint32_t)y;
> +
> +#if TCG_TARGET_REG_BITS == 64
> + case INDEX_op_shr_i64:
> + return (uint64_t)x >> (uint64_t)y;
> +#endif
> +
> + case INDEX_op_sar_i32:
> + return (int32_t)x >> (int32_t)y;
> +
> +#if TCG_TARGET_REG_BITS == 64
> + case INDEX_op_sar_i64:
> + return (int64_t)x >> (int64_t)y;
> +#endif
> +
> + case INDEX_op_rotr_i32:
> +#if TCG_TARGET_REG_BITS == 64
> + x &= 0xffffffff;
> + y &= 0xffffffff;
> +#endif
> + x = (x << (32 - y)) | (x >> y);
> + return x;
> +
> +#if TCG_TARGET_REG_BITS == 64
> + case INDEX_op_rotr_i64:
> + x = (x << (64 - y)) | (x >> y);
> + return x;
> +#endif
> +
> + case INDEX_op_rotl_i32:
> +#if TCG_TARGET_REG_BITS == 64
> + x &= 0xffffffff;
> + y &= 0xffffffff;
> +#endif
> + x = (x << y) | (x >> (32 - y));
> + return x;
> +
> +#if TCG_TARGET_REG_BITS == 64
> + case INDEX_op_rotl_i64:
> + x = (x << y) | (x >> (64 - y));
> + return x;
> +#endif
> +
> default:
> fprintf(stderr,
> "Unrecognized operation %d in do_constant_folding.\n", op);
> @@ -278,6 +340,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
> switch (op) {
> CASE_OP_32_64(add):
> CASE_OP_32_64(sub):
> + CASE_OP_32_64(shl):
> + CASE_OP_32_64(shr):
> + CASE_OP_32_64(sar):
> + CASE_OP_32_64(rotl):
> + CASE_OP_32_64(rotr):
> if (temps[args[1]].state == TCG_TEMP_CONST) {
> /* Proceed with possible constant folding. */
> break;
> @@ -363,6 +430,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
> CASE_OP_32_64(or):
> CASE_OP_32_64(and):
> CASE_OP_32_64(xor):
> + CASE_OP_32_64(shl):
> + CASE_OP_32_64(shr):
> + CASE_OP_32_64(sar):
> + CASE_OP_32_64(rotl):
> + CASE_OP_32_64(rotr):
> if (temps[args[1]].state == TCG_TEMP_CONST
> && temps[args[2]].state == TCG_TEMP_CONST) {
> gen_opc_buf[op_index] = op_to_movi(op);
> --
> 1.7.4.1
>
>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH v3 5/6] Do constant folding for shift operations.
2011-07-30 12:25 ` Blue Swirl
@ 2011-07-30 19:13 ` Blue Swirl
0 siblings, 0 replies; 18+ messages in thread
From: Blue Swirl @ 2011-07-30 19:13 UTC (permalink / raw)
To: Kirill Batuzov; +Cc: qemu-devel, zhur
On Sat, Jul 30, 2011 at 3:25 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
> On Thu, Jul 7, 2011 at 3:37 PM, Kirill Batuzov <batuzovk@ispras.ru> wrote:
>> Perform constant forlding for SHR, SHL, SAR, ROTR, ROTL operations.
>
> This patch broke build on targets (Sparc, MIPS) which don't implement
> rotation ops, the next patch likewise. I committed a fix.
Unfortunately my patch which fixed Sparc build broke i386 build, so I
committed another fix.
>> Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
>> ---
>> tcg/optimize.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 files changed, 72 insertions(+), 0 deletions(-)
>>
>> diff --git a/tcg/optimize.c b/tcg/optimize.c
>> index c469952..a1bb287 100644
>> --- a/tcg/optimize.c
>> +++ b/tcg/optimize.c
>> @@ -102,6 +102,11 @@ static int op_bits(int op)
>> case INDEX_op_and_i32:
>> case INDEX_op_or_i32:
>> case INDEX_op_xor_i32:
>> + case INDEX_op_shl_i32:
>> + case INDEX_op_shr_i32:
>> + case INDEX_op_sar_i32:
>> + case INDEX_op_rotl_i32:
>> + case INDEX_op_rotr_i32:
>> return 32;
>> #if TCG_TARGET_REG_BITS == 64
>> case INDEX_op_mov_i64:
>> @@ -111,6 +116,11 @@ static int op_bits(int op)
>> case INDEX_op_and_i64:
>> case INDEX_op_or_i64:
>> case INDEX_op_xor_i64:
>> + case INDEX_op_shl_i64:
>> + case INDEX_op_shr_i64:
>> + case INDEX_op_sar_i64:
>> + case INDEX_op_rotl_i64:
>> + case INDEX_op_rotr_i64:
>> return 64;
>> #endif
>> default:
>> @@ -205,6 +215,58 @@ static TCGArg do_constant_folding_2(int op, TCGArg x, TCGArg y)
>> CASE_OP_32_64(xor):
>> return x ^ y;
>>
>> + case INDEX_op_shl_i32:
>> + return (uint32_t)x << (uint32_t)y;
>> +
>> +#if TCG_TARGET_REG_BITS == 64
>> + case INDEX_op_shl_i64:
>> + return (uint64_t)x << (uint64_t)y;
>> +#endif
>> +
>> + case INDEX_op_shr_i32:
>> + return (uint32_t)x >> (uint32_t)y;
>> +
>> +#if TCG_TARGET_REG_BITS == 64
>> + case INDEX_op_shr_i64:
>> + return (uint64_t)x >> (uint64_t)y;
>> +#endif
>> +
>> + case INDEX_op_sar_i32:
>> + return (int32_t)x >> (int32_t)y;
>> +
>> +#if TCG_TARGET_REG_BITS == 64
>> + case INDEX_op_sar_i64:
>> + return (int64_t)x >> (int64_t)y;
>> +#endif
>> +
>> + case INDEX_op_rotr_i32:
>> +#if TCG_TARGET_REG_BITS == 64
>> + x &= 0xffffffff;
>> + y &= 0xffffffff;
>> +#endif
>> + x = (x << (32 - y)) | (x >> y);
>> + return x;
>> +
>> +#if TCG_TARGET_REG_BITS == 64
>> + case INDEX_op_rotr_i64:
>> + x = (x << (64 - y)) | (x >> y);
>> + return x;
>> +#endif
>> +
>> + case INDEX_op_rotl_i32:
>> +#if TCG_TARGET_REG_BITS == 64
>> + x &= 0xffffffff;
>> + y &= 0xffffffff;
>> +#endif
>> + x = (x << y) | (x >> (32 - y));
>> + return x;
>> +
>> +#if TCG_TARGET_REG_BITS == 64
>> + case INDEX_op_rotl_i64:
>> + x = (x << y) | (x >> (64 - y));
>> + return x;
>> +#endif
>> +
>> default:
>> fprintf(stderr,
>> "Unrecognized operation %d in do_constant_folding.\n", op);
>> @@ -278,6 +340,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>> switch (op) {
>> CASE_OP_32_64(add):
>> CASE_OP_32_64(sub):
>> + CASE_OP_32_64(shl):
>> + CASE_OP_32_64(shr):
>> + CASE_OP_32_64(sar):
>> + CASE_OP_32_64(rotl):
>> + CASE_OP_32_64(rotr):
>> if (temps[args[1]].state == TCG_TEMP_CONST) {
>> /* Proceed with possible constant folding. */
>> break;
>> @@ -363,6 +430,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>> CASE_OP_32_64(or):
>> CASE_OP_32_64(and):
>> CASE_OP_32_64(xor):
>> + CASE_OP_32_64(shl):
>> + CASE_OP_32_64(shr):
>> + CASE_OP_32_64(sar):
>> + CASE_OP_32_64(rotl):
>> + CASE_OP_32_64(rotr):
>> if (temps[args[1]].state == TCG_TEMP_CONST
>> && temps[args[2]].state == TCG_TEMP_CONST) {
>> gen_opc_buf[op_index] = op_to_movi(op);
>> --
>> 1.7.4.1
>>
>>
>>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/6] Add copy and constant propagation.
2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 2/6] Add copy and constant propagation Kirill Batuzov
@ 2011-08-03 19:00 ` Stefan Weil
2011-08-03 20:20 ` Blue Swirl
0 siblings, 1 reply; 18+ messages in thread
From: Stefan Weil @ 2011-08-03 19:00 UTC (permalink / raw)
To: Kirill Batuzov; +Cc: Blue Swirl, qemu-devel, zhur
Am 07.07.2011 14:37, schrieb Kirill Batuzov:
> Make tcg_constant_folding do copy and constant propagation. It is a
> preparational work before actual constant folding.
>
> Signed-off-by: Kirill Batuzov<batuzovk@ispras.ru>
> ---
> tcg/optimize.c | 182 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> 1 files changed, 180 insertions(+), 2 deletions(-)
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index c7c7da9..f8afe71 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
>
...
This patch breaks QEMU on 32 bit hosts (tested on 386 Linux
and w32 hosts). Simply running qemu (BIOS only) terminates
with abort(). As the error is easy to reproduce, I don't provide
a stack frame here.
> +static void tcg_opt_gen_mov(TCGArg *gen_args, TCGArg dst, TCGArg src,
> + int nb_temps, int nb_globals)
> +{
> + reset_temp(dst, nb_temps, nb_globals);
> + assert(temps[src].state != TCG_TEMP_COPY);
> + if (src>= nb_globals) {
> + assert(temps[src].state != TCG_TEMP_CONST);
> + if (temps[src].state != TCG_TEMP_HAS_COPY) {
> + temps[src].state = TCG_TEMP_HAS_COPY;
> + temps[src].next_copy = src;
> + temps[src].prev_copy = src;
> + }
> + temps[dst].state = TCG_TEMP_COPY;
> + temps[dst].val = src;
> + temps[dst].next_copy = temps[src].next_copy;
> + temps[dst].prev_copy = src;
> + temps[temps[dst].next_copy].prev_copy = dst;
> + temps[src].next_copy = dst;
> + }
> + gen_args[0] = dst;
> + gen_args[1] = src;
> +}
>
QEMU with a modified tcg_opt_gen_mov() (without the if block) works.
Kind regards,
Stefan Weil
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/6] Add copy and constant propagation.
2011-08-03 19:00 ` Stefan Weil
@ 2011-08-03 20:20 ` Blue Swirl
2011-08-03 20:56 ` Stefan Weil
0 siblings, 1 reply; 18+ messages in thread
From: Blue Swirl @ 2011-08-03 20:20 UTC (permalink / raw)
To: Stefan Weil; +Cc: qemu-devel, zhur, Kirill Batuzov
On Wed, Aug 3, 2011 at 7:00 PM, Stefan Weil <weil@mail.berlios.de> wrote:
> Am 07.07.2011 14:37, schrieb Kirill Batuzov:
>>
>> Make tcg_constant_folding do copy and constant propagation. It is a
>> preparational work before actual constant folding.
>>
>> Signed-off-by: Kirill Batuzov<batuzovk@ispras.ru>
>> ---
>> tcg/optimize.c | 182
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>> 1 files changed, 180 insertions(+), 2 deletions(-)
>>
>> diff --git a/tcg/optimize.c b/tcg/optimize.c
>> index c7c7da9..f8afe71 100644
>> --- a/tcg/optimize.c
>> +++ b/tcg/optimize.c
>>
>
> ...
>
> This patch breaks QEMU on 32 bit hosts (tested on 386 Linux
> and w32 hosts). Simply running qemu (BIOS only) terminates
> with abort(). As the error is easy to reproduce, I don't provide
> a stack frame here.
I can't reproduce, i386/Linux and win32 versions of i386, Sparc32 and
Sparc64 emulators work fine.
Maybe you have a stale build (bug in Makefile dependencies)?
>> +static void tcg_opt_gen_mov(TCGArg *gen_args, TCGArg dst, TCGArg src,
>> + int nb_temps, int nb_globals)
>> +{
>> + reset_temp(dst, nb_temps, nb_globals);
>> + assert(temps[src].state != TCG_TEMP_COPY);
>> + if (src>= nb_globals) {
>> + assert(temps[src].state != TCG_TEMP_CONST);
>> + if (temps[src].state != TCG_TEMP_HAS_COPY) {
>> + temps[src].state = TCG_TEMP_HAS_COPY;
>> + temps[src].next_copy = src;
>> + temps[src].prev_copy = src;
>> + }
>> + temps[dst].state = TCG_TEMP_COPY;
>> + temps[dst].val = src;
>> + temps[dst].next_copy = temps[src].next_copy;
>> + temps[dst].prev_copy = src;
>> + temps[temps[dst].next_copy].prev_copy = dst;
>> + temps[src].next_copy = dst;
>> + }
>> + gen_args[0] = dst;
>> + gen_args[1] = src;
>> +}
>>
>
> QEMU with a modified tcg_opt_gen_mov() (without the if block) works.
>
> Kind regards,
> Stefan Weil
>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/6] Add copy and constant propagation.
2011-08-03 20:20 ` Blue Swirl
@ 2011-08-03 20:56 ` Stefan Weil
2011-08-03 21:03 ` Stefan Weil
0 siblings, 1 reply; 18+ messages in thread
From: Stefan Weil @ 2011-08-03 20:56 UTC (permalink / raw)
To: Blue Swirl; +Cc: qemu-devel, zhur, Kirill Batuzov
Am 03.08.2011 22:20, schrieb Blue Swirl:
> On Wed, Aug 3, 2011 at 7:00 PM, Stefan Weil <weil@mail.berlios.de> wrote:
>> Am 07.07.2011 14:37, schrieb Kirill Batuzov:
>>>
>>> Make tcg_constant_folding do copy and constant propagation. It is a
>>> preparational work before actual constant folding.
>>>
>>> Signed-off-by: Kirill Batuzov<batuzovk@ispras.ru>
>>> ---
>>> tcg/optimize.c | 182
>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>> 1 files changed, 180 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/tcg/optimize.c b/tcg/optimize.c
>>> index c7c7da9..f8afe71 100644
>>> --- a/tcg/optimize.c
>>> +++ b/tcg/optimize.c
>>>
>>
>> ...
>>
>> This patch breaks QEMU on 32 bit hosts (tested on 386 Linux
>> and w32 hosts). Simply running qemu (BIOS only) terminates
>> with abort(). As the error is easy to reproduce, I don't provide
>> a stack frame here.
>
> I can't reproduce, i386/Linux and win32 versions of i386, Sparc32 and
> Sparc64 emulators work fine.
>
> Maybe you have a stale build (bug in Makefile dependencies)?
Sorry, an important information was wrong / missing in my report.
It's not qemu, but qemu-system-x86_64 which fails to work.
I just tested it once more with a new build:
$ bin/x86_64-softmmu/qemu-system-x86_64 -L pc-bios
/qemu/tcg/tcg.c:1646: tcg fatal error
Abgebrochen
Cheers,
Stefan
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/6] Add copy and constant propagation.
2011-08-03 20:56 ` Stefan Weil
@ 2011-08-03 21:03 ` Stefan Weil
2011-08-04 18:42 ` Blue Swirl
0 siblings, 1 reply; 18+ messages in thread
From: Stefan Weil @ 2011-08-03 21:03 UTC (permalink / raw)
To: Stefan Weil; +Cc: Blue Swirl, qemu-devel, zhur, Kirill Batuzov
Am 03.08.2011 22:56, schrieb Stefan Weil:
> Am 03.08.2011 22:20, schrieb Blue Swirl:
>> On Wed, Aug 3, 2011 at 7:00 PM, Stefan Weil <weil@mail.berlios.de>
>> wrote:
>>> Am 07.07.2011 14:37, schrieb Kirill Batuzov:
>>>>
>>>> Make tcg_constant_folding do copy and constant propagation. It is a
>>>> preparational work before actual constant folding.
>>>>
>>>> Signed-off-by: Kirill Batuzov<batuzovk@ispras.ru>
>>>> ---
>>>> tcg/optimize.c | 182
>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>> 1 files changed, 180 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/tcg/optimize.c b/tcg/optimize.c
>>>> index c7c7da9..f8afe71 100644
>>>> --- a/tcg/optimize.c
>>>> +++ b/tcg/optimize.c
>>>>
>>>
>>> ...
>>>
>>> This patch breaks QEMU on 32 bit hosts (tested on 386 Linux
>>> and w32 hosts). Simply running qemu (BIOS only) terminates
>>> with abort(). As the error is easy to reproduce, I don't provide
>>> a stack frame here.
>>
>> I can't reproduce, i386/Linux and win32 versions of i386, Sparc32 and
>> Sparc64 emulators work fine.
>>
>> Maybe you have a stale build (bug in Makefile dependencies)?
>
> Sorry, an important information was wrong / missing in my report.
> It's not qemu, but qemu-system-x86_64 which fails to work.
>
> I just tested it once more with a new build:
>
> $ bin/x86_64-softmmu/qemu-system-x86_64 -L pc-bios
> /qemu/tcg/tcg.c:1646: tcg fatal error
> Abgebrochen
>
> Cheers,
> Stefan
qemu-system-mips64el fails with the same error, so the problem
occurs when running 64 bit emulations on 32 bit hosts.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/6] Add copy and constant propagation.
2011-08-03 21:03 ` Stefan Weil
@ 2011-08-04 18:42 ` Blue Swirl
2011-08-04 19:24 ` Blue Swirl
0 siblings, 1 reply; 18+ messages in thread
From: Blue Swirl @ 2011-08-04 18:42 UTC (permalink / raw)
To: Stefan Weil; +Cc: qemu-devel, zhur, Kirill Batuzov
On Wed, Aug 3, 2011 at 9:03 PM, Stefan Weil <weil@mail.berlios.de> wrote:
> Am 03.08.2011 22:56, schrieb Stefan Weil:
>>
>> Am 03.08.2011 22:20, schrieb Blue Swirl:
>>>
>>> On Wed, Aug 3, 2011 at 7:00 PM, Stefan Weil <weil@mail.berlios.de> wrote:
>>>>
>>>> Am 07.07.2011 14:37, schrieb Kirill Batuzov:
>>>>>
>>>>> Make tcg_constant_folding do copy and constant propagation. It is a
>>>>> preparational work before actual constant folding.
>>>>>
>>>>> Signed-off-by: Kirill Batuzov<batuzovk@ispras.ru>
>>>>> ---
>>>>> tcg/optimize.c | 182
>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>>> 1 files changed, 180 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/tcg/optimize.c b/tcg/optimize.c
>>>>> index c7c7da9..f8afe71 100644
>>>>> --- a/tcg/optimize.c
>>>>> +++ b/tcg/optimize.c
>>>>>
>>>>
>>>> ...
>>>>
>>>> This patch breaks QEMU on 32 bit hosts (tested on 386 Linux
>>>> and w32 hosts). Simply running qemu (BIOS only) terminates
>>>> with abort(). As the error is easy to reproduce, I don't provide
>>>> a stack frame here.
>>>
>>> I can't reproduce, i386/Linux and win32 versions of i386, Sparc32 and
>>> Sparc64 emulators work fine.
>>>
>>> Maybe you have a stale build (bug in Makefile dependencies)?
>>
>> Sorry, an important information was wrong / missing in my report.
>> It's not qemu, but qemu-system-x86_64 which fails to work.
>>
>> I just tested it once more with a new build:
>>
>> $ bin/x86_64-softmmu/qemu-system-x86_64 -L pc-bios
>> /qemu/tcg/tcg.c:1646: tcg fatal error
>> Abgebrochen
OK, now that is broken also for me.
>> Cheers,
>> Stefan
>
> qemu-system-mips64el fails with the same error, so the problem
> occurs when running 64 bit emulations on 32 bit hosts.
Not always, Sparc64 still works fine.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/6] Add copy and constant propagation.
2011-08-04 18:42 ` Blue Swirl
@ 2011-08-04 19:24 ` Blue Swirl
0 siblings, 0 replies; 18+ messages in thread
From: Blue Swirl @ 2011-08-04 19:24 UTC (permalink / raw)
To: Stefan Weil; +Cc: qemu-devel, zhur, Kirill Batuzov
On Thu, Aug 4, 2011 at 6:42 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
> On Wed, Aug 3, 2011 at 9:03 PM, Stefan Weil <weil@mail.berlios.de> wrote:
>> Am 03.08.2011 22:56, schrieb Stefan Weil:
>>>
>>> Am 03.08.2011 22:20, schrieb Blue Swirl:
>>>>
>>>> On Wed, Aug 3, 2011 at 7:00 PM, Stefan Weil <weil@mail.berlios.de> wrote:
>>>>>
>>>>> Am 07.07.2011 14:37, schrieb Kirill Batuzov:
>>>>>>
>>>>>> Make tcg_constant_folding do copy and constant propagation. It is a
>>>>>> preparational work before actual constant folding.
>>>>>>
>>>>>> Signed-off-by: Kirill Batuzov<batuzovk@ispras.ru>
>>>>>> ---
>>>>>> tcg/optimize.c | 182
>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>>>> 1 files changed, 180 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/tcg/optimize.c b/tcg/optimize.c
>>>>>> index c7c7da9..f8afe71 100644
>>>>>> --- a/tcg/optimize.c
>>>>>> +++ b/tcg/optimize.c
>>>>>>
>>>>>
>>>>> ...
>>>>>
>>>>> This patch breaks QEMU on 32 bit hosts (tested on 386 Linux
>>>>> and w32 hosts). Simply running qemu (BIOS only) terminates
>>>>> with abort(). As the error is easy to reproduce, I don't provide
>>>>> a stack frame here.
>>>>
>>>> I can't reproduce, i386/Linux and win32 versions of i386, Sparc32 and
>>>> Sparc64 emulators work fine.
>>>>
>>>> Maybe you have a stale build (bug in Makefile dependencies)?
>>>
>>> Sorry, an important information was wrong / missing in my report.
>>> It's not qemu, but qemu-system-x86_64 which fails to work.
>>>
>>> I just tested it once more with a new build:
>>>
>>> $ bin/x86_64-softmmu/qemu-system-x86_64 -L pc-bios
>>> /qemu/tcg/tcg.c:1646: tcg fatal error
>>> Abgebrochen
>
> OK, now that is broken also for me.
>
>>> Cheers,
>>> Stefan
>>
>> qemu-system-mips64el fails with the same error, so the problem
>> occurs when running 64 bit emulations on 32 bit hosts.
>
> Not always, Sparc64 still works fine.
x86_64 fails because 'mov_i32 cc_src_0,loc25' is incorrectly optimized
to 'mov_i32 cc_src_0,tmp6' where tmp6 is dead after brcond.
IN:
0x000000000ffeb90a: shl %cl,%eax
OP:
---- 0xffeb90a
mov_i32 tmp2,rcx_0
mov_i32 tmp3,rcx_1
mov_i32 tmp0,rax_0
mov_i32 tmp1,rax_1
movi_i32 tmp20,$0x1f
and_i32 tmp2,tmp2,tmp20
movi_i32 tmp3,$0x0
movi_i32 tmp21,$0xffffffff
movi_i32 tmp22,$0xffffffff
add2_i32 tmp16,tmp17,tmp2,tmp3,tmp21,tmp22
movi_i32 tmp20,$0x80bd4e0
call tmp20,$0x30,$2,tmp6,tmp7,tmp0,tmp1,tmp16,tmp17
...tmp6 is assigned here...
movi_i32 tmp20,$0x80bd4e0
call tmp20,$0x30,$2,tmp0,tmp1,tmp0,tmp1,tmp2,tmp3
mov_i32 rax_0,tmp0
movi_i32 rax_1,$0x0
mov_i32 loc23,tmp0
mov_i32 loc24,tmp1
mov_i32 loc25,tmp6
...tmp6 saved to loc25 to survive brcond...
mov_i32 loc26,tmp7
movi_i32 tmp21,$0x0
movi_i32 tmp22,$0x0
brcond2_i32 tmp2,tmp3,tmp21,tmp22,eq,$0x0
mov_i32 cc_src_0,loc25
...used here.
mov_i32 cc_src_1,loc26
mov_i32 cc_dst_0,loc23
mov_i32 cc_dst_1,loc24
movi_i32 cc_op,$0x24
set_label $0x0
movi_i32 tmp8,$0xffeb90c
movi_i32 tmp9,$0x0
st_i32 tmp8,env,$0x80
st_i32 tmp9,env,$0x84
movi_i32 tmp20,$debug
call tmp20,$0x0,$0
OP after liveness analysis:
---- 0xffeb90a
mov_i32 tmp2,rcx_0
nopn $0x2,$0x2
mov_i32 tmp0,rax_0
mov_i32 tmp1,rax_1
movi_i32 tmp20,$0x1f
and_i32 tmp2,tmp2,tmp20
movi_i32 tmp3,$0x0
movi_i32 tmp21,$0xffffffff
movi_i32 tmp22,$0xffffffff
add2_i32 tmp16,tmp17,tmp2,tmp3,tmp21,tmp22
movi_i32 tmp20,$0x80bd4e0
call tmp20,$0x30,$2,tmp6,tmp7,tmp0,tmp1,tmp16,tmp17
OK
movi_i32 tmp20,$0x80bd4e0
call tmp20,$0x30,$2,tmp0,tmp1,tmp0,tmp1,tmp2,tmp3
mov_i32 rax_0,tmp0
movi_i32 rax_1,$0x0
mov_i32 loc23,tmp0
mov_i32 loc24,tmp1
mov_i32 loc25,tmp6
OK, though loc25 is unused after this, why it is not optimized away?
mov_i32 loc26,tmp7
movi_i32 tmp21,$0x0
movi_i32 tmp22,$0x0
brcond2_i32 tmp2,tmp3,tmp21,tmp22,eq,$0x0
mov_i32 cc_src_0,tmp6
Incorrect optimization.
mov_i32 cc_src_1,tmp7
mov_i32 cc_dst_0,tmp0
mov_i32 cc_dst_1,tmp1
movi_i32 cc_op,$0x24
set_label $0x0
movi_i32 tmp8,$0xffeb90c
movi_i32 tmp9,$0x0
st_i32 tmp8,env,$0x80
st_i32 tmp9,env,$0x84
movi_i32 tmp20,$debug
call tmp20,$0x0,$0
end
The corresponding translation code is in target-i386/translate.c:1456,
it looks correct.
Maybe the optimizer should consider stack and memory temporaries
different from register temporaries?
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2011-08-04 19:24 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-07 12:37 [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG Kirill Batuzov
2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 1/6] Add TCG optimizations stub Kirill Batuzov
2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 2/6] Add copy and constant propagation Kirill Batuzov
2011-08-03 19:00 ` Stefan Weil
2011-08-03 20:20 ` Blue Swirl
2011-08-03 20:56 ` Stefan Weil
2011-08-03 21:03 ` Stefan Weil
2011-08-04 18:42 ` Blue Swirl
2011-08-04 19:24 ` Blue Swirl
2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 3/6] Do constant folding for basic arithmetic operations Kirill Batuzov
2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 4/6] Do constant folding for boolean operations Kirill Batuzov
2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 5/6] Do constant folding for shift operations Kirill Batuzov
2011-07-30 12:25 ` Blue Swirl
2011-07-30 19:13 ` Blue Swirl
2011-07-07 12:37 ` [Qemu-devel] [PATCH v3 6/6] Do constant folding for unary operations Kirill Batuzov
2011-07-07 12:54 ` [Qemu-devel] [PATCH v3 0/6] Implement constant folding and copy propagation in TCG Peter Maydell
2011-07-07 14:22 ` Kirill Batuzov
2011-07-30 10:52 ` Blue Swirl
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).