* [Qemu-devel] [PATCH 1/9] tcg/optimizer: remove TCG_TEMP_ANY
2012-09-19 20:00 [Qemu-devel] [PATCH 0/9] tcg/optimize: rework copy propagation Aurelien Jarno
@ 2012-09-19 20:00 ` Aurelien Jarno
2012-09-19 21:25 ` Richard Henderson
2012-09-19 20:00 ` [Qemu-devel] [PATCH 2/9] tcg/optimizer: check types in copy propagation Aurelien Jarno
` (8 subsequent siblings)
9 siblings, 1 reply; 22+ messages in thread
From: Aurelien Jarno @ 2012-09-19 20:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno
TCG_TEMP_ANY has no different meaning than TCG_TEMP_UNDEF, so use
the later instead.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
tcg/optimize.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 9da333c..ca81198 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -39,8 +39,7 @@ typedef enum {
TCG_TEMP_UNDEF = 0,
TCG_TEMP_CONST,
TCG_TEMP_COPY,
- TCG_TEMP_HAS_COPY,
- TCG_TEMP_ANY
+ TCG_TEMP_HAS_COPY
} tcg_temp_state;
struct tcg_temp_info {
@@ -52,7 +51,7 @@ struct tcg_temp_info {
static struct tcg_temp_info temps[TCG_MAX_TEMPS];
-/* Reset TEMP's state to TCG_TEMP_ANY. If TEMP was a representative of some
+/* Reset TEMP's state to TCG_TEMP_UNDEF. If TEMP was a representative of some
class of equivalent temp's, a new representative should be chosen in this
class. */
static void reset_temp(TCGArg temp, int nb_temps, int nb_globals)
@@ -69,7 +68,7 @@ static void reset_temp(TCGArg temp, int nb_temps, int nb_globals)
}
for (i = temps[temp].next_copy; i != temp; i = temps[i].next_copy) {
if (new_base == (TCGArg)-1) {
- temps[i].state = TCG_TEMP_ANY;
+ temps[i].state = TCG_TEMP_UNDEF;
} else {
temps[i].val = new_base;
}
@@ -81,9 +80,9 @@ static void reset_temp(TCGArg temp, int nb_temps, int nb_globals)
temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
new_base = temps[temp].val;
}
- temps[temp].state = TCG_TEMP_ANY;
+ temps[temp].state = TCG_TEMP_UNDEF;
if (new_base != (TCGArg)-1 && temps[new_base].next_copy == new_base) {
- temps[new_base].state = TCG_TEMP_ANY;
+ temps[new_base].state = TCG_TEMP_UNDEF;
}
}
--
1.7.10.4
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [Qemu-devel] [PATCH 2/9] tcg/optimizer: check types in copy propagation
2012-09-19 20:00 [Qemu-devel] [PATCH 0/9] tcg/optimize: rework copy propagation Aurelien Jarno
2012-09-19 20:00 ` [Qemu-devel] [PATCH 1/9] tcg/optimizer: remove TCG_TEMP_ANY Aurelien Jarno
@ 2012-09-19 20:00 ` Aurelien Jarno
2012-09-19 21:33 ` Richard Henderson
2012-09-19 20:00 ` [Qemu-devel] [PATCH 3/9] tcg/optimizer: rework copy progagation Aurelien Jarno
` (7 subsequent siblings)
9 siblings, 1 reply; 22+ messages in thread
From: Aurelien Jarno @ 2012-09-19 20:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno
The copy propagation doesn't check the types of the temps during copy
propagation. However TCG is using the mov_i32 for the i64 to i32
conversion and thus the two are not equivalent.
With this patch tcg_opt_gen_mov() doesn't consider two temps with
different types as copies anymore.
So far it seems the optimization was not aggressive enough to trigger
this bug, but it will be triggered later in this series once the copy
propagation is improved.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
tcg/optimize.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index ca81198..13b5b15 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -106,12 +106,13 @@ static TCGOpcode op_to_movi(TCGOpcode op)
}
}
-static void tcg_opt_gen_mov(TCGArg *gen_args, TCGArg dst, TCGArg src,
- int nb_temps, int nb_globals)
+static void tcg_opt_gen_mov(TCGContext *s, TCGArg *gen_args,
+ TCGArg dst, TCGArg src)
{
- reset_temp(dst, nb_temps, nb_globals);
+ reset_temp(dst, s->nb_temps, s->nb_globals);
assert(temps[src].state != TCG_TEMP_COPY);
- if (src >= nb_globals) {
+ /* Only consider temps with the same type (width) as copies. */
+ if (src >= s->nb_globals && s->temps[dst].type == s->temps[src].type) {
assert(temps[src].state != TCG_TEMP_CONST);
if (temps[src].state != TCG_TEMP_HAS_COPY) {
temps[src].state = TCG_TEMP_HAS_COPY;
@@ -440,8 +441,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
gen_opc_buf[op_index] = INDEX_op_nop;
} else {
gen_opc_buf[op_index] = op_to_mov(op);
- tcg_opt_gen_mov(gen_args, args[0], args[1],
- nb_temps, nb_globals);
+ tcg_opt_gen_mov(s, gen_args, args[0], args[1]);
gen_args += 2;
}
args += 3;
@@ -478,8 +478,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
gen_opc_buf[op_index] = INDEX_op_nop;
} else {
gen_opc_buf[op_index] = op_to_mov(op);
- tcg_opt_gen_mov(gen_args, args[0], args[1], nb_temps,
- nb_globals);
+ tcg_opt_gen_mov(s, gen_args, args[0], args[1]);
gen_args += 2;
}
args += 3;
@@ -503,8 +502,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
break;
}
if (temps[args[1]].state != TCG_TEMP_CONST) {
- tcg_opt_gen_mov(gen_args, args[0], args[1],
- nb_temps, nb_globals);
+ tcg_opt_gen_mov(s, gen_args, args[0], args[1]);
gen_args += 2;
args += 2;
break;
--
1.7.10.4
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH 2/9] tcg/optimizer: check types in copy propagation
2012-09-19 20:00 ` [Qemu-devel] [PATCH 2/9] tcg/optimizer: check types in copy propagation Aurelien Jarno
@ 2012-09-19 21:33 ` Richard Henderson
2012-09-20 5:54 ` Aurelien Jarno
0 siblings, 1 reply; 22+ messages in thread
From: Richard Henderson @ 2012-09-19 21:33 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: qemu-devel
On 09/19/2012 01:00 PM, Aurelien Jarno wrote:
> The copy propagation doesn't check the types of the temps during copy
> propagation. However TCG is using the mov_i32 for the i64 to i32
> conversion and thus the two are not equivalent.
>
> With this patch tcg_opt_gen_mov() doesn't consider two temps with
> different types as copies anymore.
>
> So far it seems the optimization was not aggressive enough to trigger
> this bug, but it will be triggered later in this series once the copy
> propagation is improved.
Exactly where/how does this manifest as problematic?
We do this mov_i32 trick from i64->i32 when we're truncating.
Given that we're not (yet) targeting mips64 and having to
maintain 32-bit sign-extended quantities, I can't see how
that would matter.
We do the i32->i64 trick immediately before a proper extension.
In either case I can't see how plain copy propagation should
matter until some other operation is involved. So, do we have
some other data propagation bug that is being masked here?
r~
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH 2/9] tcg/optimizer: check types in copy propagation
2012-09-19 21:33 ` Richard Henderson
@ 2012-09-20 5:54 ` Aurelien Jarno
2012-09-20 14:00 ` Richard Henderson
0 siblings, 1 reply; 22+ messages in thread
From: Aurelien Jarno @ 2012-09-20 5:54 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel
On Wed, Sep 19, 2012 at 02:33:46PM -0700, Richard Henderson wrote:
> On 09/19/2012 01:00 PM, Aurelien Jarno wrote:
> > The copy propagation doesn't check the types of the temps during copy
> > propagation. However TCG is using the mov_i32 for the i64 to i32
> > conversion and thus the two are not equivalent.
> >
> > With this patch tcg_opt_gen_mov() doesn't consider two temps with
> > different types as copies anymore.
> >
> > So far it seems the optimization was not aggressive enough to trigger
> > this bug, but it will be triggered later in this series once the copy
> > propagation is improved.
>
> Exactly where/how does this manifest as problematic?
The problem arise when a 64-bit value is moved to a 32-bit value, and
later this 64-bit value is reused. With the copy propagation if you
consider them as identical, the 32-bit value might be used instead of
the 64-bit value. This happens for example for the umull instruction on
arm:
| OP:
| ---- 0xf67e5ea0
| mov_i32 tmp5,r3
| mov_i32 tmp6,r8
| ext32u_i64 tmp7,tmp5
| ext32u_i64 tmp8,tmp6
| mul_i64 tmp7,tmp7,tmp8
tmp7 is a 64-bit value
| mov_i32 tmp6,tmp7
Now transfered to a 32-bit tmp
| mov_i32 r1,tmp6
and a 32-bit register.
| movi_i64 tmp8,$0x20
| shr_i64 tmp7,tmp7,tmp8
| mov_i32 tmp6,tmp7
| mov_i32 r3,tmp6
| goto_tb $0x1
| movi_i32 pc,$0xf67e5ea4
| exit_tb $0x7f16948b0299
|
| OP after optimization and liveness analysis:
| ---- 0xf67e5ea0
| nopn $0x2,$0x2
| nopn $0x2,$0x2
| ext32u_i64 tmp7,r3
| ext32u_i64 tmp8,r8
| mul_i64 tmp7,tmp7,tmp8
| mov_i32 tmp6,tmp7
| mov_i32 r1,tmp6
| movi_i64 tmp8,$0x20
| shr_i64 tmp7,r1,tmp8
Here, tmp7 is replaced by r1. However r1 only contains the 32-bit low
part of tmp7, thus returning 0.
| mov_i32 tmp6,tmp7
| mov_i32 r3,tmp6
| goto_tb $0x1
| movi_i32 pc,$0xf67e5ea4
| exit_tb $0x7f16948b0299
| end
> We do this mov_i32 trick from i64->i32 when we're truncating.
> Given that we're not (yet) targeting mips64 and having to
> maintain 32-bit sign-extended quantities, I can't see how
> that would matter.
It does matter on architectures using different instructions to work on
the 32 part of a register the registers. Actually in the case a above
with a register shift right, shr_i32 and shr_i64 are always implemented
differently to not shift bits from the 32bit high part to the low part.
> We do the i32->i64 trick immediately before a proper extension.
>
> In either case I can't see how plain copy propagation should
> matter until some other operation is involved. So, do we have
> some other data propagation bug that is being masked here?
I think it's a bug of the copy propagation considering these registers
are equivalent. If on x86-64 you replace all accesses to rax by eax,
your code will be smaller, but I am not sure it is going to work
correctly.
The only latent bug I can see there, is having mov_i32 both being used
to move data between 32-bit temps and between a 64-bit temp and a 32-bit
temp.
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Qemu-devel] [PATCH 3/9] tcg/optimizer: rework copy progagation
2012-09-19 20:00 [Qemu-devel] [PATCH 0/9] tcg/optimize: rework copy propagation Aurelien Jarno
2012-09-19 20:00 ` [Qemu-devel] [PATCH 1/9] tcg/optimizer: remove TCG_TEMP_ANY Aurelien Jarno
2012-09-19 20:00 ` [Qemu-devel] [PATCH 2/9] tcg/optimizer: check types in copy propagation Aurelien Jarno
@ 2012-09-19 20:00 ` Aurelien Jarno
2012-09-19 21:41 ` Richard Henderson
2012-09-19 20:00 ` [Qemu-devel] [PATCH 4/9] tcg/optimize: do copy propagation for all operations Aurelien Jarno
` (6 subsequent siblings)
9 siblings, 1 reply; 22+ messages in thread
From: Aurelien Jarno @ 2012-09-19 20:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno
The copy propagation pass tries to keep track of what is a copy of what
and what has copy of what, and in addition it keep a circular list of
of all the copies. Unfortunately this doesn't fully work: a mov from
a temp which has a state "COPY" changes it into a state "HAS_COPY".
Later when this temp is used again, it is considered has not having
copy and thus no propagation is done.
This patch fixes that by removing the hiearchy between copies, and thus
only keeping a "COPY" state both meaning "is a copy" and "has a copy".
The decision of which copy to use is deferred to the actual temp
replacement. At this stage there is not one best choice to do, but only
better choices than others. For doing the best choice the operation
would have to be parsed in reversed to know if a temp is going to be
used later or not. That what is done by the liveness analysis. At this
stage it is known that globals will always be live, that local temps
will be dead at the end of the translation block, and that the temps
will be dead at the end of the basic block. This means that this stage
should try to replace temps by local temps or globals and local temps
by globals.
It also brings the advantage of knowing if a temp is a copy of another,
which can improve the various optimizations.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
tcg/optimize.c | 155 ++++++++++++++++++++++++++++++++------------------------
1 file changed, 88 insertions(+), 67 deletions(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 13b5b15..244eb02 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -39,7 +39,6 @@ typedef enum {
TCG_TEMP_UNDEF = 0,
TCG_TEMP_CONST,
TCG_TEMP_COPY,
- TCG_TEMP_HAS_COPY
} tcg_temp_state;
struct tcg_temp_info {
@@ -51,39 +50,19 @@ struct tcg_temp_info {
static struct tcg_temp_info temps[TCG_MAX_TEMPS];
-/* Reset TEMP's state to TCG_TEMP_UNDEF. If TEMP was a representative of some
- class of equivalent temp's, a new representative should be chosen in this
- class. */
-static void reset_temp(TCGArg temp, int nb_temps, int nb_globals)
+/* Reset TEMP's state to TCG_TEMP_UNDEF. If TEMP only had one copy, remove
+ the copy flag from the remaining temp. */
+static void reset_temp(TCGArg temp)
{
- int i;
- TCGArg new_base = (TCGArg)-1;
- if (temps[temp].state == TCG_TEMP_HAS_COPY) {
- for (i = temps[temp].next_copy; i != temp; i = temps[i].next_copy) {
- if (i >= nb_globals) {
- temps[i].state = TCG_TEMP_HAS_COPY;
- new_base = i;
- break;
- }
- }
- for (i = temps[temp].next_copy; i != temp; i = temps[i].next_copy) {
- if (new_base == (TCGArg)-1) {
- temps[i].state = TCG_TEMP_UNDEF;
- } else {
- temps[i].val = new_base;
- }
+ if (temps[temp].state == TCG_TEMP_COPY) {
+ if (temps[temp].prev_copy == temps[temp].next_copy) {
+ temps[temps[temp].next_copy].state = TCG_TEMP_UNDEF;
+ } else {
+ temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
+ temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
}
- temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
- temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
- } else if (temps[temp].state == TCG_TEMP_COPY) {
- temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
- temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
- new_base = temps[temp].val;
}
temps[temp].state = TCG_TEMP_UNDEF;
- if (new_base != (TCGArg)-1 && temps[new_base].next_copy == new_base) {
- temps[new_base].state = TCG_TEMP_UNDEF;
- }
}
static int op_bits(TCGOpcode op)
@@ -106,34 +85,83 @@ static TCGOpcode op_to_movi(TCGOpcode op)
}
}
+static TCGArg find_better_copy(TCGContext *s, TCGArg temp)
+{
+ TCGArg i;
+
+ /* If this is already a global, we can't do better. */
+ if (temp < s->nb_globals) {
+ return temp;
+ }
+
+ /* Search for a global first. */
+ for (i = temps[temp].next_copy ; i != temp ; i = temps[i].next_copy) {
+ if (i < s->nb_globals) {
+ return i;
+ }
+ }
+
+ /* If it is a temp, search for a temp local. */
+ if (!s->temps[temp].temp_local) {
+ for (i = temps[temp].next_copy ; i != temp ; i = temps[i].next_copy) {
+ if (s->temps[i].temp_local) {
+ return i;
+ }
+ }
+ }
+
+ /* Failure to find a better representation, return the same temp. */
+ return temp;
+}
+
+static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
+{
+ TCGArg i;
+
+ if (arg1 == arg2) {
+ return true;
+ }
+
+ if (temps[arg1].state != TCG_TEMP_COPY
+ || temps[arg2].state != TCG_TEMP_COPY) {
+ return false;
+ }
+
+ for (i = temps[arg1].next_copy ; i != arg1 ; i = temps[i].next_copy) {
+ if (i == arg2) {
+ return true;
+ }
+ }
+
+ return false;
+}
+
static void tcg_opt_gen_mov(TCGContext *s, TCGArg *gen_args,
TCGArg dst, TCGArg src)
{
- reset_temp(dst, s->nb_temps, s->nb_globals);
- assert(temps[src].state != TCG_TEMP_COPY);
- /* Only consider temps with the same type (width) as copies. */
- if (src >= s->nb_globals && s->temps[dst].type == s->temps[src].type) {
- assert(temps[src].state != TCG_TEMP_CONST);
- if (temps[src].state != TCG_TEMP_HAS_COPY) {
- temps[src].state = TCG_TEMP_HAS_COPY;
+ reset_temp(dst);
+ assert(temps[src].state != TCG_TEMP_CONST);
+
+ if (s->temps[src].type == s->temps[dst].type) {
+ if (temps[src].state != TCG_TEMP_COPY) {
+ temps[src].state = TCG_TEMP_COPY;
temps[src].next_copy = src;
temps[src].prev_copy = src;
}
temps[dst].state = TCG_TEMP_COPY;
- temps[dst].val = src;
temps[dst].next_copy = temps[src].next_copy;
temps[dst].prev_copy = src;
temps[temps[dst].next_copy].prev_copy = dst;
temps[src].next_copy = dst;
}
+
gen_args[0] = dst;
gen_args[1] = src;
}
-static void tcg_opt_gen_movi(TCGArg *gen_args, TCGArg dst, TCGArg val,
- int nb_temps, int nb_globals)
+static void tcg_opt_gen_movi(TCGArg *gen_args, TCGArg dst, TCGArg val)
{
- reset_temp(dst, nb_temps, nb_globals);
+ reset_temp(dst);
temps[dst].state = TCG_TEMP_CONST;
temps[dst].val = val;
gen_args[0] = dst;
@@ -324,7 +352,6 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
tcg_abort();
}
-
/* Propagate constants and copies, fold constant expressions. */
static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
TCGArg *args, TCGOpDef *tcg_op_defs)
@@ -336,10 +363,8 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
TCGArg tmp;
/* Array VALS has an element for each temp.
If this temp holds a constant then its value is kept in VALS' element.
- If this temp is a copy of other ones then this equivalence class'
- representative is kept in VALS' element.
- If this temp is neither copy nor constant then corresponding VALS'
- element is unused. */
+ If this temp is a copy of other ones then the other copies are
+ available through the doubly linked circular list. */
nb_temps = s->nb_temps;
nb_globals = s->nb_globals;
@@ -355,7 +380,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
assert(op != INDEX_op_call);
for (i = def->nb_oargs; i < def->nb_oargs + def->nb_iargs; i++) {
if (temps[args[i]].state == TCG_TEMP_COPY) {
- args[i] = temps[args[i]].val;
+ args[i] = find_better_copy(s, args[i]);
}
}
}
@@ -408,7 +433,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
if (temps[args[1]].state == TCG_TEMP_CONST
&& temps[args[1]].val == 0) {
gen_opc_buf[op_index] = op_to_movi(op);
- tcg_opt_gen_movi(gen_args, args[0], 0, nb_temps, nb_globals);
+ tcg_opt_gen_movi(gen_args, args[0], 0);
args += 3;
gen_args += 2;
continue;
@@ -435,9 +460,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
}
if (temps[args[2]].state == TCG_TEMP_CONST
&& temps[args[2]].val == 0) {
- if ((temps[args[0]].state == TCG_TEMP_COPY
- && temps[args[0]].val == args[1])
- || args[0] == args[1]) {
+ if (temps_are_copies(args[0], args[1])) {
gen_opc_buf[op_index] = INDEX_op_nop;
} else {
gen_opc_buf[op_index] = op_to_mov(op);
@@ -459,7 +482,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
if ((temps[args[2]].state == TCG_TEMP_CONST
&& temps[args[2]].val == 0)) {
gen_opc_buf[op_index] = op_to_movi(op);
- tcg_opt_gen_movi(gen_args, args[0], 0, nb_temps, nb_globals);
+ tcg_opt_gen_movi(gen_args, args[0], 0);
args += 3;
gen_args += 2;
continue;
@@ -474,7 +497,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
CASE_OP_32_64(or):
CASE_OP_32_64(and):
if (args[1] == args[2]) {
- if (args[1] == args[0]) {
+ if (temps_are_copies(args[0], args[1])) {
gen_opc_buf[op_index] = INDEX_op_nop;
} else {
gen_opc_buf[op_index] = op_to_mov(op);
@@ -494,9 +517,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
allocator where needed and possible. Also detect copies. */
switch (op) {
CASE_OP_32_64(mov):
- if ((temps[args[1]].state == TCG_TEMP_COPY
- && temps[args[1]].val == args[0])
- || args[0] == args[1]) {
+ if (temps_are_copies(args[0], args[1])) {
args += 2;
gen_opc_buf[op_index] = INDEX_op_nop;
break;
@@ -514,7 +535,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
args[1] = temps[args[1]].val;
/* fallthrough */
CASE_OP_32_64(movi):
- tcg_opt_gen_movi(gen_args, args[0], args[1], nb_temps, nb_globals);
+ tcg_opt_gen_movi(gen_args, args[0], args[1]);
gen_args += 2;
args += 2;
break;
@@ -529,9 +550,9 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
if (temps[args[1]].state == TCG_TEMP_CONST) {
gen_opc_buf[op_index] = op_to_movi(op);
tmp = do_constant_folding(op, temps[args[1]].val, 0);
- tcg_opt_gen_movi(gen_args, args[0], tmp, nb_temps, nb_globals);
+ tcg_opt_gen_movi(gen_args, args[0], tmp);
} else {
- reset_temp(args[0], nb_temps, nb_globals);
+ reset_temp(args[0]);
gen_args[0] = args[0];
gen_args[1] = args[1];
}
@@ -559,10 +580,10 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
gen_opc_buf[op_index] = op_to_movi(op);
tmp = do_constant_folding(op, temps[args[1]].val,
temps[args[2]].val);
- tcg_opt_gen_movi(gen_args, args[0], tmp, nb_temps, nb_globals);
+ tcg_opt_gen_movi(gen_args, args[0], tmp);
gen_args += 2;
} else {
- reset_temp(args[0], nb_temps, nb_globals);
+ reset_temp(args[0]);
gen_args[0] = args[0];
gen_args[1] = args[1];
gen_args[2] = args[2];
@@ -576,10 +597,10 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
gen_opc_buf[op_index] = op_to_movi(op);
tmp = do_constant_folding_cond(op, temps[args[1]].val,
temps[args[2]].val, args[3]);
- tcg_opt_gen_movi(gen_args, args[0], tmp, nb_temps, nb_globals);
+ tcg_opt_gen_movi(gen_args, args[0], tmp);
gen_args += 2;
} else {
- reset_temp(args[0], nb_temps, nb_globals);
+ reset_temp(args[0]);
gen_args[0] = args[0];
gen_args[1] = args[1];
gen_args[2] = args[2];
@@ -602,7 +623,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
}
} else {
memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));
- reset_temp(args[0], nb_temps, nb_globals);
+ reset_temp(args[0]);
gen_args[0] = args[0];
gen_args[1] = args[1];
gen_args[2] = args[2];
@@ -615,11 +636,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
nb_call_args = (args[0] >> 16) + (args[0] & 0xffff);
if (!(args[nb_call_args + 1] & (TCG_CALL_CONST | TCG_CALL_PURE))) {
for (i = 0; i < nb_globals; i++) {
- reset_temp(i, nb_temps, nb_globals);
+ reset_temp(i);
}
}
for (i = 0; i < (args[0] >> 16); i++) {
- reset_temp(args[i + 1], nb_temps, nb_globals);
+ reset_temp(args[i + 1]);
}
i = nb_call_args + 3;
while (i) {
@@ -638,7 +659,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));
} else {
for (i = 0; i < def->nb_oargs; i++) {
- reset_temp(args[i], nb_temps, nb_globals);
+ reset_temp(args[i]);
}
}
for (i = 0; i < def->nb_args; i++) {
--
1.7.10.4
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH 3/9] tcg/optimizer: rework copy progagation
2012-09-19 20:00 ` [Qemu-devel] [PATCH 3/9] tcg/optimizer: rework copy progagation Aurelien Jarno
@ 2012-09-19 21:41 ` Richard Henderson
0 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2012-09-19 21:41 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: qemu-devel
On 09/19/2012 01:00 PM, Aurelien Jarno wrote:
> The copy propagation pass tries to keep track of what is a copy of what
> and what has copy of what, and in addition it keep a circular list of
> of all the copies. Unfortunately this doesn't fully work: a mov from
> a temp which has a state "COPY" changes it into a state "HAS_COPY".
> Later when this temp is used again, it is considered has not having
> copy and thus no propagation is done.
>
> This patch fixes that by removing the hiearchy between copies, and thus
> only keeping a "COPY" state both meaning "is a copy" and "has a copy".
> The decision of which copy to use is deferred to the actual temp
> replacement. At this stage there is not one best choice to do, but only
> better choices than others. For doing the best choice the operation
> would have to be parsed in reversed to know if a temp is going to be
> used later or not. That what is done by the liveness analysis. At this
> stage it is known that globals will always be live, that local temps
> will be dead at the end of the translation block, and that the temps
> will be dead at the end of the basic block. This means that this stage
> should try to replace temps by local temps or globals and local temps
> by globals.
>
> It also brings the advantage of knowing if a temp is a copy of another,
> which can improve the various optimizations.
>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Qemu-devel] [PATCH 4/9] tcg/optimize: do copy propagation for all operations
2012-09-19 20:00 [Qemu-devel] [PATCH 0/9] tcg/optimize: rework copy propagation Aurelien Jarno
` (2 preceding siblings ...)
2012-09-19 20:00 ` [Qemu-devel] [PATCH 3/9] tcg/optimizer: rework copy progagation Aurelien Jarno
@ 2012-09-19 20:00 ` Aurelien Jarno
2012-09-19 21:43 ` Richard Henderson
2012-09-19 20:00 ` [Qemu-devel] [PATCH 5/9] tcg/optimize: optimize "op r, a, a => mov r, a" Aurelien Jarno
` (5 subsequent siblings)
9 siblings, 1 reply; 22+ messages in thread
From: Aurelien Jarno @ 2012-09-19 20:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno
It is possible to do copy propagation for all operations, even the ones
that have side effects or clobber arguments (it only concerns input
arguments). That said, the call operation should be handled differently
due to the variable number of arguments.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
tcg/optimize.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 244eb02..a58de3b 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -376,8 +376,15 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
op = gen_opc_buf[op_index];
def = &tcg_op_defs[op];
/* Do copy propagation */
- if (!(def->flags & (TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS))) {
- assert(op != INDEX_op_call);
+ if (op == INDEX_op_call) {
+ int nb_oargs = args[0] >> 16;
+ int nb_iargs = args[0] & 0xffff;
+ for (i = nb_oargs + 1; i < nb_oargs + nb_iargs + 1; i++) {
+ if (temps[args[i]].state == TCG_TEMP_COPY) {
+ args[i] = find_better_copy(s, args[i]);
+ }
+ }
+ } else {
for (i = def->nb_oargs; i < def->nb_oargs + def->nb_iargs; i++) {
if (temps[args[i]].state == TCG_TEMP_COPY) {
args[i] = find_better_copy(s, args[i]);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [Qemu-devel] [PATCH 5/9] tcg/optimize: optimize "op r, a, a => mov r, a"
2012-09-19 20:00 [Qemu-devel] [PATCH 0/9] tcg/optimize: rework copy propagation Aurelien Jarno
` (3 preceding siblings ...)
2012-09-19 20:00 ` [Qemu-devel] [PATCH 4/9] tcg/optimize: do copy propagation for all operations Aurelien Jarno
@ 2012-09-19 20:00 ` Aurelien Jarno
2012-09-19 21:43 ` Richard Henderson
2012-09-19 20:00 ` [Qemu-devel] [PATCH 6/9] tcg/optimize: optimize "op r, a, a => movi r, 0" Aurelien Jarno
` (4 subsequent siblings)
9 siblings, 1 reply; 22+ messages in thread
From: Aurelien Jarno @ 2012-09-19 20:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno
Now that we can easily detect all copies, we can optimize the
"op r, a, a => mov r, a" case a bit more.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
tcg/optimize.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index a58de3b..cfee690 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -503,7 +503,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
switch (op) {
CASE_OP_32_64(or):
CASE_OP_32_64(and):
- if (args[1] == args[2]) {
+ if (temps_are_copies(args[1], args[2])) {
if (temps_are_copies(args[0], args[1])) {
gen_opc_buf[op_index] = INDEX_op_nop;
} else {
--
1.7.10.4
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH 5/9] tcg/optimize: optimize "op r, a, a => mov r, a"
2012-09-19 20:00 ` [Qemu-devel] [PATCH 5/9] tcg/optimize: optimize "op r, a, a => mov r, a" Aurelien Jarno
@ 2012-09-19 21:43 ` Richard Henderson
0 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2012-09-19 21:43 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: qemu-devel
On 09/19/2012 01:00 PM, Aurelien Jarno wrote:
> Now that we can easily detect all copies, we can optimize the
> "op r, a, a => mov r, a" case a bit more.
>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Qemu-devel] [PATCH 6/9] tcg/optimize: optimize "op r, a, a => movi r, 0"
2012-09-19 20:00 [Qemu-devel] [PATCH 0/9] tcg/optimize: rework copy propagation Aurelien Jarno
` (4 preceding siblings ...)
2012-09-19 20:00 ` [Qemu-devel] [PATCH 5/9] tcg/optimize: optimize "op r, a, a => mov r, a" Aurelien Jarno
@ 2012-09-19 20:00 ` Aurelien Jarno
2012-09-19 21:46 ` Richard Henderson
2012-09-19 20:00 ` [Qemu-devel] [PATCH 7/9] tcg/optimize: further optimize brcond/setcond Aurelien Jarno
` (3 subsequent siblings)
9 siblings, 1 reply; 22+ messages in thread
From: Aurelien Jarno @ 2012-09-19 20:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno
Now that it's possible to detect copies, we can optimize the case
the "op r, a, a => movi r, 0". This helps the computation of
overflow flags when one of the two args is 0.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
tcg/optimize.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index cfee690..f8423e0 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -519,6 +519,23 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
break;
}
+ /* Simplify expression for "op r, a, a => movi r, 0" cases */
+ switch (op) {
+ CASE_OP_32_64(andc):
+ CASE_OP_32_64(sub):
+ CASE_OP_32_64(xor):
+ if (temps_are_copies(args[1], args[2])) {
+ gen_opc_buf[op_index] = op_to_movi(op);
+ tcg_opt_gen_movi(gen_args, args[0], 0);
+ gen_args += 2;
+ args += 3;
+ continue;
+ }
+ break;
+ default:
+ break;
+ }
+
/* Propagate constants through copy operations and do constant
folding. Constants will be substituted to arguments by register
allocator where needed and possible. Also detect copies. */
--
1.7.10.4
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH 6/9] tcg/optimize: optimize "op r, a, a => movi r, 0"
2012-09-19 20:00 ` [Qemu-devel] [PATCH 6/9] tcg/optimize: optimize "op r, a, a => movi r, 0" Aurelien Jarno
@ 2012-09-19 21:46 ` Richard Henderson
0 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2012-09-19 21:46 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: qemu-devel
On 09/19/2012 01:00 PM, Aurelien Jarno wrote:
> + /* Simplify expression for "op r, a, a => movi r, 0" cases */
> + switch (op) {
> + CASE_OP_32_64(andc):
> + CASE_OP_32_64(sub):
> + CASE_OP_32_64(xor):
For completeness, nand, orc, eqv r, a, a -> -1.
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Qemu-devel] [PATCH 7/9] tcg/optimize: further optimize brcond/setcond
2012-09-19 20:00 [Qemu-devel] [PATCH 0/9] tcg/optimize: rework copy propagation Aurelien Jarno
` (5 preceding siblings ...)
2012-09-19 20:00 ` [Qemu-devel] [PATCH 6/9] tcg/optimize: optimize "op r, a, a => movi r, 0" Aurelien Jarno
@ 2012-09-19 20:00 ` Aurelien Jarno
2012-09-19 21:48 ` Richard Henderson
2012-09-19 20:00 ` [Qemu-devel] [PATCH 8/9] tcg/optimize: prefer the "op a, a, b" form for commutative ops Aurelien Jarno
` (2 subsequent siblings)
9 siblings, 1 reply; 22+ messages in thread
From: Aurelien Jarno @ 2012-09-19 20:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno
When both arguments of brcond/setcond are the same or when one of
the two values is a constant equal to zero, it's possible to do
further optimizations. This helps the computation of carry flags
when one of the two args is 0.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
tcg/optimize.c | 121 ++++++++++++++++++++++++++++++++++----------------------
1 file changed, 74 insertions(+), 47 deletions(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index f8423e0..6e2a37d 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -292,58 +292,88 @@ static TCGArg do_constant_folding(TCGOpcode op, TCGArg x, TCGArg y)
return res;
}
+/* Return 2 if the condition can't be simplified, and the result
+ of the condition (0 or 1) if it can */
static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
TCGArg y, TCGCond c)
{
- switch (op_bits(op)) {
- case 32:
+ if (temps[x].state == TCG_TEMP_CONST && temps[y].state == TCG_TEMP_CONST) {
+ switch (op_bits(op)) {
+ case 32:
+ switch (c) {
+ case TCG_COND_EQ:
+ return (uint32_t)temps[x].val == (uint32_t)temps[y].val;
+ case TCG_COND_NE:
+ return (uint32_t)temps[x].val != (uint32_t)temps[y].val;
+ case TCG_COND_LT:
+ return (int32_t)temps[x].val < (int32_t)temps[y].val;
+ case TCG_COND_GE:
+ return (int32_t)temps[x].val >= (int32_t)temps[y].val;
+ case TCG_COND_LE:
+ return (int32_t)temps[x].val <= (int32_t)temps[y].val;
+ case TCG_COND_GT:
+ return (int32_t)temps[x].val > (int32_t)temps[y].val;
+ case TCG_COND_LTU:
+ return (uint32_t)temps[x].val < (uint32_t)temps[y].val;
+ case TCG_COND_GEU:
+ return (uint32_t)temps[x].val >= (uint32_t)temps[y].val;
+ case TCG_COND_LEU:
+ return (uint32_t)temps[x].val <= (uint32_t)temps[y].val;
+ case TCG_COND_GTU:
+ return (uint32_t)temps[x].val > (uint32_t)temps[y].val;
+ }
+ break;
+ case 64:
+ switch (c) {
+ case TCG_COND_EQ:
+ return (uint64_t)temps[x].val == (uint64_t)temps[y].val;
+ case TCG_COND_NE:
+ return (uint64_t)temps[x].val != (uint64_t)temps[y].val;
+ case TCG_COND_LT:
+ return (int64_t)temps[x].val < (int64_t)temps[y].val;
+ case TCG_COND_GE:
+ return (int64_t)temps[x].val >= (int64_t)temps[y].val;
+ case TCG_COND_LE:
+ return (int64_t)temps[x].val <= (int64_t)temps[y].val;
+ case TCG_COND_GT:
+ return (int64_t)temps[x].val > (int64_t)temps[y].val;
+ case TCG_COND_LTU:
+ return (uint64_t)temps[x].val < (uint64_t)temps[y].val;
+ case TCG_COND_GEU:
+ return (uint64_t)temps[x].val >= (uint64_t)temps[y].val;
+ case TCG_COND_LEU:
+ return (uint64_t)temps[x].val <= (uint64_t)temps[y].val;
+ case TCG_COND_GTU:
+ return (uint64_t)temps[x].val > (uint64_t)temps[y].val;
+ }
+ break;
+ }
+ } else if (temps_are_copies(x, y)) {
switch (c) {
- case TCG_COND_EQ:
- return (uint32_t)x == (uint32_t)y;
- case TCG_COND_NE:
- return (uint32_t)x != (uint32_t)y;
- case TCG_COND_LT:
- return (int32_t)x < (int32_t)y;
- case TCG_COND_GE:
- return (int32_t)x >= (int32_t)y;
- case TCG_COND_LE:
- return (int32_t)x <= (int32_t)y;
case TCG_COND_GT:
- return (int32_t)x > (int32_t)y;
case TCG_COND_LTU:
- return (uint32_t)x < (uint32_t)y;
- case TCG_COND_GEU:
- return (uint32_t)x >= (uint32_t)y;
- case TCG_COND_LEU:
- return (uint32_t)x <= (uint32_t)y;
+ case TCG_COND_LT:
case TCG_COND_GTU:
- return (uint32_t)x > (uint32_t)y;
- }
- break;
- case 64:
- switch (c) {
- case TCG_COND_EQ:
- return (uint64_t)x == (uint64_t)y;
case TCG_COND_NE:
- return (uint64_t)x != (uint64_t)y;
- case TCG_COND_LT:
- return (int64_t)x < (int64_t)y;
+ return 0;
case TCG_COND_GE:
- return (int64_t)x >= (int64_t)y;
+ case TCG_COND_GEU:
case TCG_COND_LE:
- return (int64_t)x <= (int64_t)y;
- case TCG_COND_GT:
- return (int64_t)x > (int64_t)y;
+ case TCG_COND_LEU:
+ case TCG_COND_EQ:
+ return 1;
+ }
+ } else if (temps[y].state == TCG_TEMP_CONST && temps[y].val == 0) {
+ switch (c) {
case TCG_COND_LTU:
- return (uint64_t)x < (uint64_t)y;
+ return 0;
case TCG_COND_GEU:
- return (uint64_t)x >= (uint64_t)y;
- case TCG_COND_LEU:
- return (uint64_t)x <= (uint64_t)y;
- case TCG_COND_GTU:
- return (uint64_t)x > (uint64_t)y;
+ return 1;
+ default:
+ return 2;
}
- break;
+ } else {
+ return 2;
}
fprintf(stderr,
@@ -616,11 +646,9 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
args += 3;
break;
CASE_OP_32_64(setcond):
- if (temps[args[1]].state == TCG_TEMP_CONST
- && temps[args[2]].state == TCG_TEMP_CONST) {
+ tmp = do_constant_folding_cond(op, args[1], args[2], args[3]);
+ if (tmp != 2) {
gen_opc_buf[op_index] = op_to_movi(op);
- tmp = do_constant_folding_cond(op, temps[args[1]].val,
- temps[args[2]].val, args[3]);
tcg_opt_gen_movi(gen_args, args[0], tmp);
gen_args += 2;
} else {
@@ -634,10 +662,9 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
args += 4;
break;
CASE_OP_32_64(brcond):
- if (temps[args[0]].state == TCG_TEMP_CONST
- && temps[args[1]].state == TCG_TEMP_CONST) {
- if (do_constant_folding_cond(op, temps[args[0]].val,
- temps[args[1]].val, args[2])) {
+ tmp = do_constant_folding_cond(op, args[0], args[1], args[2]);
+ if (tmp != 2) {
+ if (tmp) {
memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));
gen_opc_buf[op_index] = INDEX_op_br;
gen_args[0] = args[3];
--
1.7.10.4
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [Qemu-devel] [PATCH 8/9] tcg/optimize: prefer the "op a, a, b" form for commutative ops
2012-09-19 20:00 [Qemu-devel] [PATCH 0/9] tcg/optimize: rework copy propagation Aurelien Jarno
` (6 preceding siblings ...)
2012-09-19 20:00 ` [Qemu-devel] [PATCH 7/9] tcg/optimize: further optimize brcond/setcond Aurelien Jarno
@ 2012-09-19 20:00 ` Aurelien Jarno
2012-09-19 21:49 ` Richard Henderson
2012-09-19 20:00 ` [Qemu-devel] [PATCH 9/9] tcg: remove #ifdef #endif around TCGOpcode tests Aurelien Jarno
2012-09-21 12:36 ` [Qemu-devel] [PATCH 0/9] tcg/optimize: rework copy propagation Laurent Desnogues
9 siblings, 1 reply; 22+ messages in thread
From: Aurelien Jarno @ 2012-09-19 20:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno
The "op a, a, b" form is better handled on non-RISC host than the "op
a, b, a" form, so swap the arguments to this form when possible, and
when b is not a constant.
This reduces the number of generated instructions by a tiny bit.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
tcg/optimize.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 6e2a37d..3495b7a 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -432,7 +432,10 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
CASE_OP_32_64(eqv):
CASE_OP_32_64(nand):
CASE_OP_32_64(nor):
- if (temps[args[1]].state == TCG_TEMP_CONST) {
+ /* Prefer the constant in second argument, and then the form
+ op a, a, b, which is better handled on non-RISC hosts. */
+ if (temps[args[1]].state == TCG_TEMP_CONST || (args[0] == args[2]
+ && temps[args[2]].state != TCG_TEMP_CONST)) {
tmp = args[1];
args[1] = args[2];
args[2] = tmp;
--
1.7.10.4
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH 8/9] tcg/optimize: prefer the "op a, a, b" form for commutative ops
2012-09-19 20:00 ` [Qemu-devel] [PATCH 8/9] tcg/optimize: prefer the "op a, a, b" form for commutative ops Aurelien Jarno
@ 2012-09-19 21:49 ` Richard Henderson
0 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2012-09-19 21:49 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: qemu-devel
On 09/19/2012 01:00 PM, Aurelien Jarno wrote:
> The "op a, a, b" form is better handled on non-RISC host than the "op
> a, b, a" form, so swap the arguments to this form when possible, and
> when b is not a constant.
>
> This reduces the number of generated instructions by a tiny bit.
>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Qemu-devel] [PATCH 9/9] tcg: remove #ifdef #endif around TCGOpcode tests
2012-09-19 20:00 [Qemu-devel] [PATCH 0/9] tcg/optimize: rework copy propagation Aurelien Jarno
` (7 preceding siblings ...)
2012-09-19 20:00 ` [Qemu-devel] [PATCH 8/9] tcg/optimize: prefer the "op a, a, b" form for commutative ops Aurelien Jarno
@ 2012-09-19 20:00 ` Aurelien Jarno
2012-09-19 21:50 ` Richard Henderson
2012-09-21 12:36 ` [Qemu-devel] [PATCH 0/9] tcg/optimize: rework copy propagation Laurent Desnogues
9 siblings, 1 reply; 22+ messages in thread
From: Aurelien Jarno @ 2012-09-19 20:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno
Commit 25c4d9cc changed all TCGOpcode enums to be available, so we don't
need to #ifdef #endif the one that are available only on some targets.
This makes the code easier to read.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
tcg/tcg.c | 16 +---------------
1 file changed, 1 insertion(+), 15 deletions(-)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index b8a1bec..57fca70 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -937,11 +937,7 @@ void tcg_dump_ops(TCGContext *s)
args[nb_oargs + i]));
}
}
- } else if (c == INDEX_op_movi_i32
-#if TCG_TARGET_REG_BITS == 64
- || c == INDEX_op_movi_i64
-#endif
- ) {
+ } else if (c == INDEX_op_movi_i32 || c == INDEX_op_movi_i64) {
tcg_target_ulong val;
TCGHelperInfo *th;
@@ -991,17 +987,11 @@ void tcg_dump_ops(TCGContext *s)
}
switch (c) {
case INDEX_op_brcond_i32:
-#if TCG_TARGET_REG_BITS == 32
case INDEX_op_brcond2_i32:
-#elif TCG_TARGET_REG_BITS == 64
case INDEX_op_brcond_i64:
-#endif
case INDEX_op_setcond_i32:
-#if TCG_TARGET_REG_BITS == 32
case INDEX_op_setcond2_i32:
-#elif TCG_TARGET_REG_BITS == 64
case INDEX_op_setcond_i64:
-#endif
if (args[k] < ARRAY_SIZE(cond_name) && cond_name[args[k]]) {
qemu_log(",%s", cond_name[args[k++]]);
} else {
@@ -2103,16 +2093,12 @@ static inline int tcg_gen_code_common(TCGContext *s, uint8_t *gen_code_buf,
#endif
switch(opc) {
case INDEX_op_mov_i32:
-#if TCG_TARGET_REG_BITS == 64
case INDEX_op_mov_i64:
-#endif
dead_args = s->op_dead_args[op_index];
tcg_reg_alloc_mov(s, def, args, dead_args);
break;
case INDEX_op_movi_i32:
-#if TCG_TARGET_REG_BITS == 64
case INDEX_op_movi_i64:
-#endif
tcg_reg_alloc_movi(s, args);
break;
case INDEX_op_debug_insn_start:
--
1.7.10.4
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH 0/9] tcg/optimize: rework copy propagation
2012-09-19 20:00 [Qemu-devel] [PATCH 0/9] tcg/optimize: rework copy propagation Aurelien Jarno
` (8 preceding siblings ...)
2012-09-19 20:00 ` [Qemu-devel] [PATCH 9/9] tcg: remove #ifdef #endif around TCGOpcode tests Aurelien Jarno
@ 2012-09-21 12:36 ` Laurent Desnogues
9 siblings, 0 replies; 22+ messages in thread
From: Laurent Desnogues @ 2012-09-21 12:36 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: qemu-devel
On Wed, Sep 19, 2012 at 10:00 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> This patch series rework the copy propagation in order to generate
> better code. The first two patches are cleanup and bug fixes, the third
> patch is the heart of the series, and the remaining ones are small
> optimizations using the new copy propagation.
>
> I have measured a decrease of the generated code size of about 4%, with
> a gain in speed between 0 and 2% depending on the workload.
>
> For better benefits in ARM emulation, it should be used with the setcond
> patches series I have sent a few days ago.
Using this patch series along with the ARM one, I got a nice gain of
speed in user mode for SPECCPU 2k gcc: without the patches the
integrate.i data set runs in 13.6s, with them the time is reduced to 12.4s
(host CPU is i5 2400 64-bit mode). Nice job!
Laurent
> Aurelien Jarno (9):
> tcg/optimizer: remove TCG_TEMP_ANY
> tcg/optimizer: check types in copy propagation
> tcg/optimizer: rework copy progagation
> tcg/optimize: do copy propagation for all operations
> tcg/optimize: optimize "op r, a, a => mov r, a"
> tcg/optimize: optimize "op r, a, a => movi r, 0"
> tcg/optimize: further optimize brcond/setcond
> tcg/optimize: prefer the "op a, a, b" form for commutative ops
> tcg: remove #ifdef #endif around TCGOpcode tests
>
> tcg/optimize.c | 326 ++++++++++++++++++++++++++++++++++----------------------
> tcg/tcg.c | 16 +--
> 2 files changed, 200 insertions(+), 142 deletions(-)
>
> --
> 1.7.10.4
>
>
^ permalink raw reply [flat|nested] 22+ messages in thread