* [Qemu-devel] [PATCH 0/8] tcg optimization improvements
@ 2014-01-31 14:46 Richard Henderson
2014-01-31 14:46 ` [Qemu-devel] [PATCH 1/8] tcg/optimize: fix known-zero bits for right shift ops Richard Henderson
` (9 more replies)
0 siblings, 10 replies; 15+ messages in thread
From: Richard Henderson @ 2014-01-31 14:46 UTC (permalink / raw)
To: qemu-devel; +Cc: aurelien
The first 4 of these are ones that Aurelien posted some time ago,
and I reviewed, but never seemed to get committed.
The second 4 address optimization issues that I noticed with the
BMI instruction set extension, adding ANDC support to x86_64.
r~
Aurelien Jarno (4):
tcg/optimize: fix known-zero bits for right shift ops
tcg/optimize: fix known-zero bits optimization
tcg/optimize: improve known-zero bits for 32-bit ops
tcg/optimize: add known-zero bits compute for load ops
Richard Henderson (4):
tcg/optimize: Handle known-zeros masks for ANDC
tcg/optimize: Simply some logical ops to NOT
tcg/optimize: Optmize ANDC X,Y,Y to MOV X,0
tcg/optimize: Add more identity simplifications
tcg/optimize.c | 163 +++++++++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 142 insertions(+), 21 deletions(-)
--
1.8.5.3
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 1/8] tcg/optimize: fix known-zero bits for right shift ops
2014-01-31 14:46 [Qemu-devel] [PATCH 0/8] tcg optimization improvements Richard Henderson
@ 2014-01-31 14:46 ` Richard Henderson
2014-01-31 14:46 ` [Qemu-devel] [PATCH 2/8] tcg/optimize: fix known-zero bits optimization Richard Henderson
` (8 subsequent siblings)
9 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2014-01-31 14:46 UTC (permalink / raw)
To: qemu-devel; +Cc: Paolo Bonzini, qemu-stable, aurelien
From: Aurelien Jarno <aurelien@aurel32.net>
32-bit versions of sar and shr ops should not propagate known-zero bits
from the unused 32 high bits. For sar it could even lead to wrong code
being generated.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/optimize.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 89e2d6a..c5cdde2 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -726,16 +726,25 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
mask = temps[args[1]].mask & mask;
break;
- CASE_OP_32_64(sar):
+ case INDEX_op_sar_i32:
+ if (temps[args[2]].state == TCG_TEMP_CONST) {
+ mask = (int32_t)temps[args[1]].mask >> temps[args[2]].val;
+ }
+ break;
+ case INDEX_op_sar_i64:
if (temps[args[2]].state == TCG_TEMP_CONST) {
- mask = ((tcg_target_long)temps[args[1]].mask
- >> temps[args[2]].val);
+ mask = (int64_t)temps[args[1]].mask >> temps[args[2]].val;
}
break;
- CASE_OP_32_64(shr):
+ case INDEX_op_shr_i32:
+ if (temps[args[2]].state == TCG_TEMP_CONST) {
+ mask = (uint32_t)temps[args[1]].mask >> temps[args[2]].val;
+ }
+ break;
+ case INDEX_op_shr_i64:
if (temps[args[2]].state == TCG_TEMP_CONST) {
- mask = temps[args[1]].mask >> temps[args[2]].val;
+ mask = (uint64_t)temps[args[1]].mask >> temps[args[2]].val;
}
break;
--
1.8.5.3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 2/8] tcg/optimize: fix known-zero bits optimization
2014-01-31 14:46 [Qemu-devel] [PATCH 0/8] tcg optimization improvements Richard Henderson
2014-01-31 14:46 ` [Qemu-devel] [PATCH 1/8] tcg/optimize: fix known-zero bits for right shift ops Richard Henderson
@ 2014-01-31 14:46 ` Richard Henderson
2014-01-31 14:46 ` [Qemu-devel] [PATCH 3/8] tcg/optimize: improve known-zero bits for 32-bit ops Richard Henderson
` (7 subsequent siblings)
9 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2014-01-31 14:46 UTC (permalink / raw)
To: qemu-devel; +Cc: Paolo Bonzini, aurelien
From: Aurelien Jarno <aurelien@aurel32.net>
Known-zero bits optimization is a great idea that helps to generate more
optimized code. However the current implementation only works in very few
cases as the computed mask is not saved.
Fix this to make it really working.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/optimize.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index c5cdde2..7838be2 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -691,7 +691,8 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
break;
}
- /* Simplify using known-zero bits */
+ /* Simplify using known-zero bits. Currently only ops with a single
+ output argument is supported. */
mask = -1;
affected = -1;
switch (op) {
@@ -1149,6 +1150,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
} else {
for (i = 0; i < def->nb_oargs; i++) {
reset_temp(args[i]);
+ /* Save the corresponding known-zero bits mask for the
+ first output argument (only one supported so far). */
+ if (i == 0) {
+ temps[args[i]].mask = mask;
+ }
}
}
for (i = 0; i < def->nb_args; i++) {
--
1.8.5.3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 3/8] tcg/optimize: improve known-zero bits for 32-bit ops
2014-01-31 14:46 [Qemu-devel] [PATCH 0/8] tcg optimization improvements Richard Henderson
2014-01-31 14:46 ` [Qemu-devel] [PATCH 1/8] tcg/optimize: fix known-zero bits for right shift ops Richard Henderson
2014-01-31 14:46 ` [Qemu-devel] [PATCH 2/8] tcg/optimize: fix known-zero bits optimization Richard Henderson
@ 2014-01-31 14:46 ` Richard Henderson
2014-01-31 14:46 ` [Qemu-devel] [PATCH 4/8] tcg/optimize: add known-zero bits compute for load ops Richard Henderson
` (6 subsequent siblings)
9 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2014-01-31 14:46 UTC (permalink / raw)
To: qemu-devel; +Cc: Paolo Bonzini, aurelien
From: Aurelien Jarno <aurelien@aurel32.net>
The shl_i32 op might set some bits of the unused 32 high bits of the
mask. Fix that by clearing the unused 32 high bits for all 32-bit ops
except load/store which operate on tl values.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/optimize.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 7838be2..1cf017a 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -783,6 +783,12 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
break;
}
+ /* 32-bit ops (non 64-bit ops and non load/store ops) generate 32-bit
+ results */
+ if (!(tcg_op_defs[op].flags & (TCG_OPF_CALL_CLOBBER | TCG_OPF_64BIT))) {
+ mask &= 0xffffffffu;
+ }
+
if (mask == 0) {
assert(def->nb_oargs == 1);
s->gen_opc_buf[op_index] = op_to_movi(op);
--
1.8.5.3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 4/8] tcg/optimize: add known-zero bits compute for load ops
2014-01-31 14:46 [Qemu-devel] [PATCH 0/8] tcg optimization improvements Richard Henderson
` (2 preceding siblings ...)
2014-01-31 14:46 ` [Qemu-devel] [PATCH 3/8] tcg/optimize: improve known-zero bits for 32-bit ops Richard Henderson
@ 2014-01-31 14:46 ` Richard Henderson
2014-01-31 14:47 ` [Qemu-devel] [PATCH 5/8] tcg/optimize: Handle known-zeros masks for ANDC Richard Henderson
` (5 subsequent siblings)
9 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2014-01-31 14:46 UTC (permalink / raw)
To: qemu-devel; +Cc: Paolo Bonzini, aurelien
From: Aurelien Jarno <aurelien@aurel32.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/optimize.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 1cf017a..d3b099a 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -779,13 +779,35 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
mask = temps[args[3]].mask | temps[args[4]].mask;
break;
+ CASE_OP_32_64(ld8u):
+ case INDEX_op_qemu_ld8u:
+ mask = 0xff;
+ break;
+ CASE_OP_32_64(ld16u):
+ case INDEX_op_qemu_ld16u:
+ mask = 0xffff;
+ break;
+ case INDEX_op_ld32u_i64:
+ case INDEX_op_qemu_ld32u:
+ mask = 0xffffffffu;
+ break;
+
+ CASE_OP_32_64(qemu_ld):
+ {
+ TCGMemOp mop = args[def->nb_oargs + def->nb_iargs];
+ if (!(mop & MO_SIGN)) {
+ mask = (2ULL << ((8 << (mop & MO_SIZE)) - 1)) - 1;
+ }
+ }
+ break;
+
default:
break;
}
/* 32-bit ops (non 64-bit ops and non load/store ops) generate 32-bit
results */
- if (!(tcg_op_defs[op].flags & (TCG_OPF_CALL_CLOBBER | TCG_OPF_64BIT))) {
+ if (!(def->flags & (TCG_OPF_CALL_CLOBBER | TCG_OPF_64BIT))) {
mask &= 0xffffffffu;
}
--
1.8.5.3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 5/8] tcg/optimize: Handle known-zeros masks for ANDC
2014-01-31 14:46 [Qemu-devel] [PATCH 0/8] tcg optimization improvements Richard Henderson
` (3 preceding siblings ...)
2014-01-31 14:46 ` [Qemu-devel] [PATCH 4/8] tcg/optimize: add known-zero bits compute for load ops Richard Henderson
@ 2014-01-31 14:47 ` Richard Henderson
2014-02-16 18:12 ` Aurelien Jarno
2014-01-31 14:47 ` [Qemu-devel] [PATCH 6/8] tcg/optimize: Simply some logical ops to NOT Richard Henderson
` (4 subsequent siblings)
9 siblings, 1 reply; 15+ messages in thread
From: Richard Henderson @ 2014-01-31 14:47 UTC (permalink / raw)
To: qemu-devel; +Cc: aurelien
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/optimize.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index d3b099a..3291a08 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -727,6 +727,17 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
mask = temps[args[1]].mask & mask;
break;
+ CASE_OP_32_64(andc):
+ /* Known-zeros does not imply known-ones. Therefore unless
+ args[2] is constant, we can't infer anything from it. */
+ if (temps[args[2]].state == TCG_TEMP_CONST) {
+ mask = ~temps[args[2]].mask;
+ goto and_const;
+ }
+ /* But we certainly know nothing outside args[1] may be set. */
+ mask = temps[args[1]].mask;
+ break;
+
case INDEX_op_sar_i32:
if (temps[args[2]].state == TCG_TEMP_CONST) {
mask = (int32_t)temps[args[1]].mask >> temps[args[2]].val;
--
1.8.5.3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 6/8] tcg/optimize: Simply some logical ops to NOT
2014-01-31 14:46 [Qemu-devel] [PATCH 0/8] tcg optimization improvements Richard Henderson
` (4 preceding siblings ...)
2014-01-31 14:47 ` [Qemu-devel] [PATCH 5/8] tcg/optimize: Handle known-zeros masks for ANDC Richard Henderson
@ 2014-01-31 14:47 ` Richard Henderson
2014-02-16 18:27 ` Aurelien Jarno
2014-01-31 14:47 ` [Qemu-devel] [PATCH 7/8] tcg/optimize: Optmize ANDC X, Y, Y to MOV X, 0 Richard Henderson
` (3 subsequent siblings)
9 siblings, 1 reply; 15+ messages in thread
From: Richard Henderson @ 2014-01-31 14:47 UTC (permalink / raw)
To: qemu-devel; +Cc: aurelien
Given, of course, an appropriate constant. These could be generated
from the "canonical" operation for inversion on the guest, or via
other optimizations.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/optimize.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 57 insertions(+)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 3291a08..cdfc746 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -655,6 +655,63 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
}
}
break;
+ CASE_OP_32_64(xor):
+ CASE_OP_32_64(nand):
+ if (temps[args[1]].state != TCG_TEMP_CONST
+ && temps[args[2]].state == TCG_TEMP_CONST
+ && temps[args[2]].val == -1) {
+ i = 1;
+ goto try_not;
+ }
+ break;
+ CASE_OP_32_64(nor):
+ if (temps[args[1]].state != TCG_TEMP_CONST
+ && temps[args[2]].state == TCG_TEMP_CONST
+ && temps[args[2]].val == 0) {
+ i = 1;
+ goto try_not;
+ }
+ break;
+ CASE_OP_32_64(andc):
+ if (temps[args[2]].state != TCG_TEMP_CONST
+ && temps[args[1]].state == TCG_TEMP_CONST
+ && temps[args[1]].val == -1) {
+ i = 2;
+ goto try_not;
+ }
+ break;
+ CASE_OP_32_64(orc):
+ CASE_OP_32_64(eqv):
+ if (temps[args[2]].state != TCG_TEMP_CONST
+ && temps[args[1]].state == TCG_TEMP_CONST
+ && temps[args[1]].val == 0) {
+ i = 2;
+ goto try_not;
+ }
+ break;
+ try_not:
+ {
+ TCGOpcode not_op;
+ bool have_not;
+
+ if (def->flags & TCG_OPF_64BIT) {
+ not_op = INDEX_op_not_i64;
+ have_not = TCG_TARGET_HAS_not_i64;
+ } else {
+ not_op = INDEX_op_not_i32;
+ have_not = TCG_TARGET_HAS_not_i32;
+ }
+ if (!have_not) {
+ break;
+ }
+ s->gen_opc_buf[op_index] = not_op;
+ reset_temp(args[0]);
+ gen_args[0] = args[0];
+ gen_args[1] = args[i];
+ args += 3;
+ gen_args += 2;
+ continue;
+ }
default:
break;
}
--
1.8.5.3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 7/8] tcg/optimize: Optmize ANDC X, Y, Y to MOV X, 0
2014-01-31 14:46 [Qemu-devel] [PATCH 0/8] tcg optimization improvements Richard Henderson
` (5 preceding siblings ...)
2014-01-31 14:47 ` [Qemu-devel] [PATCH 6/8] tcg/optimize: Simply some logical ops to NOT Richard Henderson
@ 2014-01-31 14:47 ` Richard Henderson
2014-02-16 18:27 ` Aurelien Jarno
2014-01-31 14:47 ` [Qemu-devel] [PATCH 8/8] tcg/optimize: Add more identity simplifications Richard Henderson
` (2 subsequent siblings)
9 siblings, 1 reply; 15+ messages in thread
From: Richard Henderson @ 2014-01-31 14:47 UTC (permalink / raw)
To: qemu-devel; +Cc: aurelien
Like we already do for SUB and XOR.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/optimize.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index cdfc746..a703f8c 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -945,6 +945,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
/* Simplify expression for "op r, a, a => movi r, 0" cases */
switch (op) {
+ CASE_OP_32_64(andc):
CASE_OP_32_64(sub):
CASE_OP_32_64(xor):
if (temps_are_copies(args[1], args[2])) {
--
1.8.5.3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 8/8] tcg/optimize: Add more identity simplifications
2014-01-31 14:46 [Qemu-devel] [PATCH 0/8] tcg optimization improvements Richard Henderson
` (6 preceding siblings ...)
2014-01-31 14:47 ` [Qemu-devel] [PATCH 7/8] tcg/optimize: Optmize ANDC X, Y, Y to MOV X, 0 Richard Henderson
@ 2014-01-31 14:47 ` Richard Henderson
2014-02-16 18:30 ` Aurelien Jarno
2014-02-14 21:44 ` [Qemu-devel] [PATCH 0/8] tcg optimization improvements Richard Henderson
2014-02-16 14:15 ` Paolo Bonzini
9 siblings, 1 reply; 15+ messages in thread
From: Richard Henderson @ 2014-01-31 14:47 UTC (permalink / raw)
To: qemu-devel; +Cc: aurelien
Recognize 0 operand to andc, and -1 operands to and, orc, eqv.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/optimize.c | 39 ++++++++++++++++++++++++---------------
1 file changed, 24 insertions(+), 15 deletions(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index a703f8c..8d7100e 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -716,7 +716,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
break;
}
- /* Simplify expression for "op r, a, 0 => mov r, a" cases */
+ /* Simplify expression for "op r, a, const => mov r, a" cases */
switch (op) {
CASE_OP_32_64(add):
CASE_OP_32_64(sub):
@@ -727,23 +727,32 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
CASE_OP_32_64(rotr):
CASE_OP_32_64(or):
CASE_OP_32_64(xor):
- if (temps[args[1]].state == TCG_TEMP_CONST) {
- /* Proceed with possible constant folding. */
- break;
- }
- if (temps[args[2]].state == TCG_TEMP_CONST
+ CASE_OP_32_64(andc):
+ if (temps[args[1]].state != TCG_TEMP_CONST
+ && temps[args[2]].state == TCG_TEMP_CONST
&& temps[args[2]].val == 0) {
- if (temps_are_copies(args[0], args[1])) {
- s->gen_opc_buf[op_index] = INDEX_op_nop;
- } else {
- s->gen_opc_buf[op_index] = op_to_mov(op);
- tcg_opt_gen_mov(s, gen_args, args[0], args[1]);
- gen_args += 2;
- }
- args += 3;
- continue;
+ goto do_mov3;
}
break;
+ CASE_OP_32_64(and):
+ CASE_OP_32_64(orc):
+ CASE_OP_32_64(eqv):
+ if (temps[args[1]].state != TCG_TEMP_CONST
+ && temps[args[2]].state == TCG_TEMP_CONST
+ && temps[args[2]].val == -1) {
+ goto do_mov3;
+ }
+ break;
+ do_mov3:
+ if (temps_are_copies(args[0], args[1])) {
+ s->gen_opc_buf[op_index] = INDEX_op_nop;
+ } else {
+ s->gen_opc_buf[op_index] = op_to_mov(op);
+ tcg_opt_gen_mov(s, gen_args, args[0], args[1]);
+ gen_args += 2;
+ }
+ args += 3;
+ continue;
default:
break;
}
--
1.8.5.3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 0/8] tcg optimization improvements
2014-01-31 14:46 [Qemu-devel] [PATCH 0/8] tcg optimization improvements Richard Henderson
` (7 preceding siblings ...)
2014-01-31 14:47 ` [Qemu-devel] [PATCH 8/8] tcg/optimize: Add more identity simplifications Richard Henderson
@ 2014-02-14 21:44 ` Richard Henderson
2014-02-16 14:15 ` Paolo Bonzini
9 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2014-02-14 21:44 UTC (permalink / raw)
To: qemu-devel; +Cc: aurelien
Ping.
On 01/31/2014 06:46 AM, Richard Henderson wrote:
> The first 4 of these are ones that Aurelien posted some time ago,
> and I reviewed, but never seemed to get committed.
>
> The second 4 address optimization issues that I noticed with the
> BMI instruction set extension, adding ANDC support to x86_64.
>
>
> r~
>
>
> Aurelien Jarno (4):
> tcg/optimize: fix known-zero bits for right shift ops
> tcg/optimize: fix known-zero bits optimization
> tcg/optimize: improve known-zero bits for 32-bit ops
> tcg/optimize: add known-zero bits compute for load ops
>
> Richard Henderson (4):
> tcg/optimize: Handle known-zeros masks for ANDC
> tcg/optimize: Simply some logical ops to NOT
> tcg/optimize: Optmize ANDC X,Y,Y to MOV X,0
> tcg/optimize: Add more identity simplifications
>
> tcg/optimize.c | 163 +++++++++++++++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 142 insertions(+), 21 deletions(-)
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 0/8] tcg optimization improvements
2014-01-31 14:46 [Qemu-devel] [PATCH 0/8] tcg optimization improvements Richard Henderson
` (8 preceding siblings ...)
2014-02-14 21:44 ` [Qemu-devel] [PATCH 0/8] tcg optimization improvements Richard Henderson
@ 2014-02-16 14:15 ` Paolo Bonzini
9 siblings, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2014-02-16 14:15 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: aurelien
Il 31/01/2014 15:46, Richard Henderson ha scritto:
> The first 4 of these are ones that Aurelien posted some time ago,
> and I reviewed, but never seemed to get committed.
>
> The second 4 address optimization issues that I noticed with the
> BMI instruction set extension, adding ANDC support to x86_64.
>
>
> r~
>
>
> Aurelien Jarno (4):
> tcg/optimize: fix known-zero bits for right shift ops
> tcg/optimize: fix known-zero bits optimization
> tcg/optimize: improve known-zero bits for 32-bit ops
> tcg/optimize: add known-zero bits compute for load ops
>
> Richard Henderson (4):
> tcg/optimize: Handle known-zeros masks for ANDC
> tcg/optimize: Simply some logical ops to NOT
> tcg/optimize: Optmize ANDC X,Y,Y to MOV X,0
> tcg/optimize: Add more identity simplifications
>
> tcg/optimize.c | 163 +++++++++++++++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 142 insertions(+), 21 deletions(-)
>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 5/8] tcg/optimize: Handle known-zeros masks for ANDC
2014-01-31 14:47 ` [Qemu-devel] [PATCH 5/8] tcg/optimize: Handle known-zeros masks for ANDC Richard Henderson
@ 2014-02-16 18:12 ` Aurelien Jarno
0 siblings, 0 replies; 15+ messages in thread
From: Aurelien Jarno @ 2014-02-16 18:12 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel
On Fri, Jan 31, 2014 at 08:47:00AM -0600, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
> tcg/optimize.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index d3b099a..3291a08 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -727,6 +727,17 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
> mask = temps[args[1]].mask & mask;
> break;
>
> + CASE_OP_32_64(andc):
> + /* Known-zeros does not imply known-ones. Therefore unless
> + args[2] is constant, we can't infer anything from it. */
> + if (temps[args[2]].state == TCG_TEMP_CONST) {
> + mask = ~temps[args[2]].mask;
> + goto and_const;
> + }
> + /* But we certainly know nothing outside args[1] may be set. */
> + mask = temps[args[1]].mask;
> + break;
> +
> case INDEX_op_sar_i32:
> if (temps[args[2]].state == TCG_TEMP_CONST) {
> mask = (int32_t)temps[args[1]].mask >> temps[args[2]].val;
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 6/8] tcg/optimize: Simply some logical ops to NOT
2014-01-31 14:47 ` [Qemu-devel] [PATCH 6/8] tcg/optimize: Simply some logical ops to NOT Richard Henderson
@ 2014-02-16 18:27 ` Aurelien Jarno
0 siblings, 0 replies; 15+ messages in thread
From: Aurelien Jarno @ 2014-02-16 18:27 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel
On Fri, Jan 31, 2014 at 08:47:01AM -0600, Richard Henderson wrote:
> Given, of course, an appropriate constant. These could be generated
> from the "canonical" operation for inversion on the guest, or via
> other optimizations.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
> tcg/optimize.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 57 insertions(+)
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 3291a08..cdfc746 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -655,6 +655,63 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
> }
> }
> break;
> + CASE_OP_32_64(xor):
> + CASE_OP_32_64(nand):
> + if (temps[args[1]].state != TCG_TEMP_CONST
> + && temps[args[2]].state == TCG_TEMP_CONST
> + && temps[args[2]].val == -1) {
> + i = 1;
> + goto try_not;
> + }
> + break;
> + CASE_OP_32_64(nor):
> + if (temps[args[1]].state != TCG_TEMP_CONST
> + && temps[args[2]].state == TCG_TEMP_CONST
> + && temps[args[2]].val == 0) {
> + i = 1;
> + goto try_not;
> + }
> + break;
> + CASE_OP_32_64(andc):
> + if (temps[args[2]].state != TCG_TEMP_CONST
> + && temps[args[1]].state == TCG_TEMP_CONST
> + && temps[args[1]].val == -1) {
> + i = 2;
> + goto try_not;
> + }
> + break;
> + CASE_OP_32_64(orc):
> + CASE_OP_32_64(eqv):
> + if (temps[args[2]].state != TCG_TEMP_CONST
> + && temps[args[1]].state == TCG_TEMP_CONST
> + && temps[args[1]].val == 0) {
> + i = 2;
> + goto try_not;
> + }
> + break;
> + try_not:
> + {
> + TCGOpcode not_op;
> + bool have_not;
> +
> + if (def->flags & TCG_OPF_64BIT) {
> + not_op = INDEX_op_not_i64;
> + have_not = TCG_TARGET_HAS_not_i64;
> + } else {
> + not_op = INDEX_op_not_i32;
> + have_not = TCG_TARGET_HAS_not_i32;
> + }
> + if (!have_not) {
> + break;
> + }
> + s->gen_opc_buf[op_index] = not_op;
> + reset_temp(args[0]);
> + gen_args[0] = args[0];
> + gen_args[1] = args[i];
> + args += 3;
> + gen_args += 2;
> + continue;
> + }
> default:
> break;
> }
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 7/8] tcg/optimize: Optmize ANDC X, Y, Y to MOV X, 0
2014-01-31 14:47 ` [Qemu-devel] [PATCH 7/8] tcg/optimize: Optmize ANDC X, Y, Y to MOV X, 0 Richard Henderson
@ 2014-02-16 18:27 ` Aurelien Jarno
0 siblings, 0 replies; 15+ messages in thread
From: Aurelien Jarno @ 2014-02-16 18:27 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel
On Fri, Jan 31, 2014 at 08:47:02AM -0600, Richard Henderson wrote:
> Like we already do for SUB and XOR.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
> tcg/optimize.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index cdfc746..a703f8c 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -945,6 +945,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>
> /* Simplify expression for "op r, a, a => movi r, 0" cases */
> switch (op) {
> + CASE_OP_32_64(andc):
> CASE_OP_32_64(sub):
> CASE_OP_32_64(xor):
> if (temps_are_copies(args[1], args[2])) {
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 8/8] tcg/optimize: Add more identity simplifications
2014-01-31 14:47 ` [Qemu-devel] [PATCH 8/8] tcg/optimize: Add more identity simplifications Richard Henderson
@ 2014-02-16 18:30 ` Aurelien Jarno
0 siblings, 0 replies; 15+ messages in thread
From: Aurelien Jarno @ 2014-02-16 18:30 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel
On Fri, Jan 31, 2014 at 08:47:03AM -0600, Richard Henderson wrote:
> Recognize 0 operand to andc, and -1 operands to and, orc, eqv.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
> tcg/optimize.c | 39 ++++++++++++++++++++++++---------------
> 1 file changed, 24 insertions(+), 15 deletions(-)
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index a703f8c..8d7100e 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -716,7 +716,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
> break;
> }
>
> - /* Simplify expression for "op r, a, 0 => mov r, a" cases */
> + /* Simplify expression for "op r, a, const => mov r, a" cases */
> switch (op) {
> CASE_OP_32_64(add):
> CASE_OP_32_64(sub):
> @@ -727,23 +727,32 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
> CASE_OP_32_64(rotr):
> CASE_OP_32_64(or):
> CASE_OP_32_64(xor):
> - if (temps[args[1]].state == TCG_TEMP_CONST) {
> - /* Proceed with possible constant folding. */
> - break;
> - }
> - if (temps[args[2]].state == TCG_TEMP_CONST
> + CASE_OP_32_64(andc):
> + if (temps[args[1]].state != TCG_TEMP_CONST
> + && temps[args[2]].state == TCG_TEMP_CONST
> && temps[args[2]].val == 0) {
> - if (temps_are_copies(args[0], args[1])) {
> - s->gen_opc_buf[op_index] = INDEX_op_nop;
> - } else {
> - s->gen_opc_buf[op_index] = op_to_mov(op);
> - tcg_opt_gen_mov(s, gen_args, args[0], args[1]);
> - gen_args += 2;
> - }
> - args += 3;
> - continue;
> + goto do_mov3;
> }
> break;
> + CASE_OP_32_64(and):
> + CASE_OP_32_64(orc):
> + CASE_OP_32_64(eqv):
> + if (temps[args[1]].state != TCG_TEMP_CONST
> + && temps[args[2]].state == TCG_TEMP_CONST
> + && temps[args[2]].val == -1) {
> + goto do_mov3;
> + }
> + break;
> + do_mov3:
> + if (temps_are_copies(args[0], args[1])) {
> + s->gen_opc_buf[op_index] = INDEX_op_nop;
> + } else {
> + s->gen_opc_buf[op_index] = op_to_mov(op);
> + tcg_opt_gen_mov(s, gen_args, args[0], args[1]);
> + gen_args += 2;
> + }
> + args += 3;
> + continue;
> default:
> break;
> }
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2014-02-16 18:30 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-31 14:46 [Qemu-devel] [PATCH 0/8] tcg optimization improvements Richard Henderson
2014-01-31 14:46 ` [Qemu-devel] [PATCH 1/8] tcg/optimize: fix known-zero bits for right shift ops Richard Henderson
2014-01-31 14:46 ` [Qemu-devel] [PATCH 2/8] tcg/optimize: fix known-zero bits optimization Richard Henderson
2014-01-31 14:46 ` [Qemu-devel] [PATCH 3/8] tcg/optimize: improve known-zero bits for 32-bit ops Richard Henderson
2014-01-31 14:46 ` [Qemu-devel] [PATCH 4/8] tcg/optimize: add known-zero bits compute for load ops Richard Henderson
2014-01-31 14:47 ` [Qemu-devel] [PATCH 5/8] tcg/optimize: Handle known-zeros masks for ANDC Richard Henderson
2014-02-16 18:12 ` Aurelien Jarno
2014-01-31 14:47 ` [Qemu-devel] [PATCH 6/8] tcg/optimize: Simply some logical ops to NOT Richard Henderson
2014-02-16 18:27 ` Aurelien Jarno
2014-01-31 14:47 ` [Qemu-devel] [PATCH 7/8] tcg/optimize: Optmize ANDC X, Y, Y to MOV X, 0 Richard Henderson
2014-02-16 18:27 ` Aurelien Jarno
2014-01-31 14:47 ` [Qemu-devel] [PATCH 8/8] tcg/optimize: Add more identity simplifications Richard Henderson
2014-02-16 18:30 ` Aurelien Jarno
2014-02-14 21:44 ` [Qemu-devel] [PATCH 0/8] tcg optimization improvements Richard Henderson
2014-02-16 14:15 ` Paolo Bonzini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).