* [Qemu-devel] [PATCH v3 1/4] tcg/optimize: fix known-zero bits for right shift ops
2013-12-11 14:13 [Qemu-devel] [PATCH v3 0/4] tcg/optimize: fixes and improvements Aurelien Jarno
@ 2013-12-11 14:13 ` Aurelien Jarno
2013-12-11 14:13 ` [Qemu-devel] [PATCH v3 2/4] tcg/optimize: fix known-zero bits optimization Aurelien Jarno
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Aurelien Jarno @ 2013-12-11 14:13 UTC (permalink / raw)
To: qemu-devel; +Cc: Paolo Bonzini, qemu-stable, Aurelien Jarno
32-bit versions of sar and shr ops should not propagate known-zero bits
from the unused 32 high bits. For sar it could even lead to wrong code
being generated.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
tcg/optimize.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 89e2d6a..c03d2f0 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -726,16 +726,29 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
mask = temps[args[1]].mask & mask;
break;
- CASE_OP_32_64(sar):
+ case INDEX_op_sar_i32:
+ if (temps[args[2]].state == TCG_TEMP_CONST) {
+ mask = ((int32_t)temps[args[1]].mask
+ >> temps[args[2]].val);
+ }
+ break;
+ case INDEX_op_sar_i64:
if (temps[args[2]].state == TCG_TEMP_CONST) {
- mask = ((tcg_target_long)temps[args[1]].mask
+ mask = ((int64_t)temps[args[1]].mask
>> temps[args[2]].val);
}
break;
- CASE_OP_32_64(shr):
+ case INDEX_op_shr_i32:
if (temps[args[2]].state == TCG_TEMP_CONST) {
- mask = temps[args[1]].mask >> temps[args[2]].val;
+ mask = ((uint32_t)temps[args[1]].mask
+ >> temps[args[2]].val);
+ }
+ break;
+ case INDEX_op_shr_i64:
+ if (temps[args[2]].state == TCG_TEMP_CONST) {
+ mask = ((uint64_t)temps[args[1]].mask
+ >> temps[args[2]].val);
}
break;
--
1.7.10.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH v3 2/4] tcg/optimize: fix known-zero bits optimization
2013-12-11 14:13 [Qemu-devel] [PATCH v3 0/4] tcg/optimize: fixes and improvements Aurelien Jarno
2013-12-11 14:13 ` [Qemu-devel] [PATCH v3 1/4] tcg/optimize: fix known-zero bits for right shift ops Aurelien Jarno
@ 2013-12-11 14:13 ` Aurelien Jarno
2013-12-11 14:13 ` [Qemu-devel] [PATCH v3 3/4] tcg/optimize: improve known-zero bits for 32-bit ops Aurelien Jarno
2013-12-11 14:13 ` [Qemu-devel] [PATCH v3 4/4] tcg/optimize: add known-zero bits compute for load ops Aurelien Jarno
3 siblings, 0 replies; 6+ messages in thread
From: Aurelien Jarno @ 2013-12-11 14:13 UTC (permalink / raw)
To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno
Known-zero bits optimization is a great idea that helps to generate more
optimized code. However the current implementation only works in very few
cases as the computed mask is not saved.
Fix this to make it really working.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
tcg/optimize.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index c03d2f0..342c6e5 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -691,7 +691,8 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
break;
}
- /* Simplify using known-zero bits */
+ /* Simplify using known-zero bits. Currently only ops with a single
+ output argument is supported. */
mask = -1;
affected = -1;
switch (op) {
@@ -1153,6 +1154,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
} else {
for (i = 0; i < def->nb_oargs; i++) {
reset_temp(args[i]);
+ /* Save the corresponding known-zero bits mask for the
+ first output argument (only one supported so far). */
+ if (i == 0) {
+ temps[args[i]].mask = mask;
+ }
}
}
for (i = 0; i < def->nb_args; i++) {
--
1.7.10.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH v3 3/4] tcg/optimize: improve known-zero bits for 32-bit ops
2013-12-11 14:13 [Qemu-devel] [PATCH v3 0/4] tcg/optimize: fixes and improvements Aurelien Jarno
2013-12-11 14:13 ` [Qemu-devel] [PATCH v3 1/4] tcg/optimize: fix known-zero bits for right shift ops Aurelien Jarno
2013-12-11 14:13 ` [Qemu-devel] [PATCH v3 2/4] tcg/optimize: fix known-zero bits optimization Aurelien Jarno
@ 2013-12-11 14:13 ` Aurelien Jarno
2013-12-11 14:13 ` [Qemu-devel] [PATCH v3 4/4] tcg/optimize: add known-zero bits compute for load ops Aurelien Jarno
3 siblings, 0 replies; 6+ messages in thread
From: Aurelien Jarno @ 2013-12-11 14:13 UTC (permalink / raw)
To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno
The shl_i32 op might set some bits of the unused 32 high bits of the
mask. Fix that by clearing the unused 32 high bits for all 32-bit ops
except load/store which operate on tl values.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
tcg/optimize.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 342c6e5..e14b564 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -787,6 +787,12 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
break;
}
+ /* 32-bit ops (non 64-bit ops and non load/store ops) generate 32-bit
+ results */
+ if (!(tcg_op_defs[op].flags & (TCG_OPF_CALL_CLOBBER | TCG_OPF_64BIT))) {
+ mask &= 0xffffffffu;
+ }
+
if (mask == 0) {
assert(def->nb_oargs == 1);
s->gen_opc_buf[op_index] = op_to_movi(op);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH v3 4/4] tcg/optimize: add known-zero bits compute for load ops
2013-12-11 14:13 [Qemu-devel] [PATCH v3 0/4] tcg/optimize: fixes and improvements Aurelien Jarno
` (2 preceding siblings ...)
2013-12-11 14:13 ` [Qemu-devel] [PATCH v3 3/4] tcg/optimize: improve known-zero bits for 32-bit ops Aurelien Jarno
@ 2013-12-11 14:13 ` Aurelien Jarno
2013-12-11 19:34 ` Richard Henderson
3 siblings, 1 reply; 6+ messages in thread
From: Aurelien Jarno @ 2013-12-11 14:13 UTC (permalink / raw)
To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno, Richard Henderson
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
tcg/optimize.c | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index e14b564..db2b079 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -783,6 +783,39 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
mask = temps[args[3]].mask | temps[args[4]].mask;
break;
+ CASE_OP_32_64(ld8u):
+ case INDEX_op_qemu_ld8u:
+ mask = 0xff;
+ break;
+ CASE_OP_32_64(ld16u):
+ case INDEX_op_qemu_ld16u:
+ mask = 0xffff;
+ break;
+ case INDEX_op_ld32u_i64:
+ case INDEX_op_qemu_ld32u:
+ mask = 0xffffffffu;
+ break;
+
+ case INDEX_op_qemu_ld_i32:
+ case INDEX_op_qemu_ld_i64:
+ {
+ const TCGMemOp opc = args[def->nb_oargs + def->nb_iargs];
+ if (!(opc & MO_SIGN)) {
+ switch (opc & MO_SIZE) {
+ case MO_8:
+ mask = 0xff;
+ break;
+ case MO_16:
+ mask = 0xffff;
+ break;
+ case MO_32:
+ mask = 0xffffffffu;
+ break;
+ }
+ }
+ }
+ break;
+
default:
break;
}
--
1.7.10.4
^ permalink raw reply related [flat|nested] 6+ messages in thread