qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 0/4] tcg/optimize: fixes and improvements
@ 2013-12-11 14:13 Aurelien Jarno
  2013-12-11 14:13 ` [Qemu-devel] [PATCH v3 1/4] tcg/optimize: fix known-zero bits for right shift ops Aurelien Jarno
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Aurelien Jarno @ 2013-12-11 14:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

This patchset first fixes known-zero bits optimization so that it works                            
in more than a few cases, and does some further optimizations for 32-bit
ops and unsigned loads.
                                                                                                     
v2 -> v3:
- added support for the new INDEX_op_qemu_ld_{i32,i64} opcodes in patch 4

v1 -> v2:                                                                                                   
- swapped patches 1 & 2                                                                            
- Cc:ed qemu-stable for patch 1                                                                          
- improved description of patch 2 


Aurelien Jarno (4):
  tcg/optimize: fix known-zero bits for right shift ops
  tcg/optimize: fix known-zero bits optimization
  tcg/optimize: improve known-zero bits for 32-bit ops
  tcg/optimize: add known-zero bits compute for load ops

 tcg/optimize.c |   68 +++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 63 insertions(+), 5 deletions(-)

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH v3 1/4] tcg/optimize: fix known-zero bits for right shift ops
  2013-12-11 14:13 [Qemu-devel] [PATCH v3 0/4] tcg/optimize: fixes and improvements Aurelien Jarno
@ 2013-12-11 14:13 ` Aurelien Jarno
  2013-12-11 14:13 ` [Qemu-devel] [PATCH v3 2/4] tcg/optimize: fix known-zero bits optimization Aurelien Jarno
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Aurelien Jarno @ 2013-12-11 14:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, qemu-stable, Aurelien Jarno

32-bit versions of sar and shr ops should not propagate known-zero bits
from the unused 32 high bits. For sar it could even lead to wrong code
being generated.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c |   21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 89e2d6a..c03d2f0 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -726,16 +726,29 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             mask = temps[args[1]].mask & mask;
             break;
 
-        CASE_OP_32_64(sar):
+        case INDEX_op_sar_i32:
+            if (temps[args[2]].state == TCG_TEMP_CONST) {
+                mask = ((int32_t)temps[args[1]].mask
+                        >> temps[args[2]].val);
+            }
+            break;
+        case INDEX_op_sar_i64:
             if (temps[args[2]].state == TCG_TEMP_CONST) {
-                mask = ((tcg_target_long)temps[args[1]].mask
+                mask = ((int64_t)temps[args[1]].mask
                         >> temps[args[2]].val);
             }
             break;
 
-        CASE_OP_32_64(shr):
+        case INDEX_op_shr_i32:
             if (temps[args[2]].state == TCG_TEMP_CONST) {
-                mask = temps[args[1]].mask >> temps[args[2]].val;
+                mask = ((uint32_t)temps[args[1]].mask
+                        >> temps[args[2]].val);
+            }
+            break;
+        case INDEX_op_shr_i64:
+            if (temps[args[2]].state == TCG_TEMP_CONST) {
+                mask = ((uint64_t)temps[args[1]].mask
+                        >> temps[args[2]].val);
             }
             break;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH v3 2/4] tcg/optimize: fix known-zero bits optimization
  2013-12-11 14:13 [Qemu-devel] [PATCH v3 0/4] tcg/optimize: fixes and improvements Aurelien Jarno
  2013-12-11 14:13 ` [Qemu-devel] [PATCH v3 1/4] tcg/optimize: fix known-zero bits for right shift ops Aurelien Jarno
@ 2013-12-11 14:13 ` Aurelien Jarno
  2013-12-11 14:13 ` [Qemu-devel] [PATCH v3 3/4] tcg/optimize: improve known-zero bits for 32-bit ops Aurelien Jarno
  2013-12-11 14:13 ` [Qemu-devel] [PATCH v3 4/4] tcg/optimize: add known-zero bits compute for load ops Aurelien Jarno
  3 siblings, 0 replies; 6+ messages in thread
From: Aurelien Jarno @ 2013-12-11 14:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno

Known-zero bits optimization is a great idea that helps to generate more
optimized code. However the current implementation only works in very few
cases as the computed mask is not saved.

Fix this to make it really working.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index c03d2f0..342c6e5 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -691,7 +691,8 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             break;
         }
 
-        /* Simplify using known-zero bits */
+        /* Simplify using known-zero bits. Currently only ops with a single
+           output argument is supported. */
         mask = -1;
         affected = -1;
         switch (op) {
@@ -1153,6 +1154,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             } else {
                 for (i = 0; i < def->nb_oargs; i++) {
                     reset_temp(args[i]);
+                    /* Save the corresponding known-zero bits mask for the
+                       first output argument (only one supported so far). */
+                    if (i == 0) {
+                        temps[args[i]].mask = mask;
+                    }
                 }
             }
             for (i = 0; i < def->nb_args; i++) {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH v3 3/4] tcg/optimize: improve known-zero bits for 32-bit ops
  2013-12-11 14:13 [Qemu-devel] [PATCH v3 0/4] tcg/optimize: fixes and improvements Aurelien Jarno
  2013-12-11 14:13 ` [Qemu-devel] [PATCH v3 1/4] tcg/optimize: fix known-zero bits for right shift ops Aurelien Jarno
  2013-12-11 14:13 ` [Qemu-devel] [PATCH v3 2/4] tcg/optimize: fix known-zero bits optimization Aurelien Jarno
@ 2013-12-11 14:13 ` Aurelien Jarno
  2013-12-11 14:13 ` [Qemu-devel] [PATCH v3 4/4] tcg/optimize: add known-zero bits compute for load ops Aurelien Jarno
  3 siblings, 0 replies; 6+ messages in thread
From: Aurelien Jarno @ 2013-12-11 14:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno

The shl_i32 op might set some bits of the unused 32 high bits of the
mask. Fix that by clearing the unused 32 high bits for all 32-bit ops
except load/store which operate on tl values.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 342c6e5..e14b564 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -787,6 +787,12 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             break;
         }
 
+        /* 32-bit ops (non 64-bit ops and non load/store ops) generate 32-bit
+           results */
+        if (!(tcg_op_defs[op].flags & (TCG_OPF_CALL_CLOBBER | TCG_OPF_64BIT))) {
+            mask &= 0xffffffffu;
+        }
+
         if (mask == 0) {
             assert(def->nb_oargs == 1);
             s->gen_opc_buf[op_index] = op_to_movi(op);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH v3 4/4] tcg/optimize: add known-zero bits compute for load ops
  2013-12-11 14:13 [Qemu-devel] [PATCH v3 0/4] tcg/optimize: fixes and improvements Aurelien Jarno
                   ` (2 preceding siblings ...)
  2013-12-11 14:13 ` [Qemu-devel] [PATCH v3 3/4] tcg/optimize: improve known-zero bits for 32-bit ops Aurelien Jarno
@ 2013-12-11 14:13 ` Aurelien Jarno
  2013-12-11 19:34   ` Richard Henderson
  3 siblings, 1 reply; 6+ messages in thread
From: Aurelien Jarno @ 2013-12-11 14:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno, Richard Henderson

Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c |   33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index e14b564..db2b079 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -783,6 +783,39 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             mask = temps[args[3]].mask | temps[args[4]].mask;
             break;
 
+        CASE_OP_32_64(ld8u):
+        case INDEX_op_qemu_ld8u:
+            mask = 0xff;
+            break;
+        CASE_OP_32_64(ld16u):
+        case INDEX_op_qemu_ld16u:
+            mask = 0xffff;
+            break;
+        case INDEX_op_ld32u_i64:
+        case INDEX_op_qemu_ld32u:
+            mask = 0xffffffffu;
+            break;
+
+        case INDEX_op_qemu_ld_i32:
+        case INDEX_op_qemu_ld_i64:
+            {
+                const TCGMemOp opc = args[def->nb_oargs + def->nb_iargs];
+                if (!(opc & MO_SIGN)) {
+                    switch (opc & MO_SIZE) {
+                    case MO_8:
+                        mask = 0xff;
+                        break;
+                    case MO_16:
+                        mask = 0xffff;
+                        break;
+                    case MO_32:
+                        mask = 0xffffffffu;
+                        break;
+                    }
+                }
+            }
+            break;
+
         default:
             break;
         }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/4] tcg/optimize: add known-zero bits compute for load ops
  2013-12-11 14:13 ` [Qemu-devel] [PATCH v3 4/4] tcg/optimize: add known-zero bits compute for load ops Aurelien Jarno
@ 2013-12-11 19:34   ` Richard Henderson
  0 siblings, 0 replies; 6+ messages in thread
From: Richard Henderson @ 2013-12-11 19:34 UTC (permalink / raw)
  To: Aurelien Jarno, qemu-devel; +Cc: Paolo Bonzini

On 12/11/2013 06:13 AM, Aurelien Jarno wrote:
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
>  tcg/optimize.c |   33 +++++++++++++++++++++++++++++++++
>  1 file changed, 33 insertions(+)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-12-11 19:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-11 14:13 [Qemu-devel] [PATCH v3 0/4] tcg/optimize: fixes and improvements Aurelien Jarno
2013-12-11 14:13 ` [Qemu-devel] [PATCH v3 1/4] tcg/optimize: fix known-zero bits for right shift ops Aurelien Jarno
2013-12-11 14:13 ` [Qemu-devel] [PATCH v3 2/4] tcg/optimize: fix known-zero bits optimization Aurelien Jarno
2013-12-11 14:13 ` [Qemu-devel] [PATCH v3 3/4] tcg/optimize: improve known-zero bits for 32-bit ops Aurelien Jarno
2013-12-11 14:13 ` [Qemu-devel] [PATCH v3 4/4] tcg/optimize: add known-zero bits compute for load ops Aurelien Jarno
2013-12-11 19:34   ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).