qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/8] tcg optimization improvements
@ 2014-01-31 14:46 Richard Henderson
  2014-01-31 14:46 ` [Qemu-devel] [PATCH 1/8] tcg/optimize: fix known-zero bits for right shift ops Richard Henderson
                   ` (9 more replies)
  0 siblings, 10 replies; 15+ messages in thread
From: Richard Henderson @ 2014-01-31 14:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

The first 4 of these are ones that Aurelien posted some time ago,
and I reviewed, but never seemed to get committed.

The second 4 address optimization issues that I noticed with the
BMI instruction set extension, adding ANDC support to x86_64.


r~


Aurelien Jarno (4):
  tcg/optimize: fix known-zero bits for right shift ops
  tcg/optimize: fix known-zero bits optimization
  tcg/optimize: improve known-zero bits for 32-bit ops
  tcg/optimize: add known-zero bits compute for load ops

Richard Henderson (4):
  tcg/optimize: Handle known-zeros masks for ANDC
  tcg/optimize: Simply some logical ops to NOT
  tcg/optimize: Optmize ANDC X,Y,Y to MOV X,0
  tcg/optimize: Add more identity simplifications

 tcg/optimize.c | 163 +++++++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 142 insertions(+), 21 deletions(-)

-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 1/8] tcg/optimize: fix known-zero bits for right shift ops
  2014-01-31 14:46 [Qemu-devel] [PATCH 0/8] tcg optimization improvements Richard Henderson
@ 2014-01-31 14:46 ` Richard Henderson
  2014-01-31 14:46 ` [Qemu-devel] [PATCH 2/8] tcg/optimize: fix known-zero bits optimization Richard Henderson
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2014-01-31 14:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, qemu-stable, aurelien

From: Aurelien Jarno <aurelien@aurel32.net>

32-bit versions of sar and shr ops should not propagate known-zero bits
from the unused 32 high bits. For sar it could even lead to wrong code
being generated.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/optimize.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 89e2d6a..c5cdde2 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -726,16 +726,25 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             mask = temps[args[1]].mask & mask;
             break;
 
-        CASE_OP_32_64(sar):
+        case INDEX_op_sar_i32:
+            if (temps[args[2]].state == TCG_TEMP_CONST) {
+                mask = (int32_t)temps[args[1]].mask >> temps[args[2]].val;
+            }
+            break;
+        case INDEX_op_sar_i64:
             if (temps[args[2]].state == TCG_TEMP_CONST) {
-                mask = ((tcg_target_long)temps[args[1]].mask
-                        >> temps[args[2]].val);
+                mask = (int64_t)temps[args[1]].mask >> temps[args[2]].val;
             }
             break;
 
-        CASE_OP_32_64(shr):
+        case INDEX_op_shr_i32:
+            if (temps[args[2]].state == TCG_TEMP_CONST) {
+                mask = (uint32_t)temps[args[1]].mask >> temps[args[2]].val;
+            }
+            break;
+        case INDEX_op_shr_i64:
             if (temps[args[2]].state == TCG_TEMP_CONST) {
-                mask = temps[args[1]].mask >> temps[args[2]].val;
+                mask = (uint64_t)temps[args[1]].mask >> temps[args[2]].val;
             }
             break;
 
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 2/8] tcg/optimize: fix known-zero bits optimization
  2014-01-31 14:46 [Qemu-devel] [PATCH 0/8] tcg optimization improvements Richard Henderson
  2014-01-31 14:46 ` [Qemu-devel] [PATCH 1/8] tcg/optimize: fix known-zero bits for right shift ops Richard Henderson
@ 2014-01-31 14:46 ` Richard Henderson
  2014-01-31 14:46 ` [Qemu-devel] [PATCH 3/8] tcg/optimize: improve known-zero bits for 32-bit ops Richard Henderson
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2014-01-31 14:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, aurelien

From: Aurelien Jarno <aurelien@aurel32.net>

Known-zero bits optimization is a great idea that helps to generate more
optimized code. However the current implementation only works in very few
cases as the computed mask is not saved.

Fix this to make it really working.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/optimize.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index c5cdde2..7838be2 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -691,7 +691,8 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             break;
         }
 
-        /* Simplify using known-zero bits */
+        /* Simplify using known-zero bits. Currently only ops with a single
+           output argument is supported. */
         mask = -1;
         affected = -1;
         switch (op) {
@@ -1149,6 +1150,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             } else {
                 for (i = 0; i < def->nb_oargs; i++) {
                     reset_temp(args[i]);
+                    /* Save the corresponding known-zero bits mask for the
+                       first output argument (only one supported so far). */
+                    if (i == 0) {
+                        temps[args[i]].mask = mask;
+                    }
                 }
             }
             for (i = 0; i < def->nb_args; i++) {
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 3/8] tcg/optimize: improve known-zero bits for 32-bit ops
  2014-01-31 14:46 [Qemu-devel] [PATCH 0/8] tcg optimization improvements Richard Henderson
  2014-01-31 14:46 ` [Qemu-devel] [PATCH 1/8] tcg/optimize: fix known-zero bits for right shift ops Richard Henderson
  2014-01-31 14:46 ` [Qemu-devel] [PATCH 2/8] tcg/optimize: fix known-zero bits optimization Richard Henderson
@ 2014-01-31 14:46 ` Richard Henderson
  2014-01-31 14:46 ` [Qemu-devel] [PATCH 4/8] tcg/optimize: add known-zero bits compute for load ops Richard Henderson
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2014-01-31 14:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, aurelien

From: Aurelien Jarno <aurelien@aurel32.net>

The shl_i32 op might set some bits of the unused 32 high bits of the
mask. Fix that by clearing the unused 32 high bits for all 32-bit ops
except load/store which operate on tl values.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/optimize.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 7838be2..1cf017a 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -783,6 +783,12 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             break;
         }
 
+        /* 32-bit ops (non 64-bit ops and non load/store ops) generate 32-bit
+           results */
+        if (!(tcg_op_defs[op].flags & (TCG_OPF_CALL_CLOBBER | TCG_OPF_64BIT))) {
+            mask &= 0xffffffffu;
+        }
+
         if (mask == 0) {
             assert(def->nb_oargs == 1);
             s->gen_opc_buf[op_index] = op_to_movi(op);
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 4/8] tcg/optimize: add known-zero bits compute for load ops
  2014-01-31 14:46 [Qemu-devel] [PATCH 0/8] tcg optimization improvements Richard Henderson
                   ` (2 preceding siblings ...)
  2014-01-31 14:46 ` [Qemu-devel] [PATCH 3/8] tcg/optimize: improve known-zero bits for 32-bit ops Richard Henderson
@ 2014-01-31 14:46 ` Richard Henderson
  2014-01-31 14:47 ` [Qemu-devel] [PATCH 5/8] tcg/optimize: Handle known-zeros masks for ANDC Richard Henderson
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2014-01-31 14:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, aurelien

From: Aurelien Jarno <aurelien@aurel32.net>

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/optimize.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 1cf017a..d3b099a 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -779,13 +779,35 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             mask = temps[args[3]].mask | temps[args[4]].mask;
             break;
 
+        CASE_OP_32_64(ld8u):
+        case INDEX_op_qemu_ld8u:
+            mask = 0xff;
+            break;
+        CASE_OP_32_64(ld16u):
+        case INDEX_op_qemu_ld16u:
+            mask = 0xffff;
+            break;
+        case INDEX_op_ld32u_i64:
+        case INDEX_op_qemu_ld32u:
+            mask = 0xffffffffu;
+            break;
+
+        CASE_OP_32_64(qemu_ld):
+            {
+                TCGMemOp mop = args[def->nb_oargs + def->nb_iargs];
+                if (!(mop & MO_SIGN)) {
+                    mask = (2ULL << ((8 << (mop & MO_SIZE)) - 1)) - 1;
+                }
+            }
+            break;
+
         default:
             break;
         }
 
         /* 32-bit ops (non 64-bit ops and non load/store ops) generate 32-bit
            results */
-        if (!(tcg_op_defs[op].flags & (TCG_OPF_CALL_CLOBBER | TCG_OPF_64BIT))) {
+        if (!(def->flags & (TCG_OPF_CALL_CLOBBER | TCG_OPF_64BIT))) {
             mask &= 0xffffffffu;
         }
 
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 5/8] tcg/optimize: Handle known-zeros masks for ANDC
  2014-01-31 14:46 [Qemu-devel] [PATCH 0/8] tcg optimization improvements Richard Henderson
                   ` (3 preceding siblings ...)
  2014-01-31 14:46 ` [Qemu-devel] [PATCH 4/8] tcg/optimize: add known-zero bits compute for load ops Richard Henderson
@ 2014-01-31 14:47 ` Richard Henderson
  2014-02-16 18:12   ` Aurelien Jarno
  2014-01-31 14:47 ` [Qemu-devel] [PATCH 6/8] tcg/optimize: Simply some logical ops to NOT Richard Henderson
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 15+ messages in thread
From: Richard Henderson @ 2014-01-31 14:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/optimize.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index d3b099a..3291a08 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -727,6 +727,17 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             mask = temps[args[1]].mask & mask;
             break;
 
+        CASE_OP_32_64(andc):
+            /* Known-zeros does not imply known-ones.  Therefore unless
+               args[2] is constant, we can't infer anything from it.  */
+            if (temps[args[2]].state == TCG_TEMP_CONST) {
+                mask = ~temps[args[2]].mask;
+                goto and_const;
+            }
+            /* But we certainly know nothing outside args[1] may be set. */
+            mask = temps[args[1]].mask;
+            break;
+
         case INDEX_op_sar_i32:
             if (temps[args[2]].state == TCG_TEMP_CONST) {
                 mask = (int32_t)temps[args[1]].mask >> temps[args[2]].val;
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 6/8] tcg/optimize: Simply some logical ops to NOT
  2014-01-31 14:46 [Qemu-devel] [PATCH 0/8] tcg optimization improvements Richard Henderson
                   ` (4 preceding siblings ...)
  2014-01-31 14:47 ` [Qemu-devel] [PATCH 5/8] tcg/optimize: Handle known-zeros masks for ANDC Richard Henderson
@ 2014-01-31 14:47 ` Richard Henderson
  2014-02-16 18:27   ` Aurelien Jarno
  2014-01-31 14:47 ` [Qemu-devel] [PATCH 7/8] tcg/optimize: Optmize ANDC X, Y, Y to MOV X, 0 Richard Henderson
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 15+ messages in thread
From: Richard Henderson @ 2014-01-31 14:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Given, of course, an appropriate constant.  These could be generated
from the "canonical" operation for inversion on the guest, or via
other optimizations.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/optimize.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 3291a08..cdfc746 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -655,6 +655,63 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                 }
             }
             break;
+        CASE_OP_32_64(xor):
+        CASE_OP_32_64(nand):
+            if (temps[args[1]].state != TCG_TEMP_CONST
+                && temps[args[2]].state == TCG_TEMP_CONST
+                && temps[args[2]].val == -1) {
+                i = 1;
+                goto try_not;
+            }
+            break;
+        CASE_OP_32_64(nor):
+            if (temps[args[1]].state != TCG_TEMP_CONST
+                && temps[args[2]].state == TCG_TEMP_CONST
+                && temps[args[2]].val == 0) {
+                i = 1;
+                goto try_not;
+            }
+            break;
+        CASE_OP_32_64(andc):
+            if (temps[args[2]].state != TCG_TEMP_CONST
+                && temps[args[1]].state == TCG_TEMP_CONST
+                && temps[args[1]].val == -1) {
+                i = 2;
+                goto try_not;
+            }
+            break;
+        CASE_OP_32_64(orc):
+        CASE_OP_32_64(eqv):
+            if (temps[args[2]].state != TCG_TEMP_CONST
+                && temps[args[1]].state == TCG_TEMP_CONST
+                && temps[args[1]].val == 0) {
+                i = 2;
+                goto try_not;
+            }
+            break;
+        try_not:
+            {
+                TCGOpcode not_op;
+                bool have_not;
+
+                if (def->flags & TCG_OPF_64BIT) {
+                    not_op = INDEX_op_not_i64;
+                    have_not = TCG_TARGET_HAS_not_i64;
+                } else {
+                    not_op = INDEX_op_not_i32;
+                    have_not = TCG_TARGET_HAS_not_i32;
+                }
+                if (!have_not) {
+                    break;
+                }
+                s->gen_opc_buf[op_index] = not_op;
+                reset_temp(args[0]);
+                gen_args[0] = args[0];
+                gen_args[1] = args[i];
+                args += 3;
+                gen_args += 2;
+                continue;
+            }
         default:
             break;
         }
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 7/8] tcg/optimize: Optmize ANDC X, Y, Y to MOV X, 0
  2014-01-31 14:46 [Qemu-devel] [PATCH 0/8] tcg optimization improvements Richard Henderson
                   ` (5 preceding siblings ...)
  2014-01-31 14:47 ` [Qemu-devel] [PATCH 6/8] tcg/optimize: Simply some logical ops to NOT Richard Henderson
@ 2014-01-31 14:47 ` Richard Henderson
  2014-02-16 18:27   ` Aurelien Jarno
  2014-01-31 14:47 ` [Qemu-devel] [PATCH 8/8] tcg/optimize: Add more identity simplifications Richard Henderson
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 15+ messages in thread
From: Richard Henderson @ 2014-01-31 14:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Like we already do for SUB and XOR.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/optimize.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index cdfc746..a703f8c 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -945,6 +945,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
 
         /* Simplify expression for "op r, a, a => movi r, 0" cases */
         switch (op) {
+        CASE_OP_32_64(andc):
         CASE_OP_32_64(sub):
         CASE_OP_32_64(xor):
             if (temps_are_copies(args[1], args[2])) {
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 8/8] tcg/optimize: Add more identity simplifications
  2014-01-31 14:46 [Qemu-devel] [PATCH 0/8] tcg optimization improvements Richard Henderson
                   ` (6 preceding siblings ...)
  2014-01-31 14:47 ` [Qemu-devel] [PATCH 7/8] tcg/optimize: Optmize ANDC X, Y, Y to MOV X, 0 Richard Henderson
@ 2014-01-31 14:47 ` Richard Henderson
  2014-02-16 18:30   ` Aurelien Jarno
  2014-02-14 21:44 ` [Qemu-devel] [PATCH 0/8] tcg optimization improvements Richard Henderson
  2014-02-16 14:15 ` Paolo Bonzini
  9 siblings, 1 reply; 15+ messages in thread
From: Richard Henderson @ 2014-01-31 14:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Recognize 0 operand to andc, and -1 operands to and, orc, eqv.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/optimize.c | 39 ++++++++++++++++++++++++---------------
 1 file changed, 24 insertions(+), 15 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index a703f8c..8d7100e 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -716,7 +716,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             break;
         }
 
-        /* Simplify expression for "op r, a, 0 => mov r, a" cases */
+        /* Simplify expression for "op r, a, const => mov r, a" cases */
         switch (op) {
         CASE_OP_32_64(add):
         CASE_OP_32_64(sub):
@@ -727,23 +727,32 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
         CASE_OP_32_64(rotr):
         CASE_OP_32_64(or):
         CASE_OP_32_64(xor):
-            if (temps[args[1]].state == TCG_TEMP_CONST) {
-                /* Proceed with possible constant folding. */
-                break;
-            }
-            if (temps[args[2]].state == TCG_TEMP_CONST
+        CASE_OP_32_64(andc):
+            if (temps[args[1]].state != TCG_TEMP_CONST
+                && temps[args[2]].state == TCG_TEMP_CONST
                 && temps[args[2]].val == 0) {
-                if (temps_are_copies(args[0], args[1])) {
-                    s->gen_opc_buf[op_index] = INDEX_op_nop;
-                } else {
-                    s->gen_opc_buf[op_index] = op_to_mov(op);
-                    tcg_opt_gen_mov(s, gen_args, args[0], args[1]);
-                    gen_args += 2;
-                }
-                args += 3;
-                continue;
+                goto do_mov3;
             }
             break;
+        CASE_OP_32_64(and):
+        CASE_OP_32_64(orc):
+        CASE_OP_32_64(eqv):
+            if (temps[args[1]].state != TCG_TEMP_CONST
+                && temps[args[2]].state == TCG_TEMP_CONST
+                && temps[args[2]].val == -1) {
+                goto do_mov3;
+            }
+            break;
+        do_mov3:
+            if (temps_are_copies(args[0], args[1])) {
+                s->gen_opc_buf[op_index] = INDEX_op_nop;
+            } else {
+                s->gen_opc_buf[op_index] = op_to_mov(op);
+                tcg_opt_gen_mov(s, gen_args, args[0], args[1]);
+                gen_args += 2;
+            }
+            args += 3;
+            continue;
         default:
             break;
         }
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] tcg optimization improvements
  2014-01-31 14:46 [Qemu-devel] [PATCH 0/8] tcg optimization improvements Richard Henderson
                   ` (7 preceding siblings ...)
  2014-01-31 14:47 ` [Qemu-devel] [PATCH 8/8] tcg/optimize: Add more identity simplifications Richard Henderson
@ 2014-02-14 21:44 ` Richard Henderson
  2014-02-16 14:15 ` Paolo Bonzini
  9 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2014-02-14 21:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Ping.

On 01/31/2014 06:46 AM, Richard Henderson wrote:
> The first 4 of these are ones that Aurelien posted some time ago,
> and I reviewed, but never seemed to get committed.
> 
> The second 4 address optimization issues that I noticed with the
> BMI instruction set extension, adding ANDC support to x86_64.
> 
> 
> r~
> 
> 
> Aurelien Jarno (4):
>   tcg/optimize: fix known-zero bits for right shift ops
>   tcg/optimize: fix known-zero bits optimization
>   tcg/optimize: improve known-zero bits for 32-bit ops
>   tcg/optimize: add known-zero bits compute for load ops
> 
> Richard Henderson (4):
>   tcg/optimize: Handle known-zeros masks for ANDC
>   tcg/optimize: Simply some logical ops to NOT
>   tcg/optimize: Optmize ANDC X,Y,Y to MOV X,0
>   tcg/optimize: Add more identity simplifications
> 
>  tcg/optimize.c | 163 +++++++++++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 142 insertions(+), 21 deletions(-)
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] tcg optimization improvements
  2014-01-31 14:46 [Qemu-devel] [PATCH 0/8] tcg optimization improvements Richard Henderson
                   ` (8 preceding siblings ...)
  2014-02-14 21:44 ` [Qemu-devel] [PATCH 0/8] tcg optimization improvements Richard Henderson
@ 2014-02-16 14:15 ` Paolo Bonzini
  9 siblings, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2014-02-16 14:15 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: aurelien

Il 31/01/2014 15:46, Richard Henderson ha scritto:
> The first 4 of these are ones that Aurelien posted some time ago,
> and I reviewed, but never seemed to get committed.
>
> The second 4 address optimization issues that I noticed with the
> BMI instruction set extension, adding ANDC support to x86_64.
>
>
> r~
>
>
> Aurelien Jarno (4):
>   tcg/optimize: fix known-zero bits for right shift ops
>   tcg/optimize: fix known-zero bits optimization
>   tcg/optimize: improve known-zero bits for 32-bit ops
>   tcg/optimize: add known-zero bits compute for load ops
>
> Richard Henderson (4):
>   tcg/optimize: Handle known-zeros masks for ANDC
>   tcg/optimize: Simply some logical ops to NOT
>   tcg/optimize: Optmize ANDC X,Y,Y to MOV X,0
>   tcg/optimize: Add more identity simplifications
>
>  tcg/optimize.c | 163 +++++++++++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 142 insertions(+), 21 deletions(-)
>

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 5/8] tcg/optimize: Handle known-zeros masks for ANDC
  2014-01-31 14:47 ` [Qemu-devel] [PATCH 5/8] tcg/optimize: Handle known-zeros masks for ANDC Richard Henderson
@ 2014-02-16 18:12   ` Aurelien Jarno
  0 siblings, 0 replies; 15+ messages in thread
From: Aurelien Jarno @ 2014-02-16 18:12 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Fri, Jan 31, 2014 at 08:47:00AM -0600, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/optimize.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index d3b099a..3291a08 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -727,6 +727,17 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>              mask = temps[args[1]].mask & mask;
>              break;
>  
> +        CASE_OP_32_64(andc):
> +            /* Known-zeros does not imply known-ones.  Therefore unless
> +               args[2] is constant, we can't infer anything from it.  */
> +            if (temps[args[2]].state == TCG_TEMP_CONST) {
> +                mask = ~temps[args[2]].mask;
> +                goto and_const;
> +            }
> +            /* But we certainly know nothing outside args[1] may be set. */
> +            mask = temps[args[1]].mask;
> +            break;
> +
>          case INDEX_op_sar_i32:
>              if (temps[args[2]].state == TCG_TEMP_CONST) {
>                  mask = (int32_t)temps[args[1]].mask >> temps[args[2]].val;

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>

-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] tcg/optimize: Simply some logical ops to NOT
  2014-01-31 14:47 ` [Qemu-devel] [PATCH 6/8] tcg/optimize: Simply some logical ops to NOT Richard Henderson
@ 2014-02-16 18:27   ` Aurelien Jarno
  0 siblings, 0 replies; 15+ messages in thread
From: Aurelien Jarno @ 2014-02-16 18:27 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Fri, Jan 31, 2014 at 08:47:01AM -0600, Richard Henderson wrote:
> Given, of course, an appropriate constant.  These could be generated
> from the "canonical" operation for inversion on the guest, or via
> other optimizations.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/optimize.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 57 insertions(+)
> 
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 3291a08..cdfc746 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -655,6 +655,63 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>                  }
>              }
>              break;
> +        CASE_OP_32_64(xor):
> +        CASE_OP_32_64(nand):
> +            if (temps[args[1]].state != TCG_TEMP_CONST
> +                && temps[args[2]].state == TCG_TEMP_CONST
> +                && temps[args[2]].val == -1) {
> +                i = 1;
> +                goto try_not;
> +            }
> +            break;
> +        CASE_OP_32_64(nor):
> +            if (temps[args[1]].state != TCG_TEMP_CONST
> +                && temps[args[2]].state == TCG_TEMP_CONST
> +                && temps[args[2]].val == 0) {
> +                i = 1;
> +                goto try_not;
> +            }
> +            break;
> +        CASE_OP_32_64(andc):
> +            if (temps[args[2]].state != TCG_TEMP_CONST
> +                && temps[args[1]].state == TCG_TEMP_CONST
> +                && temps[args[1]].val == -1) {
> +                i = 2;
> +                goto try_not;
> +            }
> +            break;
> +        CASE_OP_32_64(orc):
> +        CASE_OP_32_64(eqv):
> +            if (temps[args[2]].state != TCG_TEMP_CONST
> +                && temps[args[1]].state == TCG_TEMP_CONST
> +                && temps[args[1]].val == 0) {
> +                i = 2;
> +                goto try_not;
> +            }
> +            break;
> +        try_not:
> +            {
> +                TCGOpcode not_op;
> +                bool have_not;
> +
> +                if (def->flags & TCG_OPF_64BIT) {
> +                    not_op = INDEX_op_not_i64;
> +                    have_not = TCG_TARGET_HAS_not_i64;
> +                } else {
> +                    not_op = INDEX_op_not_i32;
> +                    have_not = TCG_TARGET_HAS_not_i32;
> +                }
> +                if (!have_not) {
> +                    break;
> +                }
> +                s->gen_opc_buf[op_index] = not_op;
> +                reset_temp(args[0]);
> +                gen_args[0] = args[0];
> +                gen_args[1] = args[i];
> +                args += 3;
> +                gen_args += 2;
> +                continue;
> +            }
>          default:
>              break;
>          }

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>

-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 7/8] tcg/optimize: Optmize ANDC X, Y, Y to MOV X, 0
  2014-01-31 14:47 ` [Qemu-devel] [PATCH 7/8] tcg/optimize: Optmize ANDC X, Y, Y to MOV X, 0 Richard Henderson
@ 2014-02-16 18:27   ` Aurelien Jarno
  0 siblings, 0 replies; 15+ messages in thread
From: Aurelien Jarno @ 2014-02-16 18:27 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Fri, Jan 31, 2014 at 08:47:02AM -0600, Richard Henderson wrote:
> Like we already do for SUB and XOR.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/optimize.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index cdfc746..a703f8c 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -945,6 +945,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>  
>          /* Simplify expression for "op r, a, a => movi r, 0" cases */
>          switch (op) {
> +        CASE_OP_32_64(andc):
>          CASE_OP_32_64(sub):
>          CASE_OP_32_64(xor):
>              if (temps_are_copies(args[1], args[2])) {

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>


-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 8/8] tcg/optimize: Add more identity simplifications
  2014-01-31 14:47 ` [Qemu-devel] [PATCH 8/8] tcg/optimize: Add more identity simplifications Richard Henderson
@ 2014-02-16 18:30   ` Aurelien Jarno
  0 siblings, 0 replies; 15+ messages in thread
From: Aurelien Jarno @ 2014-02-16 18:30 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Fri, Jan 31, 2014 at 08:47:03AM -0600, Richard Henderson wrote:
> Recognize 0 operand to andc, and -1 operands to and, orc, eqv.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/optimize.c | 39 ++++++++++++++++++++++++---------------
>  1 file changed, 24 insertions(+), 15 deletions(-)
> 
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index a703f8c..8d7100e 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -716,7 +716,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>              break;
>          }
>  
> -        /* Simplify expression for "op r, a, 0 => mov r, a" cases */
> +        /* Simplify expression for "op r, a, const => mov r, a" cases */
>          switch (op) {
>          CASE_OP_32_64(add):
>          CASE_OP_32_64(sub):
> @@ -727,23 +727,32 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>          CASE_OP_32_64(rotr):
>          CASE_OP_32_64(or):
>          CASE_OP_32_64(xor):
> -            if (temps[args[1]].state == TCG_TEMP_CONST) {
> -                /* Proceed with possible constant folding. */
> -                break;
> -            }
> -            if (temps[args[2]].state == TCG_TEMP_CONST
> +        CASE_OP_32_64(andc):
> +            if (temps[args[1]].state != TCG_TEMP_CONST
> +                && temps[args[2]].state == TCG_TEMP_CONST
>                  && temps[args[2]].val == 0) {
> -                if (temps_are_copies(args[0], args[1])) {
> -                    s->gen_opc_buf[op_index] = INDEX_op_nop;
> -                } else {
> -                    s->gen_opc_buf[op_index] = op_to_mov(op);
> -                    tcg_opt_gen_mov(s, gen_args, args[0], args[1]);
> -                    gen_args += 2;
> -                }
> -                args += 3;
> -                continue;
> +                goto do_mov3;
>              }
>              break;
> +        CASE_OP_32_64(and):
> +        CASE_OP_32_64(orc):
> +        CASE_OP_32_64(eqv):
> +            if (temps[args[1]].state != TCG_TEMP_CONST
> +                && temps[args[2]].state == TCG_TEMP_CONST
> +                && temps[args[2]].val == -1) {
> +                goto do_mov3;
> +            }
> +            break;
> +        do_mov3:
> +            if (temps_are_copies(args[0], args[1])) {
> +                s->gen_opc_buf[op_index] = INDEX_op_nop;
> +            } else {
> +                s->gen_opc_buf[op_index] = op_to_mov(op);
> +                tcg_opt_gen_mov(s, gen_args, args[0], args[1]);
> +                gen_args += 2;
> +            }
> +            args += 3;
> +            continue;
>          default:
>              break;
>          }

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>

-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-02-16 18:30 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-31 14:46 [Qemu-devel] [PATCH 0/8] tcg optimization improvements Richard Henderson
2014-01-31 14:46 ` [Qemu-devel] [PATCH 1/8] tcg/optimize: fix known-zero bits for right shift ops Richard Henderson
2014-01-31 14:46 ` [Qemu-devel] [PATCH 2/8] tcg/optimize: fix known-zero bits optimization Richard Henderson
2014-01-31 14:46 ` [Qemu-devel] [PATCH 3/8] tcg/optimize: improve known-zero bits for 32-bit ops Richard Henderson
2014-01-31 14:46 ` [Qemu-devel] [PATCH 4/8] tcg/optimize: add known-zero bits compute for load ops Richard Henderson
2014-01-31 14:47 ` [Qemu-devel] [PATCH 5/8] tcg/optimize: Handle known-zeros masks for ANDC Richard Henderson
2014-02-16 18:12   ` Aurelien Jarno
2014-01-31 14:47 ` [Qemu-devel] [PATCH 6/8] tcg/optimize: Simply some logical ops to NOT Richard Henderson
2014-02-16 18:27   ` Aurelien Jarno
2014-01-31 14:47 ` [Qemu-devel] [PATCH 7/8] tcg/optimize: Optmize ANDC X, Y, Y to MOV X, 0 Richard Henderson
2014-02-16 18:27   ` Aurelien Jarno
2014-01-31 14:47 ` [Qemu-devel] [PATCH 8/8] tcg/optimize: Add more identity simplifications Richard Henderson
2014-02-16 18:30   ` Aurelien Jarno
2014-02-14 21:44 ` [Qemu-devel] [PATCH 0/8] tcg optimization improvements Richard Henderson
2014-02-16 14:15 ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).