qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/4] tcg/optimize: fixes and improvements
@ 2013-09-09 17:27 Aurelien Jarno
  2013-09-09 17:27 ` [Qemu-devel] [PATCH v2 1/4] tcg/optimize: fix known-zero bits for right shift ops Aurelien Jarno
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Aurelien Jarno @ 2013-09-09 17:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno, Richard Henderson

This patchset first fixes known-zero bits optimization so that it works
in more than a few cases, and does some further optimizations for 32-bit
ops and unsigned loads.

v1 -> v2:
- swapped patches 1 & 2
- Cc:ed qemu-stable for patch 1
- improved description of patch 2

Aurelien Jarno (4):
  tcg/optimize: fix known-zero bits for right shift ops
  tcg/optimize: fix known-zero bits optimization
  tcg/optimize: improve known-zero bits for 32-bit ops
  tcg/optimize: add known-zero bits compute for load ops

 tcg/optimize.c |   48 +++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 43 insertions(+), 5 deletions(-)

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v2 1/4] tcg/optimize: fix known-zero bits for right shift ops
  2013-09-09 17:27 [Qemu-devel] [PATCH v2 0/4] tcg/optimize: fixes and improvements Aurelien Jarno
@ 2013-09-09 17:27 ` Aurelien Jarno
  2013-12-06 17:53   ` Richard Henderson
  2014-02-16  5:42   ` Michael Roth
  2013-09-09 17:27 ` [Qemu-devel] [PATCH v2 2/4] tcg/optimize: fix known-zero bits optimization Aurelien Jarno
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 11+ messages in thread
From: Aurelien Jarno @ 2013-09-09 17:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, qemu-stable, Aurelien Jarno, Richard Henderson

32-bit versions of sar and shr ops should not propagate known-zero bits
from the unused 32 high bits. For sar it could even lead to wrong code
being generated.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c |   21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index b29bf25..c539e39 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -730,16 +730,29 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             mask = temps[args[1]].mask & mask;
             break;
 
-        CASE_OP_32_64(sar):
+        case INDEX_op_sar_i32:
+            if (temps[args[2]].state == TCG_TEMP_CONST) {
+                mask = ((int32_t)temps[args[1]].mask
+                        >> temps[args[2]].val);
+            }
+            break;
+        case INDEX_op_sar_i64:
             if (temps[args[2]].state == TCG_TEMP_CONST) {
-                mask = ((tcg_target_long)temps[args[1]].mask
+                mask = ((int64_t)temps[args[1]].mask
                         >> temps[args[2]].val);
             }
             break;
 
-        CASE_OP_32_64(shr):
+        case INDEX_op_shr_i32:
             if (temps[args[2]].state == TCG_TEMP_CONST) {
-                mask = temps[args[1]].mask >> temps[args[2]].val;
+                mask = ((uint32_t)temps[args[1]].mask
+                        >> temps[args[2]].val);
+            }
+            break;
+        case INDEX_op_shr_i64:
+            if (temps[args[2]].state == TCG_TEMP_CONST) {
+                mask = ((uint64_t)temps[args[1]].mask
+                        >> temps[args[2]].val);
             }
             break;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v2 2/4] tcg/optimize: fix known-zero bits optimization
  2013-09-09 17:27 [Qemu-devel] [PATCH v2 0/4] tcg/optimize: fixes and improvements Aurelien Jarno
  2013-09-09 17:27 ` [Qemu-devel] [PATCH v2 1/4] tcg/optimize: fix known-zero bits for right shift ops Aurelien Jarno
@ 2013-09-09 17:27 ` Aurelien Jarno
  2013-12-06 17:54   ` Richard Henderson
  2013-09-09 17:27 ` [Qemu-devel] [PATCH v2 3/4] tcg/optimize: improve known-zero bits for 32-bit ops Aurelien Jarno
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Aurelien Jarno @ 2013-09-09 17:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno, Richard Henderson

Known-zero bits optimization is a great idea that helps to generate more
optimized code. However the current implementation only works in very few
cases as the computed mask is not saved.

Fix this to make it really working.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index c539e39..0ed8983 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -695,7 +695,8 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             break;
         }
 
-        /* Simplify using known-zero bits */
+        /* Simplify using known-zero bits. Currently only ops with a single
+           output argument is supported. */
         mask = -1;
         affected = -1;
         switch (op) {
@@ -1157,6 +1158,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             } else {
                 for (i = 0; i < def->nb_oargs; i++) {
                     reset_temp(args[i]);
+                    /* Save the corresponding known-zero bits mask for the
+                       first output argument (only one supported so far). */
+                    if (i == 0) {
+                        temps[args[i]].mask = mask;
+                    }
                 }
             }
             for (i = 0; i < def->nb_args; i++) {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v2 3/4] tcg/optimize: improve known-zero bits for 32-bit ops
  2013-09-09 17:27 [Qemu-devel] [PATCH v2 0/4] tcg/optimize: fixes and improvements Aurelien Jarno
  2013-09-09 17:27 ` [Qemu-devel] [PATCH v2 1/4] tcg/optimize: fix known-zero bits for right shift ops Aurelien Jarno
  2013-09-09 17:27 ` [Qemu-devel] [PATCH v2 2/4] tcg/optimize: fix known-zero bits optimization Aurelien Jarno
@ 2013-09-09 17:27 ` Aurelien Jarno
  2013-12-06 17:54   ` Richard Henderson
  2013-09-09 17:27 ` [Qemu-devel] [PATCH v2 4/4] tcg/optimize: add known-zero bits compute for load ops Aurelien Jarno
  2013-11-29  9:32 ` [Qemu-devel] [PATCH v2 0/4] tcg/optimize: fixes and improvements Paolo Bonzini
  4 siblings, 1 reply; 11+ messages in thread
From: Aurelien Jarno @ 2013-09-09 17:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno, Richard Henderson

The shl_i32 op might set some bits of the unused 32 high bits of the
mask. Fix that by clearing the unused 32 high bits for all 32-bit ops
except load/store which operate on tl values.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 0ed8983..b1f736b 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -791,6 +791,12 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             break;
         }
 
+        /* 32-bit ops (non 64-bit ops and non load/store ops) generate 32-bit
+           results */
+        if (!(tcg_op_defs[op].flags & (TCG_OPF_CALL_CLOBBER | TCG_OPF_64BIT))) {
+            mask &= 0xffffffffu;
+        }
+
         if (mask == 0) {
             assert(def->nb_oargs == 1);
             s->gen_opc_buf[op_index] = op_to_movi(op);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v2 4/4] tcg/optimize: add known-zero bits compute for load ops
  2013-09-09 17:27 [Qemu-devel] [PATCH v2 0/4] tcg/optimize: fixes and improvements Aurelien Jarno
                   ` (2 preceding siblings ...)
  2013-09-09 17:27 ` [Qemu-devel] [PATCH v2 3/4] tcg/optimize: improve known-zero bits for 32-bit ops Aurelien Jarno
@ 2013-09-09 17:27 ` Aurelien Jarno
  2013-12-06 17:53   ` Richard Henderson
  2013-11-29  9:32 ` [Qemu-devel] [PATCH v2 0/4] tcg/optimize: fixes and improvements Paolo Bonzini
  4 siblings, 1 reply; 11+ messages in thread
From: Aurelien Jarno @ 2013-09-09 17:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno, Richard Henderson

Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index b1f736b..044f456 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -787,6 +787,19 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             mask = temps[args[3]].mask | temps[args[4]].mask;
             break;
 
+        CASE_OP_32_64(ld8u):
+        case INDEX_op_qemu_ld8u:
+            mask = 0xff;
+            break;
+        CASE_OP_32_64(ld16u):
+        case INDEX_op_qemu_ld16u:
+            mask = 0xffff;
+            break;
+        case INDEX_op_ld32u_i64:
+        case INDEX_op_qemu_ld32u:
+            mask = 0xffffffffu;
+            break;
+
         default:
             break;
         }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/4] tcg/optimize: fixes and improvements
  2013-09-09 17:27 [Qemu-devel] [PATCH v2 0/4] tcg/optimize: fixes and improvements Aurelien Jarno
                   ` (3 preceding siblings ...)
  2013-09-09 17:27 ` [Qemu-devel] [PATCH v2 4/4] tcg/optimize: add known-zero bits compute for load ops Aurelien Jarno
@ 2013-11-29  9:32 ` Paolo Bonzini
  4 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2013-11-29  9:32 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel, Richard Henderson

Il 09/09/2013 19:27, Aurelien Jarno ha scritto:
> This patchset first fixes known-zero bits optimization so that it works
> in more than a few cases, and does some further optimizations for 32-bit
> ops and unsigned loads.
> 
> v1 -> v2:
> - swapped patches 1 & 2
> - Cc:ed qemu-stable for patch 1
> - improved description of patch 2
> 
> Aurelien Jarno (4):
>   tcg/optimize: fix known-zero bits for right shift ops
>   tcg/optimize: fix known-zero bits optimization
>   tcg/optimize: improve known-zero bits for 32-bit ops
>   tcg/optimize: add known-zero bits compute for load ops
> 
>  tcg/optimize.c |   48 +++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 43 insertions(+), 5 deletions(-)
> 

Ping?

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v2 4/4] tcg/optimize: add known-zero bits compute for load ops
  2013-09-09 17:27 ` [Qemu-devel] [PATCH v2 4/4] tcg/optimize: add known-zero bits compute for load ops Aurelien Jarno
@ 2013-12-06 17:53   ` Richard Henderson
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Henderson @ 2013-12-06 17:53 UTC (permalink / raw)
  To: Aurelien Jarno, qemu-devel; +Cc: Paolo Bonzini

On 09/10/2013 05:27 AM, Aurelien Jarno wrote:
> +        CASE_OP_32_64(ld8u):
> +        case INDEX_op_qemu_ld8u:
> +            mask = 0xff;
> +            break;
> +        CASE_OP_32_64(ld16u):
> +        case INDEX_op_qemu_ld16u:
> +            mask = 0xffff;
> +            break;
> +        case INDEX_op_ld32u_i64:
> +        case INDEX_op_qemu_ld32u:
> +            mask = 0xffffffffu;
> +            break;
> +

This could stand to be updated for the new INDEX_op_qemu_ld_{i32,i64} opcodes,
where you have to look at args[last] to find out the width and sign.

But this is still an improvement for the old opcodes.

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/4] tcg/optimize: fix known-zero bits for right shift ops
  2013-09-09 17:27 ` [Qemu-devel] [PATCH v2 1/4] tcg/optimize: fix known-zero bits for right shift ops Aurelien Jarno
@ 2013-12-06 17:53   ` Richard Henderson
  2014-02-16  5:42   ` Michael Roth
  1 sibling, 0 replies; 11+ messages in thread
From: Richard Henderson @ 2013-12-06 17:53 UTC (permalink / raw)
  To: Aurelien Jarno, qemu-devel; +Cc: Paolo Bonzini, qemu-stable

On 09/10/2013 05:27 AM, Aurelien Jarno wrote:
> 32-bit versions of sar and shr ops should not propagate known-zero bits
> from the unused 32 high bits. For sar it could even lead to wrong code
> being generated.
> 
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
>  tcg/optimize.c |   21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/4] tcg/optimize: fix known-zero bits optimization
  2013-09-09 17:27 ` [Qemu-devel] [PATCH v2 2/4] tcg/optimize: fix known-zero bits optimization Aurelien Jarno
@ 2013-12-06 17:54   ` Richard Henderson
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Henderson @ 2013-12-06 17:54 UTC (permalink / raw)
  To: Aurelien Jarno, qemu-devel; +Cc: Paolo Bonzini

On 09/10/2013 05:27 AM, Aurelien Jarno wrote:
> Known-zero bits optimization is a great idea that helps to generate more
> optimized code. However the current implementation only works in very few
> cases as the computed mask is not saved.
> 
> Fix this to make it really working.
> 
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
>  tcg/optimize.c |    8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/4] tcg/optimize: improve known-zero bits for 32-bit ops
  2013-09-09 17:27 ` [Qemu-devel] [PATCH v2 3/4] tcg/optimize: improve known-zero bits for 32-bit ops Aurelien Jarno
@ 2013-12-06 17:54   ` Richard Henderson
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Henderson @ 2013-12-06 17:54 UTC (permalink / raw)
  To: Aurelien Jarno, qemu-devel; +Cc: Paolo Bonzini

On 09/10/2013 05:27 AM, Aurelien Jarno wrote:
> The shl_i32 op might set some bits of the unused 32 high bits of the
> mask. Fix that by clearing the unused 32 high bits for all 32-bit ops
> except load/store which operate on tl values.
> 
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
>  tcg/optimize.c |    6 ++++++
>  1 file changed, 6 insertions(+)


Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/4] tcg/optimize: fix known-zero bits for right shift ops
  2013-09-09 17:27 ` [Qemu-devel] [PATCH v2 1/4] tcg/optimize: fix known-zero bits for right shift ops Aurelien Jarno
  2013-12-06 17:53   ` Richard Henderson
@ 2014-02-16  5:42   ` Michael Roth
  1 sibling, 0 replies; 11+ messages in thread
From: Michael Roth @ 2014-02-16  5:42 UTC (permalink / raw)
  To: Aurelien Jarno, qemu-devel; +Cc: Paolo Bonzini, qemu-stable, Richard Henderson

Quoting Aurelien Jarno (2013-09-09 12:27:47)
> 32-bit versions of sar and shr ops should not propagate known-zero bits
> from the unused 32 high bits. For sar it could even lead to wrong code
> being generated.
> 
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
>  tcg/optimize.c |   21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)

Ping, looking to pull this in for 1.7.1

> 
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index b29bf25..c539e39 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -730,16 +730,29 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>              mask = temps[args[1]].mask & mask;
>              break;
> 
> -        CASE_OP_32_64(sar):
> +        case INDEX_op_sar_i32:
> +            if (temps[args[2]].state == TCG_TEMP_CONST) {
> +                mask = ((int32_t)temps[args[1]].mask
> +                        >> temps[args[2]].val);
> +            }
> +            break;
> +        case INDEX_op_sar_i64:
>              if (temps[args[2]].state == TCG_TEMP_CONST) {
> -                mask = ((tcg_target_long)temps[args[1]].mask
> +                mask = ((int64_t)temps[args[1]].mask
>                          >> temps[args[2]].val);
>              }
>              break;
> 
> -        CASE_OP_32_64(shr):
> +        case INDEX_op_shr_i32:
>              if (temps[args[2]].state == TCG_TEMP_CONST) {
> -                mask = temps[args[1]].mask >> temps[args[2]].val;
> +                mask = ((uint32_t)temps[args[1]].mask
> +                        >> temps[args[2]].val);
> +            }
> +            break;
> +        case INDEX_op_shr_i64:
> +            if (temps[args[2]].state == TCG_TEMP_CONST) {
> +                mask = ((uint64_t)temps[args[1]].mask
> +                        >> temps[args[2]].val);
>              }
>              break;
> 
> -- 
> 1.7.10.4

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-02-16  5:43 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-09 17:27 [Qemu-devel] [PATCH v2 0/4] tcg/optimize: fixes and improvements Aurelien Jarno
2013-09-09 17:27 ` [Qemu-devel] [PATCH v2 1/4] tcg/optimize: fix known-zero bits for right shift ops Aurelien Jarno
2013-12-06 17:53   ` Richard Henderson
2014-02-16  5:42   ` Michael Roth
2013-09-09 17:27 ` [Qemu-devel] [PATCH v2 2/4] tcg/optimize: fix known-zero bits optimization Aurelien Jarno
2013-12-06 17:54   ` Richard Henderson
2013-09-09 17:27 ` [Qemu-devel] [PATCH v2 3/4] tcg/optimize: improve known-zero bits for 32-bit ops Aurelien Jarno
2013-12-06 17:54   ` Richard Henderson
2013-09-09 17:27 ` [Qemu-devel] [PATCH v2 4/4] tcg/optimize: add known-zero bits compute for load ops Aurelien Jarno
2013-12-06 17:53   ` Richard Henderson
2013-11-29  9:32 ` [Qemu-devel] [PATCH v2 0/4] tcg/optimize: fixes and improvements Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).