[Qemu-devel] [PATCH RFC 0/9] tcg: improve size changing ops

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH RFC 0/9] tcg: improve size changing ops
@ 2015-07-15 11:03 Aurelien Jarno
  2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 1/9] tcg: rename trunc_shr_i32 into trunc_shr_i64_i32 Aurelien Jarno
                   ` (8 more replies)
  0 siblings, 9 replies; 20+ messages in thread
From: Aurelien Jarno @ 2015-07-15 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno, Richard Henderson

This patch set tries to improve the size changing ops in TCG, so that we
have a clean interface and a better view of how 32-bit and 64-bit values
are handled. I believe part of the code we have now are more band aid
than real fixes. The idea behind this patchset is that size changing ops
should be real ops and not implemented as mov or casting types, so that
we can distinguish them in the register allocator and the optimizer. It
however allow targets to override that and replace them by a mov in case
the target CPU already maintain values zero/sign extended.

It is currently only correct on x86, for other targets we have to review
and decide how to handle things (or be conservative and implement the 3
size changing ops). For x86 I have made the choice to implement
ext_i32_i64 and extu_i32_i64 as real ops and trunc_shr_i64_i32 as a mov
as it doesn't change the generated code. I believe it is also possible
to implement ext_i32_i64 and trunc_shr_i64_i32 as real ops and
extu_i32_i64 as a mov.

Note that it doesn't fix the qemu_ld/st issue reported by Leon Alrae.
Also note that this is definitely not 2.4 material, but I post it now
in the hope it helps to have a better view about how things are
currently handled.

Aurelien Jarno (9):
  tcg: rename trunc_shr_i32 into trunc_shr_i64_i32
  tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32
  tcg: implement real ext_i32_i64 and extu_i32_i64 ops
  tcg/optimize: add optimizations for ext_i32_i64 and extu_i32_i64 ops
  tcg/i386: implement ext_i32_i64 and extu_i32_i64 ops
  tcg/i386: document the way 32/64-bit conversions are handled
  tcg: replace ext/u_i32_i64 by a mov when not implemented
  tcg/optimize: do not simplify size changing moves
  tcg: update README about size changing ops

 tcg/README               | 17 ++++++++++++++---
 tcg/aarch64/tcg-target.h |  6 +++++-
 tcg/i386/tcg-target.c    |  5 +++++
 tcg/i386/tcg-target.h    | 11 ++++++++++-
 tcg/ia64/tcg-target.h    |  6 +++++-
 tcg/optimize.c           | 44 +++++++++++++++++++-------------------------
 tcg/ppc/tcg-target.h     |  7 ++++++-
 tcg/s390/tcg-target.h    |  6 +++++-
 tcg/sparc/tcg-target.c   |  4 ++--
 tcg/sparc/tcg-target.h   |  6 +++++-
 tcg/tcg-op.c             | 16 +++++++++++-----
 tcg/tcg-opc.h            |  7 +++++--
 tcg/tcg.h                |  2 +-
 tcg/tci/tcg-target.h     |  7 ++++++-
 14 files changed, 99 insertions(+), 45 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH RFC 1/9] tcg: rename trunc_shr_i32 into trunc_shr_i64_i32
  2015-07-15 11:03 [Qemu-devel] [PATCH RFC 0/9] tcg: improve size changing ops Aurelien Jarno
@ 2015-07-15 11:03 ` Aurelien Jarno
  2015-07-17  6:14   ` Richard Henderson
  2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 2/9] tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32 Aurelien Jarno
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Aurelien Jarno @ 2015-07-15 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno, Richard Henderson

The op is sometimes named trunc_shr_i32 and sometimes trunc_shr_i64_i32,
and the name in the README doesn't match the name offered to the
frontends.

Always use the long name to make it clear it is a size changing op.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/README               | 2 +-
 tcg/aarch64/tcg-target.h | 2 +-
 tcg/i386/tcg-target.h    | 2 +-
 tcg/ia64/tcg-target.h    | 2 +-
 tcg/optimize.c           | 6 +++---
 tcg/ppc/tcg-target.h     | 2 +-
 tcg/s390/tcg-target.h    | 2 +-
 tcg/sparc/tcg-target.c   | 4 ++--
 tcg/sparc/tcg-target.h   | 2 +-
 tcg/tcg-op.c             | 4 ++--
 tcg/tcg-opc.h            | 4 ++--
 tcg/tcg.h                | 2 +-
 tcg/tci/tcg-target.h     | 2 +-
 13 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/tcg/README b/tcg/README
index a550ff1..61b3899 100644
--- a/tcg/README
+++ b/tcg/README
@@ -314,7 +314,7 @@ This operation would be equivalent to
 
   dest = (t1 & ~0x0f00) | ((t2 << 8) & 0x0f00)
 
-* trunc_shr_i32 t0, t1, pos
+* trunc_shr_i64_i32 t0, t1, pos
 
 For 64-bit hosts only, right shift the 64-bit input T1 by POS and
 truncate to 32-bit output T0.  Depending on the host, this may be
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 8aec04d..dfd8801 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -70,7 +70,7 @@ typedef enum {
 #define TCG_TARGET_HAS_muls2_i32        0
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 
 #define TCG_TARGET_HAS_div_i64          1
 #define TCG_TARGET_HAS_rem_i64          1
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 25b5133..dae50ba 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -102,7 +102,7 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_mulsh_i32        0
 
 #if TCG_TARGET_REG_BITS == 64
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 #define TCG_TARGET_HAS_div2_i64         1
 #define TCG_TARGET_HAS_rot_i64          1
 #define TCG_TARGET_HAS_ext8s_i64        1
diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
index a04ed81..29902f9 100644
--- a/tcg/ia64/tcg-target.h
+++ b/tcg/ia64/tcg-target.h
@@ -160,7 +160,7 @@ typedef enum {
 #define TCG_TARGET_HAS_muluh_i64        0
 #define TCG_TARGET_HAS_mulsh_i32        0
 #define TCG_TARGET_HAS_mulsh_i64        0
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 
 #define TCG_TARGET_deposit_i32_valid(ofs, len) ((len) <= 16)
 #define TCG_TARGET_deposit_i64_valid(ofs, len) ((len) <= 16)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 0f6f700..d66373d 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -292,7 +292,7 @@ static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
     case INDEX_op_shr_i32:
         return (uint32_t)x >> (y & 31);
 
-    case INDEX_op_trunc_shr_i32:
+    case INDEX_op_trunc_shr_i64_i32:
     case INDEX_op_shr_i64:
         return (uint64_t)x >> (y & 63);
 
@@ -876,7 +876,7 @@ void tcg_optimize(TCGContext *s)
             }
             break;
 
-        case INDEX_op_trunc_shr_i32:
+        case INDEX_op_trunc_shr_i64_i32:
             mask = (uint64_t)temps[args[1]].mask >> args[2];
             break;
 
@@ -1025,7 +1025,7 @@ void tcg_optimize(TCGContext *s)
             }
             goto do_default;
 
-        case INDEX_op_trunc_shr_i32:
+        case INDEX_op_trunc_shr_i64_i32:
             if (temps[args[1]].state == TCG_TEMP_CONST) {
                 tmp = do_constant_folding(opc, temps[args[1]].val, args[2]);
                 tcg_opt_gen_movi(s, op, args, args[0], tmp);
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 7ce7048..b7e6861 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -77,7 +77,7 @@ typedef enum {
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_add2_i32         0
 #define TCG_TARGET_HAS_sub2_i32         0
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 #define TCG_TARGET_HAS_div_i64          1
 #define TCG_TARGET_HAS_rem_i64          0
 #define TCG_TARGET_HAS_rot_i64          1
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 91576d5..50016a8 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -72,7 +72,7 @@ typedef enum TCGReg {
 #define TCG_TARGET_HAS_muls2_i32        0
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 
 #define TCG_TARGET_HAS_div2_i64         1
 #define TCG_TARGET_HAS_rot_i64          1
diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 1a870a8..b23032b 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -1413,7 +1413,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_ext32u_i64:
         tcg_out_arithi(s, a0, a1, 0, SHIFT_SRL);
         break;
-    case INDEX_op_trunc_shr_i32:
+    case INDEX_op_trunc_shr_i64_i32:
         if (a2 == 0) {
             tcg_out_mov(s, TCG_TYPE_I32, a0, a1);
         } else {
@@ -1533,7 +1533,7 @@ static const TCGTargetOpDef sparc_op_defs[] = {
 
     { INDEX_op_ext32s_i64, { "R", "r" } },
     { INDEX_op_ext32u_i64, { "R", "r" } },
-    { INDEX_op_trunc_shr_i32,  { "r", "R" } },
+    { INDEX_op_trunc_shr_i64_i32,  { "r", "R" } },
 
     { INDEX_op_brcond_i64, { "RZ", "RJ" } },
     { INDEX_op_setcond_i64, { "R", "RZ", "RJ" } },
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index f584de4..336c47f 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -118,7 +118,7 @@ extern bool use_vis3_instructions;
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
 
-#define TCG_TARGET_HAS_trunc_shr_i32    1
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 1
 #define TCG_TARGET_HAS_div_i64          1
 #define TCG_TARGET_HAS_rem_i64          0
 #define TCG_TARGET_HAS_rot_i64          0
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 45098c3..61b64db 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1751,8 +1751,8 @@ void tcg_gen_trunc_shr_i64_i32(TCGv_i32 ret, TCGv_i64 arg, unsigned count)
             tcg_gen_mov_i32(ret, TCGV_LOW(t));
             tcg_temp_free_i64(t);
         }
-    } else if (TCG_TARGET_HAS_trunc_shr_i32) {
-        tcg_gen_op3i_i32(INDEX_op_trunc_shr_i32, ret,
+    } else if (TCG_TARGET_HAS_trunc_shr_i64_i32) {
+        tcg_gen_op3i_i32(INDEX_op_trunc_shr_i64_i32, ret,
                          MAKE_TCGV_I32(GET_TCGV_I64(arg)), count);
     } else if (count == 0) {
         tcg_gen_mov_i32(ret, MAKE_TCGV_I32(GET_TCGV_I64(arg)));
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 13ccb60..4a34f43 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -138,8 +138,8 @@ DEF(rotl_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
 DEF(rotr_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
 DEF(deposit_i64, 1, 2, 2, IMPL64 | IMPL(TCG_TARGET_HAS_deposit_i64))
 
-DEF(trunc_shr_i32, 1, 1, 1,
-    IMPL(TCG_TARGET_HAS_trunc_shr_i32)
+DEF(trunc_shr_i64_i32, 1, 1, 1,
+    IMPL(TCG_TARGET_HAS_trunc_shr_i64_i32)
     | (TCG_TARGET_REG_BITS == 32 ? TCG_OPF_NOT_PRESENT : 0))
 
 DEF(brcond_i64, 0, 2, 2, TCG_OPF_BB_END | IMPL64)
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 231a781..e7e33b9 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -66,7 +66,7 @@ typedef uint64_t TCGRegSet;
 
 #if TCG_TARGET_REG_BITS == 32
 /* Turn some undef macros into false macros.  */
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 #define TCG_TARGET_HAS_div_i64          0
 #define TCG_TARGET_HAS_rem_i64          0
 #define TCG_TARGET_HAS_div2_i64         0
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index cbf3f9b..8b1139b 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -84,7 +84,7 @@
 #define TCG_TARGET_HAS_mulsh_i32        0
 
 #if TCG_TARGET_REG_BITS == 64
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 #define TCG_TARGET_HAS_bswap16_i64      1
 #define TCG_TARGET_HAS_bswap32_i64      1
 #define TCG_TARGET_HAS_bswap64_i64      1
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 1/9] tcg: rename trunc_shr_i32 into trunc_shr_i64_i32
  2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 1/9] tcg: rename trunc_shr_i32 into trunc_shr_i64_i32 Aurelien Jarno
@ 2015-07-17  6:14   ` Richard Henderson
  0 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2015-07-17  6:14 UTC (permalink / raw)
  To: Aurelien Jarno, qemu-devel; +Cc: Paolo Bonzini

On 07/15/2015 12:03 PM, Aurelien Jarno wrote:
> The op is sometimes named trunc_shr_i32 and sometimes trunc_shr_i64_i32,
> and the name in the README doesn't match the name offered to the
> frontends.
>
> Always use the long name to make it clear it is a size changing op.
>
> Cc: Paolo Bonzini<pbonzini@redhat.com>
> Cc: Richard Henderson<rth@twiddle.net>
> Signed-off-by: Aurelien Jarno<aurelien@aurel32.net>
> ---
>   tcg/README               | 2 +-
>   tcg/aarch64/tcg-target.h | 2 +-
>   tcg/i386/tcg-target.h    | 2 +-
>   tcg/ia64/tcg-target.h    | 2 +-
>   tcg/optimize.c           | 6 +++---
>   tcg/ppc/tcg-target.h     | 2 +-
>   tcg/s390/tcg-target.h    | 2 +-
>   tcg/sparc/tcg-target.c   | 4 ++--
>   tcg/sparc/tcg-target.h   | 2 +-
>   tcg/tcg-op.c             | 4 ++--
>   tcg/tcg-opc.h            | 4 ++--
>   tcg/tcg.h                | 2 +-
>   tcg/tci/tcg-target.h     | 2 +-
>   13 files changed, 18 insertions(+), 18 deletions(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH RFC 2/9] tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32
  2015-07-15 11:03 [Qemu-devel] [PATCH RFC 0/9] tcg: improve size changing ops Aurelien Jarno
  2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 1/9] tcg: rename trunc_shr_i32 into trunc_shr_i64_i32 Aurelien Jarno
@ 2015-07-15 11:03 ` Aurelien Jarno
  2015-07-17  6:14   ` Richard Henderson
  2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 3/9] tcg: implement real ext_i32_i64 and extu_i32_i64 ops Aurelien Jarno
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Aurelien Jarno @ 2015-07-15 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno, Richard Henderson

The tcg_gen_trunc_shr_i64_i32 function takes a 64-bit argument and
returns a 32-bit value. Directly call tcg_gen_op3 with the correct
types instead of calling tcg_gen_op3i_i32 and abusing the TCG types.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/tcg-op.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 61b64db..0e79fd1 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1752,8 +1752,8 @@ void tcg_gen_trunc_shr_i64_i32(TCGv_i32 ret, TCGv_i64 arg, unsigned count)
             tcg_temp_free_i64(t);
         }
     } else if (TCG_TARGET_HAS_trunc_shr_i64_i32) {
-        tcg_gen_op3i_i32(INDEX_op_trunc_shr_i64_i32, ret,
-                         MAKE_TCGV_I32(GET_TCGV_I64(arg)), count);
+        tcg_gen_op3(&tcg_ctx, INDEX_op_trunc_shr_i64_i32,
+                    GET_TCGV_I32(ret), GET_TCGV_I64(arg), count);
     } else if (count == 0) {
         tcg_gen_mov_i32(ret, MAKE_TCGV_I32(GET_TCGV_I64(arg)));
     } else {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 2/9] tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32
  2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 2/9] tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32 Aurelien Jarno
@ 2015-07-17  6:14   ` Richard Henderson
  0 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2015-07-17  6:14 UTC (permalink / raw)
  To: Aurelien Jarno, qemu-devel; +Cc: Paolo Bonzini

On 07/15/2015 12:03 PM, Aurelien Jarno wrote:
> The tcg_gen_trunc_shr_i64_i32 function takes a 64-bit argument and
> returns a 32-bit value. Directly call tcg_gen_op3 with the correct
> types instead of calling tcg_gen_op3i_i32 and abusing the TCG types.
>
> Cc: Paolo Bonzini<pbonzini@redhat.com>
> Cc: Richard Henderson<rth@twiddle.net>
> Signed-off-by: Aurelien Jarno<aurelien@aurel32.net>
> ---
>   tcg/tcg-op.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)


Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH RFC 3/9] tcg: implement real ext_i32_i64 and extu_i32_i64 ops
  2015-07-15 11:03 [Qemu-devel] [PATCH RFC 0/9] tcg: improve size changing ops Aurelien Jarno
  2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 1/9] tcg: rename trunc_shr_i32 into trunc_shr_i64_i32 Aurelien Jarno
  2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 2/9] tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32 Aurelien Jarno
@ 2015-07-15 11:03 ` Aurelien Jarno
  2015-07-17  6:19   ` Richard Henderson
  2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 4/9] tcg/optimize: add optimizations for " Aurelien Jarno
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Aurelien Jarno @ 2015-07-15 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno, Richard Henderson

Implement optional but real ext_i32_i64 and extu_i32_i64 ops. When
implemented, these ensure that a 32-bit value is always converted to
a 64-bit value and not propagated through the register allocator or
the optimizer.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/aarch64/tcg-target.h | 6 +++++-
 tcg/i386/tcg-target.h    | 7 ++++++-
 tcg/ia64/tcg-target.h    | 6 +++++-
 tcg/ppc/tcg-target.h     | 7 ++++++-
 tcg/s390/tcg-target.h    | 6 +++++-
 tcg/sparc/tcg-target.h   | 6 +++++-
 tcg/tcg-op.c             | 6 ++++++
 tcg/tcg-opc.h            | 3 +++
 tcg/tci/tcg-target.h     | 7 ++++++-
 9 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index dfd8801..2cb870c 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -70,7 +70,6 @@ typedef enum {
 #define TCG_TARGET_HAS_muls2_i32        0
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
-#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 
 #define TCG_TARGET_HAS_div_i64          1
 #define TCG_TARGET_HAS_rem_i64          1
@@ -100,6 +99,11 @@ typedef enum {
 #define TCG_TARGET_HAS_muluh_i64        1
 #define TCG_TARGET_HAS_mulsh_i64        1
 
+/* size changing optional ops */
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
+#define TCG_TARGET_HAS_ext_i32_i64       0
+#define TCG_TARGET_HAS_extu_i32_i64      0
+
 static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 {
     __builtin___clear_cache((char *)start, (char *)stop);
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index dae50ba..274c97f 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -102,7 +102,6 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_mulsh_i32        0
 
 #if TCG_TARGET_REG_BITS == 64
-#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 #define TCG_TARGET_HAS_div2_i64         1
 #define TCG_TARGET_HAS_rot_i64          1
 #define TCG_TARGET_HAS_ext8s_i64        1
@@ -129,6 +128,12 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_muls2_i64        1
 #define TCG_TARGET_HAS_muluh_i64        0
 #define TCG_TARGET_HAS_mulsh_i64        0
+
+/* size changing optional ops */
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
+#define TCG_TARGET_HAS_ext_i32_i64       0
+#define TCG_TARGET_HAS_extu_i32_i64      0
+
 #endif
 
 #define TCG_TARGET_deposit_i32_valid(ofs, len) \
diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
index 29902f9..adf4b17 100644
--- a/tcg/ia64/tcg-target.h
+++ b/tcg/ia64/tcg-target.h
@@ -160,11 +160,15 @@ typedef enum {
 #define TCG_TARGET_HAS_muluh_i64        0
 #define TCG_TARGET_HAS_mulsh_i32        0
 #define TCG_TARGET_HAS_mulsh_i64        0
-#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 
 #define TCG_TARGET_deposit_i32_valid(ofs, len) ((len) <= 16)
 #define TCG_TARGET_deposit_i64_valid(ofs, len) ((len) <= 16)
 
+/* size changing optional ops */
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
+#define TCG_TARGET_HAS_ext_i32_i64       0
+#define TCG_TARGET_HAS_extu_i32_i64      0
+
 /* optional instructions automatically implemented */
 #define TCG_TARGET_HAS_neg_i32          0 /* sub r1, r0, r3 */
 #define TCG_TARGET_HAS_neg_i64          0 /* sub r1, r0, r3 */
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index b7e6861..7b84491 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -77,7 +77,6 @@ typedef enum {
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_add2_i32         0
 #define TCG_TARGET_HAS_sub2_i32         0
-#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 #define TCG_TARGET_HAS_div_i64          1
 #define TCG_TARGET_HAS_rem_i64          0
 #define TCG_TARGET_HAS_rot_i64          1
@@ -105,6 +104,12 @@ typedef enum {
 #define TCG_TARGET_HAS_muls2_i64        0
 #define TCG_TARGET_HAS_muluh_i64        1
 #define TCG_TARGET_HAS_mulsh_i64        1
+
+/* size changing optional ops */
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
+#define TCG_TARGET_HAS_ext_i32_i64       0
+#define TCG_TARGET_HAS_extu_i32_i64      0
+
 #endif
 
 void flush_icache_range(uintptr_t start, uintptr_t stop);
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 50016a8..c4c5334 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -72,7 +72,6 @@ typedef enum TCGReg {
 #define TCG_TARGET_HAS_muls2_i32        0
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
-#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 
 #define TCG_TARGET_HAS_div2_i64         1
 #define TCG_TARGET_HAS_rot_i64          1
@@ -101,6 +100,11 @@ typedef enum TCGReg {
 #define TCG_TARGET_HAS_muluh_i64        0
 #define TCG_TARGET_HAS_mulsh_i64        0
 
+/* size changing optional ops */
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
+#define TCG_TARGET_HAS_ext_i32_i64       0
+#define TCG_TARGET_HAS_extu_i32_i64      0
+
 extern bool tcg_target_deposit_valid(int ofs, int len);
 #define TCG_TARGET_deposit_i32_valid  tcg_target_deposit_valid
 #define TCG_TARGET_deposit_i64_valid  tcg_target_deposit_valid
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index 336c47f..387d9a2 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -118,7 +118,6 @@ extern bool use_vis3_instructions;
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
 
-#define TCG_TARGET_HAS_trunc_shr_i64_i32 1
 #define TCG_TARGET_HAS_div_i64          1
 #define TCG_TARGET_HAS_rem_i64          0
 #define TCG_TARGET_HAS_rot_i64          0
@@ -147,6 +146,11 @@ extern bool use_vis3_instructions;
 #define TCG_TARGET_HAS_muluh_i64        use_vis3_instructions
 #define TCG_TARGET_HAS_mulsh_i64        0
 
+/* size changing optional ops */
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 1
+#define TCG_TARGET_HAS_ext_i32_i64       0
+#define TCG_TARGET_HAS_extu_i32_i64      0
+
 #define TCG_AREG0 TCG_REG_I0
 
 static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 0e79fd1..c8db812 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1769,6 +1769,9 @@ void tcg_gen_extu_i32_i64(TCGv_i64 ret, TCGv_i32 arg)
     if (TCG_TARGET_REG_BITS == 32) {
         tcg_gen_mov_i32(TCGV_LOW(ret), arg);
         tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
+    } else if (TCG_TARGET_HAS_extu_i32_i64) {
+        tcg_gen_op2(&tcg_ctx, INDEX_op_extu_i32_i64,
+                    GET_TCGV_I64(ret), GET_TCGV_I32(arg));
     } else {
         /* Note: we assume the target supports move between
            32 and 64 bit registers.  */
@@ -1781,6 +1784,9 @@ void tcg_gen_ext_i32_i64(TCGv_i64 ret, TCGv_i32 arg)
     if (TCG_TARGET_REG_BITS == 32) {
         tcg_gen_mov_i32(TCGV_LOW(ret), arg);
         tcg_gen_sari_i32(TCGV_HIGH(ret), TCGV_LOW(ret), 31);
+    } else if (TCG_TARGET_HAS_extu_i32_i64) {
+        tcg_gen_op2(&tcg_ctx, INDEX_op_ext_i32_i64,
+                    GET_TCGV_I64(ret), GET_TCGV_I32(arg));
     } else {
         /* Note: we assume the target supports move between
            32 and 64 bit registers.  */
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 4a34f43..810b524 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -138,6 +138,9 @@ DEF(rotl_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
 DEF(rotr_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
 DEF(deposit_i64, 1, 2, 2, IMPL64 | IMPL(TCG_TARGET_HAS_deposit_i64))
 
+/* size changing ops */
+DEF(ext_i32_i64, 1, 1, 0, IMPL64 | IMPL(TCG_TARGET_HAS_ext_i32_i64))
+DEF(extu_i32_i64, 1, 1, 0, IMPL64 | IMPL(TCG_TARGET_HAS_extu_i32_i64))
 DEF(trunc_shr_i64_i32, 1, 1, 1,
     IMPL(TCG_TARGET_HAS_trunc_shr_i64_i32)
     | (TCG_TARGET_REG_BITS == 32 ? TCG_OPF_NOT_PRESENT : 0))
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 8b1139b..d649581 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -84,7 +84,6 @@
 #define TCG_TARGET_HAS_mulsh_i32        0
 
 #if TCG_TARGET_REG_BITS == 64
-#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 #define TCG_TARGET_HAS_bswap16_i64      1
 #define TCG_TARGET_HAS_bswap32_i64      1
 #define TCG_TARGET_HAS_bswap64_i64      1
@@ -115,6 +114,12 @@
 #define TCG_TARGET_HAS_mulu2_i64        0
 #define TCG_TARGET_HAS_muluh_i64        0
 #define TCG_TARGET_HAS_mulsh_i64        0
+
+/* size changing optional ops */
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
+#define TCG_TARGET_HAS_ext_i32_i64       0
+#define TCG_TARGET_HAS_extu_i32_i64      0
+
 #else
 #define TCG_TARGET_HAS_mulu2_i32        1
 #endif /* TCG_TARGET_REG_BITS == 64 */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 3/9] tcg: implement real ext_i32_i64 and extu_i32_i64 ops
  2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 3/9] tcg: implement real ext_i32_i64 and extu_i32_i64 ops Aurelien Jarno
@ 2015-07-17  6:19   ` Richard Henderson
  0 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2015-07-17  6:19 UTC (permalink / raw)
  To: Aurelien Jarno, qemu-devel; +Cc: Paolo Bonzini

On 07/15/2015 12:03 PM, Aurelien Jarno wrote:
> Implement optional but real ext_i32_i64 and extu_i32_i64 ops. When
> implemented, these ensure that a 32-bit value is always converted to
> a 64-bit value and not propagated through the register allocator or
> the optimizer.
>
> Cc: Paolo Bonzini<pbonzini@redhat.com>
> Cc: Richard Henderson<rth@twiddle.net>
> Signed-off-by: Aurelien Jarno<aurelien@aurel32.net>
> ---
>   tcg/aarch64/tcg-target.h | 6 +++++-
>   tcg/i386/tcg-target.h    | 7 ++++++-
>   tcg/ia64/tcg-target.h    | 6 +++++-
>   tcg/ppc/tcg-target.h     | 7 ++++++-
>   tcg/s390/tcg-target.h    | 6 +++++-
>   tcg/sparc/tcg-target.h   | 6 +++++-
>   tcg/tcg-op.c             | 6 ++++++
>   tcg/tcg-opc.h            | 3 +++
>   tcg/tci/tcg-target.h     | 7 ++++++-
>   9 files changed, 47 insertions(+), 7 deletions(-)


Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH RFC 4/9] tcg/optimize: add optimizations for ext_i32_i64 and extu_i32_i64 ops
  2015-07-15 11:03 [Qemu-devel] [PATCH RFC 0/9] tcg: improve size changing ops Aurelien Jarno
                   ` (2 preceding siblings ...)
  2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 3/9] tcg: implement real ext_i32_i64 and extu_i32_i64 ops Aurelien Jarno
@ 2015-07-15 11:03 ` Aurelien Jarno
  2015-07-17  6:23   ` Richard Henderson
  2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 5/9] tcg/i386: implement " Aurelien Jarno
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Aurelien Jarno @ 2015-07-15 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno, Richard Henderson

They behave the same as ext32s_i64 and ext32u_i64 from the constant
folding and zero propagation point of view, except that they can't
be replaced by a mov, so we don't compute the affected value.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index d66373d..18b7bc3 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -347,9 +347,11 @@ static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
     CASE_OP_32_64(ext16u):
         return (uint16_t)x;
 
+    case INDEX_op_ext_i32_i64:
     case INDEX_op_ext32s_i64:
         return (int32_t)x;
 
+    case INDEX_op_extu_i32_i64:
     case INDEX_op_ext32u_i64:
         return (uint32_t)x;
 
@@ -839,6 +841,14 @@ void tcg_optimize(TCGContext *s)
             mask = temps[args[1]].mask & mask;
             break;
 
+        case INDEX_op_ext_i32_i64:
+            if ((temps[args[1]].mask & 0x80000000) != 0) {
+                break;
+            }
+        case INDEX_op_extu_i32_i64:
+            mask = (uint32_t)temps[args[1]].mask;
+            break;
+
         CASE_OP_32_64(andc):
             /* Known-zeros does not imply known-ones.  Therefore unless
                args[2] is constant, we can't infer anything from it.  */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 4/9] tcg/optimize: add optimizations for ext_i32_i64 and extu_i32_i64 ops
  2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 4/9] tcg/optimize: add optimizations for " Aurelien Jarno
@ 2015-07-17  6:23   ` Richard Henderson
  0 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2015-07-17  6:23 UTC (permalink / raw)
  To: Aurelien Jarno, qemu-devel; +Cc: Paolo Bonzini

On 07/15/2015 12:03 PM, Aurelien Jarno wrote:
> They behave the same as ext32s_i64 and ext32u_i64 from the constant
> folding and zero propagation point of view, except that they can't
> be replaced by a mov, so we don't compute the affected value.
>
> Cc: Paolo Bonzini<pbonzini@redhat.com>
> Cc: Richard Henderson<rth@twiddle.net>
> Signed-off-by: Aurelien Jarno<aurelien@aurel32.net>
> ---
>   tcg/optimize.c | 10 ++++++++++
>   1 file changed, 10 insertions(+)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH RFC 5/9] tcg/i386: implement ext_i32_i64 and extu_i32_i64 ops
  2015-07-15 11:03 [Qemu-devel] [PATCH RFC 0/9] tcg: improve size changing ops Aurelien Jarno
                   ` (3 preceding siblings ...)
  2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 4/9] tcg/optimize: add optimizations for " Aurelien Jarno
@ 2015-07-15 11:03 ` Aurelien Jarno
  2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 6/9] tcg/i386: document the way 32/64-bit conversions are handled Aurelien Jarno
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Aurelien Jarno @ 2015-07-15 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno, Richard Henderson

Implementing them as real ops means they can't be optimized out by
the register allocator or the optimizer.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/i386/tcg-target.c | 5 +++++
 tcg/i386/tcg-target.h | 4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index ff4d9cf..637b1fb 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -2034,9 +2034,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_bswap64_i64:
         tcg_out_bswap64(s, args[0]);
         break;
+    case INDEX_op_extu_i32_i64:
     case INDEX_op_ext32u_i64:
         tcg_out_ext32u(s, args[0], args[1]);
         break;
+    case INDEX_op_ext_i32_i64:
     case INDEX_op_ext32s_i64:
         tcg_out_ext32s(s, args[0], args[1]);
         break;
@@ -2171,6 +2173,9 @@ static const TCGTargetOpDef x86_op_defs[] = {
     { INDEX_op_ext16u_i64, { "r", "r" } },
     { INDEX_op_ext32u_i64, { "r", "r" } },
 
+    { INDEX_op_ext_i32_i64, { "r", "r" } },
+    { INDEX_op_extu_i32_i64, { "r", "r" } },
+
     { INDEX_op_deposit_i64, { "Q", "0", "Q" } },
     { INDEX_op_movcond_i64, { "r", "r", "re", "r", "0" } },
 
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 274c97f..16f3949 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -131,8 +131,8 @@ extern bool have_bmi1;
 
 /* size changing optional ops */
 #define TCG_TARGET_HAS_trunc_shr_i64_i32 0
-#define TCG_TARGET_HAS_ext_i32_i64       0
-#define TCG_TARGET_HAS_extu_i32_i64      0
+#define TCG_TARGET_HAS_ext_i32_i64       1
+#define TCG_TARGET_HAS_extu_i32_i64      1
 
 #endif
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH RFC 6/9] tcg/i386: document the way 32/64-bit conversions are handled
  2015-07-15 11:03 [Qemu-devel] [PATCH RFC 0/9] tcg: improve size changing ops Aurelien Jarno
                   ` (4 preceding siblings ...)
  2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 5/9] tcg/i386: implement " Aurelien Jarno
@ 2015-07-15 11:03 ` Aurelien Jarno
  2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 7/9] tcg: replace ext/u_i32_i64 by a mov when not implemented Aurelien Jarno
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Aurelien Jarno @ 2015-07-15 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno, Richard Henderson

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/i386/tcg-target.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 16f3949..d483083 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -129,7 +129,11 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_muluh_i64        0
 #define TCG_TARGET_HAS_mulsh_i64        0
 
-/* size changing optional ops */
+/* size changing optional ops.  On x86 we ensure that all 32-bit ops
+   ignore the high bits. We therefore can implement trunc_shr_i64_i32
+   as a mov, but we need to implement ext_i32_i64 and extu_i32_i64 to
+   zero/sign extend the high bits when converting a 32-bit value into
+   a 64-bit one.  */
 #define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 #define TCG_TARGET_HAS_ext_i32_i64       1
 #define TCG_TARGET_HAS_extu_i32_i64      1
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH RFC 7/9] tcg: replace ext/u_i32_i64 by a mov when not implemented
  2015-07-15 11:03 [Qemu-devel] [PATCH RFC 0/9] tcg: improve size changing ops Aurelien Jarno
                   ` (5 preceding siblings ...)
  2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 6/9] tcg/i386: document the way 32/64-bit conversions are handled Aurelien Jarno
@ 2015-07-15 11:03 ` Aurelien Jarno
  2015-07-17  6:30   ` Richard Henderson
  2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 8/9] tcg/optimize: do not simplify size changing moves Aurelien Jarno
  2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 9/9] tcg: update README about size changing ops Aurelien Jarno
  8 siblings, 1 reply; 20+ messages in thread
From: Aurelien Jarno @ 2015-07-15 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno, Richard Henderson

When ext_i32_i64 and extu_i32_i64 ops are not implemented, this means
that the register is already properly zero/sign extended, so we can
simply replace it by a mov.

In practice it means at least one of the two ops should always be
implemented on 64-bit targets.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/tcg-op.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index c8db812..b4b1654 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1775,7 +1775,7 @@ void tcg_gen_extu_i32_i64(TCGv_i64 ret, TCGv_i32 arg)
     } else {
         /* Note: we assume the target supports move between
            32 and 64 bit registers.  */
-        tcg_gen_ext32u_i64(ret, MAKE_TCGV_I64(GET_TCGV_I32(arg)));
+        tcg_gen_mov_i64(ret, MAKE_TCGV_I64(GET_TCGV_I32(arg)));
     }
 }
 
@@ -1790,7 +1790,7 @@ void tcg_gen_ext_i32_i64(TCGv_i64 ret, TCGv_i32 arg)
     } else {
         /* Note: we assume the target supports move between
            32 and 64 bit registers.  */
-        tcg_gen_ext32s_i64(ret, MAKE_TCGV_I64(GET_TCGV_I32(arg)));
+        tcg_gen_mov_i64(ret, MAKE_TCGV_I64(GET_TCGV_I32(arg)));
     }
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 7/9] tcg: replace ext/u_i32_i64 by a mov when not implemented
  2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 7/9] tcg: replace ext/u_i32_i64 by a mov when not implemented Aurelien Jarno
@ 2015-07-17  6:30   ` Richard Henderson
  0 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2015-07-17  6:30 UTC (permalink / raw)
  To: Aurelien Jarno, qemu-devel; +Cc: Paolo Bonzini

On 07/15/2015 12:03 PM, Aurelien Jarno wrote:
> When ext_i32_i64 and extu_i32_i64 ops are not implemented, this means
> that the register is already properly zero/sign extended, so we can
> simply replace it by a mov.
>
> In practice it means at least one of the two ops should always be
> implemented on 64-bit targets.
>
> Cc: Paolo Bonzini<pbonzini@redhat.com>
> Cc: Richard Henderson<rth@twiddle.net>
> Signed-off-by: Aurelien Jarno<aurelien@aurel32.net>
> ---
>   tcg/tcg-op.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)

If we're going to do this (and of course pick a solution for all of the other 
backends), I think perhaps x86 should choose trunc + exts as the two that 
should be implemented, leaving extu the one that can be folded away.

Something to experiment with...


r~

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH RFC 8/9] tcg/optimize: do not simplify size changing moves
  2015-07-15 11:03 [Qemu-devel] [PATCH RFC 0/9] tcg: improve size changing ops Aurelien Jarno
                   ` (6 preceding siblings ...)
  2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 7/9] tcg: replace ext/u_i32_i64 by a mov when not implemented Aurelien Jarno
@ 2015-07-15 11:03 ` Aurelien Jarno
  2015-07-17  6:38   ` Richard Henderson
  2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 9/9] tcg: update README about size changing ops Aurelien Jarno
  8 siblings, 1 reply; 20+ messages in thread
From: Aurelien Jarno @ 2015-07-15 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno, Richard Henderson

Now that we have real size changing ops, we don't need to marked high
bits of the destination as garbage. The goal of the optimizer is to
predict the value of the temps (and not of the registers) and do
simplifications when possible. The problem there is therefore not the
fact that those bits are not counted as garbage, but that a size
changing op is replaced by a move.

This patch is basically a revert of 24666baf, including the changes that
have been made since then.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c | 28 ++++++----------------------
 1 file changed, 6 insertions(+), 22 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 18b7bc3..d1a0b6d 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -197,19 +197,13 @@ static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args,
                              TCGArg dst, TCGArg val)
 {
     TCGOpcode new_op = op_to_movi(op->opc);
-    tcg_target_ulong mask;
 
     op->opc = new_op;
 
     reset_temp(dst);
     temps[dst].state = TCG_TEMP_CONST;
     temps[dst].val = val;
-    mask = val;
-    if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_mov_i32) {
-        /* High bits of the destination are now garbage.  */
-        mask |= ~0xffffffffull;
-    }
-    temps[dst].mask = mask;
+    temps[dst].mask = val;
 
     args[0] = dst;
     args[1] = val;
@@ -229,17 +223,11 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
     }
 
     TCGOpcode new_op = op_to_mov(op->opc);
-    tcg_target_ulong mask;
 
     op->opc = new_op;
 
     reset_temp(dst);
-    mask = temps[src].mask;
-    if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_mov_i32) {
-        /* High bits of the destination are now garbage.  */
-        mask |= ~0xffffffffull;
-    }
-    temps[dst].mask = mask;
+    temps[src].mask = temps[dst].mask;
 
     assert(temps[src].state != TCG_TEMP_CONST);
 
@@ -590,7 +578,7 @@ void tcg_optimize(TCGContext *s)
     reset_all_temps(nb_temps);
 
     for (oi = s->gen_first_op_idx; oi >= 0; oi = oi_next) {
-        tcg_target_ulong mask, partmask, affected;
+        tcg_target_ulong mask, affected;
         int nb_oargs, nb_iargs, i;
         TCGArg tmp;
 
@@ -945,17 +933,13 @@ void tcg_optimize(TCGContext *s)
             break;
         }
 
-        /* 32-bit ops generate 32-bit results.  For the result is zero test
-           below, we can ignore high bits, but for further optimizations we
-           need to record that the high bits contain garbage.  */
-        partmask = mask;
+        /* 32-bit ops generate 32-bit results.  */
         if (!(def->flags & TCG_OPF_64BIT)) {
-            mask |= ~(tcg_target_ulong)0xffffffffu;
-            partmask &= 0xffffffffu;
+            mask &= 0xffffffffu;
             affected &= 0xffffffffu;
         }
 
-        if (partmask == 0) {
+        if (mask == 0) {
             assert(nb_oargs == 1);
             tcg_opt_gen_movi(s, op, args, args[0], 0);
             continue;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 8/9] tcg/optimize: do not simplify size changing moves
  2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 8/9] tcg/optimize: do not simplify size changing moves Aurelien Jarno
@ 2015-07-17  6:38   ` Richard Henderson
  2015-07-17 10:33     ` Aurelien Jarno
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Henderson @ 2015-07-17  6:38 UTC (permalink / raw)
  To: Aurelien Jarno, qemu-devel; +Cc: Paolo Bonzini

On 07/15/2015 12:03 PM, Aurelien Jarno wrote:
> Now that we have real size changing ops, we don't need to marked high
> bits of the destination as garbage. The goal of the optimizer is to
> predict the value of the temps (and not of the registers) and do
> simplifications when possible. The problem there is therefore not the
> fact that those bits are not counted as garbage, but that a size
> changing op is replaced by a move.
>
> This patch is basically a revert of 24666baf, including the changes that
> have been made since then.
>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

What we're missing here is whether the omitted size changing op is extu or 
exts.  Mask should be extended to match.  Which means keeping most of this code.


r~

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 8/9] tcg/optimize: do not simplify size changing moves
  2015-07-17  6:38   ` Richard Henderson
@ 2015-07-17 10:33     ` Aurelien Jarno
  2015-07-18  7:24       ` Richard Henderson
  0 siblings, 1 reply; 20+ messages in thread
From: Aurelien Jarno @ 2015-07-17 10:33 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Paolo Bonzini, qemu-devel

On 2015-07-17 07:38, Richard Henderson wrote:
> On 07/15/2015 12:03 PM, Aurelien Jarno wrote:
> >Now that we have real size changing ops, we don't need to marked high
> >bits of the destination as garbage. The goal of the optimizer is to
> >predict the value of the temps (and not of the registers) and do
> >simplifications when possible. The problem there is therefore not the
> >fact that those bits are not counted as garbage, but that a size
> >changing op is replaced by a move.
> >
> >This patch is basically a revert of 24666baf, including the changes that
> >have been made since then.
> >
> >Cc: Paolo Bonzini <pbonzini@redhat.com>
> >Cc: Richard Henderson <rth@twiddle.net>
> >Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> 
> What we're missing here is whether the omitted size changing op is extu or
> exts.  Mask should be extended to match.  Which means keeping most of this
> code.

I am afraid your correct. Unfortunately one of my goal is to remove this
part in the optimizer, as I need that in a patch series I am preparing.
I have also tried to check the temp type directly from the optimizer (it
is accessible), but it has some performance impact. Propagating the
extu/exts as real opcode means propagating the information about size
changing up to the optimizer or the register allocator, without having
to recreate it from other available information.

For now I do wonder if we shouldn't get the size changing extu/exts
mandatory instead of reusing the 64-bit only version. This doesn't
change the generated code, at least on x86.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 8/9] tcg/optimize: do not simplify size changing moves
  2015-07-17 10:33     ` Aurelien Jarno
@ 2015-07-18  7:24       ` Richard Henderson
  2015-07-18 21:19         ` Aurelien Jarno
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Henderson @ 2015-07-18  7:24 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: Paolo Bonzini, qemu-devel

On 07/17/2015 11:33 AM, Aurelien Jarno wrote:
> For now I do wonder if we shouldn't get the size changing extu/exts
> mandatory instead of reusing the 64-bit only version. This doesn't
> change the generated code, at least on x86.

I'd be surprised if it did anywhere.  I don't mind starting with them being 
required, and then figuring out a way to optimize.


r~

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 8/9] tcg/optimize: do not simplify size changing moves
  2015-07-18  7:24       ` Richard Henderson
@ 2015-07-18 21:19         ` Aurelien Jarno
  0 siblings, 0 replies; 20+ messages in thread
From: Aurelien Jarno @ 2015-07-18 21:19 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Paolo Bonzini, qemu-devel

On 2015-07-18 08:24, Richard Henderson wrote:
> On 07/17/2015 11:33 AM, Aurelien Jarno wrote:
> >For now I do wonder if we shouldn't get the size changing extu/exts
> >mandatory instead of reusing the 64-bit only version. This doesn't
> >change the generated code, at least on x86.
> 
> I'd be surprised if it did anywhere.  I don't mind starting with them being
> required, and then figuring out a way to optimize.

I have a patch series ready for that if you want I can post it as RFC.

That said looking more deeply into the problem you found I guess we can
solve that easily by using the same convention than the real CPU for
storing 32-bit constants in the TCG optimizer.

This roughly means the following code for the 32-bit ops:

         /* 32-bit ops generate 32-bit results.  */
         if (!(def->flags & TCG_OPF_64BIT)) {
             if (!TCG_TARGET_HAS_ext_i32_i64) {
                 /* registers are maintained sign-extended */
                 mask = (int32_t)mask;
                 affected = (int32_t)mask;
             } else if (!TCG_TARGET_HAS_extu_i32_i64) { 
                 /* registers are maintained zero-extended */
                 mask = (uint32_t)mask;
                 affected = (uint32_t)mask;
             } else {
                 /* high bits will be computed by ext/extu_i32_i64 */
                 mask = (uint32_t)mask;
                 affected = (uint32_t)mask;
             }
         }

And that would be fine for my patch series in preparation, as long as I
can predict the high part instead of considering it as garbage.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH RFC 9/9] tcg: update README about size changing ops
  2015-07-15 11:03 [Qemu-devel] [PATCH RFC 0/9] tcg: improve size changing ops Aurelien Jarno
                   ` (7 preceding siblings ...)
  2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 8/9] tcg/optimize: do not simplify size changing moves Aurelien Jarno
@ 2015-07-15 11:03 ` Aurelien Jarno
  2015-07-17  6:42   ` Richard Henderson
  8 siblings, 1 reply; 20+ messages in thread
From: Aurelien Jarno @ 2015-07-15 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Aurelien Jarno, Richard Henderson

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/README | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/tcg/README b/tcg/README
index 61b3899..d8fd17a 100644
--- a/tcg/README
+++ b/tcg/README
@@ -470,8 +470,19 @@ Floating point operations are not supported in this version. A
 previous incarnation of the code generator had full support of them,
 but it is better to concentrate on integer operations first.
 
-On a 64 bit target, no assumption is made in TCG about the storage of
-the 32 bit values in 64 bit registers.
+On a 64 bit target, the values are transfered between 32 and 64-bit
+registers by the mean of the following ops:
+- trunc_shr_i64_i32
+- ext_i32_i64
+- extu_i32_i64
+
+These ops are all optional in that case they are implemented as mov.
+This is to allow some optimizations if the target maintains registers
+zero or sign extended. For example a MIPS64 CPU requires that all
+32-bit values are stored sign-extended in the registers. This means
+the trunc_shr_i64_i32 should sign-extend the value when moving it
+from a 64-bit to a 32-bit register. It also means ext_i32_i64 can be
+implemented as a simple mov as the value is already sign extended.
 
 4.2) Constraints
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 9/9] tcg: update README about size changing ops
  2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 9/9] tcg: update README about size changing ops Aurelien Jarno
@ 2015-07-17  6:42   ` Richard Henderson
  0 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2015-07-17  6:42 UTC (permalink / raw)
  To: Aurelien Jarno, qemu-devel; +Cc: Paolo Bonzini

On 07/15/2015 12:03 PM, Aurelien Jarno wrote:
> +These ops are all optional in that case they are implemented as mov.
> +This is to allow some optimizations if the target maintains registers
> +zero or sign extended. For example a MIPS64 CPU requires that all
> +32-bit values are stored sign-extended in the registers. This means
> +the trunc_shr_i64_i32 should sign-extend the value when moving it
> +from a 64-bit to a 32-bit register. It also means ext_i32_i64 can be
> +implemented as a simple mov as the value is already sign extended.

We need better wording.  Each one of the three are optional, and the other two 
must be implemented.  I think we ought to have a check in tcg.c about this, in 
tcg_add_target_add_op_defs.


r~

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-07-18 21:19 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-15 11:03 [Qemu-devel] [PATCH RFC 0/9] tcg: improve size changing ops Aurelien Jarno
2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 1/9] tcg: rename trunc_shr_i32 into trunc_shr_i64_i32 Aurelien Jarno
2015-07-17  6:14   ` Richard Henderson
2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 2/9] tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32 Aurelien Jarno
2015-07-17  6:14   ` Richard Henderson
2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 3/9] tcg: implement real ext_i32_i64 and extu_i32_i64 ops Aurelien Jarno
2015-07-17  6:19   ` Richard Henderson
2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 4/9] tcg/optimize: add optimizations for " Aurelien Jarno
2015-07-17  6:23   ` Richard Henderson
2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 5/9] tcg/i386: implement " Aurelien Jarno
2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 6/9] tcg/i386: document the way 32/64-bit conversions are handled Aurelien Jarno
2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 7/9] tcg: replace ext/u_i32_i64 by a mov when not implemented Aurelien Jarno
2015-07-17  6:30   ` Richard Henderson
2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 8/9] tcg/optimize: do not simplify size changing moves Aurelien Jarno
2015-07-17  6:38   ` Richard Henderson
2015-07-17 10:33     ` Aurelien Jarno
2015-07-18  7:24       ` Richard Henderson
2015-07-18 21:19         ` Aurelien Jarno
2015-07-15 11:03 ` [Qemu-devel] [PATCH RFC 9/9] tcg: update README about size changing ops Aurelien Jarno
2015-07-17  6:42   ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).