* [PULL 00/24] tcg patch queue
@ 2021-01-14  2:16 Richard Henderson
  2021-01-14  2:16 ` [PULL 01/24] tcg: Use tcg_out_dupi_vec from temp_load Richard Henderson
                   ` (24 more replies)
  0 siblings, 25 replies; 50+ messages in thread
From: Richard Henderson @ 2021-01-14  2:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell
The following changes since commit 45240eed4f064576d589ea60ebadf3c11d7ab891:
  Merge remote-tracking branch 'remotes/armbru/tags/pull-yank-2021-01-13' into staging (2021-01-13 14:19:24 +0000)
are available in the Git repository at:
  https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20210113
for you to fetch changes up to 4cacecaaa2bbf8af0967bd3eee43297fada475a9:
  decodetree: Open files with encoding='utf-8' (2021-01-13 08:39:08 -1000)
----------------------------------------------------------------
Improvements to tcg constant handling.
Force utf8 for decodetree.
----------------------------------------------------------------
Philippe Mathieu-Daudé (1):
      decodetree: Open files with encoding='utf-8'
Richard Henderson (23):
      tcg: Use tcg_out_dupi_vec from temp_load
      tcg: Increase tcg_out_dupi_vec immediate to int64_t
      tcg: Consolidate 3 bits into enum TCGTempKind
      tcg: Add temp_readonly
      tcg: Expand TCGTemp.val to 64-bits
      tcg: Rename struct tcg_temp_info to TempOptInfo
      tcg: Expand TempOptInfo to 64-bits
      tcg: Introduce TYPE_CONST temporaries
      tcg/optimize: Improve find_better_copy
      tcg/optimize: Adjust TempOptInfo allocation
      tcg/optimize: Use tcg_constant_internal with constant folding
      tcg: Convert tcg_gen_dupi_vec to TCG_CONST
      tcg: Use tcg_constant_i32 with icount expander
      tcg: Use tcg_constant_{i32,i64} with tcg int expanders
      tcg: Use tcg_constant_{i32,i64} with tcg plugins
      tcg: Use tcg_constant_{i32,i64,vec} with gvec expanders
      tcg/tci: Add special tci_movi_{i32,i64} opcodes
      tcg: Remove movi and dupi opcodes
      tcg: Add tcg_reg_alloc_dup2
      tcg/i386: Use tcg_constant_vec with tcg vec expanders
      tcg: Remove tcg_gen_dup{8,16,32,64}i_vec
      tcg/ppc: Use tcg_constant_vec with tcg vec expanders
      tcg/aarch64: Use tcg_constant_vec with tcg vec expanders
 include/exec/gen-icount.h    |  25 +--
 include/tcg/tcg-op.h         |  17 +-
 include/tcg/tcg-opc.h        |  11 +-
 include/tcg/tcg.h            |  50 ++++-
 accel/tcg/plugin-gen.c       |  49 ++---
 tcg/optimize.c               | 249 +++++++++++-----------
 tcg/tcg-op-gvec.c            | 129 +++++-------
 tcg/tcg-op-vec.c             |  52 +----
 tcg/tcg-op.c                 | 227 ++++++++++----------
 tcg/tcg.c                    | 488 +++++++++++++++++++++++++++++++++----------
 tcg/tci.c                    |   4 +-
 tcg/aarch64/tcg-target.c.inc |  32 +--
 tcg/arm/tcg-target.c.inc     |   1 -
 tcg/i386/tcg-target.c.inc    | 112 ++++++----
 tcg/mips/tcg-target.c.inc    |   2 -
 tcg/ppc/tcg-target.c.inc     |  90 ++++----
 tcg/riscv/tcg-target.c.inc   |   2 -
 tcg/s390/tcg-target.c.inc    |   2 -
 tcg/sparc/tcg-target.c.inc   |   2 -
 tcg/tci/tcg-target.c.inc     |   6 +-
 scripts/decodetree.py        |   9 +-
 21 files changed, 890 insertions(+), 669 deletions(-)
^ permalink raw reply	[flat|nested] 50+ messages in thread
* [PULL 01/24] tcg: Use tcg_out_dupi_vec from temp_load
  2021-01-14  2:16 [PULL 00/24] tcg patch queue Richard Henderson
@ 2021-01-14  2:16 ` Richard Henderson
  2021-01-14  2:16 ` [PULL 02/24] tcg: Increase tcg_out_dupi_vec immediate to int64_t Richard Henderson
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 50+ messages in thread
From: Richard Henderson @ 2021-01-14  2:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Alex Bennée
Having dupi pass though movi is confusing and arguably wrong.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg.c                    |  6 +++-
 tcg/aarch64/tcg-target.c.inc |  7 ----
 tcg/i386/tcg-target.c.inc    | 63 ++++++++++++++++++++++++------------
 tcg/ppc/tcg-target.c.inc     |  6 ----
 4 files changed, 47 insertions(+), 35 deletions(-)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 472bf1755b..ded3c928e3 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -3387,7 +3387,11 @@ static void temp_load(TCGContext *s, TCGTemp *ts, TCGRegSet desired_regs,
     case TEMP_VAL_CONST:
         reg = tcg_reg_alloc(s, desired_regs, allocated_regs,
                             preferred_regs, ts->indirect_base);
-        tcg_out_movi(s, ts->type, reg, ts->val);
+        if (ts->type <= TCG_TYPE_I64) {
+            tcg_out_movi(s, ts->type, reg, ts->val);
+        } else {
+            tcg_out_dupi_vec(s, ts->type, reg, ts->val);
+        }
         ts->mem_coherent = 0;
         break;
     case TEMP_VAL_MEM:
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index ab199b143f..a2a588e3aa 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1011,13 +1011,6 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
     case TCG_TYPE_I64:
         tcg_debug_assert(rd < 32);
         break;
-
-    case TCG_TYPE_V64:
-    case TCG_TYPE_V128:
-        tcg_debug_assert(rd >= 32);
-        tcg_out_dupi_vec(s, type, rd, value);
-        return;
-
     default:
         g_assert_not_reached();
     }
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 46e856f442..35554fd1e8 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -975,30 +975,32 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
     }
 }
 
-static void tcg_out_movi(TCGContext *s, TCGType type,
-                         TCGReg ret, tcg_target_long arg)
+static void tcg_out_movi_vec(TCGContext *s, TCGType type,
+                             TCGReg ret, tcg_target_long arg)
+{
+    if (arg == 0) {
+        tcg_out_vex_modrm(s, OPC_PXOR, ret, ret, ret);
+        return;
+    }
+    if (arg == -1) {
+        tcg_out_vex_modrm(s, OPC_PCMPEQB, ret, ret, ret);
+        return;
+    }
+
+    int rexw = (type == TCG_TYPE_I32 ? 0 : P_REXW);
+    tcg_out_vex_modrm_pool(s, OPC_MOVD_VyEy + rexw, ret);
+    if (TCG_TARGET_REG_BITS == 64) {
+        new_pool_label(s, arg, R_386_PC32, s->code_ptr - 4, -4);
+    } else {
+        new_pool_label(s, arg, R_386_32, s->code_ptr - 4, 0);
+    }
+}
+
+static void tcg_out_movi_int(TCGContext *s, TCGType type,
+                             TCGReg ret, tcg_target_long arg)
 {
     tcg_target_long diff;
 
-    switch (type) {
-    case TCG_TYPE_I32:
-#if TCG_TARGET_REG_BITS == 64
-    case TCG_TYPE_I64:
-#endif
-        if (ret < 16) {
-            break;
-        }
-        /* fallthru */
-    case TCG_TYPE_V64:
-    case TCG_TYPE_V128:
-    case TCG_TYPE_V256:
-        tcg_debug_assert(ret >= 16);
-        tcg_out_dupi_vec(s, type, ret, arg);
-        return;
-    default:
-        g_assert_not_reached();
-    }
-
     if (arg == 0) {
         tgen_arithr(s, ARITH_XOR, ret, ret);
         return;
@@ -1027,6 +1029,25 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
     tcg_out64(s, arg);
 }
 
+static void tcg_out_movi(TCGContext *s, TCGType type,
+                         TCGReg ret, tcg_target_long arg)
+{
+    switch (type) {
+    case TCG_TYPE_I32:
+#if TCG_TARGET_REG_BITS == 64
+    case TCG_TYPE_I64:
+#endif
+        if (ret < 16) {
+            tcg_out_movi_int(s, type, ret, arg);
+        } else {
+            tcg_out_movi_vec(s, type, ret, arg);
+        }
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static inline void tcg_out_pushi(TCGContext *s, tcg_target_long val)
 {
     if (val == (int8_t)val) {
diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 19a4a12f15..a3f1bd41cd 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -987,12 +987,6 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg ret,
         tcg_out_movi_int(s, type, ret, arg, false);
         break;
 
-    case TCG_TYPE_V64:
-    case TCG_TYPE_V128:
-        tcg_debug_assert(ret >= TCG_REG_V0);
-        tcg_out_dupi_vec(s, type, ret, arg);
-        break;
-
     default:
         g_assert_not_reached();
     }
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 50+ messages in thread
* [PULL 02/24] tcg: Increase tcg_out_dupi_vec immediate to int64_t
  2021-01-14  2:16 [PULL 00/24] tcg patch queue Richard Henderson
  2021-01-14  2:16 ` [PULL 01/24] tcg: Use tcg_out_dupi_vec from temp_load Richard Henderson
@ 2021-01-14  2:16 ` Richard Henderson
  2021-01-14  2:16 ` [PULL 03/24] tcg: Consolidate 3 bits into enum TCGTempKind Richard Henderson
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 50+ messages in thread
From: Richard Henderson @ 2021-01-14  2:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell
While we don't store more than tcg_target_long in TCGTemp,
we shouldn't be limited to that for code generation.  We will
be able to use this for INDEX_op_dup2_vec with 2 constants.
Also pass along the minimal vece that may be said to apply
to the constant.  This allows some simplification in the
various backends.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg.c                    | 31 +++++++++++++++++++++++++-----
 tcg/aarch64/tcg-target.c.inc | 12 ++++++------
 tcg/i386/tcg-target.c.inc    | 22 ++++++++++++---------
 tcg/ppc/tcg-target.c.inc     | 37 +++++++++++++++++++++++-------------
 4 files changed, 69 insertions(+), 33 deletions(-)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index ded3c928e3..c73128208c 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -117,8 +117,8 @@ static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
                             TCGReg dst, TCGReg src);
 static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
                              TCGReg dst, TCGReg base, intptr_t offset);
-static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
-                             TCGReg dst, tcg_target_long arg);
+static void tcg_out_dupi_vec(TCGContext *s, TCGType type, unsigned vece,
+                             TCGReg dst, int64_t arg);
 static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, unsigned vecl,
                            unsigned vece, const TCGArg *args,
                            const int *const_args);
@@ -133,8 +133,8 @@ static inline bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
 {
     g_assert_not_reached();
 }
-static inline void tcg_out_dupi_vec(TCGContext *s, TCGType type,
-                                    TCGReg dst, tcg_target_long arg)
+static inline void tcg_out_dupi_vec(TCGContext *s, TCGType type, unsigned vece,
+                                    TCGReg dst, int64_t arg)
 {
     g_assert_not_reached();
 }
@@ -3390,7 +3390,28 @@ static void temp_load(TCGContext *s, TCGTemp *ts, TCGRegSet desired_regs,
         if (ts->type <= TCG_TYPE_I64) {
             tcg_out_movi(s, ts->type, reg, ts->val);
         } else {
-            tcg_out_dupi_vec(s, ts->type, reg, ts->val);
+            uint64_t val = ts->val;
+            MemOp vece = MO_64;
+
+            /*
+             * Find the minimal vector element that matches the constant.
+             * The targets will, in general, have to do this search anyway,
+             * do this generically.
+             */
+            if (TCG_TARGET_REG_BITS == 32) {
+                val = dup_const(MO_32, val);
+                vece = MO_32;
+            }
+            if (val == dup_const(MO_8, val)) {
+                vece = MO_8;
+            } else if (val == dup_const(MO_16, val)) {
+                vece = MO_16;
+            } else if (TCG_TARGET_REG_BITS == 64 &&
+                       val == dup_const(MO_32, val)) {
+                vece = MO_32;
+            }
+
+            tcg_out_dupi_vec(s, ts->type, vece, reg, ts->val);
         }
         ts->mem_coherent = 0;
         break;
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index a2a588e3aa..be6d3ea2a8 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -857,14 +857,14 @@ static void tcg_out_logicali(TCGContext *s, AArch64Insn insn, TCGType ext,
     tcg_out_insn_3404(s, insn, ext, rd, rn, ext, r, c);
 }
 
-static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
-                             TCGReg rd, tcg_target_long v64)
+static void tcg_out_dupi_vec(TCGContext *s, TCGType type, unsigned vece,
+                             TCGReg rd, int64_t v64)
 {
     bool q = type == TCG_TYPE_V128;
     int cmode, imm8, i;
 
     /* Test all bytes equal first.  */
-    if (v64 == dup_const(MO_8, v64)) {
+    if (vece == MO_8) {
         imm8 = (uint8_t)v64;
         tcg_out_insn(s, 3606, MOVI, q, rd, 0, 0xe, imm8);
         return;
@@ -891,7 +891,7 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
      * cannot find an expansion there's no point checking a larger
      * width because we already know by replication it cannot match.
      */
-    if (v64 == dup_const(MO_16, v64)) {
+    if (vece == MO_16) {
         uint16_t v16 = v64;
 
         if (is_shimm16(v16, &cmode, &imm8)) {
@@ -910,7 +910,7 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
         tcg_out_insn(s, 3606, MOVI, q, rd, 0, 0x8, v16 & 0xff);
         tcg_out_insn(s, 3606, ORR, q, rd, 0, 0xa, v16 >> 8);
         return;
-    } else if (v64 == dup_const(MO_32, v64)) {
+    } else if (vece == MO_32) {
         uint32_t v32 = v64;
         uint32_t n32 = ~v32;
 
@@ -2435,7 +2435,7 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
                         tcg_out_insn_3617(s, insn, is_q, vece, a0, a1);
                         break;
                     }
-                    tcg_out_dupi_vec(s, type, TCG_VEC_TMP, 0);
+                    tcg_out_dupi_vec(s, type, MO_8, TCG_VEC_TMP, 0);
                     a2 = TCG_VEC_TMP;
                 }
                 insn = cmp_insn[cond];
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 35554fd1e8..9f81e11773 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -942,8 +942,8 @@ static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
     return true;
 }
 
-static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
-                             TCGReg ret, tcg_target_long arg)
+static void tcg_out_dupi_vec(TCGContext *s, TCGType type, unsigned vece,
+                             TCGReg ret, int64_t arg)
 {
     int vex_l = (type == TCG_TYPE_V256 ? P_VEXL : 0);
 
@@ -956,7 +956,14 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
         return;
     }
 
-    if (TCG_TARGET_REG_BITS == 64) {
+    if (TCG_TARGET_REG_BITS == 32 && vece < MO_64) {
+        if (have_avx2) {
+            tcg_out_vex_modrm_pool(s, OPC_VPBROADCASTD + vex_l, ret);
+        } else {
+            tcg_out_vex_modrm_pool(s, OPC_VBROADCASTSS, ret);
+        }
+        new_pool_label(s, arg, R_386_32, s->code_ptr - 4, 0);
+    } else {
         if (type == TCG_TYPE_V64) {
             tcg_out_vex_modrm_pool(s, OPC_MOVQ_VqWq, ret);
         } else if (have_avx2) {
@@ -964,14 +971,11 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
         } else {
             tcg_out_vex_modrm_pool(s, OPC_MOVDDUP, ret);
         }
-        new_pool_label(s, arg, R_386_PC32, s->code_ptr - 4, -4);
-    } else {
-        if (have_avx2) {
-            tcg_out_vex_modrm_pool(s, OPC_VPBROADCASTD + vex_l, ret);
+        if (TCG_TARGET_REG_BITS == 64) {
+            new_pool_label(s, arg, R_386_PC32, s->code_ptr - 4, -4);
         } else {
-            tcg_out_vex_modrm_pool(s, OPC_VBROADCASTSS, ret);
+            new_pool_l2(s, R_386_32, s->code_ptr - 4, 0, arg, arg >> 32);
         }
-        new_pool_label(s, arg, R_386_32, s->code_ptr - 4, 0);
     }
 }
 
diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index a3f1bd41cd..d00ec20203 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -912,31 +912,41 @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
     }
 }
 
-static void tcg_out_dupi_vec(TCGContext *s, TCGType type, TCGReg ret,
-                             tcg_target_long val)
+static void tcg_out_dupi_vec(TCGContext *s, TCGType type, unsigned vece,
+                             TCGReg ret, int64_t val)
 {
     uint32_t load_insn;
     int rel, low;
     intptr_t add;
 
-    low = (int8_t)val;
-    if (low >= -16 && low < 16) {
-        if (val == (tcg_target_long)dup_const(MO_8, low)) {
+    switch (vece) {
+    case MO_8:
+        low = (int8_t)val;
+        if (low >= -16 && low < 16) {
             tcg_out32(s, VSPLTISB | VRT(ret) | ((val & 31) << 16));
             return;
         }
-        if (val == (tcg_target_long)dup_const(MO_16, low)) {
+        if (have_isa_3_00) {
+            tcg_out32(s, XXSPLTIB | VRT(ret) | ((val & 0xff) << 11));
+            return;
+        }
+        break;
+
+    case MO_16:
+        low = (int16_t)val;
+        if (low >= -16 && low < 16) {
             tcg_out32(s, VSPLTISH | VRT(ret) | ((val & 31) << 16));
             return;
         }
-        if (val == (tcg_target_long)dup_const(MO_32, low)) {
+        break;
+
+    case MO_32:
+        low = (int32_t)val;
+        if (low >= -16 && low < 16) {
             tcg_out32(s, VSPLTISW | VRT(ret) | ((val & 31) << 16));
             return;
         }
-    }
-    if (have_isa_3_00 && val == (tcg_target_long)dup_const(MO_8, val)) {
-        tcg_out32(s, XXSPLTIB | VRT(ret) | ((val & 0xff) << 11));
-        return;
+        break;
     }
 
     /*
@@ -956,14 +966,15 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type, TCGReg ret,
         if (TCG_TARGET_REG_BITS == 64) {
             new_pool_label(s, val, rel, s->code_ptr, add);
         } else {
-            new_pool_l2(s, rel, s->code_ptr, add, val, val);
+            new_pool_l2(s, rel, s->code_ptr, add, val >> 32, val);
         }
     } else {
         load_insn = LVX | VRT(ret) | RB(TCG_REG_TMP1);
         if (TCG_TARGET_REG_BITS == 64) {
             new_pool_l2(s, rel, s->code_ptr, add, val, val);
         } else {
-            new_pool_l4(s, rel, s->code_ptr, add, val, val, val, val);
+            new_pool_l4(s, rel, s->code_ptr, add,
+                        val >> 32, val, val >> 32, val);
         }
     }
 
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 50+ messages in thread
* [PULL 03/24] tcg: Consolidate 3 bits into enum TCGTempKind
  2021-01-14  2:16 [PULL 00/24] tcg patch queue Richard Henderson
  2021-01-14  2:16 ` [PULL 01/24] tcg: Use tcg_out_dupi_vec from temp_load Richard Henderson
  2021-01-14  2:16 ` [PULL 02/24] tcg: Increase tcg_out_dupi_vec immediate to int64_t Richard Henderson
@ 2021-01-14  2:16 ` Richard Henderson
  2021-01-14  2:16 ` [PULL 04/24] tcg: Add temp_readonly Richard Henderson
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 50+ messages in thread
From: Richard Henderson @ 2021-01-14  2:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Alex Bennée, Philippe Mathieu-Daudé
The temp_fixed, temp_global, temp_local bits are all related.
Combine them into a single enumeration.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg.h |  20 +++++---
 tcg/optimize.c    |   8 +--
 tcg/tcg.c         | 126 ++++++++++++++++++++++++++++------------------
 3 files changed, 92 insertions(+), 62 deletions(-)
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 95fe5604eb..571d4e0fa3 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -483,23 +483,27 @@ typedef enum TCGTempVal {
     TEMP_VAL_CONST,
 } TCGTempVal;
 
+typedef enum TCGTempKind {
+    /* Temp is dead at the end of all basic blocks. */
+    TEMP_NORMAL,
+    /* Temp is saved across basic blocks but dead at the end of TBs. */
+    TEMP_LOCAL,
+    /* Temp is saved across both basic blocks and translation blocks. */
+    TEMP_GLOBAL,
+    /* Temp is in a fixed register. */
+    TEMP_FIXED,
+} TCGTempKind;
+
 typedef struct TCGTemp {
     TCGReg reg:8;
     TCGTempVal val_type:8;
     TCGType base_type:8;
     TCGType type:8;
-    unsigned int fixed_reg:1;
+    TCGTempKind kind:3;
     unsigned int indirect_reg:1;
     unsigned int indirect_base:1;
     unsigned int mem_coherent:1;
     unsigned int mem_allocated:1;
-    /* If true, the temp is saved across both basic blocks and
-       translation blocks.  */
-    unsigned int temp_global:1;
-    /* If true, the temp is saved across basic blocks but dead
-       at the end of translation blocks.  If false, the temp is
-       dead at the end of basic blocks.  */
-    unsigned int temp_local:1;
     unsigned int temp_allocated:1;
 
     tcg_target_long val;
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 1fb42eb2a9..2f827a9d2d 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -116,21 +116,21 @@ static TCGTemp *find_better_copy(TCGContext *s, TCGTemp *ts)
     TCGTemp *i;
 
     /* If this is already a global, we can't do better. */
-    if (ts->temp_global) {
+    if (ts->kind >= TEMP_GLOBAL) {
         return ts;
     }
 
     /* Search for a global first. */
     for (i = ts_info(ts)->next_copy; i != ts; i = ts_info(i)->next_copy) {
-        if (i->temp_global) {
+        if (i->kind >= TEMP_GLOBAL) {
             return i;
         }
     }
 
     /* If it is a temp, search for a temp local. */
-    if (!ts->temp_local) {
+    if (ts->kind == TEMP_NORMAL) {
         for (i = ts_info(ts)->next_copy; i != ts; i = ts_info(i)->next_copy) {
-            if (ts->temp_local) {
+            if (i->kind >= TEMP_LOCAL) {
                 return i;
             }
         }
diff --git a/tcg/tcg.c b/tcg/tcg.c
index c73128208c..143794a585 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1211,7 +1211,7 @@ static inline TCGTemp *tcg_global_alloc(TCGContext *s)
     tcg_debug_assert(s->nb_globals == s->nb_temps);
     s->nb_globals++;
     ts = tcg_temp_alloc(s);
-    ts->temp_global = 1;
+    ts->kind = TEMP_GLOBAL;
 
     return ts;
 }
@@ -1228,7 +1228,7 @@ static TCGTemp *tcg_global_reg_new_internal(TCGContext *s, TCGType type,
     ts = tcg_global_alloc(s);
     ts->base_type = type;
     ts->type = type;
-    ts->fixed_reg = 1;
+    ts->kind = TEMP_FIXED;
     ts->reg = reg;
     ts->name = name;
     tcg_regset_set_reg(s->reserved_regs, reg);
@@ -1255,7 +1255,7 @@ TCGTemp *tcg_global_mem_new_internal(TCGType type, TCGv_ptr base,
     bigendian = 1;
 #endif
 
-    if (!base_ts->fixed_reg) {
+    if (base_ts->kind != TEMP_FIXED) {
         /* We do not support double-indirect registers.  */
         tcg_debug_assert(!base_ts->indirect_reg);
         base_ts->indirect_base = 1;
@@ -1303,6 +1303,7 @@ TCGTemp *tcg_global_mem_new_internal(TCGType type, TCGv_ptr base,
 TCGTemp *tcg_temp_new_internal(TCGType type, bool temp_local)
 {
     TCGContext *s = tcg_ctx;
+    TCGTempKind kind = temp_local ? TEMP_LOCAL : TEMP_NORMAL;
     TCGTemp *ts;
     int idx, k;
 
@@ -1315,7 +1316,7 @@ TCGTemp *tcg_temp_new_internal(TCGType type, bool temp_local)
         ts = &s->temps[idx];
         ts->temp_allocated = 1;
         tcg_debug_assert(ts->base_type == type);
-        tcg_debug_assert(ts->temp_local == temp_local);
+        tcg_debug_assert(ts->kind == kind);
     } else {
         ts = tcg_temp_alloc(s);
         if (TCG_TARGET_REG_BITS == 32 && type == TCG_TYPE_I64) {
@@ -1324,18 +1325,18 @@ TCGTemp *tcg_temp_new_internal(TCGType type, bool temp_local)
             ts->base_type = type;
             ts->type = TCG_TYPE_I32;
             ts->temp_allocated = 1;
-            ts->temp_local = temp_local;
+            ts->kind = kind;
 
             tcg_debug_assert(ts2 == ts + 1);
             ts2->base_type = TCG_TYPE_I64;
             ts2->type = TCG_TYPE_I32;
             ts2->temp_allocated = 1;
-            ts2->temp_local = temp_local;
+            ts2->kind = kind;
         } else {
             ts->base_type = type;
             ts->type = type;
             ts->temp_allocated = 1;
-            ts->temp_local = temp_local;
+            ts->kind = kind;
         }
     }
 
@@ -1392,12 +1393,12 @@ void tcg_temp_free_internal(TCGTemp *ts)
     }
 #endif
 
-    tcg_debug_assert(ts->temp_global == 0);
+    tcg_debug_assert(ts->kind < TEMP_GLOBAL);
     tcg_debug_assert(ts->temp_allocated != 0);
     ts->temp_allocated = 0;
 
     idx = temp_idx(ts);
-    k = ts->base_type + (ts->temp_local ? TCG_TYPE_COUNT : 0);
+    k = ts->base_type + (ts->kind == TEMP_NORMAL ? 0 : TCG_TYPE_COUNT);
     set_bit(idx, s->free_temps[k].l);
 }
 
@@ -1930,17 +1931,27 @@ void tcg_gen_callN(void *func, TCGTemp *ret, int nargs, TCGTemp **args)
 static void tcg_reg_alloc_start(TCGContext *s)
 {
     int i, n;
-    TCGTemp *ts;
 
-    for (i = 0, n = s->nb_globals; i < n; i++) {
-        ts = &s->temps[i];
-        ts->val_type = (ts->fixed_reg ? TEMP_VAL_REG : TEMP_VAL_MEM);
-    }
-    for (n = s->nb_temps; i < n; i++) {
-        ts = &s->temps[i];
-        ts->val_type = (ts->temp_local ? TEMP_VAL_MEM : TEMP_VAL_DEAD);
-        ts->mem_allocated = 0;
-        ts->fixed_reg = 0;
+    for (i = 0, n = s->nb_temps; i < n; i++) {
+        TCGTemp *ts = &s->temps[i];
+        TCGTempVal val = TEMP_VAL_MEM;
+
+        switch (ts->kind) {
+        case TEMP_FIXED:
+            val = TEMP_VAL_REG;
+            break;
+        case TEMP_GLOBAL:
+            break;
+        case TEMP_NORMAL:
+            val = TEMP_VAL_DEAD;
+            /* fall through */
+        case TEMP_LOCAL:
+            ts->mem_allocated = 0;
+            break;
+        default:
+            g_assert_not_reached();
+        }
+        ts->val_type = val;
     }
 
     memset(s->reg_to_temp, 0, sizeof(s->reg_to_temp));
@@ -1951,12 +1962,17 @@ static char *tcg_get_arg_str_ptr(TCGContext *s, char *buf, int buf_size,
 {
     int idx = temp_idx(ts);
 
-    if (ts->temp_global) {
+    switch (ts->kind) {
+    case TEMP_FIXED:
+    case TEMP_GLOBAL:
         pstrcpy(buf, buf_size, ts->name);
-    } else if (ts->temp_local) {
+        break;
+    case TEMP_LOCAL:
         snprintf(buf, buf_size, "loc%d", idx - s->nb_globals);
-    } else {
+        break;
+    case TEMP_NORMAL:
         snprintf(buf, buf_size, "tmp%d", idx - s->nb_globals);
+        break;
     }
     return buf;
 }
@@ -2547,15 +2563,24 @@ static void la_bb_end(TCGContext *s, int ng, int nt)
 {
     int i;
 
-    for (i = 0; i < ng; ++i) {
-        s->temps[i].state = TS_DEAD | TS_MEM;
-        la_reset_pref(&s->temps[i]);
-    }
-    for (i = ng; i < nt; ++i) {
-        s->temps[i].state = (s->temps[i].temp_local
-                             ? TS_DEAD | TS_MEM
-                             : TS_DEAD);
-        la_reset_pref(&s->temps[i]);
+    for (i = 0; i < nt; ++i) {
+        TCGTemp *ts = &s->temps[i];
+        int state;
+
+        switch (ts->kind) {
+        case TEMP_FIXED:
+        case TEMP_GLOBAL:
+        case TEMP_LOCAL:
+            state = TS_DEAD | TS_MEM;
+            break;
+        case TEMP_NORMAL:
+            state = TS_DEAD;
+            break;
+        default:
+            g_assert_not_reached();
+        }
+        ts->state = state;
+        la_reset_pref(ts);
     }
 }
 
@@ -2583,7 +2608,7 @@ static void la_bb_sync(TCGContext *s, int ng, int nt)
     la_global_sync(s, ng);
 
     for (int i = ng; i < nt; ++i) {
-        if (s->temps[i].temp_local) {
+        if (s->temps[i].kind == TEMP_LOCAL) {
             int state = s->temps[i].state;
             s->temps[i].state = state | TS_MEM;
             if (state != TS_DEAD) {
@@ -3191,7 +3216,8 @@ static void check_regs(TCGContext *s)
     }
     for (k = 0; k < s->nb_temps; k++) {
         ts = &s->temps[k];
-        if (ts->val_type == TEMP_VAL_REG && !ts->fixed_reg
+        if (ts->val_type == TEMP_VAL_REG
+            && ts->kind != TEMP_FIXED
             && s->reg_to_temp[ts->reg] != ts) {
             printf("Inconsistency for temp %s:\n",
                    tcg_get_arg_str_ptr(s, buf, sizeof(buf), ts));
@@ -3228,15 +3254,14 @@ static void temp_load(TCGContext *, TCGTemp *, TCGRegSet, TCGRegSet, TCGRegSet);
    mark it free; otherwise mark it dead.  */
 static void temp_free_or_dead(TCGContext *s, TCGTemp *ts, int free_or_dead)
 {
-    if (ts->fixed_reg) {
+    if (ts->kind == TEMP_FIXED) {
         return;
     }
     if (ts->val_type == TEMP_VAL_REG) {
         s->reg_to_temp[ts->reg] = NULL;
     }
     ts->val_type = (free_or_dead < 0
-                    || ts->temp_local
-                    || ts->temp_global
+                    || ts->kind != TEMP_NORMAL
                     ? TEMP_VAL_MEM : TEMP_VAL_DEAD);
 }
 
@@ -3253,7 +3278,7 @@ static inline void temp_dead(TCGContext *s, TCGTemp *ts)
 static void temp_sync(TCGContext *s, TCGTemp *ts, TCGRegSet allocated_regs,
                       TCGRegSet preferred_regs, int free_or_dead)
 {
-    if (ts->fixed_reg) {
+    if (ts->kind == TEMP_FIXED) {
         return;
     }
     if (!ts->mem_coherent) {
@@ -3436,7 +3461,8 @@ static void temp_save(TCGContext *s, TCGTemp *ts, TCGRegSet allocated_regs)
 {
     /* The liveness analysis already ensures that globals are back
        in memory. Keep an tcg_debug_assert for safety. */
-    tcg_debug_assert(ts->val_type == TEMP_VAL_MEM || ts->fixed_reg);
+    tcg_debug_assert(ts->val_type == TEMP_VAL_MEM
+                     || ts->kind == TEMP_FIXED);
 }
 
 /* save globals to their canonical location and assume they can be
@@ -3461,7 +3487,7 @@ static void sync_globals(TCGContext *s, TCGRegSet allocated_regs)
     for (i = 0, n = s->nb_globals; i < n; i++) {
         TCGTemp *ts = &s->temps[i];
         tcg_debug_assert(ts->val_type != TEMP_VAL_REG
-                         || ts->fixed_reg
+                         || ts->kind == TEMP_FIXED
                          || ts->mem_coherent);
     }
 }
@@ -3474,7 +3500,7 @@ static void tcg_reg_alloc_bb_end(TCGContext *s, TCGRegSet allocated_regs)
 
     for (i = s->nb_globals; i < s->nb_temps; i++) {
         TCGTemp *ts = &s->temps[i];
-        if (ts->temp_local) {
+        if (ts->kind == TEMP_LOCAL) {
             temp_save(s, ts, allocated_regs);
         } else {
             /* The liveness analysis already ensures that temps are dead.
@@ -3500,7 +3526,7 @@ static void tcg_reg_alloc_cbranch(TCGContext *s, TCGRegSet allocated_regs)
          * The liveness analysis already ensures that temps are dead.
          * Keep tcg_debug_asserts for safety.
          */
-        if (ts->temp_local) {
+        if (ts->kind == TEMP_LOCAL) {
             tcg_debug_assert(ts->val_type != TEMP_VAL_REG || ts->mem_coherent);
         } else {
             tcg_debug_assert(ts->val_type == TEMP_VAL_DEAD);
@@ -3516,7 +3542,7 @@ static void tcg_reg_alloc_do_movi(TCGContext *s, TCGTemp *ots,
                                   TCGRegSet preferred_regs)
 {
     /* ENV should not be modified.  */
-    tcg_debug_assert(!ots->fixed_reg);
+    tcg_debug_assert(ots->kind != TEMP_FIXED);
 
     /* The movi is not explicitly generated here.  */
     if (ots->val_type == TEMP_VAL_REG) {
@@ -3556,7 +3582,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
     ts = arg_temp(op->args[1]);
 
     /* ENV should not be modified.  */
-    tcg_debug_assert(!ots->fixed_reg);
+    tcg_debug_assert(ots->kind != TEMP_FIXED);
 
     /* Note that otype != itype for no-op truncation.  */
     otype = ots->type;
@@ -3595,7 +3621,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
         }
         temp_dead(s, ots);
     } else {
-        if (IS_DEAD_ARG(1) && !ts->fixed_reg) {
+        if (IS_DEAD_ARG(1) && ts->kind != TEMP_FIXED) {
             /* the mov can be suppressed */
             if (ots->val_type == TEMP_VAL_REG) {
                 s->reg_to_temp[ots->reg] = NULL;
@@ -3617,7 +3643,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
                  * Store the source register into the destination slot
                  * and leave the destination temp as TEMP_VAL_MEM.
                  */
-                assert(!ots->fixed_reg);
+                assert(ots->kind != TEMP_FIXED);
                 if (!ts->mem_allocated) {
                     temp_allocate_frame(s, ots);
                 }
@@ -3654,7 +3680,7 @@ static void tcg_reg_alloc_dup(TCGContext *s, const TCGOp *op)
     its = arg_temp(op->args[1]);
 
     /* ENV should not be modified.  */
-    tcg_debug_assert(!ots->fixed_reg);
+    tcg_debug_assert(ots->kind != TEMP_FIXED);
 
     itype = its->type;
     vece = TCGOP_VECE(op);
@@ -3794,7 +3820,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
         i_preferred_regs = o_preferred_regs = 0;
         if (arg_ct->ialias) {
             o_preferred_regs = op->output_pref[arg_ct->alias_index];
-            if (ts->fixed_reg) {
+            if (ts->kind == TEMP_FIXED) {
                 /* if fixed register, we must allocate a new register
                    if the alias is not the same register */
                 if (arg != op->args[arg_ct->alias_index]) {
@@ -3886,7 +3912,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
             ts = arg_temp(arg);
 
             /* ENV should not be modified.  */
-            tcg_debug_assert(!ts->fixed_reg);
+            tcg_debug_assert(ts->kind != TEMP_FIXED);
 
             if (arg_ct->oalias && !const_args[arg_ct->alias_index]) {
                 reg = new_args[arg_ct->alias_index];
@@ -3927,7 +3953,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
         ts = arg_temp(op->args[i]);
 
         /* ENV should not be modified.  */
-        tcg_debug_assert(!ts->fixed_reg);
+        tcg_debug_assert(ts->kind != TEMP_FIXED);
 
         if (NEED_SYNC_ARG(i)) {
             temp_sync(s, ts, o_allocated_regs, 0, IS_DEAD_ARG(i));
@@ -4059,7 +4085,7 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
         ts = arg_temp(arg);
 
         /* ENV should not be modified.  */
-        tcg_debug_assert(!ts->fixed_reg);
+        tcg_debug_assert(ts->kind != TEMP_FIXED);
 
         reg = tcg_target_call_oarg_regs[i];
         tcg_debug_assert(s->reg_to_temp[reg] == NULL);
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 50+ messages in thread
* [PULL 04/24] tcg: Add temp_readonly
  2021-01-14  2:16 [PULL 00/24] tcg patch queue Richard Henderson
                   ` (2 preceding siblings ...)
  2021-01-14  2:16 ` [PULL 03/24] tcg: Consolidate 3 bits into enum TCGTempKind Richard Henderson
@ 2021-01-14  2:16 ` Richard Henderson
  2021-01-14  2:16 ` [PULL 05/24] tcg: Expand TCGTemp.val to 64-bits Richard Henderson
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 50+ messages in thread
From: Richard Henderson @ 2021-01-14  2:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Alex Bennée, Philippe Mathieu-Daudé
In most, but not all, places that we check for TEMP_FIXED,
we are really testing that we do not modify the temporary.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg.h |  5 +++++
 tcg/tcg.c         | 21 ++++++++++-----------
 2 files changed, 15 insertions(+), 11 deletions(-)
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 571d4e0fa3..2bdaeaa69c 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -679,6 +679,11 @@ struct TCGContext {
     target_ulong gen_insn_data[TCG_MAX_INSNS][TARGET_INSN_START_WORDS];
 };
 
+static inline bool temp_readonly(TCGTemp *ts)
+{
+    return ts->kind == TEMP_FIXED;
+}
+
 extern TCGContext tcg_init_ctx;
 extern __thread TCGContext *tcg_ctx;
 extern const void *tcg_code_gen_epilogue;
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 143794a585..e02bb71953 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -3254,7 +3254,7 @@ static void temp_load(TCGContext *, TCGTemp *, TCGRegSet, TCGRegSet, TCGRegSet);
    mark it free; otherwise mark it dead.  */
 static void temp_free_or_dead(TCGContext *s, TCGTemp *ts, int free_or_dead)
 {
-    if (ts->kind == TEMP_FIXED) {
+    if (temp_readonly(ts)) {
         return;
     }
     if (ts->val_type == TEMP_VAL_REG) {
@@ -3278,7 +3278,7 @@ static inline void temp_dead(TCGContext *s, TCGTemp *ts)
 static void temp_sync(TCGContext *s, TCGTemp *ts, TCGRegSet allocated_regs,
                       TCGRegSet preferred_regs, int free_or_dead)
 {
-    if (ts->kind == TEMP_FIXED) {
+    if (temp_readonly(ts)) {
         return;
     }
     if (!ts->mem_coherent) {
@@ -3461,8 +3461,7 @@ static void temp_save(TCGContext *s, TCGTemp *ts, TCGRegSet allocated_regs)
 {
     /* The liveness analysis already ensures that globals are back
        in memory. Keep an tcg_debug_assert for safety. */
-    tcg_debug_assert(ts->val_type == TEMP_VAL_MEM
-                     || ts->kind == TEMP_FIXED);
+    tcg_debug_assert(ts->val_type == TEMP_VAL_MEM || temp_readonly(ts));
 }
 
 /* save globals to their canonical location and assume they can be
@@ -3542,7 +3541,7 @@ static void tcg_reg_alloc_do_movi(TCGContext *s, TCGTemp *ots,
                                   TCGRegSet preferred_regs)
 {
     /* ENV should not be modified.  */
-    tcg_debug_assert(ots->kind != TEMP_FIXED);
+    tcg_debug_assert(!temp_readonly(ots));
 
     /* The movi is not explicitly generated here.  */
     if (ots->val_type == TEMP_VAL_REG) {
@@ -3582,7 +3581,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
     ts = arg_temp(op->args[1]);
 
     /* ENV should not be modified.  */
-    tcg_debug_assert(ots->kind != TEMP_FIXED);
+    tcg_debug_assert(!temp_readonly(ots));
 
     /* Note that otype != itype for no-op truncation.  */
     otype = ots->type;
@@ -3643,7 +3642,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
                  * Store the source register into the destination slot
                  * and leave the destination temp as TEMP_VAL_MEM.
                  */
-                assert(ots->kind != TEMP_FIXED);
+                assert(!temp_readonly(ots));
                 if (!ts->mem_allocated) {
                     temp_allocate_frame(s, ots);
                 }
@@ -3680,7 +3679,7 @@ static void tcg_reg_alloc_dup(TCGContext *s, const TCGOp *op)
     its = arg_temp(op->args[1]);
 
     /* ENV should not be modified.  */
-    tcg_debug_assert(ots->kind != TEMP_FIXED);
+    tcg_debug_assert(!temp_readonly(ots));
 
     itype = its->type;
     vece = TCGOP_VECE(op);
@@ -3912,7 +3911,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
             ts = arg_temp(arg);
 
             /* ENV should not be modified.  */
-            tcg_debug_assert(ts->kind != TEMP_FIXED);
+            tcg_debug_assert(!temp_readonly(ts));
 
             if (arg_ct->oalias && !const_args[arg_ct->alias_index]) {
                 reg = new_args[arg_ct->alias_index];
@@ -3953,7 +3952,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
         ts = arg_temp(op->args[i]);
 
         /* ENV should not be modified.  */
-        tcg_debug_assert(ts->kind != TEMP_FIXED);
+        tcg_debug_assert(!temp_readonly(ts));
 
         if (NEED_SYNC_ARG(i)) {
             temp_sync(s, ts, o_allocated_regs, 0, IS_DEAD_ARG(i));
@@ -4085,7 +4084,7 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
         ts = arg_temp(arg);
 
         /* ENV should not be modified.  */
-        tcg_debug_assert(ts->kind != TEMP_FIXED);
+        tcg_debug_assert(!temp_readonly(ts));
 
         reg = tcg_target_call_oarg_regs[i];
         tcg_debug_assert(s->reg_to_temp[reg] == NULL);
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 50+ messages in thread
* [PULL 05/24] tcg: Expand TCGTemp.val to 64-bits
  2021-01-14  2:16 [PULL 00/24] tcg patch queue Richard Henderson
                   ` (3 preceding siblings ...)
  2021-01-14  2:16 ` [PULL 04/24] tcg: Add temp_readonly Richard Henderson
@ 2021-01-14  2:16 ` Richard Henderson
  2021-01-14  2:16 ` [PULL 06/24] tcg: Rename struct tcg_temp_info to TempOptInfo Richard Henderson
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 50+ messages in thread
From: Richard Henderson @ 2021-01-14  2:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell
This will reduce the differences between 32-bit and 64-bit hosts,
allowing full 64-bit constants to be created with the same interface.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg.h | 2 +-
 tcg/tcg.c         | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 2bdaeaa69c..e7adc7e265 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -506,7 +506,7 @@ typedef struct TCGTemp {
     unsigned int mem_allocated:1;
     unsigned int temp_allocated:1;
 
-    tcg_target_long val;
+    int64_t val;
     struct TCGTemp *mem_base;
     intptr_t mem_offset;
     const char *name;
diff --git a/tcg/tcg.c b/tcg/tcg.c
index e02bb71953..545dd2b0b2 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -3176,7 +3176,7 @@ static void dump_regs(TCGContext *s)
                    tcg_target_reg_names[ts->mem_base->reg]);
             break;
         case TEMP_VAL_CONST:
-            printf("$0x%" TCG_PRIlx, ts->val);
+            printf("$0x%" PRIx64, ts->val);
             break;
         case TEMP_VAL_DEAD:
             printf("D");
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 50+ messages in thread
* [PULL 06/24] tcg: Rename struct tcg_temp_info to TempOptInfo
  2021-01-14  2:16 [PULL 00/24] tcg patch queue Richard Henderson
                   ` (4 preceding siblings ...)
  2021-01-14  2:16 ` [PULL 05/24] tcg: Expand TCGTemp.val to 64-bits Richard Henderson
@ 2021-01-14  2:16 ` Richard Henderson
  2021-01-14  2:16 ` [PULL 07/24] tcg: Expand TempOptInfo to 64-bits Richard Henderson
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 50+ messages in thread
From: Richard Henderson @ 2021-01-14  2:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Alex Bennée, Philippe Mathieu-Daudé
Fix this name vs our coding style.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 2f827a9d2d..0da9750b65 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -35,20 +35,20 @@
         glue(glue(case INDEX_op_, x), _i64):    \
         glue(glue(case INDEX_op_, x), _vec)
 
-struct tcg_temp_info {
+typedef struct TempOptInfo {
     bool is_const;
     TCGTemp *prev_copy;
     TCGTemp *next_copy;
     tcg_target_ulong val;
     tcg_target_ulong mask;
-};
+} TempOptInfo;
 
-static inline struct tcg_temp_info *ts_info(TCGTemp *ts)
+static inline TempOptInfo *ts_info(TCGTemp *ts)
 {
     return ts->state_ptr;
 }
 
-static inline struct tcg_temp_info *arg_info(TCGArg arg)
+static inline TempOptInfo *arg_info(TCGArg arg)
 {
     return ts_info(arg_temp(arg));
 }
@@ -71,9 +71,9 @@ static inline bool ts_is_copy(TCGTemp *ts)
 /* Reset TEMP's state, possibly removing the temp for the list of copies.  */
 static void reset_ts(TCGTemp *ts)
 {
-    struct tcg_temp_info *ti = ts_info(ts);
-    struct tcg_temp_info *pi = ts_info(ti->prev_copy);
-    struct tcg_temp_info *ni = ts_info(ti->next_copy);
+    TempOptInfo *ti = ts_info(ts);
+    TempOptInfo *pi = ts_info(ti->prev_copy);
+    TempOptInfo *ni = ts_info(ti->next_copy);
 
     ni->prev_copy = ti->prev_copy;
     pi->next_copy = ti->next_copy;
@@ -89,12 +89,12 @@ static void reset_temp(TCGArg arg)
 }
 
 /* Initialize and activate a temporary.  */
-static void init_ts_info(struct tcg_temp_info *infos,
+static void init_ts_info(TempOptInfo *infos,
                          TCGTempSet *temps_used, TCGTemp *ts)
 {
     size_t idx = temp_idx(ts);
     if (!test_bit(idx, temps_used->l)) {
-        struct tcg_temp_info *ti = &infos[idx];
+        TempOptInfo *ti = &infos[idx];
 
         ts->state_ptr = ti;
         ti->next_copy = ts;
@@ -105,7 +105,7 @@ static void init_ts_info(struct tcg_temp_info *infos,
     }
 }
 
-static void init_arg_info(struct tcg_temp_info *infos,
+static void init_arg_info(TempOptInfo *infos,
                           TCGTempSet *temps_used, TCGArg arg)
 {
     init_ts_info(infos, temps_used, arg_temp(arg));
@@ -171,7 +171,7 @@ static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg val)
     const TCGOpDef *def;
     TCGOpcode new_op;
     tcg_target_ulong mask;
-    struct tcg_temp_info *di = arg_info(dst);
+    TempOptInfo *di = arg_info(dst);
 
     def = &tcg_op_defs[op->opc];
     if (def->flags & TCG_OPF_VECTOR) {
@@ -202,8 +202,8 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
     TCGTemp *dst_ts = arg_temp(dst);
     TCGTemp *src_ts = arg_temp(src);
     const TCGOpDef *def;
-    struct tcg_temp_info *di;
-    struct tcg_temp_info *si;
+    TempOptInfo *di;
+    TempOptInfo *si;
     tcg_target_ulong mask;
     TCGOpcode new_op;
 
@@ -236,7 +236,7 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
     di->mask = mask;
 
     if (src_ts->type == dst_ts->type) {
-        struct tcg_temp_info *ni = ts_info(si->next_copy);
+        TempOptInfo *ni = ts_info(si->next_copy);
 
         di->next_copy = si->next_copy;
         di->prev_copy = src_ts;
@@ -599,7 +599,7 @@ void tcg_optimize(TCGContext *s)
 {
     int nb_temps, nb_globals;
     TCGOp *op, *op_next, *prev_mb = NULL;
-    struct tcg_temp_info *infos;
+    TempOptInfo *infos;
     TCGTempSet temps_used;
 
     /* Array VALS has an element for each temp.
@@ -610,7 +610,7 @@ void tcg_optimize(TCGContext *s)
     nb_temps = s->nb_temps;
     nb_globals = s->nb_globals;
     bitmap_zero(temps_used.l, nb_temps);
-    infos = tcg_malloc(sizeof(struct tcg_temp_info) * nb_temps);
+    infos = tcg_malloc(sizeof(TempOptInfo) * nb_temps);
 
     QTAILQ_FOREACH_SAFE(op, &s->ops, link, op_next) {
         tcg_target_ulong mask, partmask, affected;
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 50+ messages in thread
* [PULL 07/24] tcg: Expand TempOptInfo to 64-bits
  2021-01-14  2:16 [PULL 00/24] tcg patch queue Richard Henderson
                   ` (5 preceding siblings ...)
  2021-01-14  2:16 ` [PULL 06/24] tcg: Rename struct tcg_temp_info to TempOptInfo Richard Henderson
@ 2021-01-14  2:16 ` Richard Henderson
  2021-01-14  2:16 ` [PULL 08/24] tcg: Introduce TYPE_CONST temporaries Richard Henderson
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 50+ messages in thread
From: Richard Henderson @ 2021-01-14  2:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell
This propagates the extended value of TCGTemp.val that we did before.
In addition, it will be required for vector constants.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 40 +++++++++++++++++++++-------------------
 1 file changed, 21 insertions(+), 19 deletions(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 0da9750b65..433d2540f4 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -39,8 +39,8 @@ typedef struct TempOptInfo {
     bool is_const;
     TCGTemp *prev_copy;
     TCGTemp *next_copy;
-    tcg_target_ulong val;
-    tcg_target_ulong mask;
+    uint64_t val;
+    uint64_t mask;
 } TempOptInfo;
 
 static inline TempOptInfo *ts_info(TCGTemp *ts)
@@ -166,11 +166,11 @@ static bool args_are_copies(TCGArg arg1, TCGArg arg2)
     return ts_are_copies(arg_temp(arg1), arg_temp(arg2));
 }
 
-static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg val)
+static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg dst, uint64_t val)
 {
     const TCGOpDef *def;
     TCGOpcode new_op;
-    tcg_target_ulong mask;
+    uint64_t mask;
     TempOptInfo *di = arg_info(dst);
 
     def = &tcg_op_defs[op->opc];
@@ -204,7 +204,7 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
     const TCGOpDef *def;
     TempOptInfo *di;
     TempOptInfo *si;
-    tcg_target_ulong mask;
+    uint64_t mask;
     TCGOpcode new_op;
 
     if (ts_are_copies(dst_ts, src_ts)) {
@@ -247,7 +247,7 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
     }
 }
 
-static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
+static uint64_t do_constant_folding_2(TCGOpcode op, uint64_t x, uint64_t y)
 {
     uint64_t l64, h64;
 
@@ -410,10 +410,10 @@ static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
     }
 }
 
-static TCGArg do_constant_folding(TCGOpcode op, TCGArg x, TCGArg y)
+static uint64_t do_constant_folding(TCGOpcode op, uint64_t x, uint64_t y)
 {
     const TCGOpDef *def = &tcg_op_defs[op];
-    TCGArg res = do_constant_folding_2(op, x, y);
+    uint64_t res = do_constant_folding_2(op, x, y);
     if (!(def->flags & TCG_OPF_64BIT)) {
         res = (int32_t)res;
     }
@@ -501,8 +501,9 @@ static bool do_constant_folding_cond_eq(TCGCond c)
 static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
                                        TCGArg y, TCGCond c)
 {
-    tcg_target_ulong xv = arg_info(x)->val;
-    tcg_target_ulong yv = arg_info(y)->val;
+    uint64_t xv = arg_info(x)->val;
+    uint64_t yv = arg_info(y)->val;
+
     if (arg_is_const(x) && arg_is_const(y)) {
         const TCGOpDef *def = &tcg_op_defs[op];
         tcg_debug_assert(!(def->flags & TCG_OPF_VECTOR));
@@ -613,9 +614,8 @@ void tcg_optimize(TCGContext *s)
     infos = tcg_malloc(sizeof(TempOptInfo) * nb_temps);
 
     QTAILQ_FOREACH_SAFE(op, &s->ops, link, op_next) {
-        tcg_target_ulong mask, partmask, affected;
+        uint64_t mask, partmask, affected, tmp;
         int nb_oargs, nb_iargs, i;
-        TCGArg tmp;
         TCGOpcode opc = op->opc;
         const TCGOpDef *def = &tcg_op_defs[opc];
 
@@ -1225,14 +1225,15 @@ void tcg_optimize(TCGContext *s)
 
         CASE_OP_32_64(extract2):
             if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
-                TCGArg v1 = arg_info(op->args[1])->val;
-                TCGArg v2 = arg_info(op->args[2])->val;
+                uint64_t v1 = arg_info(op->args[1])->val;
+                uint64_t v2 = arg_info(op->args[2])->val;
+                int shr = op->args[3];
 
                 if (opc == INDEX_op_extract2_i64) {
-                    tmp = (v1 >> op->args[3]) | (v2 << (64 - op->args[3]));
+                    tmp = (v1 >> shr) | (v2 << (64 - shr));
                 } else {
-                    tmp = (int32_t)(((uint32_t)v1 >> op->args[3]) |
-                                    ((uint32_t)v2 << (32 - op->args[3])));
+                    tmp = (int32_t)(((uint32_t)v1 >> shr) |
+                                    ((uint32_t)v2 << (32 - shr)));
                 }
                 tcg_opt_gen_movi(s, op, op->args[0], tmp);
                 break;
@@ -1271,9 +1272,10 @@ void tcg_optimize(TCGContext *s)
                 break;
             }
             if (arg_is_const(op->args[3]) && arg_is_const(op->args[4])) {
-                tcg_target_ulong tv = arg_info(op->args[3])->val;
-                tcg_target_ulong fv = arg_info(op->args[4])->val;
+                uint64_t tv = arg_info(op->args[3])->val;
+                uint64_t fv = arg_info(op->args[4])->val;
                 TCGCond cond = op->args[5];
+
                 if (fv == 1 && tv == 0) {
                     cond = tcg_invert_cond(cond);
                 } else if (!(tv == 1 && fv == 0)) {
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 50+ messages in thread
* [PULL 08/24] tcg: Introduce TYPE_CONST temporaries
  2021-01-14  2:16 [PULL 00/24] tcg patch queue Richard Henderson
                   ` (6 preceding siblings ...)
  2021-01-14  2:16 ` [PULL 07/24] tcg: Expand TempOptInfo to 64-bits Richard Henderson
@ 2021-01-14  2:16 ` Richard Henderson
  2021-01-14  2:16 ` [PULL 09/24] tcg/optimize: Improve find_better_copy Richard Henderson
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 50+ messages in thread
From: Richard Henderson @ 2021-01-14  2:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell
These will hold a single constant for the duration of the TB.
They are hashed, so that each value has one temp across the TB.
Not used yet, this is all infrastructure.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg.h |  24 ++++-
 tcg/optimize.c    |  13 ++-
 tcg/tcg.c         | 224 ++++++++++++++++++++++++++++++++++++----------
 3 files changed, 211 insertions(+), 50 deletions(-)
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index e7adc7e265..eeeb70ad43 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -492,6 +492,8 @@ typedef enum TCGTempKind {
     TEMP_GLOBAL,
     /* Temp is in a fixed register. */
     TEMP_FIXED,
+    /* Temp is a fixed constant. */
+    TEMP_CONST,
 } TCGTempKind;
 
 typedef struct TCGTemp {
@@ -665,6 +667,7 @@ struct TCGContext {
     QSIMPLEQ_HEAD(, TCGOp) plugin_ops;
 #endif
 
+    GHashTable *const_table[TCG_TYPE_COUNT];
     TCGTempSet free_temps[TCG_TYPE_COUNT * 2];
     TCGTemp temps[TCG_MAX_TEMPS]; /* globals first, temps after */
 
@@ -681,7 +684,7 @@ struct TCGContext {
 
 static inline bool temp_readonly(TCGTemp *ts)
 {
-    return ts->kind == TEMP_FIXED;
+    return ts->kind >= TEMP_FIXED;
 }
 
 extern TCGContext tcg_init_ctx;
@@ -1079,6 +1082,7 @@ TCGOp *tcg_op_insert_after(TCGContext *s, TCGOp *op, TCGOpcode opc);
 
 void tcg_optimize(TCGContext *s);
 
+/* Allocate a new temporary and initialize it with a constant. */
 TCGv_i32 tcg_const_i32(int32_t val);
 TCGv_i64 tcg_const_i64(int64_t val);
 TCGv_i32 tcg_const_local_i32(int32_t val);
@@ -1088,6 +1092,24 @@ TCGv_vec tcg_const_ones_vec(TCGType);
 TCGv_vec tcg_const_zeros_vec_matching(TCGv_vec);
 TCGv_vec tcg_const_ones_vec_matching(TCGv_vec);
 
+/*
+ * Locate or create a read-only temporary that is a constant.
+ * This kind of temporary need not and should not be freed.
+ */
+TCGTemp *tcg_constant_internal(TCGType type, int64_t val);
+
+static inline TCGv_i32 tcg_constant_i32(int32_t val)
+{
+    return temp_tcgv_i32(tcg_constant_internal(TCG_TYPE_I32, val));
+}
+
+static inline TCGv_i64 tcg_constant_i64(int64_t val)
+{
+    return temp_tcgv_i64(tcg_constant_internal(TCG_TYPE_I64, val));
+}
+
+TCGv_vec tcg_constant_vec(TCGType type, unsigned vece, int64_t val);
+
 #if UINTPTR_MAX == UINT32_MAX
 # define tcg_const_ptr(x)        ((TCGv_ptr)tcg_const_i32((intptr_t)(x)))
 # define tcg_const_local_ptr(x)  ((TCGv_ptr)tcg_const_local_i32((intptr_t)(x)))
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 433d2540f4..16b0aa7229 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -99,8 +99,17 @@ static void init_ts_info(TempOptInfo *infos,
         ts->state_ptr = ti;
         ti->next_copy = ts;
         ti->prev_copy = ts;
-        ti->is_const = false;
-        ti->mask = -1;
+        if (ts->kind == TEMP_CONST) {
+            ti->is_const = true;
+            ti->val = ti->mask = ts->val;
+            if (TCG_TARGET_REG_BITS > 32 && ts->type == TCG_TYPE_I32) {
+                /* High bits of a 32-bit quantity are garbage.  */
+                ti->mask |= ~0xffffffffull;
+            }
+        } else {
+            ti->is_const = false;
+            ti->mask = -1;
+        }
         set_bit(idx, temps_used->l);
     }
 }
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 545dd2b0b2..802f0b8a32 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1184,6 +1184,13 @@ void tcg_func_start(TCGContext *s)
     /* No temps have been previously allocated for size or locality.  */
     memset(s->free_temps, 0, sizeof(s->free_temps));
 
+    /* No constant temps have been previously allocated. */
+    for (int i = 0; i < TCG_TYPE_COUNT; ++i) {
+        if (s->const_table[i]) {
+            g_hash_table_remove_all(s->const_table[i]);
+        }
+    }
+
     s->nb_ops = 0;
     s->nb_labels = 0;
     s->current_frame_offset = s->frame_start;
@@ -1255,13 +1262,19 @@ TCGTemp *tcg_global_mem_new_internal(TCGType type, TCGv_ptr base,
     bigendian = 1;
 #endif
 
-    if (base_ts->kind != TEMP_FIXED) {
+    switch (base_ts->kind) {
+    case TEMP_FIXED:
+        break;
+    case TEMP_GLOBAL:
         /* We do not support double-indirect registers.  */
         tcg_debug_assert(!base_ts->indirect_reg);
         base_ts->indirect_base = 1;
         s->nb_indirects += (TCG_TARGET_REG_BITS == 32 && type == TCG_TYPE_I64
                             ? 2 : 1);
         indirect_reg = 1;
+        break;
+    default:
+        g_assert_not_reached();
     }
 
     if (TCG_TARGET_REG_BITS == 32 && type == TCG_TYPE_I64) {
@@ -1386,6 +1399,11 @@ void tcg_temp_free_internal(TCGTemp *ts)
     TCGContext *s = tcg_ctx;
     int k, idx;
 
+    /* In order to simplify users of tcg_constant_*, silently ignore free. */
+    if (ts->kind == TEMP_CONST) {
+        return;
+    }
+
 #if defined(CONFIG_DEBUG_TCG)
     s->temps_in_use--;
     if (s->temps_in_use < 0) {
@@ -1402,6 +1420,60 @@ void tcg_temp_free_internal(TCGTemp *ts)
     set_bit(idx, s->free_temps[k].l);
 }
 
+TCGTemp *tcg_constant_internal(TCGType type, int64_t val)
+{
+    TCGContext *s = tcg_ctx;
+    GHashTable *h = s->const_table[type];
+    TCGTemp *ts;
+
+    if (h == NULL) {
+        h = g_hash_table_new(g_int64_hash, g_int64_equal);
+        s->const_table[type] = h;
+    }
+
+    ts = g_hash_table_lookup(h, &val);
+    if (ts == NULL) {
+        ts = tcg_temp_alloc(s);
+
+        if (TCG_TARGET_REG_BITS == 32 && type == TCG_TYPE_I64) {
+            TCGTemp *ts2 = tcg_temp_alloc(s);
+
+            ts->base_type = TCG_TYPE_I64;
+            ts->type = TCG_TYPE_I32;
+            ts->kind = TEMP_CONST;
+            ts->temp_allocated = 1;
+            /*
+             * Retain the full value of the 64-bit constant in the low
+             * part, so that the hash table works.  Actual uses will
+             * truncate the value to the low part.
+             */
+            ts->val = val;
+
+            tcg_debug_assert(ts2 == ts + 1);
+            ts2->base_type = TCG_TYPE_I64;
+            ts2->type = TCG_TYPE_I32;
+            ts2->kind = TEMP_CONST;
+            ts2->temp_allocated = 1;
+            ts2->val = val >> 32;
+        } else {
+            ts->base_type = type;
+            ts->type = type;
+            ts->kind = TEMP_CONST;
+            ts->temp_allocated = 1;
+            ts->val = val;
+        }
+        g_hash_table_insert(h, &ts->val, ts);
+    }
+
+    return ts;
+}
+
+TCGv_vec tcg_constant_vec(TCGType type, unsigned vece, int64_t val)
+{
+    val = dup_const(vece, val);
+    return temp_tcgv_vec(tcg_constant_internal(type, val));
+}
+
 TCGv_i32 tcg_const_i32(int32_t val)
 {
     TCGv_i32 t0;
@@ -1937,6 +2009,9 @@ static void tcg_reg_alloc_start(TCGContext *s)
         TCGTempVal val = TEMP_VAL_MEM;
 
         switch (ts->kind) {
+        case TEMP_CONST:
+            val = TEMP_VAL_CONST;
+            break;
         case TEMP_FIXED:
             val = TEMP_VAL_REG;
             break;
@@ -1973,6 +2048,26 @@ static char *tcg_get_arg_str_ptr(TCGContext *s, char *buf, int buf_size,
     case TEMP_NORMAL:
         snprintf(buf, buf_size, "tmp%d", idx - s->nb_globals);
         break;
+    case TEMP_CONST:
+        switch (ts->type) {
+        case TCG_TYPE_I32:
+            snprintf(buf, buf_size, "$0x%x", (int32_t)ts->val);
+            break;
+#if TCG_TARGET_REG_BITS > 32
+        case TCG_TYPE_I64:
+            snprintf(buf, buf_size, "$0x%" PRIx64, ts->val);
+            break;
+#endif
+        case TCG_TYPE_V64:
+        case TCG_TYPE_V128:
+        case TCG_TYPE_V256:
+            snprintf(buf, buf_size, "v%d$0x%" PRIx64,
+                     64 << (ts->type - TCG_TYPE_V64), ts->val);
+            break;
+        default:
+            g_assert_not_reached();
+        }
+        break;
     }
     return buf;
 }
@@ -2574,6 +2669,7 @@ static void la_bb_end(TCGContext *s, int ng, int nt)
             state = TS_DEAD | TS_MEM;
             break;
         case TEMP_NORMAL:
+        case TEMP_CONST:
             state = TS_DEAD;
             break;
         default:
@@ -2608,14 +2704,24 @@ static void la_bb_sync(TCGContext *s, int ng, int nt)
     la_global_sync(s, ng);
 
     for (int i = ng; i < nt; ++i) {
-        if (s->temps[i].kind == TEMP_LOCAL) {
-            int state = s->temps[i].state;
-            s->temps[i].state = state | TS_MEM;
+        TCGTemp *ts = &s->temps[i];
+        int state;
+
+        switch (ts->kind) {
+        case TEMP_LOCAL:
+            state = ts->state;
+            ts->state = state | TS_MEM;
             if (state != TS_DEAD) {
                 continue;
             }
-        } else {
+            break;
+        case TEMP_NORMAL:
             s->temps[i].state = TS_DEAD;
+            break;
+        case TEMP_CONST:
+            continue;
+        default:
+            g_assert_not_reached();
         }
         la_reset_pref(&s->temps[i]);
     }
@@ -3254,15 +3360,28 @@ static void temp_load(TCGContext *, TCGTemp *, TCGRegSet, TCGRegSet, TCGRegSet);
    mark it free; otherwise mark it dead.  */
 static void temp_free_or_dead(TCGContext *s, TCGTemp *ts, int free_or_dead)
 {
-    if (temp_readonly(ts)) {
+    TCGTempVal new_type;
+
+    switch (ts->kind) {
+    case TEMP_FIXED:
         return;
+    case TEMP_GLOBAL:
+    case TEMP_LOCAL:
+        new_type = TEMP_VAL_MEM;
+        break;
+    case TEMP_NORMAL:
+        new_type = free_or_dead < 0 ? TEMP_VAL_MEM : TEMP_VAL_DEAD;
+        break;
+    case TEMP_CONST:
+        new_type = TEMP_VAL_CONST;
+        break;
+    default:
+        g_assert_not_reached();
     }
     if (ts->val_type == TEMP_VAL_REG) {
         s->reg_to_temp[ts->reg] = NULL;
     }
-    ts->val_type = (free_or_dead < 0
-                    || ts->kind != TEMP_NORMAL
-                    ? TEMP_VAL_MEM : TEMP_VAL_DEAD);
+    ts->val_type = new_type;
 }
 
 /* Mark a temporary as dead.  */
@@ -3278,10 +3397,7 @@ static inline void temp_dead(TCGContext *s, TCGTemp *ts)
 static void temp_sync(TCGContext *s, TCGTemp *ts, TCGRegSet allocated_regs,
                       TCGRegSet preferred_regs, int free_or_dead)
 {
-    if (temp_readonly(ts)) {
-        return;
-    }
-    if (!ts->mem_coherent) {
+    if (!temp_readonly(ts) && !ts->mem_coherent) {
         if (!ts->mem_allocated) {
             temp_allocate_frame(s, ts);
         }
@@ -3499,12 +3615,22 @@ static void tcg_reg_alloc_bb_end(TCGContext *s, TCGRegSet allocated_regs)
 
     for (i = s->nb_globals; i < s->nb_temps; i++) {
         TCGTemp *ts = &s->temps[i];
-        if (ts->kind == TEMP_LOCAL) {
+
+        switch (ts->kind) {
+        case TEMP_LOCAL:
             temp_save(s, ts, allocated_regs);
-        } else {
+            break;
+        case TEMP_NORMAL:
             /* The liveness analysis already ensures that temps are dead.
                Keep an tcg_debug_assert for safety. */
             tcg_debug_assert(ts->val_type == TEMP_VAL_DEAD);
+            break;
+        case TEMP_CONST:
+            /* Similarly, we should have freed any allocated register. */
+            tcg_debug_assert(ts->val_type == TEMP_VAL_CONST);
+            break;
+        default:
+            g_assert_not_reached();
         }
     }
 
@@ -3525,10 +3651,17 @@ static void tcg_reg_alloc_cbranch(TCGContext *s, TCGRegSet allocated_regs)
          * The liveness analysis already ensures that temps are dead.
          * Keep tcg_debug_asserts for safety.
          */
-        if (ts->kind == TEMP_LOCAL) {
+        switch (ts->kind) {
+        case TEMP_LOCAL:
             tcg_debug_assert(ts->val_type != TEMP_VAL_REG || ts->mem_coherent);
-        } else {
+            break;
+        case TEMP_NORMAL:
             tcg_debug_assert(ts->val_type == TEMP_VAL_DEAD);
+            break;
+        case TEMP_CONST:
+            break;
+        default:
+            g_assert_not_reached();
         }
     }
 }
@@ -3819,45 +3952,42 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
         i_preferred_regs = o_preferred_regs = 0;
         if (arg_ct->ialias) {
             o_preferred_regs = op->output_pref[arg_ct->alias_index];
-            if (ts->kind == TEMP_FIXED) {
-                /* if fixed register, we must allocate a new register
-                   if the alias is not the same register */
-                if (arg != op->args[arg_ct->alias_index]) {
-                    goto allocate_in_reg;
-                }
-            } else {
-                /* if the input is aliased to an output and if it is
-                   not dead after the instruction, we must allocate
-                   a new register and move it */
-                if (!IS_DEAD_ARG(i)) {
-                    goto allocate_in_reg;
-                }
 
-                /* check if the current register has already been allocated
-                   for another input aliased to an output */
-                if (ts->val_type == TEMP_VAL_REG) {
-                    int k2, i2;
-                    reg = ts->reg;
-                    for (k2 = 0 ; k2 < k ; k2++) {
-                        i2 = def->args_ct[nb_oargs + k2].sort_index;
-                        if (def->args_ct[i2].ialias && reg == new_args[i2]) {
-                            goto allocate_in_reg;
-                        }
+            /*
+             * If the input is readonly, then it cannot also be an
+             * output and aliased to itself.  If the input is not
+             * dead after the instruction, we must allocate a new
+             * register and move it.
+             */
+            if (temp_readonly(ts) || !IS_DEAD_ARG(i)) {
+                goto allocate_in_reg;
+            }
+
+            /*
+             * Check if the current register has already been allocated
+             * for another input aliased to an output.
+             */
+            if (ts->val_type == TEMP_VAL_REG) {
+                reg = ts->reg;
+                for (int k2 = 0; k2 < k; k2++) {
+                    int i2 = def->args_ct[nb_oargs + k2].sort_index;
+                    if (def->args_ct[i2].ialias && reg == new_args[i2]) {
+                        goto allocate_in_reg;
                     }
                 }
-                i_preferred_regs = o_preferred_regs;
             }
+            i_preferred_regs = o_preferred_regs;
         }
 
         temp_load(s, ts, arg_ct->regs, i_allocated_regs, i_preferred_regs);
         reg = ts->reg;
 
-        if (tcg_regset_test_reg(arg_ct->regs, reg)) {
-            /* nothing to do : the constraint is satisfied */
-        } else {
-        allocate_in_reg:
-            /* allocate a new register matching the constraint 
-               and move the temporary register into it */
+        if (!tcg_regset_test_reg(arg_ct->regs, reg)) {
+ allocate_in_reg:
+            /*
+             * Allocate a new register matching the constraint
+             * and move the temporary register into it.
+             */
             temp_load(s, ts, tcg_target_available_regs[ts->type],
                       i_allocated_regs, 0);
             reg = tcg_reg_alloc(s, arg_ct->regs, i_allocated_regs,
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 50+ messages in thread
* [PULL 09/24] tcg/optimize: Improve find_better_copy
  2021-01-14  2:16 [PULL 00/24] tcg patch queue Richard Henderson
                   ` (7 preceding siblings ...)
  2021-01-14  2:16 ` [PULL 08/24] tcg: Introduce TYPE_CONST temporaries Richard Henderson
@ 2021-01-14  2:16 ` Richard Henderson
  2021-01-14  2:16 ` [PULL 10/24] tcg/optimize: Adjust TempOptInfo allocation Richard Henderson
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 50+ messages in thread
From: Richard Henderson @ 2021-01-14  2:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell
Prefer TEMP_CONST over anything else.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 27 ++++++++++++---------------
 1 file changed, 12 insertions(+), 15 deletions(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 16b0aa7229..e42f9c89a8 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -122,31 +122,28 @@ static void init_arg_info(TempOptInfo *infos,
 
 static TCGTemp *find_better_copy(TCGContext *s, TCGTemp *ts)
 {
-    TCGTemp *i;
+    TCGTemp *i, *g, *l;
 
-    /* If this is already a global, we can't do better. */
-    if (ts->kind >= TEMP_GLOBAL) {
+    /* If this is already readonly, we can't do better. */
+    if (temp_readonly(ts)) {
         return ts;
     }
 
-    /* Search for a global first. */
+    g = l = NULL;
     for (i = ts_info(ts)->next_copy; i != ts; i = ts_info(i)->next_copy) {
-        if (i->kind >= TEMP_GLOBAL) {
+        if (temp_readonly(i)) {
             return i;
-        }
-    }
-
-    /* If it is a temp, search for a temp local. */
-    if (ts->kind == TEMP_NORMAL) {
-        for (i = ts_info(ts)->next_copy; i != ts; i = ts_info(i)->next_copy) {
-            if (i->kind >= TEMP_LOCAL) {
-                return i;
+        } else if (i->kind > ts->kind) {
+            if (i->kind == TEMP_GLOBAL) {
+                g = i;
+            } else if (i->kind == TEMP_LOCAL) {
+                l = i;
             }
         }
     }
 
-    /* Failure to find a better representation, return the same temp. */
-    return ts;
+    /* If we didn't find a better representation, return the same temp. */
+    return g ? g : l ? l : ts;
 }
 
 static bool ts_are_copies(TCGTemp *ts1, TCGTemp *ts2)
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 50+ messages in thread
* [PULL 10/24] tcg/optimize: Adjust TempOptInfo allocation
  2021-01-14  2:16 [PULL 00/24] tcg patch queue Richard Henderson
                   ` (8 preceding siblings ...)
  2021-01-14  2:16 ` [PULL 09/24] tcg/optimize: Improve find_better_copy Richard Henderson
@ 2021-01-14  2:16 ` Richard Henderson
  2021-01-14  2:16 ` [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding Richard Henderson
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 50+ messages in thread
From: Richard Henderson @ 2021-01-14  2:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Alex Bennée
Do not allocate a large block for indexing.  Instead, allocate
for each temporary as they are seen.
In general, this will use less memory, if we consider that most
TBs do not touch every target register.  This also allows us to
allocate TempOptInfo for new temps created during optimization.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 60 ++++++++++++++++++++++++++++----------------------
 1 file changed, 34 insertions(+), 26 deletions(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index e42f9c89a8..49bf1386c7 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -89,35 +89,41 @@ static void reset_temp(TCGArg arg)
 }
 
 /* Initialize and activate a temporary.  */
-static void init_ts_info(TempOptInfo *infos,
-                         TCGTempSet *temps_used, TCGTemp *ts)
+static void init_ts_info(TCGTempSet *temps_used, TCGTemp *ts)
 {
     size_t idx = temp_idx(ts);
-    if (!test_bit(idx, temps_used->l)) {
-        TempOptInfo *ti = &infos[idx];
+    TempOptInfo *ti;
 
+    if (test_bit(idx, temps_used->l)) {
+        return;
+    }
+    set_bit(idx, temps_used->l);
+
+    ti = ts->state_ptr;
+    if (ti == NULL) {
+        ti = tcg_malloc(sizeof(TempOptInfo));
         ts->state_ptr = ti;
-        ti->next_copy = ts;
-        ti->prev_copy = ts;
-        if (ts->kind == TEMP_CONST) {
-            ti->is_const = true;
-            ti->val = ti->mask = ts->val;
-            if (TCG_TARGET_REG_BITS > 32 && ts->type == TCG_TYPE_I32) {
-                /* High bits of a 32-bit quantity are garbage.  */
-                ti->mask |= ~0xffffffffull;
-            }
-        } else {
-            ti->is_const = false;
-            ti->mask = -1;
+    }
+
+    ti->next_copy = ts;
+    ti->prev_copy = ts;
+    if (ts->kind == TEMP_CONST) {
+        ti->is_const = true;
+        ti->val = ts->val;
+        ti->mask = ts->val;
+        if (TCG_TARGET_REG_BITS > 32 && ts->type == TCG_TYPE_I32) {
+            /* High bits of a 32-bit quantity are garbage.  */
+            ti->mask |= ~0xffffffffull;
         }
-        set_bit(idx, temps_used->l);
+    } else {
+        ti->is_const = false;
+        ti->mask = -1;
     }
 }
 
-static void init_arg_info(TempOptInfo *infos,
-                          TCGTempSet *temps_used, TCGArg arg)
+static void init_arg_info(TCGTempSet *temps_used, TCGArg arg)
 {
-    init_ts_info(infos, temps_used, arg_temp(arg));
+    init_ts_info(temps_used, arg_temp(arg));
 }
 
 static TCGTemp *find_better_copy(TCGContext *s, TCGTemp *ts)
@@ -604,9 +610,8 @@ static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
 /* Propagate constants and copies, fold constant expressions. */
 void tcg_optimize(TCGContext *s)
 {
-    int nb_temps, nb_globals;
+    int nb_temps, nb_globals, i;
     TCGOp *op, *op_next, *prev_mb = NULL;
-    TempOptInfo *infos;
     TCGTempSet temps_used;
 
     /* Array VALS has an element for each temp.
@@ -616,12 +621,15 @@ void tcg_optimize(TCGContext *s)
 
     nb_temps = s->nb_temps;
     nb_globals = s->nb_globals;
+
     bitmap_zero(temps_used.l, nb_temps);
-    infos = tcg_malloc(sizeof(TempOptInfo) * nb_temps);
+    for (i = 0; i < nb_temps; ++i) {
+        s->temps[i].state_ptr = NULL;
+    }
 
     QTAILQ_FOREACH_SAFE(op, &s->ops, link, op_next) {
         uint64_t mask, partmask, affected, tmp;
-        int nb_oargs, nb_iargs, i;
+        int nb_oargs, nb_iargs;
         TCGOpcode opc = op->opc;
         const TCGOpDef *def = &tcg_op_defs[opc];
 
@@ -633,14 +641,14 @@ void tcg_optimize(TCGContext *s)
             for (i = 0; i < nb_oargs + nb_iargs; i++) {
                 TCGTemp *ts = arg_temp(op->args[i]);
                 if (ts) {
-                    init_ts_info(infos, &temps_used, ts);
+                    init_ts_info(&temps_used, ts);
                 }
             }
         } else {
             nb_oargs = def->nb_oargs;
             nb_iargs = def->nb_iargs;
             for (i = 0; i < nb_oargs + nb_iargs; i++) {
-                init_arg_info(infos, &temps_used, op->args[i]);
+                init_arg_info(&temps_used, op->args[i]);
             }
         }
 
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 50+ messages in thread
* [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding
  2021-01-14  2:16 [PULL 00/24] tcg patch queue Richard Henderson
                   ` (9 preceding siblings ...)
  2021-01-14  2:16 ` [PULL 10/24] tcg/optimize: Adjust TempOptInfo allocation Richard Henderson
@ 2021-01-14  2:16 ` Richard Henderson
  2021-01-15 23:03   ` Alistair Francis
  2021-02-01 20:45   ` Richard W.M. Jones
  2021-01-14  2:16 ` [PULL 12/24] tcg: Convert tcg_gen_dupi_vec to TCG_CONST Richard Henderson
                   ` (13 subsequent siblings)
  24 siblings, 2 replies; 50+ messages in thread
From: Richard Henderson @ 2021-01-14  2:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 108 ++++++++++++++++++++++---------------------------
 1 file changed, 49 insertions(+), 59 deletions(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 49bf1386c7..bda727d5ed 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -178,37 +178,6 @@ static bool args_are_copies(TCGArg arg1, TCGArg arg2)
     return ts_are_copies(arg_temp(arg1), arg_temp(arg2));
 }
 
-static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg dst, uint64_t val)
-{
-    const TCGOpDef *def;
-    TCGOpcode new_op;
-    uint64_t mask;
-    TempOptInfo *di = arg_info(dst);
-
-    def = &tcg_op_defs[op->opc];
-    if (def->flags & TCG_OPF_VECTOR) {
-        new_op = INDEX_op_dupi_vec;
-    } else if (def->flags & TCG_OPF_64BIT) {
-        new_op = INDEX_op_movi_i64;
-    } else {
-        new_op = INDEX_op_movi_i32;
-    }
-    op->opc = new_op;
-    /* TCGOP_VECL and TCGOP_VECE remain unchanged.  */
-    op->args[0] = dst;
-    op->args[1] = val;
-
-    reset_temp(dst);
-    di->is_const = true;
-    di->val = val;
-    mask = val;
-    if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_movi_i32) {
-        /* High bits of the destination are now garbage.  */
-        mask |= ~0xffffffffull;
-    }
-    di->mask = mask;
-}
-
 static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
 {
     TCGTemp *dst_ts = arg_temp(dst);
@@ -259,6 +228,27 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
     }
 }
 
+static void tcg_opt_gen_movi(TCGContext *s, TCGTempSet *temps_used,
+                             TCGOp *op, TCGArg dst, uint64_t val)
+{
+    const TCGOpDef *def = &tcg_op_defs[op->opc];
+    TCGType type;
+    TCGTemp *tv;
+
+    if (def->flags & TCG_OPF_VECTOR) {
+        type = TCGOP_VECL(op) + TCG_TYPE_V64;
+    } else if (def->flags & TCG_OPF_64BIT) {
+        type = TCG_TYPE_I64;
+    } else {
+        type = TCG_TYPE_I32;
+    }
+
+    /* Convert movi to mov with constant temp. */
+    tv = tcg_constant_internal(type, val);
+    init_ts_info(temps_used, tv);
+    tcg_opt_gen_mov(s, op, dst, temp_arg(tv));
+}
+
 static uint64_t do_constant_folding_2(TCGOpcode op, uint64_t x, uint64_t y)
 {
     uint64_t l64, h64;
@@ -622,7 +612,7 @@ void tcg_optimize(TCGContext *s)
     nb_temps = s->nb_temps;
     nb_globals = s->nb_globals;
 
-    bitmap_zero(temps_used.l, nb_temps);
+    memset(&temps_used, 0, sizeof(temps_used));
     for (i = 0; i < nb_temps; ++i) {
         s->temps[i].state_ptr = NULL;
     }
@@ -727,7 +717,7 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(rotr):
             if (arg_is_const(op->args[1])
                 && arg_info(op->args[1])->val == 0) {
-                tcg_opt_gen_movi(s, op, op->args[0], 0);
+                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], 0);
                 continue;
             }
             break;
@@ -1054,7 +1044,7 @@ void tcg_optimize(TCGContext *s)
 
         if (partmask == 0) {
             tcg_debug_assert(nb_oargs == 1);
-            tcg_opt_gen_movi(s, op, op->args[0], 0);
+            tcg_opt_gen_movi(s, &temps_used, op, op->args[0], 0);
             continue;
         }
         if (affected == 0) {
@@ -1071,7 +1061,7 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(mulsh):
             if (arg_is_const(op->args[2])
                 && arg_info(op->args[2])->val == 0) {
-                tcg_opt_gen_movi(s, op, op->args[0], 0);
+                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], 0);
                 continue;
             }
             break;
@@ -1098,7 +1088,7 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64_VEC(sub):
         CASE_OP_32_64_VEC(xor):
             if (args_are_copies(op->args[1], op->args[2])) {
-                tcg_opt_gen_movi(s, op, op->args[0], 0);
+                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], 0);
                 continue;
             }
             break;
@@ -1115,14 +1105,14 @@ void tcg_optimize(TCGContext *s)
             break;
         CASE_OP_32_64(movi):
         case INDEX_op_dupi_vec:
-            tcg_opt_gen_movi(s, op, op->args[0], op->args[1]);
+            tcg_opt_gen_movi(s, &temps_used, op, op->args[0], op->args[1]);
             break;
 
         case INDEX_op_dup_vec:
             if (arg_is_const(op->args[1])) {
                 tmp = arg_info(op->args[1])->val;
                 tmp = dup_const(TCGOP_VECE(op), tmp);
-                tcg_opt_gen_movi(s, op, op->args[0], tmp);
+                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
@@ -1132,7 +1122,7 @@ void tcg_optimize(TCGContext *s)
             if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
                 tmp = arg_info(op->args[1])->val;
                 if (tmp == arg_info(op->args[2])->val) {
-                    tcg_opt_gen_movi(s, op, op->args[0], tmp);
+                    tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
                     break;
                 }
             } else if (args_are_copies(op->args[1], op->args[2])) {
@@ -1160,7 +1150,7 @@ void tcg_optimize(TCGContext *s)
         case INDEX_op_extrh_i64_i32:
             if (arg_is_const(op->args[1])) {
                 tmp = do_constant_folding(opc, arg_info(op->args[1])->val, 0);
-                tcg_opt_gen_movi(s, op, op->args[0], tmp);
+                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
@@ -1190,7 +1180,7 @@ void tcg_optimize(TCGContext *s)
             if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
                 tmp = do_constant_folding(opc, arg_info(op->args[1])->val,
                                           arg_info(op->args[2])->val);
-                tcg_opt_gen_movi(s, op, op->args[0], tmp);
+                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
@@ -1201,7 +1191,7 @@ void tcg_optimize(TCGContext *s)
                 TCGArg v = arg_info(op->args[1])->val;
                 if (v != 0) {
                     tmp = do_constant_folding(opc, v, 0);
-                    tcg_opt_gen_movi(s, op, op->args[0], tmp);
+                    tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
                 } else {
                     tcg_opt_gen_mov(s, op, op->args[0], op->args[2]);
                 }
@@ -1214,7 +1204,7 @@ void tcg_optimize(TCGContext *s)
                 tmp = deposit64(arg_info(op->args[1])->val,
                                 op->args[3], op->args[4],
                                 arg_info(op->args[2])->val);
-                tcg_opt_gen_movi(s, op, op->args[0], tmp);
+                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
@@ -1223,7 +1213,7 @@ void tcg_optimize(TCGContext *s)
             if (arg_is_const(op->args[1])) {
                 tmp = extract64(arg_info(op->args[1])->val,
                                 op->args[2], op->args[3]);
-                tcg_opt_gen_movi(s, op, op->args[0], tmp);
+                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
@@ -1232,7 +1222,7 @@ void tcg_optimize(TCGContext *s)
             if (arg_is_const(op->args[1])) {
                 tmp = sextract64(arg_info(op->args[1])->val,
                                  op->args[2], op->args[3]);
-                tcg_opt_gen_movi(s, op, op->args[0], tmp);
+                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
@@ -1249,7 +1239,7 @@ void tcg_optimize(TCGContext *s)
                     tmp = (int32_t)(((uint32_t)v1 >> shr) |
                                     ((uint32_t)v2 << (32 - shr)));
                 }
-                tcg_opt_gen_movi(s, op, op->args[0], tmp);
+                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
@@ -1258,7 +1248,7 @@ void tcg_optimize(TCGContext *s)
             tmp = do_constant_folding_cond(opc, op->args[1],
                                            op->args[2], op->args[3]);
             if (tmp != 2) {
-                tcg_opt_gen_movi(s, op, op->args[0], tmp);
+                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
@@ -1268,7 +1258,7 @@ void tcg_optimize(TCGContext *s)
                                            op->args[1], op->args[2]);
             if (tmp != 2) {
                 if (tmp) {
-                    bitmap_zero(temps_used.l, nb_temps);
+                    memset(&temps_used, 0, sizeof(temps_used));
                     op->opc = INDEX_op_br;
                     op->args[0] = op->args[3];
                 } else {
@@ -1314,7 +1304,7 @@ void tcg_optimize(TCGContext *s)
                 uint64_t a = ((uint64_t)ah << 32) | al;
                 uint64_t b = ((uint64_t)bh << 32) | bl;
                 TCGArg rl, rh;
-                TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_movi_i32);
+                TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_mov_i32);
 
                 if (opc == INDEX_op_add2_i32) {
                     a += b;
@@ -1324,8 +1314,8 @@ void tcg_optimize(TCGContext *s)
 
                 rl = op->args[0];
                 rh = op->args[1];
-                tcg_opt_gen_movi(s, op, rl, (int32_t)a);
-                tcg_opt_gen_movi(s, op2, rh, (int32_t)(a >> 32));
+                tcg_opt_gen_movi(s, &temps_used, op, rl, (int32_t)a);
+                tcg_opt_gen_movi(s, &temps_used, op2, rh, (int32_t)(a >> 32));
                 break;
             }
             goto do_default;
@@ -1336,12 +1326,12 @@ void tcg_optimize(TCGContext *s)
                 uint32_t b = arg_info(op->args[3])->val;
                 uint64_t r = (uint64_t)a * b;
                 TCGArg rl, rh;
-                TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_movi_i32);
+                TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_mov_i32);
 
                 rl = op->args[0];
                 rh = op->args[1];
-                tcg_opt_gen_movi(s, op, rl, (int32_t)r);
-                tcg_opt_gen_movi(s, op2, rh, (int32_t)(r >> 32));
+                tcg_opt_gen_movi(s, &temps_used, op, rl, (int32_t)r);
+                tcg_opt_gen_movi(s, &temps_used, op2, rh, (int32_t)(r >> 32));
                 break;
             }
             goto do_default;
@@ -1352,7 +1342,7 @@ void tcg_optimize(TCGContext *s)
             if (tmp != 2) {
                 if (tmp) {
             do_brcond_true:
-                    bitmap_zero(temps_used.l, nb_temps);
+                    memset(&temps_used, 0, sizeof(temps_used));
                     op->opc = INDEX_op_br;
                     op->args[0] = op->args[5];
                 } else {
@@ -1368,7 +1358,7 @@ void tcg_optimize(TCGContext *s)
                 /* Simplify LT/GE comparisons vs zero to a single compare
                    vs the high word of the input.  */
             do_brcond_high:
-                bitmap_zero(temps_used.l, nb_temps);
+                memset(&temps_used, 0, sizeof(temps_used));
                 op->opc = INDEX_op_brcond_i32;
                 op->args[0] = op->args[1];
                 op->args[1] = op->args[3];
@@ -1394,7 +1384,7 @@ void tcg_optimize(TCGContext *s)
                     goto do_default;
                 }
             do_brcond_low:
-                bitmap_zero(temps_used.l, nb_temps);
+                memset(&temps_used, 0, sizeof(temps_used));
                 op->opc = INDEX_op_brcond_i32;
                 op->args[1] = op->args[2];
                 op->args[2] = op->args[4];
@@ -1429,7 +1419,7 @@ void tcg_optimize(TCGContext *s)
                                             op->args[5]);
             if (tmp != 2) {
             do_setcond_const:
-                tcg_opt_gen_movi(s, op, op->args[0], tmp);
+                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
             } else if ((op->args[5] == TCG_COND_LT
                         || op->args[5] == TCG_COND_GE)
                        && arg_is_const(op->args[3])
@@ -1514,7 +1504,7 @@ void tcg_optimize(TCGContext *s)
                block, otherwise we only trash the output args.  "mask" is
                the non-zero bits mask for the first output arg.  */
             if (def->flags & TCG_OPF_BB_END) {
-                bitmap_zero(temps_used.l, nb_temps);
+                memset(&temps_used, 0, sizeof(temps_used));
             } else {
         do_reset_output:
                 for (i = 0; i < nb_oargs; i++) {
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 50+ messages in thread
* [PULL 12/24] tcg: Convert tcg_gen_dupi_vec to TCG_CONST
  2021-01-14  2:16 [PULL 00/24] tcg patch queue Richard Henderson
                   ` (10 preceding siblings ...)
  2021-01-14  2:16 ` [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding Richard Henderson
@ 2021-01-14  2:16 ` Richard Henderson
  2021-01-14  2:16 ` [PULL 13/24] tcg: Use tcg_constant_i32 with icount expander Richard Henderson
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 50+ messages in thread
From: Richard Henderson @ 2021-01-14  2:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell
Because we now store uint64_t in TCGTemp, we can now always
store the full 64-bit duplicate immediate.  So remove the
difference between 32- and 64-bit hosts.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c   |  9 ++++-----
 tcg/tcg-op-vec.c | 39 ++++++++++-----------------------------
 tcg/tcg.c        |  7 +------
 3 files changed, 15 insertions(+), 40 deletions(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index bda727d5ed..dbb03ef96b 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1120,11 +1120,10 @@ void tcg_optimize(TCGContext *s)
         case INDEX_op_dup2_vec:
             assert(TCG_TARGET_REG_BITS == 32);
             if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
-                tmp = arg_info(op->args[1])->val;
-                if (tmp == arg_info(op->args[2])->val) {
-                    tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
-                    break;
-                }
+                tcg_opt_gen_movi(s, &temps_used, op, op->args[0],
+                                 deposit64(arg_info(op->args[1])->val, 32, 32,
+                                           arg_info(op->args[2])->val));
+                break;
             } else if (args_are_copies(op->args[1], op->args[2])) {
                 op->opc = INDEX_op_dup_vec;
                 TCGOP_VECE(op) = MO_32;
diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index cdbf11c573..9fbed1366c 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -216,25 +216,17 @@ void tcg_gen_mov_vec(TCGv_vec r, TCGv_vec a)
     }
 }
 
-#define MO_REG  (TCG_TARGET_REG_BITS == 64 ? MO_64 : MO_32)
-
-static void do_dupi_vec(TCGv_vec r, unsigned vece, TCGArg a)
-{
-    TCGTemp *rt = tcgv_vec_temp(r);
-    vec_gen_2(INDEX_op_dupi_vec, rt->base_type, vece, temp_arg(rt), a);
-}
-
 TCGv_vec tcg_const_zeros_vec(TCGType type)
 {
     TCGv_vec ret = tcg_temp_new_vec(type);
-    do_dupi_vec(ret, MO_REG, 0);
+    tcg_gen_dupi_vec(MO_64, ret, 0);
     return ret;
 }
 
 TCGv_vec tcg_const_ones_vec(TCGType type)
 {
     TCGv_vec ret = tcg_temp_new_vec(type);
-    do_dupi_vec(ret, MO_REG, -1);
+    tcg_gen_dupi_vec(MO_64, ret, -1);
     return ret;
 }
 
@@ -252,39 +244,28 @@ TCGv_vec tcg_const_ones_vec_matching(TCGv_vec m)
 
 void tcg_gen_dup64i_vec(TCGv_vec r, uint64_t a)
 {
-    if (TCG_TARGET_REG_BITS == 64) {
-        do_dupi_vec(r, MO_64, a);
-    } else if (a == dup_const(MO_32, a)) {
-        do_dupi_vec(r, MO_32, a);
-    } else {
-        TCGv_i64 c = tcg_const_i64(a);
-        tcg_gen_dup_i64_vec(MO_64, r, c);
-        tcg_temp_free_i64(c);
-    }
+    tcg_gen_dupi_vec(MO_64, r, a);
 }
 
 void tcg_gen_dup32i_vec(TCGv_vec r, uint32_t a)
 {
-    do_dupi_vec(r, MO_REG, dup_const(MO_32, a));
+    tcg_gen_dupi_vec(MO_32, r, a);
 }
 
 void tcg_gen_dup16i_vec(TCGv_vec r, uint32_t a)
 {
-    do_dupi_vec(r, MO_REG, dup_const(MO_16, a));
+    tcg_gen_dupi_vec(MO_16, r, a);
 }
 
 void tcg_gen_dup8i_vec(TCGv_vec r, uint32_t a)
 {
-    do_dupi_vec(r, MO_REG, dup_const(MO_8, a));
+    tcg_gen_dupi_vec(MO_8, r, a);
 }
 
 void tcg_gen_dupi_vec(unsigned vece, TCGv_vec r, uint64_t a)
 {
-    if (vece == MO_64) {
-        tcg_gen_dup64i_vec(r, a);
-    } else {
-        do_dupi_vec(r, MO_REG, dup_const(vece, a));
-    }
+    TCGTemp *rt = tcgv_vec_temp(r);
+    tcg_gen_mov_vec(r, tcg_constant_vec(rt->base_type, vece, a));
 }
 
 void tcg_gen_dup_i64_vec(unsigned vece, TCGv_vec r, TCGv_i64 a)
@@ -489,8 +470,8 @@ void tcg_gen_abs_vec(unsigned vece, TCGv_vec r, TCGv_vec a)
             if (tcg_can_emit_vec_op(INDEX_op_sari_vec, type, vece) > 0) {
                 tcg_gen_sari_vec(vece, t, a, (8 << vece) - 1);
             } else {
-                do_dupi_vec(t, MO_REG, 0);
-                tcg_gen_cmp_vec(TCG_COND_LT, vece, t, a, t);
+                tcg_gen_cmp_vec(TCG_COND_LT, vece, t, a,
+                                tcg_constant_vec(type, vece, 0));
             }
             tcg_gen_xor_vec(vece, r, a, t);
             tcg_gen_sub_vec(vece, r, r, t);
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 802f0b8a32..ad1348d811 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -3539,16 +3539,11 @@ static void temp_load(TCGContext *s, TCGTemp *ts, TCGRegSet desired_regs,
              * The targets will, in general, have to do this search anyway,
              * do this generically.
              */
-            if (TCG_TARGET_REG_BITS == 32) {
-                val = dup_const(MO_32, val);
-                vece = MO_32;
-            }
             if (val == dup_const(MO_8, val)) {
                 vece = MO_8;
             } else if (val == dup_const(MO_16, val)) {
                 vece = MO_16;
-            } else if (TCG_TARGET_REG_BITS == 64 &&
-                       val == dup_const(MO_32, val)) {
+            } else if (val == dup_const(MO_32, val)) {
                 vece = MO_32;
             }
 
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 50+ messages in thread
* [PULL 13/24] tcg: Use tcg_constant_i32 with icount expander
  2021-01-14  2:16 [PULL 00/24] tcg patch queue Richard Henderson
                   ` (11 preceding siblings ...)
  2021-01-14  2:16 ` [PULL 12/24] tcg: Convert tcg_gen_dupi_vec to TCG_CONST Richard Henderson
@ 2021-01-14  2:16 ` Richard Henderson
  2021-01-14  2:16 ` [PULL 14/24] tcg: Use tcg_constant_{i32,i64} with tcg int expanders Richard Henderson
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 50+ messages in thread
From: Richard Henderson @ 2021-01-14  2:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Alex Bennée
We must do this before we adjust tcg_out_movi_i32, lest the
under-the-hood poking that we do for icount be broken.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/gen-icount.h | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)
diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index aa4b44354a..298e01eef4 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -34,7 +34,7 @@ static inline void gen_io_end(void)
 
 static inline void gen_tb_start(const TranslationBlock *tb)
 {
-    TCGv_i32 count, imm;
+    TCGv_i32 count;
 
     tcg_ctx->exitreq_label = gen_new_label();
     if (tb_cflags(tb) & CF_USE_ICOUNT) {
@@ -48,15 +48,13 @@ static inline void gen_tb_start(const TranslationBlock *tb)
                    offsetof(ArchCPU, env));
 
     if (tb_cflags(tb) & CF_USE_ICOUNT) {
-        imm = tcg_temp_new_i32();
-        /* We emit a movi with a dummy immediate argument. Keep the insn index
-         * of the movi so that we later (when we know the actual insn count)
-         * can update the immediate argument with the actual insn count.  */
-        tcg_gen_movi_i32(imm, 0xdeadbeef);
+        /*
+         * We emit a sub with a dummy immediate argument. Keep the insn index
+         * of the sub so that we later (when we know the actual insn count)
+         * can update the argument with the actual insn count.
+         */
+        tcg_gen_sub_i32(count, count, tcg_constant_i32(0));
         icount_start_insn = tcg_last_op();
-
-        tcg_gen_sub_i32(count, count, imm);
-        tcg_temp_free_i32(imm);
     }
 
     tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, tcg_ctx->exitreq_label);
@@ -74,9 +72,12 @@ static inline void gen_tb_start(const TranslationBlock *tb)
 static inline void gen_tb_end(const TranslationBlock *tb, int num_insns)
 {
     if (tb_cflags(tb) & CF_USE_ICOUNT) {
-        /* Update the num_insn immediate parameter now that we know
-         * the actual insn count.  */
-        tcg_set_insn_param(icount_start_insn, 1, num_insns);
+        /*
+         * Update the num_insn immediate parameter now that we know
+         * the actual insn count.
+         */
+        tcg_set_insn_param(icount_start_insn, 2,
+                           tcgv_i32_arg(tcg_constant_i32(num_insns)));
     }
 
     gen_set_label(tcg_ctx->exitreq_label);
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 50+ messages in thread
* [PULL 14/24] tcg: Use tcg_constant_{i32,i64} with tcg int expanders
  2021-01-14  2:16 [PULL 00/24] tcg patch queue Richard Henderson
                   ` (12 preceding siblings ...)
  2021-01-14  2:16 ` [PULL 13/24] tcg: Use tcg_constant_i32 with icount expander Richard Henderson
@ 2021-01-14  2:16 ` Richard Henderson
  2021-02-10 10:38   ` Alex Bennée
  2021-01-14  2:16 ` [PULL 15/24] tcg: Use tcg_constant_{i32,i64} with tcg plugins Richard Henderson
                   ` (10 subsequent siblings)
  24 siblings, 1 reply; 50+ messages in thread
From: Richard Henderson @ 2021-01-14  2:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg-op.h |  13 +--
 tcg/tcg-op.c         | 227 ++++++++++++++++++++-----------------------
 2 files changed, 109 insertions(+), 131 deletions(-)
diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index 901b19f32a..ed8de045e2 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -271,6 +271,7 @@ void tcg_gen_mb(TCGBar);
 
 /* 32 bit ops */
 
+void tcg_gen_movi_i32(TCGv_i32 ret, int32_t arg);
 void tcg_gen_addi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2);
 void tcg_gen_subfi_i32(TCGv_i32 ret, int32_t arg1, TCGv_i32 arg2);
 void tcg_gen_subi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2);
@@ -349,11 +350,6 @@ static inline void tcg_gen_mov_i32(TCGv_i32 ret, TCGv_i32 arg)
     }
 }
 
-static inline void tcg_gen_movi_i32(TCGv_i32 ret, int32_t arg)
-{
-    tcg_gen_op2i_i32(INDEX_op_movi_i32, ret, arg);
-}
-
 static inline void tcg_gen_ld8u_i32(TCGv_i32 ret, TCGv_ptr arg2,
                                     tcg_target_long offset)
 {
@@ -467,6 +463,7 @@ static inline void tcg_gen_not_i32(TCGv_i32 ret, TCGv_i32 arg)
 
 /* 64 bit ops */
 
+void tcg_gen_movi_i64(TCGv_i64 ret, int64_t arg);
 void tcg_gen_addi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2);
 void tcg_gen_subfi_i64(TCGv_i64 ret, int64_t arg1, TCGv_i64 arg2);
 void tcg_gen_subi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2);
@@ -550,11 +547,6 @@ static inline void tcg_gen_mov_i64(TCGv_i64 ret, TCGv_i64 arg)
     }
 }
 
-static inline void tcg_gen_movi_i64(TCGv_i64 ret, int64_t arg)
-{
-    tcg_gen_op2i_i64(INDEX_op_movi_i64, ret, arg);
-}
-
 static inline void tcg_gen_ld8u_i64(TCGv_i64 ret, TCGv_ptr arg2,
                                     tcg_target_long offset)
 {
@@ -698,7 +690,6 @@ static inline void tcg_gen_sub_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
 
 void tcg_gen_discard_i64(TCGv_i64 arg);
 void tcg_gen_mov_i64(TCGv_i64 ret, TCGv_i64 arg);
-void tcg_gen_movi_i64(TCGv_i64 ret, int64_t arg);
 void tcg_gen_ld8u_i64(TCGv_i64 ret, TCGv_ptr arg2, tcg_target_long offset);
 void tcg_gen_ld8s_i64(TCGv_i64 ret, TCGv_ptr arg2, tcg_target_long offset);
 void tcg_gen_ld16u_i64(TCGv_i64 ret, TCGv_ptr arg2, tcg_target_long offset);
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 0374b5d56d..70475773f4 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -104,15 +104,18 @@ void tcg_gen_mb(TCGBar mb_type)
 
 /* 32 bit ops */
 
+void tcg_gen_movi_i32(TCGv_i32 ret, int32_t arg)
+{
+    tcg_gen_mov_i32(ret, tcg_constant_i32(arg));
+}
+
 void tcg_gen_addi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
 {
     /* some cases can be optimized here */
     if (arg2 == 0) {
         tcg_gen_mov_i32(ret, arg1);
     } else {
-        TCGv_i32 t0 = tcg_const_i32(arg2);
-        tcg_gen_add_i32(ret, arg1, t0);
-        tcg_temp_free_i32(t0);
+        tcg_gen_add_i32(ret, arg1, tcg_constant_i32(arg2));
     }
 }
 
@@ -122,9 +125,7 @@ void tcg_gen_subfi_i32(TCGv_i32 ret, int32_t arg1, TCGv_i32 arg2)
         /* Don't recurse with tcg_gen_neg_i32.  */
         tcg_gen_op2_i32(INDEX_op_neg_i32, ret, arg2);
     } else {
-        TCGv_i32 t0 = tcg_const_i32(arg1);
-        tcg_gen_sub_i32(ret, t0, arg2);
-        tcg_temp_free_i32(t0);
+        tcg_gen_sub_i32(ret, tcg_constant_i32(arg1), arg2);
     }
 }
 
@@ -134,15 +135,12 @@ void tcg_gen_subi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
     if (arg2 == 0) {
         tcg_gen_mov_i32(ret, arg1);
     } else {
-        TCGv_i32 t0 = tcg_const_i32(arg2);
-        tcg_gen_sub_i32(ret, arg1, t0);
-        tcg_temp_free_i32(t0);
+        tcg_gen_sub_i32(ret, arg1, tcg_constant_i32(arg2));
     }
 }
 
 void tcg_gen_andi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
 {
-    TCGv_i32 t0;
     /* Some cases can be optimized here.  */
     switch (arg2) {
     case 0:
@@ -165,9 +163,8 @@ void tcg_gen_andi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
         }
         break;
     }
-    t0 = tcg_const_i32(arg2);
-    tcg_gen_and_i32(ret, arg1, t0);
-    tcg_temp_free_i32(t0);
+
+    tcg_gen_and_i32(ret, arg1, tcg_constant_i32(arg2));
 }
 
 void tcg_gen_ori_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
@@ -178,9 +175,7 @@ void tcg_gen_ori_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
     } else if (arg2 == 0) {
         tcg_gen_mov_i32(ret, arg1);
     } else {
-        TCGv_i32 t0 = tcg_const_i32(arg2);
-        tcg_gen_or_i32(ret, arg1, t0);
-        tcg_temp_free_i32(t0);
+        tcg_gen_or_i32(ret, arg1, tcg_constant_i32(arg2));
     }
 }
 
@@ -193,9 +188,7 @@ void tcg_gen_xori_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
         /* Don't recurse with tcg_gen_not_i32.  */
         tcg_gen_op2_i32(INDEX_op_not_i32, ret, arg1);
     } else {
-        TCGv_i32 t0 = tcg_const_i32(arg2);
-        tcg_gen_xor_i32(ret, arg1, t0);
-        tcg_temp_free_i32(t0);
+        tcg_gen_xor_i32(ret, arg1, tcg_constant_i32(arg2));
     }
 }
 
@@ -205,9 +198,7 @@ void tcg_gen_shli_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
     if (arg2 == 0) {
         tcg_gen_mov_i32(ret, arg1);
     } else {
-        TCGv_i32 t0 = tcg_const_i32(arg2);
-        tcg_gen_shl_i32(ret, arg1, t0);
-        tcg_temp_free_i32(t0);
+        tcg_gen_shl_i32(ret, arg1, tcg_constant_i32(arg2));
     }
 }
 
@@ -217,9 +208,7 @@ void tcg_gen_shri_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
     if (arg2 == 0) {
         tcg_gen_mov_i32(ret, arg1);
     } else {
-        TCGv_i32 t0 = tcg_const_i32(arg2);
-        tcg_gen_shr_i32(ret, arg1, t0);
-        tcg_temp_free_i32(t0);
+        tcg_gen_shr_i32(ret, arg1, tcg_constant_i32(arg2));
     }
 }
 
@@ -229,9 +218,7 @@ void tcg_gen_sari_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
     if (arg2 == 0) {
         tcg_gen_mov_i32(ret, arg1);
     } else {
-        TCGv_i32 t0 = tcg_const_i32(arg2);
-        tcg_gen_sar_i32(ret, arg1, t0);
-        tcg_temp_free_i32(t0);
+        tcg_gen_sar_i32(ret, arg1, tcg_constant_i32(arg2));
     }
 }
 
@@ -250,9 +237,7 @@ void tcg_gen_brcondi_i32(TCGCond cond, TCGv_i32 arg1, int32_t arg2, TCGLabel *l)
     if (cond == TCG_COND_ALWAYS) {
         tcg_gen_br(l);
     } else if (cond != TCG_COND_NEVER) {
-        TCGv_i32 t0 = tcg_const_i32(arg2);
-        tcg_gen_brcond_i32(cond, arg1, t0, l);
-        tcg_temp_free_i32(t0);
+        tcg_gen_brcond_i32(cond, arg1, tcg_constant_i32(arg2), l);
     }
 }
 
@@ -271,9 +256,7 @@ void tcg_gen_setcond_i32(TCGCond cond, TCGv_i32 ret,
 void tcg_gen_setcondi_i32(TCGCond cond, TCGv_i32 ret,
                           TCGv_i32 arg1, int32_t arg2)
 {
-    TCGv_i32 t0 = tcg_const_i32(arg2);
-    tcg_gen_setcond_i32(cond, ret, arg1, t0);
-    tcg_temp_free_i32(t0);
+    tcg_gen_setcond_i32(cond, ret, arg1, tcg_constant_i32(arg2));
 }
 
 void tcg_gen_muli_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
@@ -283,9 +266,7 @@ void tcg_gen_muli_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
     } else if (is_power_of_2(arg2)) {
         tcg_gen_shli_i32(ret, arg1, ctz32(arg2));
     } else {
-        TCGv_i32 t0 = tcg_const_i32(arg2);
-        tcg_gen_mul_i32(ret, arg1, t0);
-        tcg_temp_free_i32(t0);
+        tcg_gen_mul_i32(ret, arg1, tcg_constant_i32(arg2));
     }
 }
 
@@ -433,9 +414,7 @@ void tcg_gen_clz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
 
 void tcg_gen_clzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2)
 {
-    TCGv_i32 t = tcg_const_i32(arg2);
-    tcg_gen_clz_i32(ret, arg1, t);
-    tcg_temp_free_i32(t);
+    tcg_gen_clz_i32(ret, arg1, tcg_constant_i32(arg2));
 }
 
 void tcg_gen_ctz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
@@ -468,10 +447,9 @@ void tcg_gen_ctz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
             tcg_gen_clzi_i32(t, t, 32);
             tcg_gen_xori_i32(t, t, 31);
         }
-        z = tcg_const_i32(0);
+        z = tcg_constant_i32(0);
         tcg_gen_movcond_i32(TCG_COND_EQ, ret, arg1, z, arg2, t);
         tcg_temp_free_i32(t);
-        tcg_temp_free_i32(z);
     } else {
         gen_helper_ctz_i32(ret, arg1, arg2);
     }
@@ -487,9 +465,7 @@ void tcg_gen_ctzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2)
         tcg_gen_ctpop_i32(ret, t);
         tcg_temp_free_i32(t);
     } else {
-        TCGv_i32 t = tcg_const_i32(arg2);
-        tcg_gen_ctz_i32(ret, arg1, t);
-        tcg_temp_free_i32(t);
+        tcg_gen_ctz_i32(ret, arg1, tcg_constant_i32(arg2));
     }
 }
 
@@ -547,9 +523,7 @@ void tcg_gen_rotli_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
     if (arg2 == 0) {
         tcg_gen_mov_i32(ret, arg1);
     } else if (TCG_TARGET_HAS_rot_i32) {
-        TCGv_i32 t0 = tcg_const_i32(arg2);
-        tcg_gen_rotl_i32(ret, arg1, t0);
-        tcg_temp_free_i32(t0);
+        tcg_gen_rotl_i32(ret, arg1, tcg_constant_i32(arg2));
     } else {
         TCGv_i32 t0, t1;
         t0 = tcg_temp_new_i32();
@@ -653,9 +627,8 @@ void tcg_gen_deposit_z_i32(TCGv_i32 ret, TCGv_i32 arg,
         tcg_gen_andi_i32(ret, arg, (1u << len) - 1);
     } else if (TCG_TARGET_HAS_deposit_i32
                && TCG_TARGET_deposit_i32_valid(ofs, len)) {
-        TCGv_i32 zero = tcg_const_i32(0);
+        TCGv_i32 zero = tcg_constant_i32(0);
         tcg_gen_op5ii_i32(INDEX_op_deposit_i32, ret, zero, arg, ofs, len);
-        tcg_temp_free_i32(zero);
     } else {
         /* To help two-operand hosts we prefer to zero-extend first,
            which allows ARG to stay live.  */
@@ -1052,7 +1025,7 @@ void tcg_gen_bswap32_i32(TCGv_i32 ret, TCGv_i32 arg)
     } else {
         TCGv_i32 t0 = tcg_temp_new_i32();
         TCGv_i32 t1 = tcg_temp_new_i32();
-        TCGv_i32 t2 = tcg_const_i32(0x00ff00ff);
+        TCGv_i32 t2 = tcg_constant_i32(0x00ff00ff);
 
                                         /* arg = abcd */
         tcg_gen_shri_i32(t0, arg, 8);   /*  t0 = .abc */
@@ -1067,7 +1040,6 @@ void tcg_gen_bswap32_i32(TCGv_i32 ret, TCGv_i32 arg)
 
         tcg_temp_free_i32(t0);
         tcg_temp_free_i32(t1);
-        tcg_temp_free_i32(t2);
     }
 }
 
@@ -1114,8 +1086,15 @@ void tcg_gen_discard_i64(TCGv_i64 arg)
 
 void tcg_gen_mov_i64(TCGv_i64 ret, TCGv_i64 arg)
 {
-    tcg_gen_mov_i32(TCGV_LOW(ret), TCGV_LOW(arg));
-    tcg_gen_mov_i32(TCGV_HIGH(ret), TCGV_HIGH(arg));
+    TCGTemp *ts = tcgv_i64_temp(arg);
+
+    /* Canonicalize TCGv_i64 TEMP_CONST into TCGv_i32 TEMP_CONST. */
+    if (ts->kind == TEMP_CONST) {
+        tcg_gen_movi_i64(ret, ts->val);
+    } else {
+        tcg_gen_mov_i32(TCGV_LOW(ret), TCGV_LOW(arg));
+        tcg_gen_mov_i32(TCGV_HIGH(ret), TCGV_HIGH(arg));
+    }
 }
 
 void tcg_gen_movi_i64(TCGv_i64 ret, int64_t arg)
@@ -1237,6 +1216,14 @@ void tcg_gen_mul_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
     tcg_temp_free_i64(t0);
     tcg_temp_free_i32(t1);
 }
+
+#else
+
+void tcg_gen_movi_i64(TCGv_i64 ret, int64_t arg)
+{
+    tcg_gen_mov_i64(ret, tcg_constant_i64(arg));
+}
+
 #endif /* TCG_TARGET_REG_SIZE == 32 */
 
 void tcg_gen_addi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
@@ -1244,10 +1231,12 @@ void tcg_gen_addi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
     /* some cases can be optimized here */
     if (arg2 == 0) {
         tcg_gen_mov_i64(ret, arg1);
+    } else if (TCG_TARGET_REG_BITS == 64) {
+        tcg_gen_add_i64(ret, arg1, tcg_constant_i64(arg2));
     } else {
-        TCGv_i64 t0 = tcg_const_i64(arg2);
-        tcg_gen_add_i64(ret, arg1, t0);
-        tcg_temp_free_i64(t0);
+        tcg_gen_add2_i32(TCGV_LOW(ret), TCGV_HIGH(ret),
+                         TCGV_LOW(arg1), TCGV_HIGH(arg1),
+                         tcg_constant_i32(arg2), tcg_constant_i32(arg2 >> 32));
     }
 }
 
@@ -1256,10 +1245,12 @@ void tcg_gen_subfi_i64(TCGv_i64 ret, int64_t arg1, TCGv_i64 arg2)
     if (arg1 == 0 && TCG_TARGET_HAS_neg_i64) {
         /* Don't recurse with tcg_gen_neg_i64.  */
         tcg_gen_op2_i64(INDEX_op_neg_i64, ret, arg2);
+    } else if (TCG_TARGET_REG_BITS == 64) {
+        tcg_gen_sub_i64(ret, tcg_constant_i64(arg1), arg2);
     } else {
-        TCGv_i64 t0 = tcg_const_i64(arg1);
-        tcg_gen_sub_i64(ret, t0, arg2);
-        tcg_temp_free_i64(t0);
+        tcg_gen_sub2_i32(TCGV_LOW(ret), TCGV_HIGH(ret),
+                         tcg_constant_i32(arg1), tcg_constant_i32(arg1 >> 32),
+                         TCGV_LOW(arg2), TCGV_HIGH(arg2));
     }
 }
 
@@ -1268,17 +1259,17 @@ void tcg_gen_subi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
     /* some cases can be optimized here */
     if (arg2 == 0) {
         tcg_gen_mov_i64(ret, arg1);
+    } else if (TCG_TARGET_REG_BITS == 64) {
+        tcg_gen_sub_i64(ret, arg1, tcg_constant_i64(arg2));
     } else {
-        TCGv_i64 t0 = tcg_const_i64(arg2);
-        tcg_gen_sub_i64(ret, arg1, t0);
-        tcg_temp_free_i64(t0);
+        tcg_gen_sub2_i32(TCGV_LOW(ret), TCGV_HIGH(ret),
+                         TCGV_LOW(arg1), TCGV_HIGH(arg1),
+                         tcg_constant_i32(arg2), tcg_constant_i32(arg2 >> 32));
     }
 }
 
 void tcg_gen_andi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
 {
-    TCGv_i64 t0;
-
     if (TCG_TARGET_REG_BITS == 32) {
         tcg_gen_andi_i32(TCGV_LOW(ret), TCGV_LOW(arg1), arg2);
         tcg_gen_andi_i32(TCGV_HIGH(ret), TCGV_HIGH(arg1), arg2 >> 32);
@@ -1313,9 +1304,8 @@ void tcg_gen_andi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
         }
         break;
     }
-    t0 = tcg_const_i64(arg2);
-    tcg_gen_and_i64(ret, arg1, t0);
-    tcg_temp_free_i64(t0);
+
+    tcg_gen_and_i64(ret, arg1, tcg_constant_i64(arg2));
 }
 
 void tcg_gen_ori_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
@@ -1331,9 +1321,7 @@ void tcg_gen_ori_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
     } else if (arg2 == 0) {
         tcg_gen_mov_i64(ret, arg1);
     } else {
-        TCGv_i64 t0 = tcg_const_i64(arg2);
-        tcg_gen_or_i64(ret, arg1, t0);
-        tcg_temp_free_i64(t0);
+        tcg_gen_or_i64(ret, arg1, tcg_constant_i64(arg2));
     }
 }
 
@@ -1351,9 +1339,7 @@ void tcg_gen_xori_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
         /* Don't recurse with tcg_gen_not_i64.  */
         tcg_gen_op2_i64(INDEX_op_not_i64, ret, arg1);
     } else {
-        TCGv_i64 t0 = tcg_const_i64(arg2);
-        tcg_gen_xor_i64(ret, arg1, t0);
-        tcg_temp_free_i64(t0);
+        tcg_gen_xor_i64(ret, arg1, tcg_constant_i64(arg2));
     }
 }
 
@@ -1415,9 +1401,7 @@ void tcg_gen_shli_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
     } else if (arg2 == 0) {
         tcg_gen_mov_i64(ret, arg1);
     } else {
-        TCGv_i64 t0 = tcg_const_i64(arg2);
-        tcg_gen_shl_i64(ret, arg1, t0);
-        tcg_temp_free_i64(t0);
+        tcg_gen_shl_i64(ret, arg1, tcg_constant_i64(arg2));
     }
 }
 
@@ -1429,9 +1413,7 @@ void tcg_gen_shri_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
     } else if (arg2 == 0) {
         tcg_gen_mov_i64(ret, arg1);
     } else {
-        TCGv_i64 t0 = tcg_const_i64(arg2);
-        tcg_gen_shr_i64(ret, arg1, t0);
-        tcg_temp_free_i64(t0);
+        tcg_gen_shr_i64(ret, arg1, tcg_constant_i64(arg2));
     }
 }
 
@@ -1443,9 +1425,7 @@ void tcg_gen_sari_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
     } else if (arg2 == 0) {
         tcg_gen_mov_i64(ret, arg1);
     } else {
-        TCGv_i64 t0 = tcg_const_i64(arg2);
-        tcg_gen_sar_i64(ret, arg1, t0);
-        tcg_temp_free_i64(t0);
+        tcg_gen_sar_i64(ret, arg1, tcg_constant_i64(arg2));
     }
 }
 
@@ -1468,12 +1448,17 @@ void tcg_gen_brcond_i64(TCGCond cond, TCGv_i64 arg1, TCGv_i64 arg2, TCGLabel *l)
 
 void tcg_gen_brcondi_i64(TCGCond cond, TCGv_i64 arg1, int64_t arg2, TCGLabel *l)
 {
-    if (cond == TCG_COND_ALWAYS) {
+    if (TCG_TARGET_REG_BITS == 64) {
+        tcg_gen_brcond_i64(cond, arg1, tcg_constant_i64(arg2), l);
+    } else if (cond == TCG_COND_ALWAYS) {
         tcg_gen_br(l);
     } else if (cond != TCG_COND_NEVER) {
-        TCGv_i64 t0 = tcg_const_i64(arg2);
-        tcg_gen_brcond_i64(cond, arg1, t0, l);
-        tcg_temp_free_i64(t0);
+        l->refs++;
+        tcg_gen_op6ii_i32(INDEX_op_brcond2_i32,
+                          TCGV_LOW(arg1), TCGV_HIGH(arg1),
+                          tcg_constant_i32(arg2),
+                          tcg_constant_i32(arg2 >> 32),
+                          cond, label_arg(l));
     }
 }
 
@@ -1499,9 +1484,19 @@ void tcg_gen_setcond_i64(TCGCond cond, TCGv_i64 ret,
 void tcg_gen_setcondi_i64(TCGCond cond, TCGv_i64 ret,
                           TCGv_i64 arg1, int64_t arg2)
 {
-    TCGv_i64 t0 = tcg_const_i64(arg2);
-    tcg_gen_setcond_i64(cond, ret, arg1, t0);
-    tcg_temp_free_i64(t0);
+    if (TCG_TARGET_REG_BITS == 64) {
+        tcg_gen_setcond_i64(cond, ret, arg1, tcg_constant_i64(arg2));
+    } else if (cond == TCG_COND_ALWAYS) {
+        tcg_gen_movi_i64(ret, 1);
+    } else if (cond == TCG_COND_NEVER) {
+        tcg_gen_movi_i64(ret, 0);
+    } else {
+        tcg_gen_op6i_i32(INDEX_op_setcond2_i32, TCGV_LOW(ret),
+                         TCGV_LOW(arg1), TCGV_HIGH(arg1),
+                         tcg_constant_i32(arg2),
+                         tcg_constant_i32(arg2 >> 32), cond);
+        tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
+    }
 }
 
 void tcg_gen_muli_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
@@ -1690,7 +1685,7 @@ void tcg_gen_bswap32_i64(TCGv_i64 ret, TCGv_i64 arg)
     } else {
         TCGv_i64 t0 = tcg_temp_new_i64();
         TCGv_i64 t1 = tcg_temp_new_i64();
-        TCGv_i64 t2 = tcg_const_i64(0x00ff00ff);
+        TCGv_i64 t2 = tcg_constant_i64(0x00ff00ff);
 
                                         /* arg = ....abcd */
         tcg_gen_shri_i64(t0, arg, 8);   /*  t0 = .....abc */
@@ -1706,7 +1701,6 @@ void tcg_gen_bswap32_i64(TCGv_i64 ret, TCGv_i64 arg)
 
         tcg_temp_free_i64(t0);
         tcg_temp_free_i64(t1);
-        tcg_temp_free_i64(t2);
     }
 }
 
@@ -1850,16 +1844,16 @@ void tcg_gen_clzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2)
     if (TCG_TARGET_REG_BITS == 32
         && TCG_TARGET_HAS_clz_i32
         && arg2 <= 0xffffffffu) {
-        TCGv_i32 t = tcg_const_i32((uint32_t)arg2 - 32);
-        tcg_gen_clz_i32(t, TCGV_LOW(arg1), t);
+        TCGv_i32 t = tcg_temp_new_i32();
+        tcg_gen_clzi_i32(t, TCGV_LOW(arg1), arg2 - 32);
         tcg_gen_addi_i32(t, t, 32);
         tcg_gen_clz_i32(TCGV_LOW(ret), TCGV_HIGH(arg1), t);
         tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
         tcg_temp_free_i32(t);
     } else {
-        TCGv_i64 t = tcg_const_i64(arg2);
-        tcg_gen_clz_i64(ret, arg1, t);
-        tcg_temp_free_i64(t);
+        TCGv_i64 t0 = tcg_const_i64(arg2);
+        tcg_gen_clz_i64(ret, arg1, t0);
+        tcg_temp_free_i64(t0);
     }
 }
 
@@ -1881,7 +1875,7 @@ void tcg_gen_ctz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
             tcg_gen_clzi_i64(t, t, 64);
             tcg_gen_xori_i64(t, t, 63);
         }
-        z = tcg_const_i64(0);
+        z = tcg_constant_i64(0);
         tcg_gen_movcond_i64(TCG_COND_EQ, ret, arg1, z, arg2, t);
         tcg_temp_free_i64(t);
         tcg_temp_free_i64(z);
@@ -1895,8 +1889,8 @@ void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2)
     if (TCG_TARGET_REG_BITS == 32
         && TCG_TARGET_HAS_ctz_i32
         && arg2 <= 0xffffffffu) {
-        TCGv_i32 t32 = tcg_const_i32((uint32_t)arg2 - 32);
-        tcg_gen_ctz_i32(t32, TCGV_HIGH(arg1), t32);
+        TCGv_i32 t32 = tcg_temp_new_i32();
+        tcg_gen_ctzi_i32(t32, TCGV_HIGH(arg1), arg2 - 32);
         tcg_gen_addi_i32(t32, t32, 32);
         tcg_gen_ctz_i32(TCGV_LOW(ret), TCGV_LOW(arg1), t32);
         tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
@@ -1911,9 +1905,9 @@ void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2)
         tcg_gen_ctpop_i64(ret, t);
         tcg_temp_free_i64(t);
     } else {
-        TCGv_i64 t64 = tcg_const_i64(arg2);
-        tcg_gen_ctz_i64(ret, arg1, t64);
-        tcg_temp_free_i64(t64);
+        TCGv_i64 t0 = tcg_const_i64(arg2);
+        tcg_gen_ctz_i64(ret, arg1, t0);
+        tcg_temp_free_i64(t0);
     }
 }
 
@@ -1969,9 +1963,7 @@ void tcg_gen_rotli_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
     if (arg2 == 0) {
         tcg_gen_mov_i64(ret, arg1);
     } else if (TCG_TARGET_HAS_rot_i64) {
-        TCGv_i64 t0 = tcg_const_i64(arg2);
-        tcg_gen_rotl_i64(ret, arg1, t0);
-        tcg_temp_free_i64(t0);
+        tcg_gen_rotl_i64(ret, arg1, tcg_constant_i64(arg2));
     } else {
         TCGv_i64 t0, t1;
         t0 = tcg_temp_new_i64();
@@ -2089,9 +2081,8 @@ void tcg_gen_deposit_z_i64(TCGv_i64 ret, TCGv_i64 arg,
         tcg_gen_andi_i64(ret, arg, (1ull << len) - 1);
     } else if (TCG_TARGET_HAS_deposit_i64
                && TCG_TARGET_deposit_i64_valid(ofs, len)) {
-        TCGv_i64 zero = tcg_const_i64(0);
+        TCGv_i64 zero = tcg_constant_i64(0);
         tcg_gen_op5ii_i64(INDEX_op_deposit_i64, ret, zero, arg, ofs, len);
-        tcg_temp_free_i64(zero);
     } else {
         if (TCG_TARGET_REG_BITS == 32) {
             if (ofs >= 32) {
@@ -3117,9 +3108,8 @@ void tcg_gen_atomic_cmpxchg_i32(TCGv_i32 retv, TCGv addr, TCGv_i32 cmpv,
 
 #ifdef CONFIG_SOFTMMU
         {
-            TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop & ~MO_SIGN, idx));
-            gen(retv, cpu_env, addr, cmpv, newv, oi);
-            tcg_temp_free_i32(oi);
+            TCGMemOpIdx oi = make_memop_idx(memop & ~MO_SIGN, idx);
+            gen(retv, cpu_env, addr, cmpv, newv, tcg_constant_i32(oi));
         }
 #else
         gen(retv, cpu_env, addr, cmpv, newv);
@@ -3162,9 +3152,8 @@ void tcg_gen_atomic_cmpxchg_i64(TCGv_i64 retv, TCGv addr, TCGv_i64 cmpv,
 
 #ifdef CONFIG_SOFTMMU
         {
-            TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop, idx));
-            gen(retv, cpu_env, addr, cmpv, newv, oi);
-            tcg_temp_free_i32(oi);
+            TCGMemOpIdx oi = make_memop_idx(memop, idx);
+            gen(retv, cpu_env, addr, cmpv, newv, tcg_constant_i32(oi));
         }
 #else
         gen(retv, cpu_env, addr, cmpv, newv);
@@ -3226,9 +3215,8 @@ static void do_atomic_op_i32(TCGv_i32 ret, TCGv addr, TCGv_i32 val,
 
 #ifdef CONFIG_SOFTMMU
     {
-        TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop & ~MO_SIGN, idx));
-        gen(ret, cpu_env, addr, val, oi);
-        tcg_temp_free_i32(oi);
+        TCGMemOpIdx oi = make_memop_idx(memop & ~MO_SIGN, idx);
+        gen(ret, cpu_env, addr, val, tcg_constant_i32(oi));
     }
 #else
     gen(ret, cpu_env, addr, val);
@@ -3272,9 +3260,8 @@ static void do_atomic_op_i64(TCGv_i64 ret, TCGv addr, TCGv_i64 val,
 
 #ifdef CONFIG_SOFTMMU
         {
-            TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop & ~MO_SIGN, idx));
-            gen(ret, cpu_env, addr, val, oi);
-            tcg_temp_free_i32(oi);
+            TCGMemOpIdx oi = make_memop_idx(memop & ~MO_SIGN, idx);
+            gen(ret, cpu_env, addr, val, tcg_constant_i32(oi));
         }
 #else
         gen(ret, cpu_env, addr, val);
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 50+ messages in thread
* [PULL 15/24] tcg: Use tcg_constant_{i32,i64} with tcg plugins
  2021-01-14  2:16 [PULL 00/24] tcg patch queue Richard Henderson
                   ` (13 preceding siblings ...)
  2021-01-14  2:16 ` [PULL 14/24] tcg: Use tcg_constant_{i32,i64} with tcg int expanders Richard Henderson
@ 2021-01-14  2:16 ` Richard Henderson
  2021-01-14  2:16 ` [PULL 16/24] tcg: Use tcg_constant_{i32,i64,vec} with gvec expanders Richard Henderson
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 50+ messages in thread
From: Richard Henderson @ 2021-01-14  2:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Alex Bennée
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/plugin-gen.c | 49 +++++++++++++++++++-----------------------
 1 file changed, 22 insertions(+), 27 deletions(-)
diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index 51580d51a0..e5dc9d0ca9 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -284,8 +284,8 @@ static TCGOp *copy_extu_i32_i64(TCGOp **begin_op, TCGOp *op)
     if (TCG_TARGET_REG_BITS == 32) {
         /* mov_i32 */
         op = copy_op(begin_op, op, INDEX_op_mov_i32);
-        /* movi_i32 */
-        op = copy_op(begin_op, op, INDEX_op_movi_i32);
+        /* mov_i32 w/ $0 */
+        op = copy_op(begin_op, op, INDEX_op_mov_i32);
     } else {
         /* extu_i32_i64 */
         op = copy_op(begin_op, op, INDEX_op_extu_i32_i64);
@@ -306,39 +306,34 @@ static TCGOp *copy_mov_i64(TCGOp **begin_op, TCGOp *op)
     return op;
 }
 
-static TCGOp *copy_movi_i64(TCGOp **begin_op, TCGOp *op, uint64_t v)
-{
-    if (TCG_TARGET_REG_BITS == 32) {
-        /* 2x movi_i32 */
-        op = copy_op(begin_op, op, INDEX_op_movi_i32);
-        op->args[1] = v;
-
-        op = copy_op(begin_op, op, INDEX_op_movi_i32);
-        op->args[1] = v >> 32;
-    } else {
-        /* movi_i64 */
-        op = copy_op(begin_op, op, INDEX_op_movi_i64);
-        op->args[1] = v;
-    }
-    return op;
-}
-
 static TCGOp *copy_const_ptr(TCGOp **begin_op, TCGOp *op, void *ptr)
 {
     if (UINTPTR_MAX == UINT32_MAX) {
-        /* movi_i32 */
-        op = copy_op(begin_op, op, INDEX_op_movi_i32);
-        op->args[1] = (uintptr_t)ptr;
+        /* mov_i32 */
+        op = copy_op(begin_op, op, INDEX_op_mov_i32);
+        op->args[1] = tcgv_i32_arg(tcg_constant_i32((uintptr_t)ptr));
     } else {
-        /* movi_i64 */
-        op = copy_movi_i64(begin_op, op, (uint64_t)(uintptr_t)ptr);
+        /* mov_i64 */
+        op = copy_op(begin_op, op, INDEX_op_mov_i64);
+        op->args[1] = tcgv_i64_arg(tcg_constant_i64((uintptr_t)ptr));
     }
     return op;
 }
 
 static TCGOp *copy_const_i64(TCGOp **begin_op, TCGOp *op, uint64_t v)
 {
-    return copy_movi_i64(begin_op, op, v);
+    if (TCG_TARGET_REG_BITS == 32) {
+        /* 2x mov_i32 */
+        op = copy_op(begin_op, op, INDEX_op_mov_i32);
+        op->args[1] = tcgv_i32_arg(tcg_constant_i32(v));
+        op = copy_op(begin_op, op, INDEX_op_mov_i32);
+        op->args[1] = tcgv_i32_arg(tcg_constant_i32(v >> 32));
+    } else {
+        /* mov_i64 */
+        op = copy_op(begin_op, op, INDEX_op_mov_i64);
+        op->args[1] = tcgv_i64_arg(tcg_constant_i64(v));
+    }
+    return op;
 }
 
 static TCGOp *copy_extu_tl_i64(TCGOp **begin_op, TCGOp *op)
@@ -486,8 +481,8 @@ static TCGOp *append_mem_cb(const struct qemu_plugin_dyn_cb *cb,
 
     tcg_debug_assert(type == PLUGIN_GEN_CB_MEM);
 
-    /* const_i32 == movi_i32 ("info", so it remains as is) */
-    op = copy_op(&begin_op, op, INDEX_op_movi_i32);
+    /* const_i32 == mov_i32 ("info", so it remains as is) */
+    op = copy_op(&begin_op, op, INDEX_op_mov_i32);
 
     /* const_ptr */
     op = copy_const_ptr(&begin_op, op, cb->userp);
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 50+ messages in thread
* [PULL 16/24] tcg: Use tcg_constant_{i32,i64,vec} with gvec expanders
  2021-01-14  2:16 [PULL 00/24] tcg patch queue Richard Henderson
                   ` (14 preceding siblings ...)
  2021-01-14  2:16 ` [PULL 15/24] tcg: Use tcg_constant_{i32,i64} with tcg plugins Richard Henderson
@ 2021-01-14  2:16 ` Richard Henderson
  2021-01-14  2:16 ` [PULL 17/24] tcg/tci: Add special tci_movi_{i32,i64} opcodes Richard Henderson
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 50+ messages in thread
From: Richard Henderson @ 2021-01-14  2:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg.h |   1 +
 tcg/tcg-op-gvec.c | 129 ++++++++++++++++++----------------------------
 tcg/tcg.c         |   8 +++
 3 files changed, 60 insertions(+), 78 deletions(-)
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index eeeb70ad43..504c5e9bb0 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -1109,6 +1109,7 @@ static inline TCGv_i64 tcg_constant_i64(int64_t val)
 }
 
 TCGv_vec tcg_constant_vec(TCGType type, unsigned vece, int64_t val);
+TCGv_vec tcg_constant_vec_matching(TCGv_vec match, unsigned vece, int64_t val);
 
 #if UINTPTR_MAX == UINT32_MAX
 # define tcg_const_ptr(x)        ((TCGv_ptr)tcg_const_i32((intptr_t)(x)))
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 1a41dfa908..498a959839 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -115,7 +115,7 @@ void tcg_gen_gvec_2_ool(uint32_t dofs, uint32_t aofs,
                         gen_helper_gvec_2 *fn)
 {
     TCGv_ptr a0, a1;
-    TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data));
+    TCGv_i32 desc = tcg_constant_i32(simd_desc(oprsz, maxsz, data));
 
     a0 = tcg_temp_new_ptr();
     a1 = tcg_temp_new_ptr();
@@ -127,7 +127,6 @@ void tcg_gen_gvec_2_ool(uint32_t dofs, uint32_t aofs,
 
     tcg_temp_free_ptr(a0);
     tcg_temp_free_ptr(a1);
-    tcg_temp_free_i32(desc);
 }
 
 /* Generate a call to a gvec-style helper with two vector operands
@@ -137,7 +136,7 @@ void tcg_gen_gvec_2i_ool(uint32_t dofs, uint32_t aofs, TCGv_i64 c,
                          gen_helper_gvec_2i *fn)
 {
     TCGv_ptr a0, a1;
-    TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data));
+    TCGv_i32 desc = tcg_constant_i32(simd_desc(oprsz, maxsz, data));
 
     a0 = tcg_temp_new_ptr();
     a1 = tcg_temp_new_ptr();
@@ -149,7 +148,6 @@ void tcg_gen_gvec_2i_ool(uint32_t dofs, uint32_t aofs, TCGv_i64 c,
 
     tcg_temp_free_ptr(a0);
     tcg_temp_free_ptr(a1);
-    tcg_temp_free_i32(desc);
 }
 
 /* Generate a call to a gvec-style helper with three vector operands.  */
@@ -158,7 +156,7 @@ void tcg_gen_gvec_3_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs,
                         gen_helper_gvec_3 *fn)
 {
     TCGv_ptr a0, a1, a2;
-    TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data));
+    TCGv_i32 desc = tcg_constant_i32(simd_desc(oprsz, maxsz, data));
 
     a0 = tcg_temp_new_ptr();
     a1 = tcg_temp_new_ptr();
@@ -173,7 +171,6 @@ void tcg_gen_gvec_3_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs,
     tcg_temp_free_ptr(a0);
     tcg_temp_free_ptr(a1);
     tcg_temp_free_ptr(a2);
-    tcg_temp_free_i32(desc);
 }
 
 /* Generate a call to a gvec-style helper with four vector operands.  */
@@ -182,7 +179,7 @@ void tcg_gen_gvec_4_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs,
                         int32_t data, gen_helper_gvec_4 *fn)
 {
     TCGv_ptr a0, a1, a2, a3;
-    TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data));
+    TCGv_i32 desc = tcg_constant_i32(simd_desc(oprsz, maxsz, data));
 
     a0 = tcg_temp_new_ptr();
     a1 = tcg_temp_new_ptr();
@@ -200,7 +197,6 @@ void tcg_gen_gvec_4_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs,
     tcg_temp_free_ptr(a1);
     tcg_temp_free_ptr(a2);
     tcg_temp_free_ptr(a3);
-    tcg_temp_free_i32(desc);
 }
 
 /* Generate a call to a gvec-style helper with five vector operands.  */
@@ -209,7 +205,7 @@ void tcg_gen_gvec_5_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs,
                         uint32_t maxsz, int32_t data, gen_helper_gvec_5 *fn)
 {
     TCGv_ptr a0, a1, a2, a3, a4;
-    TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data));
+    TCGv_i32 desc = tcg_constant_i32(simd_desc(oprsz, maxsz, data));
 
     a0 = tcg_temp_new_ptr();
     a1 = tcg_temp_new_ptr();
@@ -230,7 +226,6 @@ void tcg_gen_gvec_5_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs,
     tcg_temp_free_ptr(a2);
     tcg_temp_free_ptr(a3);
     tcg_temp_free_ptr(a4);
-    tcg_temp_free_i32(desc);
 }
 
 /* Generate a call to a gvec-style helper with three vector operands
@@ -240,7 +235,7 @@ void tcg_gen_gvec_2_ptr(uint32_t dofs, uint32_t aofs,
                         int32_t data, gen_helper_gvec_2_ptr *fn)
 {
     TCGv_ptr a0, a1;
-    TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data));
+    TCGv_i32 desc = tcg_constant_i32(simd_desc(oprsz, maxsz, data));
 
     a0 = tcg_temp_new_ptr();
     a1 = tcg_temp_new_ptr();
@@ -252,7 +247,6 @@ void tcg_gen_gvec_2_ptr(uint32_t dofs, uint32_t aofs,
 
     tcg_temp_free_ptr(a0);
     tcg_temp_free_ptr(a1);
-    tcg_temp_free_i32(desc);
 }
 
 /* Generate a call to a gvec-style helper with three vector operands
@@ -262,7 +256,7 @@ void tcg_gen_gvec_3_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs,
                         int32_t data, gen_helper_gvec_3_ptr *fn)
 {
     TCGv_ptr a0, a1, a2;
-    TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data));
+    TCGv_i32 desc = tcg_constant_i32(simd_desc(oprsz, maxsz, data));
 
     a0 = tcg_temp_new_ptr();
     a1 = tcg_temp_new_ptr();
@@ -277,7 +271,6 @@ void tcg_gen_gvec_3_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs,
     tcg_temp_free_ptr(a0);
     tcg_temp_free_ptr(a1);
     tcg_temp_free_ptr(a2);
-    tcg_temp_free_i32(desc);
 }
 
 /* Generate a call to a gvec-style helper with four vector operands
@@ -288,7 +281,7 @@ void tcg_gen_gvec_4_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs,
                         gen_helper_gvec_4_ptr *fn)
 {
     TCGv_ptr a0, a1, a2, a3;
-    TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data));
+    TCGv_i32 desc = tcg_constant_i32(simd_desc(oprsz, maxsz, data));
 
     a0 = tcg_temp_new_ptr();
     a1 = tcg_temp_new_ptr();
@@ -306,7 +299,6 @@ void tcg_gen_gvec_4_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs,
     tcg_temp_free_ptr(a1);
     tcg_temp_free_ptr(a2);
     tcg_temp_free_ptr(a3);
-    tcg_temp_free_i32(desc);
 }
 
 /* Generate a call to a gvec-style helper with five vector operands
@@ -317,7 +309,7 @@ void tcg_gen_gvec_5_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs,
                         gen_helper_gvec_5_ptr *fn)
 {
     TCGv_ptr a0, a1, a2, a3, a4;
-    TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data));
+    TCGv_i32 desc = tcg_constant_i32(simd_desc(oprsz, maxsz, data));
 
     a0 = tcg_temp_new_ptr();
     a1 = tcg_temp_new_ptr();
@@ -338,7 +330,6 @@ void tcg_gen_gvec_5_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs,
     tcg_temp_free_ptr(a2);
     tcg_temp_free_ptr(a3);
     tcg_temp_free_ptr(a4);
-    tcg_temp_free_i32(desc);
 }
 
 /* Return true if we want to implement something of OPRSZ bytes
@@ -605,9 +596,9 @@ static void do_dup(unsigned vece, uint32_t dofs, uint32_t oprsz,
                 || (TCG_TARGET_REG_BITS == 64
                     && (in_c == 0 || in_c == -1
                         || !check_size_impl(oprsz, 4)))) {
-                t_64 = tcg_const_i64(in_c);
+                t_64 = tcg_constant_i64(in_c);
             } else {
-                t_32 = tcg_const_i32(in_c);
+                t_32 = tcg_constant_i32(in_c);
             }
         }
 
@@ -648,11 +639,11 @@ static void do_dup(unsigned vece, uint32_t dofs, uint32_t oprsz,
             t_val = tcg_temp_new_i32();
             tcg_gen_extrl_i64_i32(t_val, in_64);
         } else {
-            t_val = tcg_const_i32(in_c);
+            t_val = tcg_constant_i32(in_c);
         }
         gen_helper_memset(t_ptr, t_ptr, t_val, t_size);
 
-        if (!in_32) {
+        if (in_64) {
             tcg_temp_free_i32(t_val);
         }
         tcg_temp_free_ptr(t_size);
@@ -660,15 +651,14 @@ static void do_dup(unsigned vece, uint32_t dofs, uint32_t oprsz,
         return;
     }
 
-    t_desc = tcg_const_i32(simd_desc(oprsz, maxsz, 0));
+    t_desc = tcg_constant_i32(simd_desc(oprsz, maxsz, 0));
 
     if (vece == MO_64) {
         if (in_64) {
             gen_helper_gvec_dup64(t_ptr, t_desc, in_64);
         } else {
-            t_64 = tcg_const_i64(in_c);
+            t_64 = tcg_constant_i64(in_c);
             gen_helper_gvec_dup64(t_ptr, t_desc, t_64);
-            tcg_temp_free_i64(t_64);
         }
     } else {
         typedef void dup_fn(TCGv_ptr, TCGv_i32, TCGv_i32);
@@ -680,24 +670,23 @@ static void do_dup(unsigned vece, uint32_t dofs, uint32_t oprsz,
 
         if (in_32) {
             fns[vece](t_ptr, t_desc, in_32);
-        } else {
+        } else if (in_64) {
             t_32 = tcg_temp_new_i32();
-            if (in_64) {
-                tcg_gen_extrl_i64_i32(t_32, in_64);
-            } else if (vece == MO_8) {
-                tcg_gen_movi_i32(t_32, in_c & 0xff);
-            } else if (vece == MO_16) {
-                tcg_gen_movi_i32(t_32, in_c & 0xffff);
-            } else {
-                tcg_gen_movi_i32(t_32, in_c);
-            }
+            tcg_gen_extrl_i64_i32(t_32, in_64);
             fns[vece](t_ptr, t_desc, t_32);
             tcg_temp_free_i32(t_32);
+        } else {
+            if (vece == MO_8) {
+                in_c &= 0xff;
+            } else if (vece == MO_16) {
+                in_c &= 0xffff;
+            }
+            t_32 = tcg_constant_i32(in_c);
+            fns[vece](t_ptr, t_desc, t_32);
         }
     }
 
     tcg_temp_free_ptr(t_ptr);
-    tcg_temp_free_i32(t_desc);
     return;
 
  done:
@@ -1247,10 +1236,9 @@ void tcg_gen_gvec_2i(uint32_t dofs, uint32_t aofs, uint32_t oprsz,
             if (g->fno) {
                 tcg_gen_gvec_2_ool(dofs, aofs, oprsz, maxsz, c, g->fno);
             } else {
-                TCGv_i64 tcg_c = tcg_const_i64(c);
+                TCGv_i64 tcg_c = tcg_constant_i64(c);
                 tcg_gen_gvec_2i_ool(dofs, aofs, tcg_c, oprsz,
                                     maxsz, c, g->fnoi);
-                tcg_temp_free_i64(tcg_c);
             }
             oprsz = maxsz;
         }
@@ -1744,16 +1732,14 @@ static void gen_addv_mask(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b, TCGv_i64 m)
 
 void tcg_gen_vec_add8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 {
-    TCGv_i64 m = tcg_const_i64(dup_const(MO_8, 0x80));
+    TCGv_i64 m = tcg_constant_i64(dup_const(MO_8, 0x80));
     gen_addv_mask(d, a, b, m);
-    tcg_temp_free_i64(m);
 }
 
 void tcg_gen_vec_add16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 {
-    TCGv_i64 m = tcg_const_i64(dup_const(MO_16, 0x8000));
+    TCGv_i64 m = tcg_constant_i64(dup_const(MO_16, 0x8000));
     gen_addv_mask(d, a, b, m);
-    tcg_temp_free_i64(m);
 }
 
 void tcg_gen_vec_add32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
@@ -1837,9 +1823,8 @@ void tcg_gen_gvec_adds(unsigned vece, uint32_t dofs, uint32_t aofs,
 void tcg_gen_gvec_addi(unsigned vece, uint32_t dofs, uint32_t aofs,
                        int64_t c, uint32_t oprsz, uint32_t maxsz)
 {
-    TCGv_i64 tmp = tcg_const_i64(c);
+    TCGv_i64 tmp = tcg_constant_i64(c);
     tcg_gen_gvec_adds(vece, dofs, aofs, tmp, oprsz, maxsz);
-    tcg_temp_free_i64(tmp);
 }
 
 static const TCGOpcode vecop_list_sub[] = { INDEX_op_sub_vec, 0 };
@@ -1897,16 +1882,14 @@ static void gen_subv_mask(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b, TCGv_i64 m)
 
 void tcg_gen_vec_sub8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 {
-    TCGv_i64 m = tcg_const_i64(dup_const(MO_8, 0x80));
+    TCGv_i64 m = tcg_constant_i64(dup_const(MO_8, 0x80));
     gen_subv_mask(d, a, b, m);
-    tcg_temp_free_i64(m);
 }
 
 void tcg_gen_vec_sub16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 {
-    TCGv_i64 m = tcg_const_i64(dup_const(MO_16, 0x8000));
+    TCGv_i64 m = tcg_constant_i64(dup_const(MO_16, 0x8000));
     gen_subv_mask(d, a, b, m);
-    tcg_temp_free_i64(m);
 }
 
 void tcg_gen_vec_sub32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
@@ -2017,9 +2000,8 @@ void tcg_gen_gvec_muls(unsigned vece, uint32_t dofs, uint32_t aofs,
 void tcg_gen_gvec_muli(unsigned vece, uint32_t dofs, uint32_t aofs,
                        int64_t c, uint32_t oprsz, uint32_t maxsz)
 {
-    TCGv_i64 tmp = tcg_const_i64(c);
+    TCGv_i64 tmp = tcg_constant_i64(c);
     tcg_gen_gvec_muls(vece, dofs, aofs, tmp, oprsz, maxsz);
-    tcg_temp_free_i64(tmp);
 }
 
 void tcg_gen_gvec_ssadd(unsigned vece, uint32_t dofs, uint32_t aofs,
@@ -2076,18 +2058,16 @@ void tcg_gen_gvec_sssub(unsigned vece, uint32_t dofs, uint32_t aofs,
 
 static void tcg_gen_usadd_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
 {
-    TCGv_i32 max = tcg_const_i32(-1);
+    TCGv_i32 max = tcg_constant_i32(-1);
     tcg_gen_add_i32(d, a, b);
     tcg_gen_movcond_i32(TCG_COND_LTU, d, d, a, max, d);
-    tcg_temp_free_i32(max);
 }
 
 static void tcg_gen_usadd_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 {
-    TCGv_i64 max = tcg_const_i64(-1);
+    TCGv_i64 max = tcg_constant_i64(-1);
     tcg_gen_add_i64(d, a, b);
     tcg_gen_movcond_i64(TCG_COND_LTU, d, d, a, max, d);
-    tcg_temp_free_i64(max);
 }
 
 void tcg_gen_gvec_usadd(unsigned vece, uint32_t dofs, uint32_t aofs,
@@ -2120,18 +2100,16 @@ void tcg_gen_gvec_usadd(unsigned vece, uint32_t dofs, uint32_t aofs,
 
 static void tcg_gen_ussub_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
 {
-    TCGv_i32 min = tcg_const_i32(0);
+    TCGv_i32 min = tcg_constant_i32(0);
     tcg_gen_sub_i32(d, a, b);
     tcg_gen_movcond_i32(TCG_COND_LTU, d, a, b, min, d);
-    tcg_temp_free_i32(min);
 }
 
 static void tcg_gen_ussub_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 {
-    TCGv_i64 min = tcg_const_i64(0);
+    TCGv_i64 min = tcg_constant_i64(0);
     tcg_gen_sub_i64(d, a, b);
     tcg_gen_movcond_i64(TCG_COND_LTU, d, a, b, min, d);
-    tcg_temp_free_i64(min);
 }
 
 void tcg_gen_gvec_ussub(unsigned vece, uint32_t dofs, uint32_t aofs,
@@ -2292,16 +2270,14 @@ static void gen_negv_mask(TCGv_i64 d, TCGv_i64 b, TCGv_i64 m)
 
 void tcg_gen_vec_neg8_i64(TCGv_i64 d, TCGv_i64 b)
 {
-    TCGv_i64 m = tcg_const_i64(dup_const(MO_8, 0x80));
+    TCGv_i64 m = tcg_constant_i64(dup_const(MO_8, 0x80));
     gen_negv_mask(d, b, m);
-    tcg_temp_free_i64(m);
 }
 
 void tcg_gen_vec_neg16_i64(TCGv_i64 d, TCGv_i64 b)
 {
-    TCGv_i64 m = tcg_const_i64(dup_const(MO_16, 0x8000));
+    TCGv_i64 m = tcg_constant_i64(dup_const(MO_16, 0x8000));
     gen_negv_mask(d, b, m);
-    tcg_temp_free_i64(m);
 }
 
 void tcg_gen_vec_neg32_i64(TCGv_i64 d, TCGv_i64 b)
@@ -2570,9 +2546,8 @@ void tcg_gen_gvec_ands(unsigned vece, uint32_t dofs, uint32_t aofs,
 void tcg_gen_gvec_andi(unsigned vece, uint32_t dofs, uint32_t aofs,
                        int64_t c, uint32_t oprsz, uint32_t maxsz)
 {
-    TCGv_i64 tmp = tcg_const_i64(dup_const(vece, c));
+    TCGv_i64 tmp = tcg_constant_i64(dup_const(vece, c));
     tcg_gen_gvec_2s(dofs, aofs, oprsz, maxsz, tmp, &gop_ands);
-    tcg_temp_free_i64(tmp);
 }
 
 static const GVecGen2s gop_xors = {
@@ -2595,9 +2570,8 @@ void tcg_gen_gvec_xors(unsigned vece, uint32_t dofs, uint32_t aofs,
 void tcg_gen_gvec_xori(unsigned vece, uint32_t dofs, uint32_t aofs,
                        int64_t c, uint32_t oprsz, uint32_t maxsz)
 {
-    TCGv_i64 tmp = tcg_const_i64(dup_const(vece, c));
+    TCGv_i64 tmp = tcg_constant_i64(dup_const(vece, c));
     tcg_gen_gvec_2s(dofs, aofs, oprsz, maxsz, tmp, &gop_xors);
-    tcg_temp_free_i64(tmp);
 }
 
 static const GVecGen2s gop_ors = {
@@ -2620,9 +2594,8 @@ void tcg_gen_gvec_ors(unsigned vece, uint32_t dofs, uint32_t aofs,
 void tcg_gen_gvec_ori(unsigned vece, uint32_t dofs, uint32_t aofs,
                       int64_t c, uint32_t oprsz, uint32_t maxsz)
 {
-    TCGv_i64 tmp = tcg_const_i64(dup_const(vece, c));
+    TCGv_i64 tmp = tcg_constant_i64(dup_const(vece, c));
     tcg_gen_gvec_2s(dofs, aofs, oprsz, maxsz, tmp, &gop_ors);
-    tcg_temp_free_i64(tmp);
 }
 
 void tcg_gen_vec_shl8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c)
@@ -3110,9 +3083,9 @@ static void tcg_gen_shlv_mod_vec(unsigned vece, TCGv_vec d,
                                  TCGv_vec a, TCGv_vec b)
 {
     TCGv_vec t = tcg_temp_new_vec_matching(d);
+    TCGv_vec m = tcg_constant_vec_matching(d, vece, (8 << vece) - 1);
 
-    tcg_gen_dupi_vec(vece, t, (8 << vece) - 1);
-    tcg_gen_and_vec(vece, t, t, b);
+    tcg_gen_and_vec(vece, t, b, m);
     tcg_gen_shlv_vec(vece, d, a, t);
     tcg_temp_free_vec(t);
 }
@@ -3173,9 +3146,9 @@ static void tcg_gen_shrv_mod_vec(unsigned vece, TCGv_vec d,
                                  TCGv_vec a, TCGv_vec b)
 {
     TCGv_vec t = tcg_temp_new_vec_matching(d);
+    TCGv_vec m = tcg_constant_vec_matching(d, vece, (8 << vece) - 1);
 
-    tcg_gen_dupi_vec(vece, t, (8 << vece) - 1);
-    tcg_gen_and_vec(vece, t, t, b);
+    tcg_gen_and_vec(vece, t, b, m);
     tcg_gen_shrv_vec(vece, d, a, t);
     tcg_temp_free_vec(t);
 }
@@ -3236,9 +3209,9 @@ static void tcg_gen_sarv_mod_vec(unsigned vece, TCGv_vec d,
                                  TCGv_vec a, TCGv_vec b)
 {
     TCGv_vec t = tcg_temp_new_vec_matching(d);
+    TCGv_vec m = tcg_constant_vec_matching(d, vece, (8 << vece) - 1);
 
-    tcg_gen_dupi_vec(vece, t, (8 << vece) - 1);
-    tcg_gen_and_vec(vece, t, t, b);
+    tcg_gen_and_vec(vece, t, b, m);
     tcg_gen_sarv_vec(vece, d, a, t);
     tcg_temp_free_vec(t);
 }
@@ -3299,9 +3272,9 @@ static void tcg_gen_rotlv_mod_vec(unsigned vece, TCGv_vec d,
                                   TCGv_vec a, TCGv_vec b)
 {
     TCGv_vec t = tcg_temp_new_vec_matching(d);
+    TCGv_vec m = tcg_constant_vec_matching(d, vece, (8 << vece) - 1);
 
-    tcg_gen_dupi_vec(vece, t, (8 << vece) - 1);
-    tcg_gen_and_vec(vece, t, t, b);
+    tcg_gen_and_vec(vece, t, b, m);
     tcg_gen_rotlv_vec(vece, d, a, t);
     tcg_temp_free_vec(t);
 }
@@ -3358,9 +3331,9 @@ static void tcg_gen_rotrv_mod_vec(unsigned vece, TCGv_vec d,
                                   TCGv_vec a, TCGv_vec b)
 {
     TCGv_vec t = tcg_temp_new_vec_matching(d);
+    TCGv_vec m = tcg_constant_vec_matching(d, vece, (8 << vece) - 1);
 
-    tcg_gen_dupi_vec(vece, t, (8 << vece) - 1);
-    tcg_gen_and_vec(vece, t, t, b);
+    tcg_gen_and_vec(vece, t, b, m);
     tcg_gen_rotrv_vec(vece, d, a, t);
     tcg_temp_free_vec(t);
 }
diff --git a/tcg/tcg.c b/tcg/tcg.c
index ad1348d811..1cadfe070c 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1474,6 +1474,14 @@ TCGv_vec tcg_constant_vec(TCGType type, unsigned vece, int64_t val)
     return temp_tcgv_vec(tcg_constant_internal(type, val));
 }
 
+TCGv_vec tcg_constant_vec_matching(TCGv_vec match, unsigned vece, int64_t val)
+{
+    TCGTemp *t = tcgv_vec_temp(match);
+
+    tcg_debug_assert(t->temp_allocated != 0);
+    return tcg_constant_vec(t->base_type, vece, val);
+}
+
 TCGv_i32 tcg_const_i32(int32_t val)
 {
     TCGv_i32 t0;
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 50+ messages in thread
* [PULL 17/24] tcg/tci: Add special tci_movi_{i32,i64} opcodes
  2021-01-14  2:16 [PULL 00/24] tcg patch queue Richard Henderson
                   ` (15 preceding siblings ...)
  2021-01-14  2:16 ` [PULL 16/24] tcg: Use tcg_constant_{i32,i64,vec} with gvec expanders Richard Henderson
@ 2021-01-14  2:16 ` Richard Henderson
  2021-01-14  2:16 ` [PULL 18/24] tcg: Remove movi and dupi opcodes Richard Henderson
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 50+ messages in thread
From: Richard Henderson @ 2021-01-14  2:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Alex Bennée
The normal movi opcodes are going away.  We need something
for TCI to use internally.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg-opc.h    | 8 ++++++++
 tcg/tci.c                | 4 ++--
 tcg/tci/tcg-target.c.inc | 4 ++--
 3 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
index 70a76646c4..43711634f6 100644
--- a/include/tcg/tcg-opc.h
+++ b/include/tcg/tcg-opc.h
@@ -278,6 +278,14 @@ DEF(last_generic, 0, 0, 0, TCG_OPF_NOT_PRESENT)
 #include "tcg-target.opc.h"
 #endif
 
+#ifdef TCG_TARGET_INTERPRETER
+/* These opcodes are only for use between the tci generator and interpreter. */
+DEF(tci_movi_i32, 1, 0, 1, TCG_OPF_NOT_PRESENT)
+#if TCG_TARGET_REG_BITS == 64
+DEF(tci_movi_i64, 1, 0, 1, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
+#endif
+#endif
+
 #undef TLADDR_ARGS
 #undef DATA64_ARGS
 #undef IMPL
diff --git a/tcg/tci.c b/tcg/tci.c
index 2bcc4409be..2311aa7d3a 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -593,7 +593,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t1 = tci_read_r32(regs, &tb_ptr);
             tci_write_reg32(regs, t0, t1);
             break;
-        case INDEX_op_movi_i32:
+        case INDEX_op_tci_movi_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_i32(&tb_ptr);
             tci_write_reg32(regs, t0, t1);
@@ -864,7 +864,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t1 = tci_read_r64(regs, &tb_ptr);
             tci_write_reg64(regs, t0, t1);
             break;
-        case INDEX_op_movi_i64:
+        case INDEX_op_tci_movi_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_i64(&tb_ptr);
             tci_write_reg64(regs, t0, t1);
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index d5a4d9d37c..e9e5f94c04 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -529,13 +529,13 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
     uint8_t *old_code_ptr = s->code_ptr;
     uint32_t arg32 = arg;
     if (type == TCG_TYPE_I32 || arg == arg32) {
-        tcg_out_op_t(s, INDEX_op_movi_i32);
+        tcg_out_op_t(s, INDEX_op_tci_movi_i32);
         tcg_out_r(s, t0);
         tcg_out32(s, arg32);
     } else {
         tcg_debug_assert(type == TCG_TYPE_I64);
 #if TCG_TARGET_REG_BITS == 64
-        tcg_out_op_t(s, INDEX_op_movi_i64);
+        tcg_out_op_t(s, INDEX_op_tci_movi_i64);
         tcg_out_r(s, t0);
         tcg_out64(s, arg);
 #else
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 50+ messages in thread
* [PULL 18/24] tcg: Remove movi and dupi opcodes
  2021-01-14  2:16 [PULL 00/24] tcg patch queue Richard Henderson
                   ` (16 preceding siblings ...)
  2021-01-14  2:16 ` [PULL 17/24] tcg/tci: Add special tci_movi_{i32,i64} opcodes Richard Henderson
@ 2021-01-14  2:16 ` Richard Henderson
  2021-01-14  2:16 ` [PULL 19/24] tcg: Add tcg_reg_alloc_dup2 Richard Henderson
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 50+ messages in thread
From: Richard Henderson @ 2021-01-14  2:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Aleksandar Markovic, Alex Bennée
These are now completely covered by mov from a
TYPE_CONST temporary.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg-opc.h        |  3 ---
 tcg/optimize.c               |  4 ----
 tcg/tcg-op-vec.c             |  1 -
 tcg/tcg.c                    | 18 +-----------------
 tcg/aarch64/tcg-target.c.inc |  3 ---
 tcg/arm/tcg-target.c.inc     |  1 -
 tcg/i386/tcg-target.c.inc    |  3 ---
 tcg/mips/tcg-target.c.inc    |  2 --
 tcg/ppc/tcg-target.c.inc     |  3 ---
 tcg/riscv/tcg-target.c.inc   |  2 --
 tcg/s390/tcg-target.c.inc    |  2 --
 tcg/sparc/tcg-target.c.inc   |  2 --
 tcg/tci/tcg-target.c.inc     |  2 --
 13 files changed, 1 insertion(+), 45 deletions(-)
diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
index 43711634f6..900984c005 100644
--- a/include/tcg/tcg-opc.h
+++ b/include/tcg/tcg-opc.h
@@ -45,7 +45,6 @@ DEF(br, 0, 0, 1, TCG_OPF_BB_END)
 DEF(mb, 0, 0, 1, 0)
 
 DEF(mov_i32, 1, 1, 0, TCG_OPF_NOT_PRESENT)
-DEF(movi_i32, 1, 0, 1, TCG_OPF_NOT_PRESENT)
 DEF(setcond_i32, 1, 2, 1, 0)
 DEF(movcond_i32, 1, 4, 1, IMPL(TCG_TARGET_HAS_movcond_i32))
 /* load/store */
@@ -111,7 +110,6 @@ DEF(ctz_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_ctz_i32))
 DEF(ctpop_i32, 1, 1, 0, IMPL(TCG_TARGET_HAS_ctpop_i32))
 
 DEF(mov_i64, 1, 1, 0, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
-DEF(movi_i64, 1, 0, 1, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
 DEF(setcond_i64, 1, 2, 1, IMPL64)
 DEF(movcond_i64, 1, 4, 1, IMPL64 | IMPL(TCG_TARGET_HAS_movcond_i64))
 /* load/store */
@@ -221,7 +219,6 @@ DEF(qemu_st8_i32, 0, TLADDR_ARGS + 1, 1,
 #define IMPLVEC  TCG_OPF_VECTOR | IMPL(TCG_TARGET_MAYBE_vec)
 
 DEF(mov_vec, 1, 1, 0, TCG_OPF_VECTOR | TCG_OPF_NOT_PRESENT)
-DEF(dupi_vec, 1, 0, 1, TCG_OPF_VECTOR | TCG_OPF_NOT_PRESENT)
 
 DEF(dup_vec, 1, 1, 0, IMPLVEC)
 DEF(dup2_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_REG_BITS == 32))
diff --git a/tcg/optimize.c b/tcg/optimize.c
index dbb03ef96b..37c902283e 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1103,10 +1103,6 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64_VEC(mov):
             tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
             break;
-        CASE_OP_32_64(movi):
-        case INDEX_op_dupi_vec:
-            tcg_opt_gen_movi(s, &temps_used, op, op->args[0], op->args[1]);
-            break;
 
         case INDEX_op_dup_vec:
             if (arg_is_const(op->args[1])) {
diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index 9fbed1366c..ce0d2f6e0e 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -83,7 +83,6 @@ bool tcg_can_emit_vecop_list(const TCGOpcode *list,
         case INDEX_op_xor_vec:
         case INDEX_op_mov_vec:
         case INDEX_op_dup_vec:
-        case INDEX_op_dupi_vec:
         case INDEX_op_dup2_vec:
         case INDEX_op_ld_vec:
         case INDEX_op_st_vec:
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 1cadfe070c..5b0e42be91 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1564,7 +1564,6 @@ bool tcg_op_supported(TCGOpcode op)
         return TCG_TARGET_HAS_goto_ptr;
 
     case INDEX_op_mov_i32:
-    case INDEX_op_movi_i32:
     case INDEX_op_setcond_i32:
     case INDEX_op_brcond_i32:
     case INDEX_op_ld8u_i32:
@@ -1658,7 +1657,6 @@ bool tcg_op_supported(TCGOpcode op)
         return TCG_TARGET_REG_BITS == 32;
 
     case INDEX_op_mov_i64:
-    case INDEX_op_movi_i64:
     case INDEX_op_setcond_i64:
     case INDEX_op_brcond_i64:
     case INDEX_op_ld8u_i64:
@@ -1764,7 +1762,6 @@ bool tcg_op_supported(TCGOpcode op)
 
     case INDEX_op_mov_vec:
     case INDEX_op_dup_vec:
-    case INDEX_op_dupi_vec:
     case INDEX_op_dupm_vec:
     case INDEX_op_ld_vec:
     case INDEX_op_st_vec:
@@ -3670,7 +3667,7 @@ static void tcg_reg_alloc_cbranch(TCGContext *s, TCGRegSet allocated_regs)
 }
 
 /*
- * Specialized code generation for INDEX_op_movi_*.
+ * Specialized code generation for INDEX_op_mov_* with a constant.
  */
 static void tcg_reg_alloc_do_movi(TCGContext *s, TCGTemp *ots,
                                   tcg_target_ulong val, TCGLifeData arg_life,
@@ -3693,14 +3690,6 @@ static void tcg_reg_alloc_do_movi(TCGContext *s, TCGTemp *ots,
     }
 }
 
-static void tcg_reg_alloc_movi(TCGContext *s, const TCGOp *op)
-{
-    TCGTemp *ots = arg_temp(op->args[0]);
-    tcg_target_ulong val = op->args[1];
-
-    tcg_reg_alloc_do_movi(s, ots, val, op->life, op->output_pref[0]);
-}
-
 /*
  * Specialized code generation for INDEX_op_mov_*.
  */
@@ -4481,11 +4470,6 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
         case INDEX_op_mov_vec:
             tcg_reg_alloc_mov(s, op);
             break;
-        case INDEX_op_movi_i32:
-        case INDEX_op_movi_i64:
-        case INDEX_op_dupi_vec:
-            tcg_reg_alloc_movi(s, op);
-            break;
         case INDEX_op_dup_vec:
             tcg_reg_alloc_dup(s, op);
             break;
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index be6d3ea2a8..e370b7e61c 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -2257,8 +2257,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
-    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
-    case INDEX_op_movi_i64:
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
     default:
         g_assert_not_reached();
@@ -2466,7 +2464,6 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
-    case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
     case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
     default:
         g_assert_not_reached();
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index 0fd1126454..c2b26b3c45 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -2068,7 +2068,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
-    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
     default:
         tcg_abort();
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 9f81e11773..1706b7c776 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -2666,8 +2666,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
-    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
-    case INDEX_op_movi_i64:
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
     default:
         tcg_abort();
@@ -2953,7 +2951,6 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
-    case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
     case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
     default:
         g_assert_not_reached();
diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index add157f6c3..7293169ab2 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -2141,8 +2141,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
-    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
-    case INDEX_op_movi_i64:
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
     default:
         tcg_abort();
diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index d00ec20203..1fbb1b6db0 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -2977,8 +2977,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
 
     case INDEX_op_mov_i32:   /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
-    case INDEX_op_movi_i32:  /* Always emitted via tcg_out_movi.  */
-    case INDEX_op_movi_i64:
     case INDEX_op_call:      /* Always emitted via tcg_out_call.  */
     default:
         tcg_abort();
@@ -3326,7 +3324,6 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         return;
 
     case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
-    case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
     case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
     default:
         g_assert_not_reached();
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index c60b91ba58..71c0badc02 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -1563,8 +1563,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
-    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
-    case INDEX_op_movi_i64:
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
     default:
         g_assert_not_reached();
diff --git a/tcg/s390/tcg-target.c.inc b/tcg/s390/tcg-target.c.inc
index d7ef079055..8517e55232 100644
--- a/tcg/s390/tcg-target.c.inc
+++ b/tcg/s390/tcg-target.c.inc
@@ -2295,8 +2295,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
-    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
-    case INDEX_op_movi_i64:
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
     default:
         tcg_abort();
diff --git a/tcg/sparc/tcg-target.c.inc b/tcg/sparc/tcg-target.c.inc
index 922ae96481..28b5b6559a 100644
--- a/tcg/sparc/tcg-target.c.inc
+++ b/tcg/sparc/tcg-target.c.inc
@@ -1586,8 +1586,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
-    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
-    case INDEX_op_movi_i64:
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
     default:
         tcg_abort();
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index e9e5f94c04..15981265db 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -814,8 +814,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         break;
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
-    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
-    case INDEX_op_movi_i64:
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
     default:
         tcg_abort();
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 50+ messages in thread
* [PULL 19/24] tcg: Add tcg_reg_alloc_dup2
  2021-01-14  2:16 [PULL 00/24] tcg patch queue Richard Henderson
                   ` (17 preceding siblings ...)
  2021-01-14  2:16 ` [PULL 18/24] tcg: Remove movi and dupi opcodes Richard Henderson
@ 2021-01-14  2:16 ` Richard Henderson
  2021-01-14  2:16 ` [PULL 20/24] tcg/i386: Use tcg_constant_vec with tcg vec expanders Richard Henderson
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 50+ messages in thread
From: Richard Henderson @ 2021-01-14  2:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell
There are several ways we can expand a vector dup of a 64-bit
element on a 32-bit host.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 97 insertions(+)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 5b0e42be91..8f8badb61c 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -4084,6 +4084,98 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
     }
 }
 
+static bool tcg_reg_alloc_dup2(TCGContext *s, const TCGOp *op)
+{
+    const TCGLifeData arg_life = op->life;
+    TCGTemp *ots, *itsl, *itsh;
+    TCGType vtype = TCGOP_VECL(op) + TCG_TYPE_V64;
+
+    /* This opcode is only valid for 32-bit hosts, for 64-bit elements. */
+    tcg_debug_assert(TCG_TARGET_REG_BITS == 32);
+    tcg_debug_assert(TCGOP_VECE(op) == MO_64);
+
+    ots = arg_temp(op->args[0]);
+    itsl = arg_temp(op->args[1]);
+    itsh = arg_temp(op->args[2]);
+
+    /* ENV should not be modified.  */
+    tcg_debug_assert(!temp_readonly(ots));
+
+    /* Allocate the output register now.  */
+    if (ots->val_type != TEMP_VAL_REG) {
+        TCGRegSet allocated_regs = s->reserved_regs;
+        TCGRegSet dup_out_regs =
+            tcg_op_defs[INDEX_op_dup_vec].args_ct[0].regs;
+
+        /* Make sure to not spill the input registers. */
+        if (!IS_DEAD_ARG(1) && itsl->val_type == TEMP_VAL_REG) {
+            tcg_regset_set_reg(allocated_regs, itsl->reg);
+        }
+        if (!IS_DEAD_ARG(2) && itsh->val_type == TEMP_VAL_REG) {
+            tcg_regset_set_reg(allocated_regs, itsh->reg);
+        }
+
+        ots->reg = tcg_reg_alloc(s, dup_out_regs, allocated_regs,
+                                 op->output_pref[0], ots->indirect_base);
+        ots->val_type = TEMP_VAL_REG;
+        ots->mem_coherent = 0;
+        s->reg_to_temp[ots->reg] = ots;
+    }
+
+    /* Promote dup2 of immediates to dupi_vec. */
+    if (itsl->val_type == TEMP_VAL_CONST && itsh->val_type == TEMP_VAL_CONST) {
+        uint64_t val = deposit64(itsl->val, 32, 32, itsh->val);
+        MemOp vece = MO_64;
+
+        if (val == dup_const(MO_8, val)) {
+            vece = MO_8;
+        } else if (val == dup_const(MO_16, val)) {
+            vece = MO_16;
+        } else if (val == dup_const(MO_32, val)) {
+            vece = MO_32;
+        }
+
+        tcg_out_dupi_vec(s, vtype, vece, ots->reg, val);
+        goto done;
+    }
+
+    /* If the two inputs form one 64-bit value, try dupm_vec. */
+    if (itsl + 1 == itsh && itsl->base_type == TCG_TYPE_I64) {
+        if (!itsl->mem_coherent) {
+            temp_sync(s, itsl, s->reserved_regs, 0, 0);
+        }
+        if (!itsh->mem_coherent) {
+            temp_sync(s, itsh, s->reserved_regs, 0, 0);
+        }
+#ifdef HOST_WORDS_BIGENDIAN
+        TCGTemp *its = itsh;
+#else
+        TCGTemp *its = itsl;
+#endif
+        if (tcg_out_dupm_vec(s, vtype, MO_64, ots->reg,
+                             its->mem_base->reg, its->mem_offset)) {
+            goto done;
+        }
+    }
+
+    /* Fall back to generic expansion. */
+    return false;
+
+ done:
+    if (IS_DEAD_ARG(1)) {
+        temp_dead(s, itsl);
+    }
+    if (IS_DEAD_ARG(2)) {
+        temp_dead(s, itsh);
+    }
+    if (NEED_SYNC_ARG(0)) {
+        temp_sync(s, ots, s->reserved_regs, 0, IS_DEAD_ARG(0));
+    } else if (IS_DEAD_ARG(0)) {
+        temp_dead(s, ots);
+    }
+    return true;
+}
+
 #ifdef TCG_TARGET_STACK_GROWSUP
 #define STACK_DIR(x) (-(x))
 #else
@@ -4501,6 +4593,11 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
         case INDEX_op_call:
             tcg_reg_alloc_call(s, op);
             break;
+        case INDEX_op_dup2_vec:
+            if (tcg_reg_alloc_dup2(s, op)) {
+                break;
+            }
+            /* fall through */
         default:
             /* Sanity check that we've not introduced any unhandled opcodes. */
             tcg_debug_assert(tcg_op_supported(opc));
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 50+ messages in thread
* [PULL 20/24] tcg/i386: Use tcg_constant_vec with tcg vec expanders
  2021-01-14  2:16 [PULL 00/24] tcg patch queue Richard Henderson
                   ` (18 preceding siblings ...)
  2021-01-14  2:16 ` [PULL 19/24] tcg: Add tcg_reg_alloc_dup2 Richard Henderson
@ 2021-01-14  2:16 ` Richard Henderson
  2021-01-14  2:16 ` [PULL 21/24] tcg: Remove tcg_gen_dup{8,16,32,64}i_vec Richard Henderson
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 50+ messages in thread
From: Richard Henderson @ 2021-01-14  2:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Alex Bennée
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.c.inc | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 1706b7c776..050f3cb0b1 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -3486,7 +3486,7 @@ static void expand_vec_rotv(TCGType type, unsigned vece, TCGv_vec v0,
 static void expand_vec_mul(TCGType type, unsigned vece,
                            TCGv_vec v0, TCGv_vec v1, TCGv_vec v2)
 {
-    TCGv_vec t1, t2, t3, t4;
+    TCGv_vec t1, t2, t3, t4, zero;
 
     tcg_debug_assert(vece == MO_8);
 
@@ -3504,11 +3504,11 @@ static void expand_vec_mul(TCGType type, unsigned vece,
     case TCG_TYPE_V64:
         t1 = tcg_temp_new_vec(TCG_TYPE_V128);
         t2 = tcg_temp_new_vec(TCG_TYPE_V128);
-        tcg_gen_dup16i_vec(t2, 0);
+        zero = tcg_constant_vec(TCG_TYPE_V128, MO_8, 0);
         vec_gen_3(INDEX_op_x86_punpckl_vec, TCG_TYPE_V128, MO_8,
-                  tcgv_vec_arg(t1), tcgv_vec_arg(v1), tcgv_vec_arg(t2));
+                  tcgv_vec_arg(t1), tcgv_vec_arg(v1), tcgv_vec_arg(zero));
         vec_gen_3(INDEX_op_x86_punpckl_vec, TCG_TYPE_V128, MO_8,
-                  tcgv_vec_arg(t2), tcgv_vec_arg(t2), tcgv_vec_arg(v2));
+                  tcgv_vec_arg(t2), tcgv_vec_arg(zero), tcgv_vec_arg(v2));
         tcg_gen_mul_vec(MO_16, t1, t1, t2);
         tcg_gen_shri_vec(MO_16, t1, t1, 8);
         vec_gen_3(INDEX_op_x86_packus_vec, TCG_TYPE_V128, MO_8,
@@ -3523,15 +3523,15 @@ static void expand_vec_mul(TCGType type, unsigned vece,
         t2 = tcg_temp_new_vec(type);
         t3 = tcg_temp_new_vec(type);
         t4 = tcg_temp_new_vec(type);
-        tcg_gen_dup16i_vec(t4, 0);
+        zero = tcg_constant_vec(TCG_TYPE_V128, MO_8, 0);
         vec_gen_3(INDEX_op_x86_punpckl_vec, type, MO_8,
-                  tcgv_vec_arg(t1), tcgv_vec_arg(v1), tcgv_vec_arg(t4));
+                  tcgv_vec_arg(t1), tcgv_vec_arg(v1), tcgv_vec_arg(zero));
         vec_gen_3(INDEX_op_x86_punpckl_vec, type, MO_8,
-                  tcgv_vec_arg(t2), tcgv_vec_arg(t4), tcgv_vec_arg(v2));
+                  tcgv_vec_arg(t2), tcgv_vec_arg(zero), tcgv_vec_arg(v2));
         vec_gen_3(INDEX_op_x86_punpckh_vec, type, MO_8,
-                  tcgv_vec_arg(t3), tcgv_vec_arg(v1), tcgv_vec_arg(t4));
+                  tcgv_vec_arg(t3), tcgv_vec_arg(v1), tcgv_vec_arg(zero));
         vec_gen_3(INDEX_op_x86_punpckh_vec, type, MO_8,
-                  tcgv_vec_arg(t4), tcgv_vec_arg(t4), tcgv_vec_arg(v2));
+                  tcgv_vec_arg(t4), tcgv_vec_arg(zero), tcgv_vec_arg(v2));
         tcg_gen_mul_vec(MO_16, t1, t1, t2);
         tcg_gen_mul_vec(MO_16, t3, t3, t4);
         tcg_gen_shri_vec(MO_16, t1, t1, 8);
@@ -3559,7 +3559,7 @@ static bool expand_vec_cmp_noinv(TCGType type, unsigned vece, TCGv_vec v0,
         NEED_UMIN = 8,
         NEED_UMAX = 16,
     };
-    TCGv_vec t1, t2;
+    TCGv_vec t1, t2, t3;
     uint8_t fixup;
 
     switch (cond) {
@@ -3630,9 +3630,9 @@ static bool expand_vec_cmp_noinv(TCGType type, unsigned vece, TCGv_vec v0,
     } else if (fixup & NEED_BIAS) {
         t1 = tcg_temp_new_vec(type);
         t2 = tcg_temp_new_vec(type);
-        tcg_gen_dupi_vec(vece, t2, 1ull << ((8 << vece) - 1));
-        tcg_gen_sub_vec(vece, t1, v1, t2);
-        tcg_gen_sub_vec(vece, t2, v2, t2);
+        t3 = tcg_constant_vec(type, vece, 1ull << ((8 << vece) - 1));
+        tcg_gen_sub_vec(vece, t1, v1, t3);
+        tcg_gen_sub_vec(vece, t2, v2, t3);
         v1 = t1;
         v2 = t2;
         cond = tcg_signed_cond(cond);
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 50+ messages in thread
* [PULL 21/24] tcg: Remove tcg_gen_dup{8,16,32,64}i_vec
  2021-01-14  2:16 [PULL 00/24] tcg patch queue Richard Henderson
                   ` (19 preceding siblings ...)
  2021-01-14  2:16 ` [PULL 20/24] tcg/i386: Use tcg_constant_vec with tcg vec expanders Richard Henderson
@ 2021-01-14  2:16 ` Richard Henderson
  2021-01-14  2:16 ` [PULL 22/24] tcg/ppc: Use tcg_constant_vec with tcg vec expanders Richard Henderson
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 50+ messages in thread
From: Richard Henderson @ 2021-01-14  2:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Alex Bennée
These interfaces have been replaced by tcg_gen_dupi_vec
and tcg_constant_vec.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg-op.h |  4 ----
 tcg/tcg-op-vec.c     | 20 --------------------
 2 files changed, 24 deletions(-)
diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index ed8de045e2..2cd1faf9c4 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -959,10 +959,6 @@ void tcg_gen_mov_vec(TCGv_vec, TCGv_vec);
 void tcg_gen_dup_i32_vec(unsigned vece, TCGv_vec, TCGv_i32);
 void tcg_gen_dup_i64_vec(unsigned vece, TCGv_vec, TCGv_i64);
 void tcg_gen_dup_mem_vec(unsigned vece, TCGv_vec, TCGv_ptr, tcg_target_long);
-void tcg_gen_dup8i_vec(TCGv_vec, uint32_t);
-void tcg_gen_dup16i_vec(TCGv_vec, uint32_t);
-void tcg_gen_dup32i_vec(TCGv_vec, uint32_t);
-void tcg_gen_dup64i_vec(TCGv_vec, uint64_t);
 void tcg_gen_dupi_vec(unsigned vece, TCGv_vec, uint64_t);
 void tcg_gen_add_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
 void tcg_gen_sub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index ce0d2f6e0e..d19aa7373e 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -241,26 +241,6 @@ TCGv_vec tcg_const_ones_vec_matching(TCGv_vec m)
     return tcg_const_ones_vec(t->base_type);
 }
 
-void tcg_gen_dup64i_vec(TCGv_vec r, uint64_t a)
-{
-    tcg_gen_dupi_vec(MO_64, r, a);
-}
-
-void tcg_gen_dup32i_vec(TCGv_vec r, uint32_t a)
-{
-    tcg_gen_dupi_vec(MO_32, r, a);
-}
-
-void tcg_gen_dup16i_vec(TCGv_vec r, uint32_t a)
-{
-    tcg_gen_dupi_vec(MO_16, r, a);
-}
-
-void tcg_gen_dup8i_vec(TCGv_vec r, uint32_t a)
-{
-    tcg_gen_dupi_vec(MO_8, r, a);
-}
-
 void tcg_gen_dupi_vec(unsigned vece, TCGv_vec r, uint64_t a)
 {
     TCGTemp *rt = tcgv_vec_temp(r);
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 50+ messages in thread
* [PULL 22/24] tcg/ppc: Use tcg_constant_vec with tcg vec expanders
  2021-01-14  2:16 [PULL 00/24] tcg patch queue Richard Henderson
                   ` (20 preceding siblings ...)
  2021-01-14  2:16 ` [PULL 21/24] tcg: Remove tcg_gen_dup{8,16,32,64}i_vec Richard Henderson
@ 2021-01-14  2:16 ` Richard Henderson
  2021-01-14  2:16 ` [PULL 23/24] tcg/aarch64: " Richard Henderson
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 50+ messages in thread
From: Richard Henderson @ 2021-01-14  2:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell
Improve expand_vec_shi to use sign-extraction for MO_32.
This allows a single VSPLTISB instruction to load all of
the valid shift constants.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/ppc/tcg-target.c.inc | 44 ++++++++++++++++++++++++----------------
 1 file changed, 27 insertions(+), 17 deletions(-)
diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 1fbb1b6db0..cf64892295 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -3336,13 +3336,22 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 static void expand_vec_shi(TCGType type, unsigned vece, TCGv_vec v0,
                            TCGv_vec v1, TCGArg imm, TCGOpcode opci)
 {
-    TCGv_vec t1 = tcg_temp_new_vec(type);
+    TCGv_vec t1;
 
-    /* Splat w/bytes for xxspltib.  */
-    tcg_gen_dupi_vec(MO_8, t1, imm & ((8 << vece) - 1));
+    if (vece == MO_32) {
+        /*
+         * Only 5 bits are significant, and VSPLTISB can represent -16..15.
+         * So using negative numbers gets us the 4th bit easily.
+         */
+        imm = sextract32(imm, 0, 5);
+    } else {
+        imm &= (8 << vece) - 1;
+    }
+
+    /* Splat w/bytes for xxspltib when 2.07 allows MO_64. */
+    t1 = tcg_constant_vec(type, MO_8, imm);
     vec_gen_3(opci, type, vece, tcgv_vec_arg(v0),
               tcgv_vec_arg(v1), tcgv_vec_arg(t1));
-    tcg_temp_free_vec(t1);
 }
 
 static void expand_vec_cmp(TCGType type, unsigned vece, TCGv_vec v0,
@@ -3400,7 +3409,7 @@ static void expand_vec_mul(TCGType type, unsigned vece, TCGv_vec v0,
 {
     TCGv_vec t1 = tcg_temp_new_vec(type);
     TCGv_vec t2 = tcg_temp_new_vec(type);
-    TCGv_vec t3, t4;
+    TCGv_vec c0, c16;
 
     switch (vece) {
     case MO_8:
@@ -3419,21 +3428,22 @@ static void expand_vec_mul(TCGType type, unsigned vece, TCGv_vec v0,
 
     case MO_32:
         tcg_debug_assert(!have_isa_2_07);
-        t3 = tcg_temp_new_vec(type);
-        t4 = tcg_temp_new_vec(type);
-        tcg_gen_dupi_vec(MO_8, t4, -16);
+        /*
+         * Only 5 bits are significant, and VSPLTISB can represent -16..15.
+         * So using -16 is a quick way to represent 16.
+         */
+        c16 = tcg_constant_vec(type, MO_8, -16);
+        c0 = tcg_constant_vec(type, MO_8, 0);
+
         vec_gen_3(INDEX_op_rotlv_vec, type, MO_32, tcgv_vec_arg(t1),
-                  tcgv_vec_arg(v2), tcgv_vec_arg(t4));
+                  tcgv_vec_arg(v2), tcgv_vec_arg(c16));
         vec_gen_3(INDEX_op_ppc_mulou_vec, type, MO_16, tcgv_vec_arg(t2),
                   tcgv_vec_arg(v1), tcgv_vec_arg(v2));
-        tcg_gen_dupi_vec(MO_8, t3, 0);
-        vec_gen_4(INDEX_op_ppc_msum_vec, type, MO_16, tcgv_vec_arg(t3),
-                  tcgv_vec_arg(v1), tcgv_vec_arg(t1), tcgv_vec_arg(t3));
-        vec_gen_3(INDEX_op_shlv_vec, type, MO_32, tcgv_vec_arg(t3),
-                  tcgv_vec_arg(t3), tcgv_vec_arg(t4));
-        tcg_gen_add_vec(MO_32, v0, t2, t3);
-        tcg_temp_free_vec(t3);
-        tcg_temp_free_vec(t4);
+        vec_gen_4(INDEX_op_ppc_msum_vec, type, MO_16, tcgv_vec_arg(t1),
+                  tcgv_vec_arg(v1), tcgv_vec_arg(t1), tcgv_vec_arg(c0));
+        vec_gen_3(INDEX_op_shlv_vec, type, MO_32, tcgv_vec_arg(t1),
+                  tcgv_vec_arg(t1), tcgv_vec_arg(c16));
+        tcg_gen_add_vec(MO_32, v0, t1, t2);
         break;
 
     default:
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 50+ messages in thread
* [PULL 23/24] tcg/aarch64: Use tcg_constant_vec with tcg vec expanders
  2021-01-14  2:16 [PULL 00/24] tcg patch queue Richard Henderson
                   ` (21 preceding siblings ...)
  2021-01-14  2:16 ` [PULL 22/24] tcg/ppc: Use tcg_constant_vec with tcg vec expanders Richard Henderson
@ 2021-01-14  2:16 ` Richard Henderson
  2021-01-14  2:16 ` [PULL 24/24] decodetree: Open files with encoding='utf-8' Richard Henderson
  2021-01-14 13:13 ` [PULL 00/24] tcg patch queue Peter Maydell
  24 siblings, 0 replies; 50+ messages in thread
From: Richard Henderson @ 2021-01-14  2:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell
Improve rotrv_vec to reduce "t1 = -v2, t2 = t1 + c" to
"t1 = -v2, t2 = c - v2".  This avoids a serial dependency
between t1 and t2.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.c.inc | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index e370b7e61c..23954ec7cf 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -2516,7 +2516,7 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
                        TCGArg a0, ...)
 {
     va_list va;
-    TCGv_vec v0, v1, v2, t1, t2;
+    TCGv_vec v0, v1, v2, t1, t2, c1;
     TCGArg a2;
 
     va_start(va, a0);
@@ -2548,8 +2548,8 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
 
     case INDEX_op_rotlv_vec:
         t1 = tcg_temp_new_vec(type);
-        tcg_gen_dupi_vec(vece, t1, 8 << vece);
-        tcg_gen_sub_vec(vece, t1, v2, t1);
+        c1 = tcg_constant_vec(type, vece, 8 << vece);
+        tcg_gen_sub_vec(vece, t1, v2, c1);
         /* Right shifts are negative left shifts for AArch64.  */
         vec_gen_3(INDEX_op_shlv_vec, type, vece, tcgv_vec_arg(t1),
                   tcgv_vec_arg(v1), tcgv_vec_arg(t1));
@@ -2562,9 +2562,9 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
     case INDEX_op_rotrv_vec:
         t1 = tcg_temp_new_vec(type);
         t2 = tcg_temp_new_vec(type);
+        c1 = tcg_constant_vec(type, vece, 8 << vece);
         tcg_gen_neg_vec(vece, t1, v2);
-        tcg_gen_dupi_vec(vece, t2, 8 << vece);
-        tcg_gen_add_vec(vece, t2, t1, t2);
+        tcg_gen_sub_vec(vece, t2, c1, v2);
         /* Right shifts are negative left shifts for AArch64.  */
         vec_gen_3(INDEX_op_shlv_vec, type, vece, tcgv_vec_arg(t1),
                   tcgv_vec_arg(v1), tcgv_vec_arg(t1));
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 50+ messages in thread
* [PULL 24/24] decodetree: Open files with encoding='utf-8'
  2021-01-14  2:16 [PULL 00/24] tcg patch queue Richard Henderson
                   ` (22 preceding siblings ...)
  2021-01-14  2:16 ` [PULL 23/24] tcg/aarch64: " Richard Henderson
@ 2021-01-14  2:16 ` Richard Henderson
  2021-01-14 13:13 ` [PULL 00/24] tcg patch queue Peter Maydell
  24 siblings, 0 replies; 50+ messages in thread
From: Richard Henderson @ 2021-01-14  2:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, Yonggang Luo, Philippe Mathieu-Daudé,
	Eduardo Habkost
From: Philippe Mathieu-Daudé <f4bug@amsat.org>
When decodetree.py was added in commit 568ae7efae7, QEMU was
using Python 2 which happily reads UTF-8 files in text mode.
Python 3 requires either UTF-8 locale or an explicit encoding
passed to open(). Now that Python 3 is required, explicit
UTF-8 encoding for decodetree source files.
To avoid further problems with the user locale, also explicit
UTF-8 encoding for the generated C files.
Explicit both input/output are plain text by using the 't' mode.
This fixes:
  $ /usr/bin/python3 scripts/decodetree.py test.decode
  Traceback (most recent call last):
    File "scripts/decodetree.py", line 1397, in <module>
      main()
    File "scripts/decodetree.py", line 1308, in main
      parse_file(f, toppat)
    File "scripts/decodetree.py", line 994, in parse_file
      for line in f:
    File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
      return codecs.ascii_decode(input, self.errors)[0]
  UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 80:
  ordinal not in range(128)
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Suggested-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210110000240.761122-1-f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 scripts/decodetree.py | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/scripts/decodetree.py b/scripts/decodetree.py
index 47aa9caf6d..4637b633e7 100644
--- a/scripts/decodetree.py
+++ b/scripts/decodetree.py
@@ -20,6 +20,7 @@
 # See the syntax and semantics in docs/devel/decodetree.rst.
 #
 
+import io
 import os
 import re
 import sys
@@ -1304,7 +1305,7 @@ def main():
 
     for filename in args:
         input_file = filename
-        f = open(filename, 'r')
+        f = open(filename, 'rt', encoding='utf-8')
         parse_file(f, toppat)
         f.close()
 
@@ -1324,9 +1325,11 @@ def main():
         prop_size(stree)
 
     if output_file:
-        output_fd = open(output_file, 'w')
+        output_fd = open(output_file, 'wt', encoding='utf-8')
     else:
-        output_fd = sys.stdout
+        output_fd = io.TextIOWrapper(sys.stdout.buffer,
+                                     encoding=sys.stdout.encoding,
+                                     errors="ignore")
 
     output_autogen()
     for n in sorted(arguments.keys()):
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 50+ messages in thread
* Re: [PULL 00/24] tcg patch queue
  2021-01-14  2:16 [PULL 00/24] tcg patch queue Richard Henderson
                   ` (23 preceding siblings ...)
  2021-01-14  2:16 ` [PULL 24/24] decodetree: Open files with encoding='utf-8' Richard Henderson
@ 2021-01-14 13:13 ` Peter Maydell
  24 siblings, 0 replies; 50+ messages in thread
From: Peter Maydell @ 2021-01-14 13:13 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers
On Thu, 14 Jan 2021 at 02:16, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> The following changes since commit 45240eed4f064576d589ea60ebadf3c11d7ab891:
>
>   Merge remote-tracking branch 'remotes/armbru/tags/pull-yank-2021-01-13' into staging (2021-01-13 14:19:24 +0000)
>
> are available in the Git repository at:
>
>   https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20210113
>
> for you to fetch changes up to 4cacecaaa2bbf8af0967bd3eee43297fada475a9:
>
>   decodetree: Open files with encoding='utf-8' (2021-01-13 08:39:08 -1000)
>
> ----------------------------------------------------------------
> Improvements to tcg constant handling.
> Force utf8 for decodetree.
>
> ----------------------------------------------------------------
Applied, thanks.
Please update the changelog at https://wiki.qemu.org/ChangeLog/6.0
for any user-visible changes.
-- PMM
^ permalink raw reply	[flat|nested] 50+ messages in thread
* Re: [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding
  2021-01-14  2:16 ` [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding Richard Henderson
@ 2021-01-15 23:03   ` Alistair Francis
  2021-01-16 17:24     ` Richard Henderson
  2021-02-01 20:45   ` Richard W.M. Jones
  1 sibling, 1 reply; 50+ messages in thread
From: Alistair Francis @ 2021-01-15 23:03 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, qemu-devel@nongnu.org Developers
On Wed, Jan 13, 2021 at 6:32 PM Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This patch results in a QEMU seg fault when starting userspace on RISC-V 32-bit.
This is the full backtrace:
```
#0  0x0000555555a67c4d in ts_are_copies (ts2=0x7fffa8008008,
ts1=0x7fffa8001e40) at ../tcg/optimize.c:163
#1  tcg_opt_gen_mov (s=s@entry=0x7fffa8000b60,
op=op@entry=0x7fffa81ac778, dst=140736011968064, src=140736011993096)
at ../tcg/optimize.c:191
#2  0x0000555555a67dcb in tcg_opt_gen_movi (s=s@entry=0x7fffa8000b60,
temps_used=temps_used@entry=0x7ffff1cb92c0,
op=op@entry=0x7fffa81ac778, dst=<optimized out>, val=<optimized out>)
at ../tcg/optimize.c:249
#3  0x0000555555a6914d in tcg_optimize (s=s@entry=0x7fffa8000b60) at
../tcg/optimize.c:1242
#4  0x0000555555abb248 in tcg_gen_code (s=0x7fffa8000b60,
tb=tb@entry=0x7fffae84edc0 <code_gen_buffer+42266003>) at
../tcg/tcg.c:4406
#5  0x0000555555a7f4d5 in tb_gen_code (cpu=cpu@entry=0x7ffff7fac930,
pc=160234, cs_base=0, flags=16640, cflags=-16252928) at
../accel/tcg/translate-all.c:1952
#6  0x0000555555ae4fe4 in tb_find (cf_mask=<optimized out>, tb_exit=0,
last_tb=0x0, cpu=0x7ffff7fac930) at ../accel/tcg/cpu-exec.c:454
#7  cpu_exec (cpu=cpu@entry=0x7ffff7fac930) at ../accel/tcg/cpu-exec.c:810
#8  0x0000555555aa6513 in tcg_cpus_exec (cpu=cpu@entry=0x7ffff7fac930)
at ../accel/tcg/tcg-cpus.c:57
#9  0x0000555555a8c7a3 in mttcg_cpu_thread_fn
(arg=arg@entry=0x7ffff7fac930) at ../accel/tcg/tcg-cpus-mttcg.c:69
#10 0x0000555555c94209 in qemu_thread_start (args=0x7ffff1cb96d0) at
../util/qemu-thread-posix.c:521
#11 0x00007ffff673a3e9 in start_thread () at /usr/lib/libpthread.so.0
#12 0x00007ffff6395293 in clone () at /usr/lib/libc.so.6
```
I run QEMU with these arguments:
./build/riscv32-softmmu/qemu-system-riscv32 \
    -machine virt -serial mon:stdio -serial null -nographic \
    -append "root=/dev/vda rw highres=off  console=ttyS0 ip=dhcp earlycon=sbi" \
    -device virtio-net-device,netdev=net0,mac=52:54:00:12:34:02
-netdev user,id=net0 \
    -object rng-random,filename=/dev/urandom,id=rng0 -device
virtio-rng-device,rng=rng0 \
    -smp 4 -d guest_errors -m 256M \
    -kernel ./Image \
    -drive id=disk0,file=./core-image-minimal-qemuriscv32.ext4,if=none,format=raw
\
    -device virtio-blk-device,drive=disk0 \
    -bios default
I am uploading the images to:
https://nextcloud.alistair23.me/index.php/s/MQFyGGNLPZjLZPH
Although apparently it will take a few hours to upload the 2GB rootFS.
Alistair
> ---
>  tcg/optimize.c | 108 ++++++++++++++++++++++---------------------------
>  1 file changed, 49 insertions(+), 59 deletions(-)
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 49bf1386c7..bda727d5ed 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -178,37 +178,6 @@ static bool args_are_copies(TCGArg arg1, TCGArg arg2)
>      return ts_are_copies(arg_temp(arg1), arg_temp(arg2));
>  }
>
> -static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg dst, uint64_t val)
> -{
> -    const TCGOpDef *def;
> -    TCGOpcode new_op;
> -    uint64_t mask;
> -    TempOptInfo *di = arg_info(dst);
> -
> -    def = &tcg_op_defs[op->opc];
> -    if (def->flags & TCG_OPF_VECTOR) {
> -        new_op = INDEX_op_dupi_vec;
> -    } else if (def->flags & TCG_OPF_64BIT) {
> -        new_op = INDEX_op_movi_i64;
> -    } else {
> -        new_op = INDEX_op_movi_i32;
> -    }
> -    op->opc = new_op;
> -    /* TCGOP_VECL and TCGOP_VECE remain unchanged.  */
> -    op->args[0] = dst;
> -    op->args[1] = val;
> -
> -    reset_temp(dst);
> -    di->is_const = true;
> -    di->val = val;
> -    mask = val;
> -    if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_movi_i32) {
> -        /* High bits of the destination are now garbage.  */
> -        mask |= ~0xffffffffull;
> -    }
> -    di->mask = mask;
> -}
> -
>  static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
>  {
>      TCGTemp *dst_ts = arg_temp(dst);
> @@ -259,6 +228,27 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
>      }
>  }
>
> +static void tcg_opt_gen_movi(TCGContext *s, TCGTempSet *temps_used,
> +                             TCGOp *op, TCGArg dst, uint64_t val)
> +{
> +    const TCGOpDef *def = &tcg_op_defs[op->opc];
> +    TCGType type;
> +    TCGTemp *tv;
> +
> +    if (def->flags & TCG_OPF_VECTOR) {
> +        type = TCGOP_VECL(op) + TCG_TYPE_V64;
> +    } else if (def->flags & TCG_OPF_64BIT) {
> +        type = TCG_TYPE_I64;
> +    } else {
> +        type = TCG_TYPE_I32;
> +    }
> +
> +    /* Convert movi to mov with constant temp. */
> +    tv = tcg_constant_internal(type, val);
> +    init_ts_info(temps_used, tv);
> +    tcg_opt_gen_mov(s, op, dst, temp_arg(tv));
> +}
> +
>  static uint64_t do_constant_folding_2(TCGOpcode op, uint64_t x, uint64_t y)
>  {
>      uint64_t l64, h64;
> @@ -622,7 +612,7 @@ void tcg_optimize(TCGContext *s)
>      nb_temps = s->nb_temps;
>      nb_globals = s->nb_globals;
>
> -    bitmap_zero(temps_used.l, nb_temps);
> +    memset(&temps_used, 0, sizeof(temps_used));
>      for (i = 0; i < nb_temps; ++i) {
>          s->temps[i].state_ptr = NULL;
>      }
> @@ -727,7 +717,7 @@ void tcg_optimize(TCGContext *s)
>          CASE_OP_32_64(rotr):
>              if (arg_is_const(op->args[1])
>                  && arg_info(op->args[1])->val == 0) {
> -                tcg_opt_gen_movi(s, op, op->args[0], 0);
> +                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], 0);
>                  continue;
>              }
>              break;
> @@ -1054,7 +1044,7 @@ void tcg_optimize(TCGContext *s)
>
>          if (partmask == 0) {
>              tcg_debug_assert(nb_oargs == 1);
> -            tcg_opt_gen_movi(s, op, op->args[0], 0);
> +            tcg_opt_gen_movi(s, &temps_used, op, op->args[0], 0);
>              continue;
>          }
>          if (affected == 0) {
> @@ -1071,7 +1061,7 @@ void tcg_optimize(TCGContext *s)
>          CASE_OP_32_64(mulsh):
>              if (arg_is_const(op->args[2])
>                  && arg_info(op->args[2])->val == 0) {
> -                tcg_opt_gen_movi(s, op, op->args[0], 0);
> +                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], 0);
>                  continue;
>              }
>              break;
> @@ -1098,7 +1088,7 @@ void tcg_optimize(TCGContext *s)
>          CASE_OP_32_64_VEC(sub):
>          CASE_OP_32_64_VEC(xor):
>              if (args_are_copies(op->args[1], op->args[2])) {
> -                tcg_opt_gen_movi(s, op, op->args[0], 0);
> +                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], 0);
>                  continue;
>              }
>              break;
> @@ -1115,14 +1105,14 @@ void tcg_optimize(TCGContext *s)
>              break;
>          CASE_OP_32_64(movi):
>          case INDEX_op_dupi_vec:
> -            tcg_opt_gen_movi(s, op, op->args[0], op->args[1]);
> +            tcg_opt_gen_movi(s, &temps_used, op, op->args[0], op->args[1]);
>              break;
>
>          case INDEX_op_dup_vec:
>              if (arg_is_const(op->args[1])) {
>                  tmp = arg_info(op->args[1])->val;
>                  tmp = dup_const(TCGOP_VECE(op), tmp);
> -                tcg_opt_gen_movi(s, op, op->args[0], tmp);
> +                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
>                  break;
>              }
>              goto do_default;
> @@ -1132,7 +1122,7 @@ void tcg_optimize(TCGContext *s)
>              if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
>                  tmp = arg_info(op->args[1])->val;
>                  if (tmp == arg_info(op->args[2])->val) {
> -                    tcg_opt_gen_movi(s, op, op->args[0], tmp);
> +                    tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
>                      break;
>                  }
>              } else if (args_are_copies(op->args[1], op->args[2])) {
> @@ -1160,7 +1150,7 @@ void tcg_optimize(TCGContext *s)
>          case INDEX_op_extrh_i64_i32:
>              if (arg_is_const(op->args[1])) {
>                  tmp = do_constant_folding(opc, arg_info(op->args[1])->val, 0);
> -                tcg_opt_gen_movi(s, op, op->args[0], tmp);
> +                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
>                  break;
>              }
>              goto do_default;
> @@ -1190,7 +1180,7 @@ void tcg_optimize(TCGContext *s)
>              if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
>                  tmp = do_constant_folding(opc, arg_info(op->args[1])->val,
>                                            arg_info(op->args[2])->val);
> -                tcg_opt_gen_movi(s, op, op->args[0], tmp);
> +                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
>                  break;
>              }
>              goto do_default;
> @@ -1201,7 +1191,7 @@ void tcg_optimize(TCGContext *s)
>                  TCGArg v = arg_info(op->args[1])->val;
>                  if (v != 0) {
>                      tmp = do_constant_folding(opc, v, 0);
> -                    tcg_opt_gen_movi(s, op, op->args[0], tmp);
> +                    tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
>                  } else {
>                      tcg_opt_gen_mov(s, op, op->args[0], op->args[2]);
>                  }
> @@ -1214,7 +1204,7 @@ void tcg_optimize(TCGContext *s)
>                  tmp = deposit64(arg_info(op->args[1])->val,
>                                  op->args[3], op->args[4],
>                                  arg_info(op->args[2])->val);
> -                tcg_opt_gen_movi(s, op, op->args[0], tmp);
> +                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
>                  break;
>              }
>              goto do_default;
> @@ -1223,7 +1213,7 @@ void tcg_optimize(TCGContext *s)
>              if (arg_is_const(op->args[1])) {
>                  tmp = extract64(arg_info(op->args[1])->val,
>                                  op->args[2], op->args[3]);
> -                tcg_opt_gen_movi(s, op, op->args[0], tmp);
> +                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
>                  break;
>              }
>              goto do_default;
> @@ -1232,7 +1222,7 @@ void tcg_optimize(TCGContext *s)
>              if (arg_is_const(op->args[1])) {
>                  tmp = sextract64(arg_info(op->args[1])->val,
>                                   op->args[2], op->args[3]);
> -                tcg_opt_gen_movi(s, op, op->args[0], tmp);
> +                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
>                  break;
>              }
>              goto do_default;
> @@ -1249,7 +1239,7 @@ void tcg_optimize(TCGContext *s)
>                      tmp = (int32_t)(((uint32_t)v1 >> shr) |
>                                      ((uint32_t)v2 << (32 - shr)));
>                  }
> -                tcg_opt_gen_movi(s, op, op->args[0], tmp);
> +                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
>                  break;
>              }
>              goto do_default;
> @@ -1258,7 +1248,7 @@ void tcg_optimize(TCGContext *s)
>              tmp = do_constant_folding_cond(opc, op->args[1],
>                                             op->args[2], op->args[3]);
>              if (tmp != 2) {
> -                tcg_opt_gen_movi(s, op, op->args[0], tmp);
> +                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
>                  break;
>              }
>              goto do_default;
> @@ -1268,7 +1258,7 @@ void tcg_optimize(TCGContext *s)
>                                             op->args[1], op->args[2]);
>              if (tmp != 2) {
>                  if (tmp) {
> -                    bitmap_zero(temps_used.l, nb_temps);
> +                    memset(&temps_used, 0, sizeof(temps_used));
>                      op->opc = INDEX_op_br;
>                      op->args[0] = op->args[3];
>                  } else {
> @@ -1314,7 +1304,7 @@ void tcg_optimize(TCGContext *s)
>                  uint64_t a = ((uint64_t)ah << 32) | al;
>                  uint64_t b = ((uint64_t)bh << 32) | bl;
>                  TCGArg rl, rh;
> -                TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_movi_i32);
> +                TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_mov_i32);
>
>                  if (opc == INDEX_op_add2_i32) {
>                      a += b;
> @@ -1324,8 +1314,8 @@ void tcg_optimize(TCGContext *s)
>
>                  rl = op->args[0];
>                  rh = op->args[1];
> -                tcg_opt_gen_movi(s, op, rl, (int32_t)a);
> -                tcg_opt_gen_movi(s, op2, rh, (int32_t)(a >> 32));
> +                tcg_opt_gen_movi(s, &temps_used, op, rl, (int32_t)a);
> +                tcg_opt_gen_movi(s, &temps_used, op2, rh, (int32_t)(a >> 32));
>                  break;
>              }
>              goto do_default;
> @@ -1336,12 +1326,12 @@ void tcg_optimize(TCGContext *s)
>                  uint32_t b = arg_info(op->args[3])->val;
>                  uint64_t r = (uint64_t)a * b;
>                  TCGArg rl, rh;
> -                TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_movi_i32);
> +                TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_mov_i32);
>
>                  rl = op->args[0];
>                  rh = op->args[1];
> -                tcg_opt_gen_movi(s, op, rl, (int32_t)r);
> -                tcg_opt_gen_movi(s, op2, rh, (int32_t)(r >> 32));
> +                tcg_opt_gen_movi(s, &temps_used, op, rl, (int32_t)r);
> +                tcg_opt_gen_movi(s, &temps_used, op2, rh, (int32_t)(r >> 32));
>                  break;
>              }
>              goto do_default;
> @@ -1352,7 +1342,7 @@ void tcg_optimize(TCGContext *s)
>              if (tmp != 2) {
>                  if (tmp) {
>              do_brcond_true:
> -                    bitmap_zero(temps_used.l, nb_temps);
> +                    memset(&temps_used, 0, sizeof(temps_used));
>                      op->opc = INDEX_op_br;
>                      op->args[0] = op->args[5];
>                  } else {
> @@ -1368,7 +1358,7 @@ void tcg_optimize(TCGContext *s)
>                  /* Simplify LT/GE comparisons vs zero to a single compare
>                     vs the high word of the input.  */
>              do_brcond_high:
> -                bitmap_zero(temps_used.l, nb_temps);
> +                memset(&temps_used, 0, sizeof(temps_used));
>                  op->opc = INDEX_op_brcond_i32;
>                  op->args[0] = op->args[1];
>                  op->args[1] = op->args[3];
> @@ -1394,7 +1384,7 @@ void tcg_optimize(TCGContext *s)
>                      goto do_default;
>                  }
>              do_brcond_low:
> -                bitmap_zero(temps_used.l, nb_temps);
> +                memset(&temps_used, 0, sizeof(temps_used));
>                  op->opc = INDEX_op_brcond_i32;
>                  op->args[1] = op->args[2];
>                  op->args[2] = op->args[4];
> @@ -1429,7 +1419,7 @@ void tcg_optimize(TCGContext *s)
>                                              op->args[5]);
>              if (tmp != 2) {
>              do_setcond_const:
> -                tcg_opt_gen_movi(s, op, op->args[0], tmp);
> +                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
>              } else if ((op->args[5] == TCG_COND_LT
>                          || op->args[5] == TCG_COND_GE)
>                         && arg_is_const(op->args[3])
> @@ -1514,7 +1504,7 @@ void tcg_optimize(TCGContext *s)
>                 block, otherwise we only trash the output args.  "mask" is
>                 the non-zero bits mask for the first output arg.  */
>              if (def->flags & TCG_OPF_BB_END) {
> -                bitmap_zero(temps_used.l, nb_temps);
> +                memset(&temps_used, 0, sizeof(temps_used));
>              } else {
>          do_reset_output:
>                  for (i = 0; i < nb_oargs; i++) {
> --
> 2.25.1
>
>
^ permalink raw reply	[flat|nested] 50+ messages in thread
* Re: [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding
  2021-01-15 23:03   ` Alistair Francis
@ 2021-01-16 17:24     ` Richard Henderson
  2021-01-18 20:17       ` Laurent Vivier
  0 siblings, 1 reply; 50+ messages in thread
From: Richard Henderson @ 2021-01-16 17:24 UTC (permalink / raw)
  To: Alistair Francis; +Cc: Peter Maydell, qemu-devel@nongnu.org Developers
On 1/15/21 1:03 PM, Alistair Francis wrote:
> I run QEMU with these arguments:
> 
> ./build/riscv32-softmmu/qemu-system-riscv32 \
>     -machine virt -serial mon:stdio -serial null -nographic \
>     -append "root=/dev/vda rw highres=off  console=ttyS0 ip=dhcp earlycon=sbi" \
>     -device virtio-net-device,netdev=net0,mac=52:54:00:12:34:02
> -netdev user,id=net0 \
>     -object rng-random,filename=/dev/urandom,id=rng0 -device
> virtio-rng-device,rng=rng0 \
>     -smp 4 -d guest_errors -m 256M \
>     -kernel ./Image \
>     -drive id=disk0,file=./core-image-minimal-qemuriscv32.ext4,if=none,format=raw
> \
>     -device virtio-blk-device,drive=disk0 \
>     -bios default
> 
> I am uploading the images to:
> https://nextcloud.alistair23.me/index.php/s/MQFyGGNLPZjLZPH
I don't replicate the assertion failure, I get to
/sbin/init: error while loading shared libraries: libkmod.so.2: cannot open
shared object file: Error 74
[    0.819845] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x00007f00
[    0.820430] CPU: 1 PID: 1 Comm: init Not tainted 5.11.0-rc3 #1
r~
^ permalink raw reply	[flat|nested] 50+ messages in thread
* Re: [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding
  2021-01-16 17:24     ` Richard Henderson
@ 2021-01-18 20:17       ` Laurent Vivier
  2021-01-18 21:03         ` Richard Henderson
  0 siblings, 1 reply; 50+ messages in thread
From: Laurent Vivier @ 2021-01-18 20:17 UTC (permalink / raw)
  To: Richard Henderson, Alistair Francis
  Cc: Peter Maydell, qemu-devel@nongnu.org Developers
On 16/01/2021 18:24, Richard Henderson wrote:
> On 1/15/21 1:03 PM, Alistair Francis wrote:
>> I run QEMU with these arguments:
>>
>> ./build/riscv32-softmmu/qemu-system-riscv32 \
>>     -machine virt -serial mon:stdio -serial null -nographic \
>>     -append "root=/dev/vda rw highres=off  console=ttyS0 ip=dhcp earlycon=sbi" \
>>     -device virtio-net-device,netdev=net0,mac=52:54:00:12:34:02
>> -netdev user,id=net0 \
>>     -object rng-random,filename=/dev/urandom,id=rng0 -device
>> virtio-rng-device,rng=rng0 \
>>     -smp 4 -d guest_errors -m 256M \
>>     -kernel ./Image \
>>     -drive id=disk0,file=./core-image-minimal-qemuriscv32.ext4,if=none,format=raw
>> \
>>     -device virtio-blk-device,drive=disk0 \
>>     -bios default
>>
>> I am uploading the images to:
>> https://nextcloud.alistair23.me/index.php/s/MQFyGGNLPZjLZPH
> 
> I don't replicate the assertion failure, I get to
> 
> /sbin/init: error while loading shared libraries: libkmod.so.2: cannot open
> shared object file: Error 74
> [    0.819845] Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x00007f00
> [    0.820430] CPU: 1 PID: 1 Comm: init Not tainted 5.11.0-rc3 #1
This commit breaks the build of my hello world test program with mips64el/stretch guest
(and I guess some others too).
cat > $CHROOT/tmp/hello.c <<EOF
#include <stdio.h>
int main(void)
{
    printf("Hello World!\n");
    return 0;
}
EOF
unshare --time --ipc --uts --pid --fork --kill-child --mount --mount-proc --root \
        $CHROOT gcc /tmp/hello.c -o /tmp/hello
/tmp/hello.c:1:0: internal compiler error: Segmentation fault
 #include <stdio.h>
executable file is not ELF
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-6/README.Bugs> for instructions.
# gcc --version
gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Any idea?
Thanks,
Laurent
^ permalink raw reply	[flat|nested] 50+ messages in thread
* Re: [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding
  2021-01-18 20:17       ` Laurent Vivier
@ 2021-01-18 21:03         ` Richard Henderson
  0 siblings, 0 replies; 50+ messages in thread
From: Richard Henderson @ 2021-01-18 21:03 UTC (permalink / raw)
  To: Laurent Vivier, Alistair Francis
  Cc: Peter Maydell, qemu-devel@nongnu.org Developers
On 1/18/21 10:17 AM, Laurent Vivier wrote:
> This commit breaks the build of my hello world test program with mips64el/stretch guest
> (and I guess some others too).
> 
> cat > $CHROOT/tmp/hello.c <<EOF
> #include <stdio.h>
> int main(void)
> {
>     printf("Hello World!\n");
>     return 0;
> }
> EOF
> 
> unshare --time --ipc --uts --pid --fork --kill-child --mount --mount-proc --root \
>         $CHROOT gcc /tmp/hello.c -o /tmp/hello
> /tmp/hello.c:1:0: internal compiler error: Segmentation fault
>  #include <stdio.h>
> 
> executable file is not ELF
> Please submit a full bug report,
> with preprocessed source if appropriate.
> See <file:///usr/share/doc/gcc-6/README.Bugs> for instructions.
> 
> # gcc --version
> gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
> Copyright (C) 2016 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> 
> Any idea?
Working on it:
https://bugs.launchpad.net/bugs/1912065
There's a temp hack in there that may work for you.  With no change, you'll see
an assert instead of a segv if you --enable-debug-tcg.
r~
^ permalink raw reply	[flat|nested] 50+ messages in thread
* Re: [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding
  2021-01-14  2:16 ` [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding Richard Henderson
  2021-01-15 23:03   ` Alistair Francis
@ 2021-02-01 20:45   ` Richard W.M. Jones
  2021-02-01 20:53     ` Richard W.M. Jones
  2021-02-04  2:22     ` Richard Henderson
  1 sibling, 2 replies; 50+ messages in thread
From: Richard W.M. Jones @ 2021-02-01 20:45 UTC (permalink / raw)
  To: Richard Henderson; +Cc: peter.maydell, qemu-devel
This commit breaks running certain s390x binaries, at least
the "mount" command (or a library it uses) breaks.
More details in this BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1922248
Could we revert this change since it seems to have caused other
problems as well?
Rich.
-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW
^ permalink raw reply	[flat|nested] 50+ messages in thread
* Re: [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding
  2021-02-01 20:45   ` Richard W.M. Jones
@ 2021-02-01 20:53     ` Richard W.M. Jones
  2021-02-04  2:22     ` Richard Henderson
  1 sibling, 0 replies; 50+ messages in thread
From: Richard W.M. Jones @ 2021-02-01 20:53 UTC (permalink / raw)
  To: Richard Henderson; +Cc: peter.maydell, qemu-devel
On Mon, Feb 01, 2021 at 08:45:47PM +0000, Richard W.M. Jones wrote:
> This commit breaks running certain s390x binaries, at least
> the "mount" command (or a library it uses) breaks.
> 
> More details in this BZ:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1922248
> 
> Could we revert this change since it seems to have caused other
> problems as well?
BTW I think the problem I am seeing is a bit different from the other
ones that were reported.  This commit causes the guest binary to
malfunction.  qemu itself does not emit any warning, nor crash.
Rich.
-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v
^ permalink raw reply	[flat|nested] 50+ messages in thread
* Re: [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding
  2021-02-01 20:45   ` Richard W.M. Jones
  2021-02-01 20:53     ` Richard W.M. Jones
@ 2021-02-04  2:22     ` Richard Henderson
  2021-02-04  6:41       ` David Hildenbrand
  1 sibling, 1 reply; 50+ messages in thread
From: Richard Henderson @ 2021-02-04  2:22 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: qemu-devel, David Hildenbrand
On 2/1/21 10:45 AM, Richard W.M. Jones wrote:
> This commit breaks running certain s390x binaries, at least
> the "mount" command (or a library it uses) breaks.
> 
> More details in this BZ:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1922248
> 
> Could we revert this change since it seems to have caused other
> problems as well?
Well, the other problems have been fixed (which were in fact latent, and could
have been produced by other means).  I would not like to sideline this patch
set indefinitely.
Could you give me some help extracting the relevant binaries?  "Begin with an
s390x host" is a non-starter.
FWIW, with qemu-system-s390x, booting debian, building qemu-s390x, and running
"/bin/mount -t proc proc /mnt" under double-emulation does not show the bug.
I suspect that's because debian targets a relatively old s390x cpu, and that
yours is using the relatively new vector instructions.  But I don't know.
What I do know is that current qemu doesn't seem to boot current fedora:
 $ ../bld/qemu-system-s390x -nographic -m 4G -cpu max -drive
file=Fedora-Server-netinst-s390x-33-1.2.iso,format=raw,if=virtio
 qemu-system-s390x: warning: 'msa5-base' requires 'kimd-sha-512'.
 qemu-system-s390x: warning: 'msa5-base' requires 'klmd-sha-512'.
 LOADPARM=[        ]
 Using virtio-blk.
 ISO boot image size verified
 KASLR disabled: CPU has no PRNG
 Linux version 5.8.15-301.fc33.s390x
(mockbuild@buildvm-s390x-07.s390.fedoraproject.org) 1 SMP Thu Oct 15 15:55:57
UTC 2020Kernel fault: interruption code 0005 ilc:2
 PSW : 0000200180000000 00000000000124c4
       R:0 T:0 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:0 CC:2 PM:0 RI:0 EA:3
 GPRS: 0000000000000000 0000000b806a2da6 7aa19c5cbb980703 95f62d65812b83ab
       d5e42882af203615 0000000b806a2da0 0000000000000000 0000000000000000
       00000000000230e8 0000000001438500 0000000001720320 0000000000000000
       0000000001442718 0000000000010070 0000000000012482 000000000000bf20
Which makes me thing that fedora 33 is now targeting a cpu that is too new and
not actually supported by tcg.
r~
^ permalink raw reply	[flat|nested] 50+ messages in thread
* Re: [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding
  2021-02-04  2:22     ` Richard Henderson
@ 2021-02-04  6:41       ` David Hildenbrand
  2021-02-04  7:55         ` David Hildenbrand
  0 siblings, 1 reply; 50+ messages in thread
From: David Hildenbrand @ 2021-02-04  6:41 UTC (permalink / raw)
  To: Richard Henderson; +Cc: David Hildenbrand, Richard W.M. Jones, qemu-devel
> Am 04.02.2021 um 03:22 schrieb Richard Henderson <richard.henderson@linaro.org>:
> 
> On 2/1/21 10:45 AM, Richard W.M. Jones wrote:
>> This commit breaks running certain s390x binaries, at least
>> the "mount" command (or a library it uses) breaks.
>> 
>> More details in this BZ:
>> 
>> https://bugzilla.redhat.com/show_bug.cgi?id=1922248
>> 
>> Could we revert this change since it seems to have caused other
>> problems as well?
> 
> Well, the other problems have been fixed (which were in fact latent, and could
> have been produced by other means).  I would not like to sideline this patch
> set indefinitely.
> 
> Could you give me some help extracting the relevant binaries?  "Begin with an
> s390x host" is a non-starter.
> 
Hi,
I‘m planning on reproducing it today or tomorrow. Especially, finding a reproducer and trying reproducing on x86-64 host.
> FWIW, with qemu-system-s390x, booting debian, building qemu-s390x, and running
> "/bin/mount -t proc proc /mnt" under double-emulation does not show the bug.
> 
> I suspect that's because debian targets a relatively old s390x cpu, and that
> yours is using the relatively new vector instructions.  But I don't know.
> 
> What I do know is that current qemu doesn't seem to boot current fedora:
> 
> $ ../bld/qemu-system-s390x -nographic -m 4G -cpu max -drive
> file=Fedora-Server-netinst-s390x-33-1.2.iso,format=raw,if=virtio
> qemu-system-s390x: warning: 'msa5-base' requires 'kimd-sha-512'.
> qemu-system-s390x: warning: 'msa5-base' requires 'klmd-sha-512'.
> LOADPARM=[        ]
> Using virtio-blk.
> ISO boot image size verified
> 
> KASLR disabled: CPU has no PRNG
> Linux version 5.8.15-301.fc33.s390x
> (mockbuild@buildvm-s390x-07.s390.fedoraproject.org) 1 SMP Thu Oct 15 15:55:57
> UTC 2020Kernel fault: interruption code 0005 ilc:2
> PSW : 0000200180000000 00000000000124c4
>       R:0 T:0 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:0 CC:2 PM:0 RI:0 EA:3
> GPRS: 0000000000000000 0000000b806a2da6 7aa19c5cbb980703 95f62d65812b83ab
>       d5e42882af203615 0000000b806a2da0 0000000000000000 0000000000000000
>       00000000000230e8 0000000001438500 0000000001720320 0000000000000000
>       0000000001442718 0000000000010070 0000000000012482 000000000000bf20
> 
> Which makes me thing that fedora 33 is now targeting a cpu that is too new and
> not actually supported by tcg.
> 
Try rawhide instead, that worked when testing the clang build fixes. Alternatively, boot F33 via kernel and initrd.
The Fedora 33 iso is broken and cannot boot under KVM as well (the combined kernel+initrd file is messed up).
Cheers!
> 
> r~
> 
^ permalink raw reply	[flat|nested] 50+ messages in thread
* Re: [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding
  2021-02-04  6:41       ` David Hildenbrand
@ 2021-02-04  7:55         ` David Hildenbrand
  2021-02-04  8:38           ` David Hildenbrand
  0 siblings, 1 reply; 50+ messages in thread
From: David Hildenbrand @ 2021-02-04  7:55 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Richard W.M. Jones, qemu-devel
On 04.02.21 07:41, David Hildenbrand wrote:
> 
>> Am 04.02.2021 um 03:22 schrieb Richard Henderson <richard.henderson@linaro.org>:
>>
>> On 2/1/21 10:45 AM, Richard W.M. Jones wrote:
>>> This commit breaks running certain s390x binaries, at least
>>> the "mount" command (or a library it uses) breaks.
>>>
>>> More details in this BZ:
>>>
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1922248
>>>
>>> Could we revert this change since it seems to have caused other
>>> problems as well?
>>
>> Well, the other problems have been fixed (which were in fact latent, and could
>> have been produced by other means).  I would not like to sideline this patch
>> set indefinitely.
>>
>> Could you give me some help extracting the relevant binaries?  "Begin with an
>> s390x host" is a non-starter.
>>
> 
> Hi,
> 
> I‘m planning on reproducing it today or tomorrow. Especially, finding a reproducer and trying reproducing on x86-64 host.
FWIW, on an x86-64 host, I can boot F32, Fedora rawhide, and RHEL8.X 
just fine from qcow2 (so "mount" seems to work in that environment as 
expected). Maybe it's really s390x-host specific? I'll give it a try.
-- 
Thanks,
David / dhildenb
^ permalink raw reply	[flat|nested] 50+ messages in thread
* Re: [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding
  2021-02-04  7:55         ` David Hildenbrand
@ 2021-02-04  8:38           ` David Hildenbrand
  2021-02-04  9:03             ` Richard W.M. Jones
  0 siblings, 1 reply; 50+ messages in thread
From: David Hildenbrand @ 2021-02-04  8:38 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Richard W.M. Jones, qemu-devel
On 04.02.21 08:55, David Hildenbrand wrote:
> On 04.02.21 07:41, David Hildenbrand wrote:
>>
>>> Am 04.02.2021 um 03:22 schrieb Richard Henderson <richard.henderson@linaro.org>:
>>>
>>> On 2/1/21 10:45 AM, Richard W.M. Jones wrote:
>>>> This commit breaks running certain s390x binaries, at least
>>>> the "mount" command (or a library it uses) breaks.
>>>>
>>>> More details in this BZ:
>>>>
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1922248
>>>>
>>>> Could we revert this change since it seems to have caused other
>>>> problems as well?
>>>
>>> Well, the other problems have been fixed (which were in fact latent, and could
>>> have been produced by other means).  I would not like to sideline this patch
>>> set indefinitely.
>>>
>>> Could you give me some help extracting the relevant binaries?  "Begin with an
>>> s390x host" is a non-starter.
>>>
>>
>> Hi,
>>
>> I‘m planning on reproducing it today or tomorrow. Especially, finding a reproducer and trying reproducing on x86-64 host.
> 
> FWIW, on an x86-64 host, I can boot F32, Fedora rawhide, and RHEL8.X
> just fine from qcow2 (so "mount" seems to work in that environment as
> expected). Maybe it's really s390x-host specific? I'll give it a try.
> 
F33 qcow2 [1] fails booting on an s390x/TCG host.
I tried "-cpu qemu" and "-qemu qemu=vx=off". The same image boots on 
x86-64/TCG host just fine.
With
commit 8f17a975e60b773d7c366a81c0d9bbe304f30859
Author: Richard Henderson <richard.henderson@linaro.org>
Date:   Mon Mar 30 19:52:02 2020 -0700
     tcg/optimize: Adjust TempOptInfo allocation
The image boots just fine on s390x/TCG as well.
[1] 
https://dl.fedoraproject.org/pub/fedora-secondary/releases/33/Cloud/s390x/images/Fedora-Cloud-Base-33-1.2.s390x.qcow2
-- 
Thanks,
David / dhildenb
^ permalink raw reply	[flat|nested] 50+ messages in thread
* Re: [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding
  2021-02-04  8:38           ` David Hildenbrand
@ 2021-02-04  9:03             ` Richard W.M. Jones
  2021-02-04  9:12               ` David Hildenbrand
  2021-02-04  9:29               ` Richard W.M. Jones
  0 siblings, 2 replies; 50+ messages in thread
From: Richard W.M. Jones @ 2021-02-04  9:03 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Richard Henderson, qemu-devel
On Thu, Feb 04, 2021 at 09:38:45AM +0100, David Hildenbrand wrote:
> On 04.02.21 08:55, David Hildenbrand wrote:
> >On 04.02.21 07:41, David Hildenbrand wrote:
> >>
> >>>Am 04.02.2021 um 03:22 schrieb Richard Henderson <richard.henderson@linaro.org>:
> >>>
> >>>On 2/1/21 10:45 AM, Richard W.M. Jones wrote:
> >>>>This commit breaks running certain s390x binaries, at least
> >>>>the "mount" command (or a library it uses) breaks.
> >>>>
> >>>>More details in this BZ:
> >>>>
> >>>>https://bugzilla.redhat.com/show_bug.cgi?id=1922248
> >>>>
> >>>>Could we revert this change since it seems to have caused other
> >>>>problems as well?
> >>>
> >>>Well, the other problems have been fixed (which were in fact latent, and could
> >>>have been produced by other means).  I would not like to sideline this patch
> >>>set indefinitely.
> >>>
> >>>Could you give me some help extracting the relevant binaries?  "Begin with an
> >>>s390x host" is a non-starter.
> >>>
> >>
> >>Hi,
> >>
> >>I‘m planning on reproducing it today or tomorrow. Especially, finding a reproducer and trying reproducing on x86-64 host.
> >
> >FWIW, on an x86-64 host, I can boot F32, Fedora rawhide, and RHEL8.X
> >just fine from qcow2 (so "mount" seems to work in that environment as
> >expected). Maybe it's really s390x-host specific? I'll give it a try.
> >
> 
> F33 qcow2 [1] fails booting on an s390x/TCG host.
What did the failure look like?
> I tried "-cpu qemu" and "-qemu qemu=vx=off". The same image boots on
> x86-64/TCG host just fine.
> 
> 
> With
> 
> commit 8f17a975e60b773d7c366a81c0d9bbe304f30859
> Author: Richard Henderson <richard.henderson@linaro.org>
> Date:   Mon Mar 30 19:52:02 2020 -0700
> 
>     tcg/optimize: Adjust TempOptInfo allocation
> 
> The image boots just fine on s390x/TCG as well.
Let me try this in a minute on my original test machine.
Rich.
> 
> [1] https://dl.fedoraproject.org/pub/fedora-secondary/releases/33/Cloud/s390x/images/Fedora-Cloud-Base-33-1.2.s390x.qcow2
> 
> -- 
> Thanks,
> 
> David / dhildenb
-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top
^ permalink raw reply	[flat|nested] 50+ messages in thread
* Re: [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding
  2021-02-04  9:03             ` Richard W.M. Jones
@ 2021-02-04  9:12               ` David Hildenbrand
  2021-02-04  9:29               ` Richard W.M. Jones
  1 sibling, 0 replies; 50+ messages in thread
From: David Hildenbrand @ 2021-02-04  9:12 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Richard Henderson, qemu-devel
On 04.02.21 10:03, Richard W.M. Jones wrote:
> On Thu, Feb 04, 2021 at 09:38:45AM +0100, David Hildenbrand wrote:
>> On 04.02.21 08:55, David Hildenbrand wrote:
>>> On 04.02.21 07:41, David Hildenbrand wrote:
>>>>
>>>>> Am 04.02.2021 um 03:22 schrieb Richard Henderson <richard.henderson@linaro.org>:
>>>>>
>>>>> On 2/1/21 10:45 AM, Richard W.M. Jones wrote:
>>>>>> This commit breaks running certain s390x binaries, at least
>>>>>> the "mount" command (or a library it uses) breaks.
>>>>>>
>>>>>> More details in this BZ:
>>>>>>
>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1922248
>>>>>>
>>>>>> Could we revert this change since it seems to have caused other
>>>>>> problems as well?
>>>>>
>>>>> Well, the other problems have been fixed (which were in fact latent, and could
>>>>> have been produced by other means).  I would not like to sideline this patch
>>>>> set indefinitely.
>>>>>
>>>>> Could you give me some help extracting the relevant binaries?  "Begin with an
>>>>> s390x host" is a non-starter.
>>>>>
>>>>
>>>> Hi,
>>>>
>>>> I‘m planning on reproducing it today or tomorrow. Especially, finding a reproducer and trying reproducing on x86-64 host.
>>>
>>> FWIW, on an x86-64 host, I can boot F32, Fedora rawhide, and RHEL8.X
>>> just fine from qcow2 (so "mount" seems to work in that environment as
>>> expected). Maybe it's really s390x-host specific? I'll give it a try.
>>>
>>
>> F33 qcow2 [1] fails booting on an s390x/TCG host.
> 
> What did the failure look like?
It starts booting just fine until
[   10.869011] Core dump to |/bin/false pipe failed
[   10.915968] systemd[1]: Finished Create list of static device nodes for the current kernel.
[   10.946424] systemd[1]: systemd-journald.service: Main process exited, code=killed, status=31/SYS
[   10.966677] systemd[1]: systemd-journald.service: Failed with result 'signal'.
[   11.017545] systemd[1]: Failed to start Journal Service.
[FAILED] Failed to start Journal Service.
See 'systemctl status systemd-journald.service' for details.
which repeats a couple of times. Then things go nuts
[   32.488899] systemd[1]: Failed to start Rule-based Manager for Device Events and Files.
[FAILED] Failed to start Rule-based…r for Device Events and Files.
See 'systemctl status systemd-udevd.service' for details.
[   32.501449] systemd[1]: systemd-udevd.service: Scheduled restart job, restart counter is at 1.
[   32.502134] systemd[1]: Stopped Rule-based Manager for Device Events and Files.
Looks also related to /dev / udev.
> 
>> I tried "-cpu qemu" and "-qemu qemu=vx=off". The same image boots on
>> x86-64/TCG host just fine.
>>
>>
>> With
>>
>> commit 8f17a975e60b773d7c366a81c0d9bbe304f30859
>> Author: Richard Henderson <richard.henderson@linaro.org>
>> Date:   Mon Mar 30 19:52:02 2020 -0700
>>
>>      tcg/optimize: Adjust TempOptInfo allocation
>>
>> The image boots just fine on s390x/TCG as well.
> 
> Let me try this in a minute on my original test machine.
That's the commit exactly before the problematic one (didn't want to mess with reverts for now).
-- 
Thanks,
David / dhildenb
^ permalink raw reply	[flat|nested] 50+ messages in thread
* Re: [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding
  2021-02-04  9:03             ` Richard W.M. Jones
  2021-02-04  9:12               ` David Hildenbrand
@ 2021-02-04  9:29               ` Richard W.M. Jones
  2021-02-04  9:30                 ` Richard W.M. Jones
  2021-02-04  9:37                 ` David Hildenbrand
  1 sibling, 2 replies; 50+ messages in thread
From: Richard W.M. Jones @ 2021-02-04  9:29 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Richard Henderson, qemu-devel
> > commit 8f17a975e60b773d7c366a81c0d9bbe304f30859
> > Author: Richard Henderson <richard.henderson@linaro.org>
> > Date:   Mon Mar 30 19:52:02 2020 -0700
> > 
> >     tcg/optimize: Adjust TempOptInfo allocation
> > 
> > The image boots just fine on s390x/TCG as well.
> 
> Let me try this in a minute on my original test machine.
I got the wrong end of the stick as David pointed out in the other email.
However I did test things again this morning (all on s390 host), and
current head (1ed9228f63ea4b) fails same as before ("mount" command
fails).
Also I downloaded:
  https://dl.fedoraproject.org/pub/fedora-secondary/releases/33/Cloud/s390x/images/Fedora-Cloud-Base-33-1.2.s390x.qcow2
and booted it on 1ed9228f63ea4b using this command:
  $ ~/d/qemu/build/s390x-softmmu/qemu-system-s390x -machine accel=tcg -m 2048 -drive file=Fedora-Cloud-Base-33-1.2.s390x.qcow2,format=qcow2,if=virtio -serial stdio
Lots of core dumps inside the guest, same as David saw.
I then reset qemu back to 8f17a975e60b773d ("tcg/optimize: Adjust
TempOptInfo allocation"), rebuilt qemu, tested the same command and
cloud image, and that booted up much happier with no failures or core
dumps.
Isn't it kind of weird that this would only affect an s390 host?  I
don't understand why the host would make a difference if we're doing
TCG.
Rich.
-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/
^ permalink raw reply	[flat|nested] 50+ messages in thread
* Re: [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding
  2021-02-04  9:29               ` Richard W.M. Jones
@ 2021-02-04  9:30                 ` Richard W.M. Jones
  2021-02-04  9:37                 ` David Hildenbrand
  1 sibling, 0 replies; 50+ messages in thread
From: Richard W.M. Jones @ 2021-02-04  9:30 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Richard Henderson, qemu-devel
I have this s390 machine for another 99 hours now, so if you want me
to test patches then send them my way.
Rich.
-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org
^ permalink raw reply	[flat|nested] 50+ messages in thread
* Re: [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding
  2021-02-04  9:29               ` Richard W.M. Jones
  2021-02-04  9:30                 ` Richard W.M. Jones
@ 2021-02-04  9:37                 ` David Hildenbrand
  2021-02-04 16:04                   ` Philippe Mathieu-Daudé
  1 sibling, 1 reply; 50+ messages in thread
From: David Hildenbrand @ 2021-02-04  9:37 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Richard Henderson, qemu-devel
On 04.02.21 10:29, Richard W.M. Jones wrote:
>>> commit 8f17a975e60b773d7c366a81c0d9bbe304f30859
>>> Author: Richard Henderson <richard.henderson@linaro.org>
>>> Date:   Mon Mar 30 19:52:02 2020 -0700
>>>
>>>      tcg/optimize: Adjust TempOptInfo allocation
>>>
>>> The image boots just fine on s390x/TCG as well.
>>
>> Let me try this in a minute on my original test machine.
> 
> I got the wrong end of the stick as David pointed out in the other email.
> 
> However I did test things again this morning (all on s390 host), and
> current head (1ed9228f63ea4b) fails same as before ("mount" command
> fails).
> 
> Also I downloaded:
> 
>    https://dl.fedoraproject.org/pub/fedora-secondary/releases/33/Cloud/s390x/images/Fedora-Cloud-Base-33-1.2.s390x.qcow2
> 
> and booted it on 1ed9228f63ea4b using this command:
> 
>    $ ~/d/qemu/build/s390x-softmmu/qemu-system-s390x -machine accel=tcg -m 2048 -drive file=Fedora-Cloud-Base-33-1.2.s390x.qcow2,format=qcow2,if=virtio -serial stdio
> 
> Lots of core dumps inside the guest, same as David saw.
> 
> I then reset qemu back to 8f17a975e60b773d ("tcg/optimize: Adjust
> TempOptInfo allocation"), rebuilt qemu, tested the same command and
> cloud image, and that booted up much happier with no failures or core
> dumps.
> 
> Isn't it kind of weird that this would only affect an s390 host?  I
> don't understand why the host would make a difference if we're doing
> TCG.
I assume an existing BUG in the s390x TCG backend ... which makes it 
harder to debug :)
-- 
Thanks,
David / dhildenb
^ permalink raw reply	[flat|nested] 50+ messages in thread
* Re: [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding
  2021-02-04  9:37                 ` David Hildenbrand
@ 2021-02-04 16:04                   ` Philippe Mathieu-Daudé
  2021-02-04 16:08                     ` Philippe Mathieu-Daudé
                                       ` (4 more replies)
  0 siblings, 5 replies; 50+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-02-04 16:04 UTC (permalink / raw)
  To: David Hildenbrand, Richard W.M. Jones, Richard Henderson; +Cc: qemu-devel
On 2/4/21 10:37 AM, David Hildenbrand wrote:
> On 04.02.21 10:29, Richard W.M. Jones wrote:
>>>> commit 8f17a975e60b773d7c366a81c0d9bbe304f30859
>>>> Author: Richard Henderson <richard.henderson@linaro.org>
>>>> Date:   Mon Mar 30 19:52:02 2020 -0700
>>>>
>>>>      tcg/optimize: Adjust TempOptInfo allocation
>>>>
>>>> The image boots just fine on s390x/TCG as well.
>>>
>>> Let me try this in a minute on my original test machine.
>>
>> I got the wrong end of the stick as David pointed out in the other email.
>>
>> However I did test things again this morning (all on s390 host), and
>> current head (1ed9228f63ea4b) fails same as before ("mount" command
>> fails).
>>
>> Also I downloaded:
>>
>>   
>> https://dl.fedoraproject.org/pub/fedora-secondary/releases/33/Cloud/s390x/images/Fedora-Cloud-Base-33-1.2.s390x.qcow2
>>
>>
>> and booted it on 1ed9228f63ea4b using this command:
>>
>>    $ ~/d/qemu/build/s390x-softmmu/qemu-system-s390x -machine accel=tcg
>> -m 2048 -drive
>> file=Fedora-Cloud-Base-33-1.2.s390x.qcow2,format=qcow2,if=virtio
>> -serial stdio
>>
>> Lots of core dumps inside the guest, same as David saw.
>>
>> I then reset qemu back to 8f17a975e60b773d ("tcg/optimize: Adjust
>> TempOptInfo allocation"), rebuilt qemu, tested the same command and
>> cloud image, and that booted up much happier with no failures or core
>> dumps.
>>
>> Isn't it kind of weird that this would only affect an s390 host?  I
>> don't understand why the host would make a difference if we're doing
>> TCG.
> 
> I assume an existing BUG in the s390x TCG backend ... which makes it
> harder to debug :)
This seems to fix it:
-- >8 --
diff --git a/tcg/s390/tcg-target.c.inc b/tcg/s390/tcg-target.c.inc
index 8517e552323..32d03dbfbaf 100644
--- a/tcg/s390/tcg-target.c.inc
+++ b/tcg/s390/tcg-target.c.inc
@@ -1094,10 +1094,16 @@ static int tgen_cmp(TCGContext *s, TCGType type,
TCGCond c, TCGReg r1,
                 op = (is_unsigned ? RIL_CLFI : RIL_CFI);
                 tcg_out_insn_RIL(s, op, r1, c2);
                 goto exit;
-            } else if (c2 == (is_unsigned ? (uint32_t)c2 : (int32_t)c2)) {
-                op = (is_unsigned ? RIL_CLGFI : RIL_CGFI);
-                tcg_out_insn_RIL(s, op, r1, c2);
-                goto exit;
+            } else if (is_unsigned) {
+                if (c2 == (uint32_t)c2) {
+                    tcg_out_insn_RIL(s, RIL_CLGFI, r1, c2);
+                    goto exit;
+                }
+            } else {
+                if (c2 == (int32_t)c2) {
+                    tcg_out_insn_RIL(s, RIL_CGFI, r1, c2);
+                    goto exit;
+                }
             }
         }
---
^ permalink raw reply related	[flat|nested] 50+ messages in thread
* Re: [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding
  2021-02-04 16:04                   ` Philippe Mathieu-Daudé
@ 2021-02-04 16:08                     ` Philippe Mathieu-Daudé
  2021-02-04 16:12                     ` David Hildenbrand
                                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 50+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-02-04 16:08 UTC (permalink / raw)
  To: David Hildenbrand, Richard W.M. Jones, Richard Henderson
  Cc: qemu-devel@nongnu.org Developers
On Thu, Feb 4, 2021 at 5:04 PM Philippe Mathieu-Daudé <f4bug@amsat.org> wrote:
> On 2/4/21 10:37 AM, David Hildenbrand wrote:
> > On 04.02.21 10:29, Richard W.M. Jones wrote:
> >>>> commit 8f17a975e60b773d7c366a81c0d9bbe304f30859
> >>>> Author: Richard Henderson <richard.henderson@linaro.org>
> >>>> Date:   Mon Mar 30 19:52:02 2020 -0700
> >>>>
> >>>>      tcg/optimize: Adjust TempOptInfo allocation
> >>>>
> >>>> The image boots just fine on s390x/TCG as well.
> >>>
> >>> Let me try this in a minute on my original test machine.
> >>
> >> I got the wrong end of the stick as David pointed out in the other email.
> >>
> >> However I did test things again this morning (all on s390 host), and
> >> current head (1ed9228f63ea4b) fails same as before ("mount" command
> >> fails).
> >>
> >> Also I downloaded:
> >>
> >>
> >> https://dl.fedoraproject.org/pub/fedora-secondary/releases/33/Cloud/s390x/images/Fedora-Cloud-Base-33-1.2.s390x.qcow2
> >>
> >>
> >> and booted it on 1ed9228f63ea4b using this command:
> >>
> >>    $ ~/d/qemu/build/s390x-softmmu/qemu-system-s390x -machine accel=tcg
> >> -m 2048 -drive
> >> file=Fedora-Cloud-Base-33-1.2.s390x.qcow2,format=qcow2,if=virtio
> >> -serial stdio
> >>
> >> Lots of core dumps inside the guest, same as David saw.
> >>
> >> I then reset qemu back to 8f17a975e60b773d ("tcg/optimize: Adjust
> >> TempOptInfo allocation"), rebuilt qemu, tested the same command and
> >> cloud image, and that booted up much happier with no failures or core
> >> dumps.
> >>
> >> Isn't it kind of weird that this would only affect an s390 host?  I
> >> don't understand why the host would make a difference if we're doing
> >> TCG.
> >
> > I assume an existing BUG in the s390x TCG backend ... which makes it
> > harder to debug :)
>
> This seems to fix it:
>
> -- >8 --
> diff --git a/tcg/s390/tcg-target.c.inc b/tcg/s390/tcg-target.c.inc
> index 8517e552323..32d03dbfbaf 100644
> --- a/tcg/s390/tcg-target.c.inc
> +++ b/tcg/s390/tcg-target.c.inc
> @@ -1094,10 +1094,16 @@ static int tgen_cmp(TCGContext *s, TCGType type,
> TCGCond c, TCGReg r1,
>                  op = (is_unsigned ? RIL_CLFI : RIL_CFI);
>                  tcg_out_insn_RIL(s, op, r1, c2);
>                  goto exit;
> -            } else if (c2 == (is_unsigned ? (uint32_t)c2 : (int32_t)c2)) {
> -                op = (is_unsigned ? RIL_CLGFI : RIL_CGFI);
> -                tcg_out_insn_RIL(s, op, r1, c2);
> -                goto exit;
> +            } else if (is_unsigned) {
> +                if (c2 == (uint32_t)c2) {
> +                    tcg_out_insn_RIL(s, RIL_CLGFI, r1, c2);
This path was working,
> +                    goto exit;
> +                }
> +            } else {
> +                if (c2 == (int32_t)c2) {
> +                    tcg_out_insn_RIL(s, RIL_CGFI, r1, c2);
but this one not ¯\_(ツ)_/¯
> +                    goto exit;
> +                }
>              }
>          }
> ---
>
^ permalink raw reply	[flat|nested] 50+ messages in thread
* Re: [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding
  2021-02-04 16:04                   ` Philippe Mathieu-Daudé
  2021-02-04 16:08                     ` Philippe Mathieu-Daudé
@ 2021-02-04 16:12                     ` David Hildenbrand
  2021-02-04 16:29                     ` David Hildenbrand
                                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 50+ messages in thread
From: David Hildenbrand @ 2021-02-04 16:12 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Richard W.M. Jones,
	Richard Henderson
  Cc: qemu-devel
On 04.02.21 17:04, Philippe Mathieu-Daudé wrote:
> On 2/4/21 10:37 AM, David Hildenbrand wrote:
>> On 04.02.21 10:29, Richard W.M. Jones wrote:
>>>>> commit 8f17a975e60b773d7c366a81c0d9bbe304f30859
>>>>> Author: Richard Henderson <richard.henderson@linaro.org>
>>>>> Date:   Mon Mar 30 19:52:02 2020 -0700
>>>>>
>>>>>       tcg/optimize: Adjust TempOptInfo allocation
>>>>>
>>>>> The image boots just fine on s390x/TCG as well.
>>>>
>>>> Let me try this in a minute on my original test machine.
>>>
>>> I got the wrong end of the stick as David pointed out in the other email.
>>>
>>> However I did test things again this morning (all on s390 host), and
>>> current head (1ed9228f63ea4b) fails same as before ("mount" command
>>> fails).
>>>
>>> Also I downloaded:
>>>
>>>    
>>> https://dl.fedoraproject.org/pub/fedora-secondary/releases/33/Cloud/s390x/images/Fedora-Cloud-Base-33-1.2.s390x.qcow2
>>>
>>>
>>> and booted it on 1ed9228f63ea4b using this command:
>>>
>>>     $ ~/d/qemu/build/s390x-softmmu/qemu-system-s390x -machine accel=tcg
>>> -m 2048 -drive
>>> file=Fedora-Cloud-Base-33-1.2.s390x.qcow2,format=qcow2,if=virtio
>>> -serial stdio
>>>
>>> Lots of core dumps inside the guest, same as David saw.
>>>
>>> I then reset qemu back to 8f17a975e60b773d ("tcg/optimize: Adjust
>>> TempOptInfo allocation"), rebuilt qemu, tested the same command and
>>> cloud image, and that booted up much happier with no failures or core
>>> dumps.
>>>
>>> Isn't it kind of weird that this would only affect an s390 host?  I
>>> don't understand why the host would make a difference if we're doing
>>> TCG.
>>
>> I assume an existing BUG in the s390x TCG backend ... which makes it
>> harder to debug :)
> 
> This seems to fix it:
> 
> -- >8 --
> diff --git a/tcg/s390/tcg-target.c.inc b/tcg/s390/tcg-target.c.inc
> index 8517e552323..32d03dbfbaf 100644
> --- a/tcg/s390/tcg-target.c.inc
> +++ b/tcg/s390/tcg-target.c.inc
> @@ -1094,10 +1094,16 @@ static int tgen_cmp(TCGContext *s, TCGType type,
> TCGCond c, TCGReg r1,
>                   op = (is_unsigned ? RIL_CLFI : RIL_CFI);
>                   tcg_out_insn_RIL(s, op, r1, c2);
>                   goto exit;
> -            } else if (c2 == (is_unsigned ? (uint32_t)c2 : (int32_t)c2)) {
> -                op = (is_unsigned ? RIL_CLGFI : RIL_CGFI);
> -                tcg_out_insn_RIL(s, op, r1, c2);
> -                goto exit;
> +            } else if (is_unsigned) {
> +                if (c2 == (uint32_t)c2) {
> +                    tcg_out_insn_RIL(s, RIL_CLGFI, r1, c2);
> +                    goto exit;
> +                }
> +            } else {
> +                if (c2 == (int32_t)c2) {
> +                    tcg_out_insn_RIL(s, RIL_CGFI, r1, c2);
> +                    goto exit;
> +                }
>               }
>           }
Indeed, seems to work. Thanks!
... will try to review :)
-- 
Thanks,
David / dhildenb
^ permalink raw reply	[flat|nested] 50+ messages in thread
* Re: [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding
  2021-02-04 16:04                   ` Philippe Mathieu-Daudé
  2021-02-04 16:08                     ` Philippe Mathieu-Daudé
  2021-02-04 16:12                     ` David Hildenbrand
@ 2021-02-04 16:29                     ` David Hildenbrand
  2021-02-04 16:57                       ` Eric Blake
  2021-02-04 17:33                       ` Richard Henderson
  2021-02-04 16:48                     ` Richard W.M. Jones
  2021-02-04 16:52                     ` Eric Blake
  4 siblings, 2 replies; 50+ messages in thread
From: David Hildenbrand @ 2021-02-04 16:29 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Richard W.M. Jones,
	Richard Henderson
  Cc: qemu-devel
On 04.02.21 17:04, Philippe Mathieu-Daudé wrote:
> On 2/4/21 10:37 AM, David Hildenbrand wrote:
>> On 04.02.21 10:29, Richard W.M. Jones wrote:
>>>>> commit 8f17a975e60b773d7c366a81c0d9bbe304f30859
>>>>> Author: Richard Henderson <richard.henderson@linaro.org>
>>>>> Date:   Mon Mar 30 19:52:02 2020 -0700
>>>>>
>>>>>       tcg/optimize: Adjust TempOptInfo allocation
>>>>>
>>>>> The image boots just fine on s390x/TCG as well.
>>>>
>>>> Let me try this in a minute on my original test machine.
>>>
>>> I got the wrong end of the stick as David pointed out in the other email.
>>>
>>> However I did test things again this morning (all on s390 host), and
>>> current head (1ed9228f63ea4b) fails same as before ("mount" command
>>> fails).
>>>
>>> Also I downloaded:
>>>
>>>    
>>> https://dl.fedoraproject.org/pub/fedora-secondary/releases/33/Cloud/s390x/images/Fedora-Cloud-Base-33-1.2.s390x.qcow2
>>>
>>>
>>> and booted it on 1ed9228f63ea4b using this command:
>>>
>>>     $ ~/d/qemu/build/s390x-softmmu/qemu-system-s390x -machine accel=tcg
>>> -m 2048 -drive
>>> file=Fedora-Cloud-Base-33-1.2.s390x.qcow2,format=qcow2,if=virtio
>>> -serial stdio
>>>
>>> Lots of core dumps inside the guest, same as David saw.
>>>
>>> I then reset qemu back to 8f17a975e60b773d ("tcg/optimize: Adjust
>>> TempOptInfo allocation"), rebuilt qemu, tested the same command and
>>> cloud image, and that booted up much happier with no failures or core
>>> dumps.
>>>
>>> Isn't it kind of weird that this would only affect an s390 host?  I
>>> don't understand why the host would make a difference if we're doing
>>> TCG.
>>
>> I assume an existing BUG in the s390x TCG backend ... which makes it
>> harder to debug :)
> 
> This seems to fix it:
> 
> -- >8 --
> diff --git a/tcg/s390/tcg-target.c.inc b/tcg/s390/tcg-target.c.inc
> index 8517e552323..32d03dbfbaf 100644
> --- a/tcg/s390/tcg-target.c.inc
> +++ b/tcg/s390/tcg-target.c.inc
> @@ -1094,10 +1094,16 @@ static int tgen_cmp(TCGContext *s, TCGType type,
> TCGCond c, TCGReg r1,
>                   op = (is_unsigned ? RIL_CLFI : RIL_CFI);
>                   tcg_out_insn_RIL(s, op, r1, c2);
>                   goto exit;
> -            } else if (c2 == (is_unsigned ? (uint32_t)c2 : (int32_t)c2)) {
> -                op = (is_unsigned ? RIL_CLGFI : RIL_CGFI);
> -                tcg_out_insn_RIL(s, op, r1, c2);
> -                goto exit;
> +            } else if (is_unsigned) {
> +                if (c2 == (uint32_t)c2) {
> +                    tcg_out_insn_RIL(s, RIL_CLGFI, r1, c2);
> +                    goto exit;
> +                }
> +            } else {
> +                if (c2 == (int32_t)c2) {
> +                    tcg_out_insn_RIL(s, RIL_CGFI, r1, c2);
> +                    goto exit;
> +                }
>               }
>           }
> ---
> 
This works as well I think. Broken casting.
diff --git a/tcg/s390/tcg-target.c.inc b/tcg/s390/tcg-target.c.inc
index 8517e55232..f41ca02492 100644
--- a/tcg/s390/tcg-target.c.inc
+++ b/tcg/s390/tcg-target.c.inc
@@ -1094,7 +1094,7 @@ static int tgen_cmp(TCGContext *s, TCGType type, TCGCond c, TCGReg r1,
                  op = (is_unsigned ? RIL_CLFI : RIL_CFI);
                  tcg_out_insn_RIL(s, op, r1, c2);
                  goto exit;
-            } else if (c2 == (is_unsigned ? (uint32_t)c2 : (int32_t)c2)) {
+            } else if (c2 == (is_unsigned ? (TCGArg)(uint32_t)c2 : (TCGArg)(int32_t)c2)) {
                  op = (is_unsigned ? RIL_CLGFI : RIL_CGFI);
(int32_t)c2 will be converted to (uint32_t) first, then to (TCGArg).
Which is different than casting directly from (int32_t)c2 to (TCGArg).
Nasty.
-- 
Thanks,
David / dhildenb
^ permalink raw reply related	[flat|nested] 50+ messages in thread
* Re: [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding
  2021-02-04 16:04                   ` Philippe Mathieu-Daudé
                                       ` (2 preceding siblings ...)
  2021-02-04 16:29                     ` David Hildenbrand
@ 2021-02-04 16:48                     ` Richard W.M. Jones
  2021-02-04 16:52                     ` Eric Blake
  4 siblings, 0 replies; 50+ messages in thread
From: Richard W.M. Jones @ 2021-02-04 16:48 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Richard Henderson, qemu-devel, David Hildenbrand
On Thu, Feb 04, 2021 at 05:04:22PM +0100, Philippe Mathieu-Daudé wrote:
> On 2/4/21 10:37 AM, David Hildenbrand wrote:
> > On 04.02.21 10:29, Richard W.M. Jones wrote:
> >>>> commit 8f17a975e60b773d7c366a81c0d9bbe304f30859
> >>>> Author: Richard Henderson <richard.henderson@linaro.org>
> >>>> Date:   Mon Mar 30 19:52:02 2020 -0700
> >>>>
> >>>>      tcg/optimize: Adjust TempOptInfo allocation
> >>>>
> >>>> The image boots just fine on s390x/TCG as well.
> >>>
> >>> Let me try this in a minute on my original test machine.
> >>
> >> I got the wrong end of the stick as David pointed out in the other email.
> >>
> >> However I did test things again this morning (all on s390 host), and
> >> current head (1ed9228f63ea4b) fails same as before ("mount" command
> >> fails).
> >>
> >> Also I downloaded:
> >>
> >>   
> >> https://dl.fedoraproject.org/pub/fedora-secondary/releases/33/Cloud/s390x/images/Fedora-Cloud-Base-33-1.2.s390x.qcow2
> >>
> >>
> >> and booted it on 1ed9228f63ea4b using this command:
> >>
> >>    $ ~/d/qemu/build/s390x-softmmu/qemu-system-s390x -machine accel=tcg
> >> -m 2048 -drive
> >> file=Fedora-Cloud-Base-33-1.2.s390x.qcow2,format=qcow2,if=virtio
> >> -serial stdio
> >>
> >> Lots of core dumps inside the guest, same as David saw.
> >>
> >> I then reset qemu back to 8f17a975e60b773d ("tcg/optimize: Adjust
> >> TempOptInfo allocation"), rebuilt qemu, tested the same command and
> >> cloud image, and that booted up much happier with no failures or core
> >> dumps.
> >>
> >> Isn't it kind of weird that this would only affect an s390 host?  I
> >> don't understand why the host would make a difference if we're doing
> >> TCG.
> > 
> > I assume an existing BUG in the s390x TCG backend ... which makes it
> > harder to debug :)
> 
> This seems to fix it:
> 
> -- >8 --
> diff --git a/tcg/s390/tcg-target.c.inc b/tcg/s390/tcg-target.c.inc
> index 8517e552323..32d03dbfbaf 100644
> --- a/tcg/s390/tcg-target.c.inc
> +++ b/tcg/s390/tcg-target.c.inc
> @@ -1094,10 +1094,16 @@ static int tgen_cmp(TCGContext *s, TCGType type,
> TCGCond c, TCGReg r1,
>                  op = (is_unsigned ? RIL_CLFI : RIL_CFI);
>                  tcg_out_insn_RIL(s, op, r1, c2);
>                  goto exit;
> -            } else if (c2 == (is_unsigned ? (uint32_t)c2 : (int32_t)c2)) {
> -                op = (is_unsigned ? RIL_CLGFI : RIL_CGFI);
> -                tcg_out_insn_RIL(s, op, r1, c2);
> -                goto exit;
> +            } else if (is_unsigned) {
> +                if (c2 == (uint32_t)c2) {
> +                    tcg_out_insn_RIL(s, RIL_CLGFI, r1, c2);
> +                    goto exit;
> +                }
> +            } else {
> +                if (c2 == (int32_t)c2) {
> +                    tcg_out_insn_RIL(s, RIL_CGFI, r1, c2);
> +                    goto exit;
> +                }
>              }
>          }
Thanks - can confirm this fixes both observed problems here.
Rich.
-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html
^ permalink raw reply	[flat|nested] 50+ messages in thread
* Re: [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding
  2021-02-04 16:04                   ` Philippe Mathieu-Daudé
                                       ` (3 preceding siblings ...)
  2021-02-04 16:48                     ` Richard W.M. Jones
@ 2021-02-04 16:52                     ` Eric Blake
  4 siblings, 0 replies; 50+ messages in thread
From: Eric Blake @ 2021-02-04 16:52 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, David Hildenbrand,
	Richard W.M. Jones, Richard Henderson
  Cc: qemu-devel
On 2/4/21 10:04 AM, Philippe Mathieu-Daudé wrote:
>>> Isn't it kind of weird that this would only affect an s390 host?  I
>>> don't understand why the host would make a difference if we're doing
>>> TCG.
>>
>> I assume an existing BUG in the s390x TCG backend ... which makes it
>> harder to debug :)
> 
> This seems to fix it:
> 
> -- >8 --
> diff --git a/tcg/s390/tcg-target.c.inc b/tcg/s390/tcg-target.c.inc
> index 8517e552323..32d03dbfbaf 100644
> --- a/tcg/s390/tcg-target.c.inc
> +++ b/tcg/s390/tcg-target.c.inc
> @@ -1094,10 +1094,16 @@ static int tgen_cmp(TCGContext *s, TCGType type,
> TCGCond c, TCGReg r1,
>                  op = (is_unsigned ? RIL_CLFI : RIL_CFI);
>                  tcg_out_insn_RIL(s, op, r1, c2);
>                  goto exit;
> -            } else if (c2 == (is_unsigned ? (uint32_t)c2 : (int32_t)c2)) {
In this code, you are comparing c2 to the type promotion of uint32_t and
int32_t.  That is, the conversion rules are as if you had done:
(common_type) c2 == (common_type) (uint32_t) (is_unsigned ? (uint32_t)c2
: (uint32_t)(int32_t)c2)
> -                op = (is_unsigned ? RIL_CLGFI : RIL_CGFI);
> -                tcg_out_insn_RIL(s, op, r1, c2);
> -                goto exit;
> +            } else if (is_unsigned) {
> +                if (c2 == (uint32_t)c2) {
whereas this is:
(common_type_unsigned)c2 == (common_type_unsigned)(uint32_t)c2
> +                    tcg_out_insn_RIL(s, RIL_CLGFI, r1, c2);
> +                    goto exit;
> +                }
> +            } else {
> +                if (c2 == (int32_t)c2) {
and this is:
(common_type_signed)c2 == (common_type_signed)(int32_t)c2
and the two may indeed use a different type.
I'm not sure (from the context shown here) what the type of c2 was, to
determine what the common types are, but as an example, on my 64-bit system,
(long)-1 == (int32_t)-1
is true (the 64-bit value (long)-1 is equal to the sign-extended 64-bit
value created from the signed 32-bit value (int32_t)-1), while
(unsigned long)-1 == (uint32_t)(int32_t)-1
is false (the 32-bit unsigned value (uint32_t) -1 promotes without sign
extension to the 64-bit (unsigned long) type, and 0xffffffffffffffff is
not equal to 0x00000000ffffffff).
-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org
^ permalink raw reply	[flat|nested] 50+ messages in thread
* Re: [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding
  2021-02-04 16:29                     ` David Hildenbrand
@ 2021-02-04 16:57                       ` Eric Blake
  2021-02-04 17:33                       ` Richard Henderson
  1 sibling, 0 replies; 50+ messages in thread
From: Eric Blake @ 2021-02-04 16:57 UTC (permalink / raw)
  To: David Hildenbrand, Philippe Mathieu-Daudé,
	Richard W.M. Jones, Richard Henderson
  Cc: qemu-devel
On 2/4/21 10:29 AM, David Hildenbrand wrote:
>> +++ b/tcg/s390/tcg-target.c.inc
>> @@ -1094,10 +1094,16 @@ static int tgen_cmp(TCGContext *s, TCGType type,
>> TCGCond c, TCGReg r1,
>>                   op = (is_unsigned ? RIL_CLFI : RIL_CFI);
>>                   tcg_out_insn_RIL(s, op, r1, c2);
>>                   goto exit;
>> -            } else if (c2 == (is_unsigned ? (uint32_t)c2 :
>> (int32_t)c2)) {
>> -                op = (is_unsigned ? RIL_CLGFI : RIL_CGFI);
>> -                tcg_out_insn_RIL(s, op, r1, c2);
>> -                goto exit;
>> +            } else if (is_unsigned) {
>> +                if (c2 == (uint32_t)c2) {
>> +                    tcg_out_insn_RIL(s, RIL_CLGFI, r1, c2);
>> +                    goto exit;
>> +                }
>> +            } else {
>> +                if (c2 == (int32_t)c2) {
>> +                    tcg_out_insn_RIL(s, RIL_CGFI, r1, c2);
>> +                    goto exit;
>> +                }
>>               }
>>           }
>> ---
>>
> 
> This works as well I think. Broken casting.
> 
> diff --git a/tcg/s390/tcg-target.c.inc b/tcg/s390/tcg-target.c.inc
> index 8517e55232..f41ca02492 100644
> --- a/tcg/s390/tcg-target.c.inc
> +++ b/tcg/s390/tcg-target.c.inc
> @@ -1094,7 +1094,7 @@ static int tgen_cmp(TCGContext *s, TCGType type,
> TCGCond c, TCGReg r1,
>                  op = (is_unsigned ? RIL_CLFI : RIL_CFI);
>                  tcg_out_insn_RIL(s, op, r1, c2);
>                  goto exit;
> -            } else if (c2 == (is_unsigned ? (uint32_t)c2 : (int32_t)c2)) {
> +            } else if (c2 == (is_unsigned ? (TCGArg)(uint32_t)c2 :
> (TCGArg)(int32_t)c2)) {
Yes, that also solves your problem in fewer lines of code, by doing the
round-trip parsing through the intermediate type and back to the desired
common type all at one expression, rather than separated at two
different points where intermediate type promotion rules interfere with
the work.
>                  op = (is_unsigned ? RIL_CLGFI : RIL_CGFI);
> 
> (int32_t)c2 will be converted to (uint32_t) first, then to (TCGArg).
> Which is different than casting directly from (int32_t)c2 to (TCGArg).
Yep, the broken version was losing out on the desired sign extensions
because of the argument promotion rules of ?:
> 
> Nasty.
> 
> 
-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org
^ permalink raw reply	[flat|nested] 50+ messages in thread
* Re: [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding
  2021-02-04 16:29                     ` David Hildenbrand
  2021-02-04 16:57                       ` Eric Blake
@ 2021-02-04 17:33                       ` Richard Henderson
  1 sibling, 0 replies; 50+ messages in thread
From: Richard Henderson @ 2021-02-04 17:33 UTC (permalink / raw)
  To: David Hildenbrand, Philippe Mathieu-Daudé,
	Richard W.M. Jones
  Cc: qemu-devel
On 2/4/21 6:29 AM, David Hildenbrand wrote:
> On 04.02.21 17:04, Philippe Mathieu-Daudé wrote:
>> On 2/4/21 10:37 AM, David Hildenbrand wrote:
>>> On 04.02.21 10:29, Richard W.M. Jones wrote:
>>>>>> commit 8f17a975e60b773d7c366a81c0d9bbe304f30859
>>>>>> Author: Richard Henderson <richard.henderson@linaro.org>
>>>>>> Date:   Mon Mar 30 19:52:02 2020 -0700
>>>>>>
>>>>>>       tcg/optimize: Adjust TempOptInfo allocation
>>>>>>
>>>>>> The image boots just fine on s390x/TCG as well.
>>>>>
>>>>> Let me try this in a minute on my original test machine.
>>>>
>>>> I got the wrong end of the stick as David pointed out in the other email.
>>>>
>>>> However I did test things again this morning (all on s390 host), and
>>>> current head (1ed9228f63ea4b) fails same as before ("mount" command
>>>> fails).
>>>>
>>>> Also I downloaded:
>>>>
>>>>   
>>>> https://dl.fedoraproject.org/pub/fedora-secondary/releases/33/Cloud/s390x/images/Fedora-Cloud-Base-33-1.2.s390x.qcow2
>>>>
>>>>
>>>>
>>>> and booted it on 1ed9228f63ea4b using this command:
>>>>
>>>>     $ ~/d/qemu/build/s390x-softmmu/qemu-system-s390x -machine accel=tcg
>>>> -m 2048 -drive
>>>> file=Fedora-Cloud-Base-33-1.2.s390x.qcow2,format=qcow2,if=virtio
>>>> -serial stdio
>>>>
>>>> Lots of core dumps inside the guest, same as David saw.
>>>>
>>>> I then reset qemu back to 8f17a975e60b773d ("tcg/optimize: Adjust
>>>> TempOptInfo allocation"), rebuilt qemu, tested the same command and
>>>> cloud image, and that booted up much happier with no failures or core
>>>> dumps.
>>>>
>>>> Isn't it kind of weird that this would only affect an s390 host?  I
>>>> don't understand why the host would make a difference if we're doing
>>>> TCG.
>>>
>>> I assume an existing BUG in the s390x TCG backend ... which makes it
>>> harder to debug :)
>>
>> This seems to fix it:
>>
>> -- >8 --
>> diff --git a/tcg/s390/tcg-target.c.inc b/tcg/s390/tcg-target.c.inc
>> index 8517e552323..32d03dbfbaf 100644
>> --- a/tcg/s390/tcg-target.c.inc
>> +++ b/tcg/s390/tcg-target.c.inc
>> @@ -1094,10 +1094,16 @@ static int tgen_cmp(TCGContext *s, TCGType type,
>> TCGCond c, TCGReg r1,
>>                   op = (is_unsigned ? RIL_CLFI : RIL_CFI);
>>                   tcg_out_insn_RIL(s, op, r1, c2);
>>                   goto exit;
>> -            } else if (c2 == (is_unsigned ? (uint32_t)c2 : (int32_t)c2)) {
>> -                op = (is_unsigned ? RIL_CLGFI : RIL_CGFI);
>> -                tcg_out_insn_RIL(s, op, r1, c2);
>> -                goto exit;
>> +            } else if (is_unsigned) {
>> +                if (c2 == (uint32_t)c2) {
>> +                    tcg_out_insn_RIL(s, RIL_CLGFI, r1, c2);
>> +                    goto exit;
>> +                }
>> +            } else {
>> +                if (c2 == (int32_t)c2) {
>> +                    tcg_out_insn_RIL(s, RIL_CGFI, r1, c2);
>> +                    goto exit;
>> +                }
>>               }
>>           }
>> ---
>>
> 
> This works as well I think. Broken casting.
> 
> diff --git a/tcg/s390/tcg-target.c.inc b/tcg/s390/tcg-target.c.inc
> index 8517e55232..f41ca02492 100644
> --- a/tcg/s390/tcg-target.c.inc
> +++ b/tcg/s390/tcg-target.c.inc
> @@ -1094,7 +1094,7 @@ static int tgen_cmp(TCGContext *s, TCGType type, TCGCond
> c, TCGReg r1,
>                  op = (is_unsigned ? RIL_CLFI : RIL_CFI);
>                  tcg_out_insn_RIL(s, op, r1, c2);
>                  goto exit;
> -            } else if (c2 == (is_unsigned ? (uint32_t)c2 : (int32_t)c2)) {
> +            } else if (c2 == (is_unsigned ? (TCGArg)(uint32_t)c2 :
> (TCGArg)(int32_t)c2)) {
>                  op = (is_unsigned ? RIL_CLGFI : RIL_CGFI);
> 
> (int32_t)c2 will be converted to (uint32_t) first, then to (TCGArg).
> Which is different than casting directly from (int32_t)c2 to (TCGArg).
> 
> Nasty.
Oh excellent.  Thanks for the find and analysis, guys.
I totally agree that this is a bug.
And it makes sense that extra constant propagation from the optimizer from the
patch in question might tickle this problem.
r~
^ permalink raw reply	[flat|nested] 50+ messages in thread
* Re: [PULL 14/24] tcg: Use tcg_constant_{i32,i64} with tcg int expanders
  2021-01-14  2:16 ` [PULL 14/24] tcg: Use tcg_constant_{i32,i64} with tcg int expanders Richard Henderson
@ 2021-02-10 10:38   ` Alex Bennée
  0 siblings, 0 replies; 50+ messages in thread
From: Alex Bennée @ 2021-02-10 10:38 UTC (permalink / raw)
  To: Richard Henderson; +Cc: peter.maydell, cota, qemu-devel
Richard Henderson <richard.henderson@linaro.org> writes:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Hi Richard,
I've just tracked this commit down to breaking the plugin inline support
(qemu_plugin_register_vcpu_insn_exec_inline). It wasn't picked up by CI
because inline isn't the default (normal callbacks are). I was just
adding inline support into check-tcg to workaround the rep issue for x86
and ran into the following:
  ./qemu-i386 -plugin tests/plugin/libinsn.so,arg=inline -d plugin sha1
Gives:
  Thread 1 "qemu-i386" received signal SIGSEGV, Segmentation fault.
  0x000055555565bbad in test_bit (addr=<optimized out>, nr=<optimized out>) at /home/alex.bennee/lsrc/qemu.git/include/qemu/bitops.h:135
  135         return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
  (gdb) bt
  #0  0x000055555565bbad in test_bit (addr=<optimized out>, nr=<optimized out>) at /home/alex.bennee/lsrc/qemu.git/include/qemu/bitops.h:135
  #1  init_ts_info (temps_used=temps_used@entry=0x7fffffffd9b0, ts=0x7ffff4521018 <insn_count>) at ../../tcg/optimize.c:97
  #2  0x000055555565bde0 in init_arg_info (arg=<optimized out>, temps_used=0x7fffffffd9b0) at ../../tcg/optimize.c:126
  #3  tcg_optimize (s=s@entry=0x555555bd48c0 <tcg_init_ctx>) at ../../tcg/optimize.c:641
  #4  0x000055555562e729 in tcg_gen_code (s=0x555555bd48c0 <tcg_init_ctx>, tb=tb@entry=0x7fffe8000040 <code_gen_buffer+19>) at ../../tcg/tcg.c:4401
  #5  0x0000555555691e5b in tb_gen_code (cpu=cpu@entry=0x555555c46c00, pc=pc@entry=134519520, cs_base=cs_base@entry=0, flags=flags@entry=4194483, cflags=-16777216, cflags@entry=0) at ../../accel/tcg/translate-all.c:1952
  #6  0x000055555565ed72 in tb_find (cf_mask=0, tb_exit=0, last_tb=0x0, cpu=0x555555c46c00) at ../../accel/tcg/cpu-exec.c:454
  #7  cpu_exec (cpu=cpu@entry=0x555555c46c00) at ../../accel/tcg/cpu-exec.c:810
  #8  0x00005555555db848 in cpu_loop (env=0x555555c4eef0) at ../../linux-user/i386/cpu_loop.c:207
  #9  0x00005555555d28da in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../../linux-user/main.c:859
  (gdb)
> ---
>  include/tcg/tcg-op.h |  13 +--
>  tcg/tcg-op.c         | 227 ++++++++++++++++++++-----------------------
>  2 files changed, 109 insertions(+), 131 deletions(-)
>
> diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
> index 901b19f32a..ed8de045e2 100644
> --- a/include/tcg/tcg-op.h
> +++ b/include/tcg/tcg-op.h
> @@ -271,6 +271,7 @@ void tcg_gen_mb(TCGBar);
>  
>  /* 32 bit ops */
>  
> +void tcg_gen_movi_i32(TCGv_i32 ret, int32_t arg);
>  void tcg_gen_addi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2);
>  void tcg_gen_subfi_i32(TCGv_i32 ret, int32_t arg1, TCGv_i32 arg2);
>  void tcg_gen_subi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2);
> @@ -349,11 +350,6 @@ static inline void tcg_gen_mov_i32(TCGv_i32 ret, TCGv_i32 arg)
>      }
>  }
>  
> -static inline void tcg_gen_movi_i32(TCGv_i32 ret, int32_t arg)
> -{
> -    tcg_gen_op2i_i32(INDEX_op_movi_i32, ret, arg);
> -}
> -
>  static inline void tcg_gen_ld8u_i32(TCGv_i32 ret, TCGv_ptr arg2,
>                                      tcg_target_long offset)
>  {
> @@ -467,6 +463,7 @@ static inline void tcg_gen_not_i32(TCGv_i32 ret, TCGv_i32 arg)
>  
>  /* 64 bit ops */
>  
> +void tcg_gen_movi_i64(TCGv_i64 ret, int64_t arg);
>  void tcg_gen_addi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2);
>  void tcg_gen_subfi_i64(TCGv_i64 ret, int64_t arg1, TCGv_i64 arg2);
>  void tcg_gen_subi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2);
> @@ -550,11 +547,6 @@ static inline void tcg_gen_mov_i64(TCGv_i64 ret, TCGv_i64 arg)
>      }
>  }
>  
> -static inline void tcg_gen_movi_i64(TCGv_i64 ret, int64_t arg)
> -{
> -    tcg_gen_op2i_i64(INDEX_op_movi_i64, ret, arg);
> -}
> -
>  static inline void tcg_gen_ld8u_i64(TCGv_i64 ret, TCGv_ptr arg2,
>                                      tcg_target_long offset)
>  {
> @@ -698,7 +690,6 @@ static inline void tcg_gen_sub_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
>  
>  void tcg_gen_discard_i64(TCGv_i64 arg);
>  void tcg_gen_mov_i64(TCGv_i64 ret, TCGv_i64 arg);
> -void tcg_gen_movi_i64(TCGv_i64 ret, int64_t arg);
>  void tcg_gen_ld8u_i64(TCGv_i64 ret, TCGv_ptr arg2, tcg_target_long offset);
>  void tcg_gen_ld8s_i64(TCGv_i64 ret, TCGv_ptr arg2, tcg_target_long offset);
>  void tcg_gen_ld16u_i64(TCGv_i64 ret, TCGv_ptr arg2, tcg_target_long offset);
> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index 0374b5d56d..70475773f4 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -104,15 +104,18 @@ void tcg_gen_mb(TCGBar mb_type)
>  
>  /* 32 bit ops */
>  
> +void tcg_gen_movi_i32(TCGv_i32 ret, int32_t arg)
> +{
> +    tcg_gen_mov_i32(ret, tcg_constant_i32(arg));
> +}
> +
>  void tcg_gen_addi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
>  {
>      /* some cases can be optimized here */
>      if (arg2 == 0) {
>          tcg_gen_mov_i32(ret, arg1);
>      } else {
> -        TCGv_i32 t0 = tcg_const_i32(arg2);
> -        tcg_gen_add_i32(ret, arg1, t0);
> -        tcg_temp_free_i32(t0);
> +        tcg_gen_add_i32(ret, arg1, tcg_constant_i32(arg2));
>      }
>  }
>  
> @@ -122,9 +125,7 @@ void tcg_gen_subfi_i32(TCGv_i32 ret, int32_t arg1, TCGv_i32 arg2)
>          /* Don't recurse with tcg_gen_neg_i32.  */
>          tcg_gen_op2_i32(INDEX_op_neg_i32, ret, arg2);
>      } else {
> -        TCGv_i32 t0 = tcg_const_i32(arg1);
> -        tcg_gen_sub_i32(ret, t0, arg2);
> -        tcg_temp_free_i32(t0);
> +        tcg_gen_sub_i32(ret, tcg_constant_i32(arg1), arg2);
>      }
>  }
>  
> @@ -134,15 +135,12 @@ void tcg_gen_subi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
>      if (arg2 == 0) {
>          tcg_gen_mov_i32(ret, arg1);
>      } else {
> -        TCGv_i32 t0 = tcg_const_i32(arg2);
> -        tcg_gen_sub_i32(ret, arg1, t0);
> -        tcg_temp_free_i32(t0);
> +        tcg_gen_sub_i32(ret, arg1, tcg_constant_i32(arg2));
>      }
>  }
>  
>  void tcg_gen_andi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
>  {
> -    TCGv_i32 t0;
>      /* Some cases can be optimized here.  */
>      switch (arg2) {
>      case 0:
> @@ -165,9 +163,8 @@ void tcg_gen_andi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
>          }
>          break;
>      }
> -    t0 = tcg_const_i32(arg2);
> -    tcg_gen_and_i32(ret, arg1, t0);
> -    tcg_temp_free_i32(t0);
> +
> +    tcg_gen_and_i32(ret, arg1, tcg_constant_i32(arg2));
>  }
>  
>  void tcg_gen_ori_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
> @@ -178,9 +175,7 @@ void tcg_gen_ori_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
>      } else if (arg2 == 0) {
>          tcg_gen_mov_i32(ret, arg1);
>      } else {
> -        TCGv_i32 t0 = tcg_const_i32(arg2);
> -        tcg_gen_or_i32(ret, arg1, t0);
> -        tcg_temp_free_i32(t0);
> +        tcg_gen_or_i32(ret, arg1, tcg_constant_i32(arg2));
>      }
>  }
>  
> @@ -193,9 +188,7 @@ void tcg_gen_xori_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
>          /* Don't recurse with tcg_gen_not_i32.  */
>          tcg_gen_op2_i32(INDEX_op_not_i32, ret, arg1);
>      } else {
> -        TCGv_i32 t0 = tcg_const_i32(arg2);
> -        tcg_gen_xor_i32(ret, arg1, t0);
> -        tcg_temp_free_i32(t0);
> +        tcg_gen_xor_i32(ret, arg1, tcg_constant_i32(arg2));
>      }
>  }
>  
> @@ -205,9 +198,7 @@ void tcg_gen_shli_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
>      if (arg2 == 0) {
>          tcg_gen_mov_i32(ret, arg1);
>      } else {
> -        TCGv_i32 t0 = tcg_const_i32(arg2);
> -        tcg_gen_shl_i32(ret, arg1, t0);
> -        tcg_temp_free_i32(t0);
> +        tcg_gen_shl_i32(ret, arg1, tcg_constant_i32(arg2));
>      }
>  }
>  
> @@ -217,9 +208,7 @@ void tcg_gen_shri_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
>      if (arg2 == 0) {
>          tcg_gen_mov_i32(ret, arg1);
>      } else {
> -        TCGv_i32 t0 = tcg_const_i32(arg2);
> -        tcg_gen_shr_i32(ret, arg1, t0);
> -        tcg_temp_free_i32(t0);
> +        tcg_gen_shr_i32(ret, arg1, tcg_constant_i32(arg2));
>      }
>  }
>  
> @@ -229,9 +218,7 @@ void tcg_gen_sari_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
>      if (arg2 == 0) {
>          tcg_gen_mov_i32(ret, arg1);
>      } else {
> -        TCGv_i32 t0 = tcg_const_i32(arg2);
> -        tcg_gen_sar_i32(ret, arg1, t0);
> -        tcg_temp_free_i32(t0);
> +        tcg_gen_sar_i32(ret, arg1, tcg_constant_i32(arg2));
>      }
>  }
>  
> @@ -250,9 +237,7 @@ void tcg_gen_brcondi_i32(TCGCond cond, TCGv_i32 arg1, int32_t arg2, TCGLabel *l)
>      if (cond == TCG_COND_ALWAYS) {
>          tcg_gen_br(l);
>      } else if (cond != TCG_COND_NEVER) {
> -        TCGv_i32 t0 = tcg_const_i32(arg2);
> -        tcg_gen_brcond_i32(cond, arg1, t0, l);
> -        tcg_temp_free_i32(t0);
> +        tcg_gen_brcond_i32(cond, arg1, tcg_constant_i32(arg2), l);
>      }
>  }
>  
> @@ -271,9 +256,7 @@ void tcg_gen_setcond_i32(TCGCond cond, TCGv_i32 ret,
>  void tcg_gen_setcondi_i32(TCGCond cond, TCGv_i32 ret,
>                            TCGv_i32 arg1, int32_t arg2)
>  {
> -    TCGv_i32 t0 = tcg_const_i32(arg2);
> -    tcg_gen_setcond_i32(cond, ret, arg1, t0);
> -    tcg_temp_free_i32(t0);
> +    tcg_gen_setcond_i32(cond, ret, arg1, tcg_constant_i32(arg2));
>  }
>  
>  void tcg_gen_muli_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
> @@ -283,9 +266,7 @@ void tcg_gen_muli_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
>      } else if (is_power_of_2(arg2)) {
>          tcg_gen_shli_i32(ret, arg1, ctz32(arg2));
>      } else {
> -        TCGv_i32 t0 = tcg_const_i32(arg2);
> -        tcg_gen_mul_i32(ret, arg1, t0);
> -        tcg_temp_free_i32(t0);
> +        tcg_gen_mul_i32(ret, arg1, tcg_constant_i32(arg2));
>      }
>  }
>  
> @@ -433,9 +414,7 @@ void tcg_gen_clz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
>  
>  void tcg_gen_clzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2)
>  {
> -    TCGv_i32 t = tcg_const_i32(arg2);
> -    tcg_gen_clz_i32(ret, arg1, t);
> -    tcg_temp_free_i32(t);
> +    tcg_gen_clz_i32(ret, arg1, tcg_constant_i32(arg2));
>  }
>  
>  void tcg_gen_ctz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
> @@ -468,10 +447,9 @@ void tcg_gen_ctz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
>              tcg_gen_clzi_i32(t, t, 32);
>              tcg_gen_xori_i32(t, t, 31);
>          }
> -        z = tcg_const_i32(0);
> +        z = tcg_constant_i32(0);
>          tcg_gen_movcond_i32(TCG_COND_EQ, ret, arg1, z, arg2, t);
>          tcg_temp_free_i32(t);
> -        tcg_temp_free_i32(z);
>      } else {
>          gen_helper_ctz_i32(ret, arg1, arg2);
>      }
> @@ -487,9 +465,7 @@ void tcg_gen_ctzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2)
>          tcg_gen_ctpop_i32(ret, t);
>          tcg_temp_free_i32(t);
>      } else {
> -        TCGv_i32 t = tcg_const_i32(arg2);
> -        tcg_gen_ctz_i32(ret, arg1, t);
> -        tcg_temp_free_i32(t);
> +        tcg_gen_ctz_i32(ret, arg1, tcg_constant_i32(arg2));
>      }
>  }
>  
> @@ -547,9 +523,7 @@ void tcg_gen_rotli_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
>      if (arg2 == 0) {
>          tcg_gen_mov_i32(ret, arg1);
>      } else if (TCG_TARGET_HAS_rot_i32) {
> -        TCGv_i32 t0 = tcg_const_i32(arg2);
> -        tcg_gen_rotl_i32(ret, arg1, t0);
> -        tcg_temp_free_i32(t0);
> +        tcg_gen_rotl_i32(ret, arg1, tcg_constant_i32(arg2));
>      } else {
>          TCGv_i32 t0, t1;
>          t0 = tcg_temp_new_i32();
> @@ -653,9 +627,8 @@ void tcg_gen_deposit_z_i32(TCGv_i32 ret, TCGv_i32 arg,
>          tcg_gen_andi_i32(ret, arg, (1u << len) - 1);
>      } else if (TCG_TARGET_HAS_deposit_i32
>                 && TCG_TARGET_deposit_i32_valid(ofs, len)) {
> -        TCGv_i32 zero = tcg_const_i32(0);
> +        TCGv_i32 zero = tcg_constant_i32(0);
>          tcg_gen_op5ii_i32(INDEX_op_deposit_i32, ret, zero, arg, ofs, len);
> -        tcg_temp_free_i32(zero);
>      } else {
>          /* To help two-operand hosts we prefer to zero-extend first,
>             which allows ARG to stay live.  */
> @@ -1052,7 +1025,7 @@ void tcg_gen_bswap32_i32(TCGv_i32 ret, TCGv_i32 arg)
>      } else {
>          TCGv_i32 t0 = tcg_temp_new_i32();
>          TCGv_i32 t1 = tcg_temp_new_i32();
> -        TCGv_i32 t2 = tcg_const_i32(0x00ff00ff);
> +        TCGv_i32 t2 = tcg_constant_i32(0x00ff00ff);
>  
>                                          /* arg = abcd */
>          tcg_gen_shri_i32(t0, arg, 8);   /*  t0 = .abc */
> @@ -1067,7 +1040,6 @@ void tcg_gen_bswap32_i32(TCGv_i32 ret, TCGv_i32 arg)
>  
>          tcg_temp_free_i32(t0);
>          tcg_temp_free_i32(t1);
> -        tcg_temp_free_i32(t2);
>      }
>  }
>  
> @@ -1114,8 +1086,15 @@ void tcg_gen_discard_i64(TCGv_i64 arg)
>  
>  void tcg_gen_mov_i64(TCGv_i64 ret, TCGv_i64 arg)
>  {
> -    tcg_gen_mov_i32(TCGV_LOW(ret), TCGV_LOW(arg));
> -    tcg_gen_mov_i32(TCGV_HIGH(ret), TCGV_HIGH(arg));
> +    TCGTemp *ts = tcgv_i64_temp(arg);
> +
> +    /* Canonicalize TCGv_i64 TEMP_CONST into TCGv_i32 TEMP_CONST. */
> +    if (ts->kind == TEMP_CONST) {
> +        tcg_gen_movi_i64(ret, ts->val);
> +    } else {
> +        tcg_gen_mov_i32(TCGV_LOW(ret), TCGV_LOW(arg));
> +        tcg_gen_mov_i32(TCGV_HIGH(ret), TCGV_HIGH(arg));
> +    }
>  }
>  
>  void tcg_gen_movi_i64(TCGv_i64 ret, int64_t arg)
> @@ -1237,6 +1216,14 @@ void tcg_gen_mul_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
>      tcg_temp_free_i64(t0);
>      tcg_temp_free_i32(t1);
>  }
> +
> +#else
> +
> +void tcg_gen_movi_i64(TCGv_i64 ret, int64_t arg)
> +{
> +    tcg_gen_mov_i64(ret, tcg_constant_i64(arg));
> +}
> +
>  #endif /* TCG_TARGET_REG_SIZE == 32 */
>  
>  void tcg_gen_addi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
> @@ -1244,10 +1231,12 @@ void tcg_gen_addi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
>      /* some cases can be optimized here */
>      if (arg2 == 0) {
>          tcg_gen_mov_i64(ret, arg1);
> +    } else if (TCG_TARGET_REG_BITS == 64) {
> +        tcg_gen_add_i64(ret, arg1, tcg_constant_i64(arg2));
>      } else {
> -        TCGv_i64 t0 = tcg_const_i64(arg2);
> -        tcg_gen_add_i64(ret, arg1, t0);
> -        tcg_temp_free_i64(t0);
> +        tcg_gen_add2_i32(TCGV_LOW(ret), TCGV_HIGH(ret),
> +                         TCGV_LOW(arg1), TCGV_HIGH(arg1),
> +                         tcg_constant_i32(arg2), tcg_constant_i32(arg2 >> 32));
>      }
>  }
>  
> @@ -1256,10 +1245,12 @@ void tcg_gen_subfi_i64(TCGv_i64 ret, int64_t arg1, TCGv_i64 arg2)
>      if (arg1 == 0 && TCG_TARGET_HAS_neg_i64) {
>          /* Don't recurse with tcg_gen_neg_i64.  */
>          tcg_gen_op2_i64(INDEX_op_neg_i64, ret, arg2);
> +    } else if (TCG_TARGET_REG_BITS == 64) {
> +        tcg_gen_sub_i64(ret, tcg_constant_i64(arg1), arg2);
>      } else {
> -        TCGv_i64 t0 = tcg_const_i64(arg1);
> -        tcg_gen_sub_i64(ret, t0, arg2);
> -        tcg_temp_free_i64(t0);
> +        tcg_gen_sub2_i32(TCGV_LOW(ret), TCGV_HIGH(ret),
> +                         tcg_constant_i32(arg1), tcg_constant_i32(arg1 >> 32),
> +                         TCGV_LOW(arg2), TCGV_HIGH(arg2));
>      }
>  }
>  
> @@ -1268,17 +1259,17 @@ void tcg_gen_subi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
>      /* some cases can be optimized here */
>      if (arg2 == 0) {
>          tcg_gen_mov_i64(ret, arg1);
> +    } else if (TCG_TARGET_REG_BITS == 64) {
> +        tcg_gen_sub_i64(ret, arg1, tcg_constant_i64(arg2));
>      } else {
> -        TCGv_i64 t0 = tcg_const_i64(arg2);
> -        tcg_gen_sub_i64(ret, arg1, t0);
> -        tcg_temp_free_i64(t0);
> +        tcg_gen_sub2_i32(TCGV_LOW(ret), TCGV_HIGH(ret),
> +                         TCGV_LOW(arg1), TCGV_HIGH(arg1),
> +                         tcg_constant_i32(arg2), tcg_constant_i32(arg2 >> 32));
>      }
>  }
>  
>  void tcg_gen_andi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
>  {
> -    TCGv_i64 t0;
> -
>      if (TCG_TARGET_REG_BITS == 32) {
>          tcg_gen_andi_i32(TCGV_LOW(ret), TCGV_LOW(arg1), arg2);
>          tcg_gen_andi_i32(TCGV_HIGH(ret), TCGV_HIGH(arg1), arg2 >> 32);
> @@ -1313,9 +1304,8 @@ void tcg_gen_andi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
>          }
>          break;
>      }
> -    t0 = tcg_const_i64(arg2);
> -    tcg_gen_and_i64(ret, arg1, t0);
> -    tcg_temp_free_i64(t0);
> +
> +    tcg_gen_and_i64(ret, arg1, tcg_constant_i64(arg2));
>  }
>  
>  void tcg_gen_ori_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
> @@ -1331,9 +1321,7 @@ void tcg_gen_ori_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
>      } else if (arg2 == 0) {
>          tcg_gen_mov_i64(ret, arg1);
>      } else {
> -        TCGv_i64 t0 = tcg_const_i64(arg2);
> -        tcg_gen_or_i64(ret, arg1, t0);
> -        tcg_temp_free_i64(t0);
> +        tcg_gen_or_i64(ret, arg1, tcg_constant_i64(arg2));
>      }
>  }
>  
> @@ -1351,9 +1339,7 @@ void tcg_gen_xori_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
>          /* Don't recurse with tcg_gen_not_i64.  */
>          tcg_gen_op2_i64(INDEX_op_not_i64, ret, arg1);
>      } else {
> -        TCGv_i64 t0 = tcg_const_i64(arg2);
> -        tcg_gen_xor_i64(ret, arg1, t0);
> -        tcg_temp_free_i64(t0);
> +        tcg_gen_xor_i64(ret, arg1, tcg_constant_i64(arg2));
>      }
>  }
>  
> @@ -1415,9 +1401,7 @@ void tcg_gen_shli_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
>      } else if (arg2 == 0) {
>          tcg_gen_mov_i64(ret, arg1);
>      } else {
> -        TCGv_i64 t0 = tcg_const_i64(arg2);
> -        tcg_gen_shl_i64(ret, arg1, t0);
> -        tcg_temp_free_i64(t0);
> +        tcg_gen_shl_i64(ret, arg1, tcg_constant_i64(arg2));
>      }
>  }
>  
> @@ -1429,9 +1413,7 @@ void tcg_gen_shri_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
>      } else if (arg2 == 0) {
>          tcg_gen_mov_i64(ret, arg1);
>      } else {
> -        TCGv_i64 t0 = tcg_const_i64(arg2);
> -        tcg_gen_shr_i64(ret, arg1, t0);
> -        tcg_temp_free_i64(t0);
> +        tcg_gen_shr_i64(ret, arg1, tcg_constant_i64(arg2));
>      }
>  }
>  
> @@ -1443,9 +1425,7 @@ void tcg_gen_sari_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
>      } else if (arg2 == 0) {
>          tcg_gen_mov_i64(ret, arg1);
>      } else {
> -        TCGv_i64 t0 = tcg_const_i64(arg2);
> -        tcg_gen_sar_i64(ret, arg1, t0);
> -        tcg_temp_free_i64(t0);
> +        tcg_gen_sar_i64(ret, arg1, tcg_constant_i64(arg2));
>      }
>  }
>  
> @@ -1468,12 +1448,17 @@ void tcg_gen_brcond_i64(TCGCond cond, TCGv_i64 arg1, TCGv_i64 arg2, TCGLabel *l)
>  
>  void tcg_gen_brcondi_i64(TCGCond cond, TCGv_i64 arg1, int64_t arg2, TCGLabel *l)
>  {
> -    if (cond == TCG_COND_ALWAYS) {
> +    if (TCG_TARGET_REG_BITS == 64) {
> +        tcg_gen_brcond_i64(cond, arg1, tcg_constant_i64(arg2), l);
> +    } else if (cond == TCG_COND_ALWAYS) {
>          tcg_gen_br(l);
>      } else if (cond != TCG_COND_NEVER) {
> -        TCGv_i64 t0 = tcg_const_i64(arg2);
> -        tcg_gen_brcond_i64(cond, arg1, t0, l);
> -        tcg_temp_free_i64(t0);
> +        l->refs++;
> +        tcg_gen_op6ii_i32(INDEX_op_brcond2_i32,
> +                          TCGV_LOW(arg1), TCGV_HIGH(arg1),
> +                          tcg_constant_i32(arg2),
> +                          tcg_constant_i32(arg2 >> 32),
> +                          cond, label_arg(l));
>      }
>  }
>  
> @@ -1499,9 +1484,19 @@ void tcg_gen_setcond_i64(TCGCond cond, TCGv_i64 ret,
>  void tcg_gen_setcondi_i64(TCGCond cond, TCGv_i64 ret,
>                            TCGv_i64 arg1, int64_t arg2)
>  {
> -    TCGv_i64 t0 = tcg_const_i64(arg2);
> -    tcg_gen_setcond_i64(cond, ret, arg1, t0);
> -    tcg_temp_free_i64(t0);
> +    if (TCG_TARGET_REG_BITS == 64) {
> +        tcg_gen_setcond_i64(cond, ret, arg1, tcg_constant_i64(arg2));
> +    } else if (cond == TCG_COND_ALWAYS) {
> +        tcg_gen_movi_i64(ret, 1);
> +    } else if (cond == TCG_COND_NEVER) {
> +        tcg_gen_movi_i64(ret, 0);
> +    } else {
> +        tcg_gen_op6i_i32(INDEX_op_setcond2_i32, TCGV_LOW(ret),
> +                         TCGV_LOW(arg1), TCGV_HIGH(arg1),
> +                         tcg_constant_i32(arg2),
> +                         tcg_constant_i32(arg2 >> 32), cond);
> +        tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
> +    }
>  }
>  
>  void tcg_gen_muli_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
> @@ -1690,7 +1685,7 @@ void tcg_gen_bswap32_i64(TCGv_i64 ret, TCGv_i64 arg)
>      } else {
>          TCGv_i64 t0 = tcg_temp_new_i64();
>          TCGv_i64 t1 = tcg_temp_new_i64();
> -        TCGv_i64 t2 = tcg_const_i64(0x00ff00ff);
> +        TCGv_i64 t2 = tcg_constant_i64(0x00ff00ff);
>  
>                                          /* arg = ....abcd */
>          tcg_gen_shri_i64(t0, arg, 8);   /*  t0 = .....abc */
> @@ -1706,7 +1701,6 @@ void tcg_gen_bswap32_i64(TCGv_i64 ret, TCGv_i64 arg)
>  
>          tcg_temp_free_i64(t0);
>          tcg_temp_free_i64(t1);
> -        tcg_temp_free_i64(t2);
>      }
>  }
>  
> @@ -1850,16 +1844,16 @@ void tcg_gen_clzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2)
>      if (TCG_TARGET_REG_BITS == 32
>          && TCG_TARGET_HAS_clz_i32
>          && arg2 <= 0xffffffffu) {
> -        TCGv_i32 t = tcg_const_i32((uint32_t)arg2 - 32);
> -        tcg_gen_clz_i32(t, TCGV_LOW(arg1), t);
> +        TCGv_i32 t = tcg_temp_new_i32();
> +        tcg_gen_clzi_i32(t, TCGV_LOW(arg1), arg2 - 32);
>          tcg_gen_addi_i32(t, t, 32);
>          tcg_gen_clz_i32(TCGV_LOW(ret), TCGV_HIGH(arg1), t);
>          tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
>          tcg_temp_free_i32(t);
>      } else {
> -        TCGv_i64 t = tcg_const_i64(arg2);
> -        tcg_gen_clz_i64(ret, arg1, t);
> -        tcg_temp_free_i64(t);
> +        TCGv_i64 t0 = tcg_const_i64(arg2);
> +        tcg_gen_clz_i64(ret, arg1, t0);
> +        tcg_temp_free_i64(t0);
>      }
>  }
>  
> @@ -1881,7 +1875,7 @@ void tcg_gen_ctz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
>              tcg_gen_clzi_i64(t, t, 64);
>              tcg_gen_xori_i64(t, t, 63);
>          }
> -        z = tcg_const_i64(0);
> +        z = tcg_constant_i64(0);
>          tcg_gen_movcond_i64(TCG_COND_EQ, ret, arg1, z, arg2, t);
>          tcg_temp_free_i64(t);
>          tcg_temp_free_i64(z);
> @@ -1895,8 +1889,8 @@ void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2)
>      if (TCG_TARGET_REG_BITS == 32
>          && TCG_TARGET_HAS_ctz_i32
>          && arg2 <= 0xffffffffu) {
> -        TCGv_i32 t32 = tcg_const_i32((uint32_t)arg2 - 32);
> -        tcg_gen_ctz_i32(t32, TCGV_HIGH(arg1), t32);
> +        TCGv_i32 t32 = tcg_temp_new_i32();
> +        tcg_gen_ctzi_i32(t32, TCGV_HIGH(arg1), arg2 - 32);
>          tcg_gen_addi_i32(t32, t32, 32);
>          tcg_gen_ctz_i32(TCGV_LOW(ret), TCGV_LOW(arg1), t32);
>          tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
> @@ -1911,9 +1905,9 @@ void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2)
>          tcg_gen_ctpop_i64(ret, t);
>          tcg_temp_free_i64(t);
>      } else {
> -        TCGv_i64 t64 = tcg_const_i64(arg2);
> -        tcg_gen_ctz_i64(ret, arg1, t64);
> -        tcg_temp_free_i64(t64);
> +        TCGv_i64 t0 = tcg_const_i64(arg2);
> +        tcg_gen_ctz_i64(ret, arg1, t0);
> +        tcg_temp_free_i64(t0);
>      }
>  }
>  
> @@ -1969,9 +1963,7 @@ void tcg_gen_rotli_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
>      if (arg2 == 0) {
>          tcg_gen_mov_i64(ret, arg1);
>      } else if (TCG_TARGET_HAS_rot_i64) {
> -        TCGv_i64 t0 = tcg_const_i64(arg2);
> -        tcg_gen_rotl_i64(ret, arg1, t0);
> -        tcg_temp_free_i64(t0);
> +        tcg_gen_rotl_i64(ret, arg1, tcg_constant_i64(arg2));
>      } else {
>          TCGv_i64 t0, t1;
>          t0 = tcg_temp_new_i64();
> @@ -2089,9 +2081,8 @@ void tcg_gen_deposit_z_i64(TCGv_i64 ret, TCGv_i64 arg,
>          tcg_gen_andi_i64(ret, arg, (1ull << len) - 1);
>      } else if (TCG_TARGET_HAS_deposit_i64
>                 && TCG_TARGET_deposit_i64_valid(ofs, len)) {
> -        TCGv_i64 zero = tcg_const_i64(0);
> +        TCGv_i64 zero = tcg_constant_i64(0);
>          tcg_gen_op5ii_i64(INDEX_op_deposit_i64, ret, zero, arg, ofs, len);
> -        tcg_temp_free_i64(zero);
>      } else {
>          if (TCG_TARGET_REG_BITS == 32) {
>              if (ofs >= 32) {
> @@ -3117,9 +3108,8 @@ void tcg_gen_atomic_cmpxchg_i32(TCGv_i32 retv, TCGv addr, TCGv_i32 cmpv,
>  
>  #ifdef CONFIG_SOFTMMU
>          {
> -            TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop & ~MO_SIGN, idx));
> -            gen(retv, cpu_env, addr, cmpv, newv, oi);
> -            tcg_temp_free_i32(oi);
> +            TCGMemOpIdx oi = make_memop_idx(memop & ~MO_SIGN, idx);
> +            gen(retv, cpu_env, addr, cmpv, newv, tcg_constant_i32(oi));
>          }
>  #else
>          gen(retv, cpu_env, addr, cmpv, newv);
> @@ -3162,9 +3152,8 @@ void tcg_gen_atomic_cmpxchg_i64(TCGv_i64 retv, TCGv addr, TCGv_i64 cmpv,
>  
>  #ifdef CONFIG_SOFTMMU
>          {
> -            TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop, idx));
> -            gen(retv, cpu_env, addr, cmpv, newv, oi);
> -            tcg_temp_free_i32(oi);
> +            TCGMemOpIdx oi = make_memop_idx(memop, idx);
> +            gen(retv, cpu_env, addr, cmpv, newv, tcg_constant_i32(oi));
>          }
>  #else
>          gen(retv, cpu_env, addr, cmpv, newv);
> @@ -3226,9 +3215,8 @@ static void do_atomic_op_i32(TCGv_i32 ret, TCGv addr, TCGv_i32 val,
>  
>  #ifdef CONFIG_SOFTMMU
>      {
> -        TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop & ~MO_SIGN, idx));
> -        gen(ret, cpu_env, addr, val, oi);
> -        tcg_temp_free_i32(oi);
> +        TCGMemOpIdx oi = make_memop_idx(memop & ~MO_SIGN, idx);
> +        gen(ret, cpu_env, addr, val, tcg_constant_i32(oi));
>      }
>  #else
>      gen(ret, cpu_env, addr, val);
> @@ -3272,9 +3260,8 @@ static void do_atomic_op_i64(TCGv_i64 ret, TCGv addr, TCGv_i64 val,
>  
>  #ifdef CONFIG_SOFTMMU
>          {
> -            TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop & ~MO_SIGN, idx));
> -            gen(ret, cpu_env, addr, val, oi);
> -            tcg_temp_free_i32(oi);
> +            TCGMemOpIdx oi = make_memop_idx(memop & ~MO_SIGN, idx);
> +            gen(ret, cpu_env, addr, val, tcg_constant_i32(oi));
>          }
>  #else
>          gen(ret, cpu_env, addr, val);
-- 
Alex Bennée
^ permalink raw reply	[flat|nested] 50+ messages in thread
end of thread, other threads:[~2021-02-10 10:50 UTC | newest]
Thread overview: 50+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-01-14  2:16 [PULL 00/24] tcg patch queue Richard Henderson
2021-01-14  2:16 ` [PULL 01/24] tcg: Use tcg_out_dupi_vec from temp_load Richard Henderson
2021-01-14  2:16 ` [PULL 02/24] tcg: Increase tcg_out_dupi_vec immediate to int64_t Richard Henderson
2021-01-14  2:16 ` [PULL 03/24] tcg: Consolidate 3 bits into enum TCGTempKind Richard Henderson
2021-01-14  2:16 ` [PULL 04/24] tcg: Add temp_readonly Richard Henderson
2021-01-14  2:16 ` [PULL 05/24] tcg: Expand TCGTemp.val to 64-bits Richard Henderson
2021-01-14  2:16 ` [PULL 06/24] tcg: Rename struct tcg_temp_info to TempOptInfo Richard Henderson
2021-01-14  2:16 ` [PULL 07/24] tcg: Expand TempOptInfo to 64-bits Richard Henderson
2021-01-14  2:16 ` [PULL 08/24] tcg: Introduce TYPE_CONST temporaries Richard Henderson
2021-01-14  2:16 ` [PULL 09/24] tcg/optimize: Improve find_better_copy Richard Henderson
2021-01-14  2:16 ` [PULL 10/24] tcg/optimize: Adjust TempOptInfo allocation Richard Henderson
2021-01-14  2:16 ` [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding Richard Henderson
2021-01-15 23:03   ` Alistair Francis
2021-01-16 17:24     ` Richard Henderson
2021-01-18 20:17       ` Laurent Vivier
2021-01-18 21:03         ` Richard Henderson
2021-02-01 20:45   ` Richard W.M. Jones
2021-02-01 20:53     ` Richard W.M. Jones
2021-02-04  2:22     ` Richard Henderson
2021-02-04  6:41       ` David Hildenbrand
2021-02-04  7:55         ` David Hildenbrand
2021-02-04  8:38           ` David Hildenbrand
2021-02-04  9:03             ` Richard W.M. Jones
2021-02-04  9:12               ` David Hildenbrand
2021-02-04  9:29               ` Richard W.M. Jones
2021-02-04  9:30                 ` Richard W.M. Jones
2021-02-04  9:37                 ` David Hildenbrand
2021-02-04 16:04                   ` Philippe Mathieu-Daudé
2021-02-04 16:08                     ` Philippe Mathieu-Daudé
2021-02-04 16:12                     ` David Hildenbrand
2021-02-04 16:29                     ` David Hildenbrand
2021-02-04 16:57                       ` Eric Blake
2021-02-04 17:33                       ` Richard Henderson
2021-02-04 16:48                     ` Richard W.M. Jones
2021-02-04 16:52                     ` Eric Blake
2021-01-14  2:16 ` [PULL 12/24] tcg: Convert tcg_gen_dupi_vec to TCG_CONST Richard Henderson
2021-01-14  2:16 ` [PULL 13/24] tcg: Use tcg_constant_i32 with icount expander Richard Henderson
2021-01-14  2:16 ` [PULL 14/24] tcg: Use tcg_constant_{i32,i64} with tcg int expanders Richard Henderson
2021-02-10 10:38   ` Alex Bennée
2021-01-14  2:16 ` [PULL 15/24] tcg: Use tcg_constant_{i32,i64} with tcg plugins Richard Henderson
2021-01-14  2:16 ` [PULL 16/24] tcg: Use tcg_constant_{i32,i64,vec} with gvec expanders Richard Henderson
2021-01-14  2:16 ` [PULL 17/24] tcg/tci: Add special tci_movi_{i32,i64} opcodes Richard Henderson
2021-01-14  2:16 ` [PULL 18/24] tcg: Remove movi and dupi opcodes Richard Henderson
2021-01-14  2:16 ` [PULL 19/24] tcg: Add tcg_reg_alloc_dup2 Richard Henderson
2021-01-14  2:16 ` [PULL 20/24] tcg/i386: Use tcg_constant_vec with tcg vec expanders Richard Henderson
2021-01-14  2:16 ` [PULL 21/24] tcg: Remove tcg_gen_dup{8,16,32,64}i_vec Richard Henderson
2021-01-14  2:16 ` [PULL 22/24] tcg/ppc: Use tcg_constant_vec with tcg vec expanders Richard Henderson
2021-01-14  2:16 ` [PULL 23/24] tcg/aarch64: " Richard Henderson
2021-01-14  2:16 ` [PULL 24/24] decodetree: Open files with encoding='utf-8' Richard Henderson
2021-01-14 13:13 ` [PULL 00/24] tcg patch queue Peter Maydell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).