qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for-9.1 00/15] tcg: Canonicalize operations during optimize
@ 2024-03-12 14:38 Richard Henderson
  2024-03-12 14:38 ` [PATCH 01/15] tcg/optimize: Fold andc with immediate to and Richard Henderson
                   ` (14 more replies)
  0 siblings, 15 replies; 17+ messages in thread
From: Richard Henderson @ 2024-03-12 14:38 UTC (permalink / raw)
  To: qemu-devel

This is a follow-on to 6334a968eec3 ("tcg/optimize: Canonicalize
subi to addi during optimization"), which I wrote at the end of
the previous devel cycle, then forgot about during the current.

In addition to sub->add, canonicalize andc->and etc.

The early expansion that we produce for deposit does not fold
constants well; expand unsupported deposit during optimize.


r~


Richard Henderson (15):
  tcg/optimize: Fold andc with immediate to and
  tcg/optimize: Fold orc with immediate to or
  tcg/optimize: Fold eqv with immediate to xor
  tcg/i386: Do not accept immediate operand for andc
  tcg/aarch64: Do not accept immediate operand for andc, orc, eqv
  tcg/arm: Do not accept immediate operand for andc
  tcg/ppc: Do not accept immediate operand for andc, orc, eqv
  tcg/loongarch64: Do not accept immediate operand for andc, orc
  tcg/s390x: Do not accept immediate operand for andc, orc
  tcg/riscv: Do not accept immediate operand for andc, orc, eqv
  tcg/riscv: Do not accept immediate operands for sub
  tcg/riscv: Do not accept zero operands for logicals, multiply or
    divide
  tcg/optimize: Fold and to extu during optimize
  tcg: Use arg_is_const_val in fold_sub_to_neg
  tcg/optimize: Lower unsupported deposit during optimize

 tcg/i386/tcg-target-con-set.h        |   3 +-
 tcg/i386/tcg-target-con-str.h        |   1 -
 tcg/loongarch64/tcg-target-con-set.h |   2 +-
 tcg/loongarch64/tcg-target-con-str.h |   1 -
 tcg/riscv/tcg-target-con-set.h       |   4 +-
 tcg/riscv/tcg-target-con-str.h       |   2 -
 tcg/optimize.c                       | 318 +++++++++++++++++++++++----
 tcg/tcg-op.c                         | 244 +++++---------------
 tcg/aarch64/tcg-target.c.inc         |  50 ++---
 tcg/arm/tcg-target.c.inc             |   6 +-
 tcg/i386/tcg-target.c.inc            |  20 +-
 tcg/loongarch64/tcg-target.c.inc     |  31 +--
 tcg/ppc/tcg-target.c.inc             |  32 +--
 tcg/riscv/tcg-target.c.inc           |  58 +----
 tcg/s390x/tcg-target.c.inc           |  56 +----
 15 files changed, 393 insertions(+), 435 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 01/15] tcg/optimize: Fold andc with immediate to and
  2024-03-12 14:38 [PATCH for-9.1 00/15] tcg: Canonicalize operations during optimize Richard Henderson
@ 2024-03-12 14:38 ` Richard Henderson
  2024-03-13  1:29   ` Richard Henderson
  2024-03-12 14:38 ` [PATCH 02/15] tcg/optimize: Fold orc with immediate to or Richard Henderson
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 17+ messages in thread
From: Richard Henderson @ 2024-03-12 14:38 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 752cc5c56b..2ec52df368 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1324,17 +1324,23 @@ static bool fold_andc(OptContext *ctx, TCGOp *op)
 
     z1 = arg_info(op->args[1])->z_mask;
 
-    /*
-     * Known-zeros does not imply known-ones.  Therefore unless
-     * arg2 is constant, we can't infer anything from it.
-     */
     if (arg_is_const(op->args[2])) {
-        uint64_t z2 = ~arg_info(op->args[2])->z_mask;
-        ctx->a_mask = z1 & ~z2;
-        z1 &= z2;
-    }
-    ctx->z_mask = z1;
+        uint64_t val = ~arg_info(op->args[2])->val;
 
+        /* Fold andc r,x,i to and r,x,~i. */
+        op->opc = (ctx->type == TCG_TYPE_I32
+                   ? INDEX_op_and_i32 : INDEX_op_and_i64);
+        op->args[2] = arg_new_constant(ctx, val);
+
+        /*
+         * Known-zeros does not imply known-ones.  Therefore unless
+         * arg2 is constant, we can't infer anything from it.
+         */
+        ctx->a_mask = z1 & ~val;
+        z1 &= val;
+    }
+
+    ctx->z_mask = z1;
     ctx->s_mask = arg_info(op->args[1])->s_mask
                 & arg_info(op->args[2])->s_mask;
     return fold_masks(ctx, op);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 02/15] tcg/optimize: Fold orc with immediate to or
  2024-03-12 14:38 [PATCH for-9.1 00/15] tcg: Canonicalize operations during optimize Richard Henderson
  2024-03-12 14:38 ` [PATCH 01/15] tcg/optimize: Fold andc with immediate to and Richard Henderson
@ 2024-03-12 14:38 ` Richard Henderson
  2024-03-12 14:38 ` [PATCH 03/15] tcg/optimize: Fold eqv with immediate to xor Richard Henderson
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2024-03-12 14:38 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 2ec52df368..5729433548 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -2065,6 +2065,15 @@ static bool fold_orc(OptContext *ctx, TCGOp *op)
         return true;
     }
 
+    /* Fold orc r,x,i to or r,x,~i. */
+    if (arg_is_const(op->args[2])) {
+        uint64_t val = ~arg_info(op->args[2])->val;
+
+        op->opc = (ctx->type == TCG_TYPE_I32
+                   ? INDEX_op_or_i32 : INDEX_op_or_i64);
+        op->args[2] = arg_new_constant(ctx, val);
+    }
+
     ctx->s_mask = arg_info(op->args[1])->s_mask
                 & arg_info(op->args[2])->s_mask;
     return false;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 03/15] tcg/optimize: Fold eqv with immediate to xor
  2024-03-12 14:38 [PATCH for-9.1 00/15] tcg: Canonicalize operations during optimize Richard Henderson
  2024-03-12 14:38 ` [PATCH 01/15] tcg/optimize: Fold andc with immediate to and Richard Henderson
  2024-03-12 14:38 ` [PATCH 02/15] tcg/optimize: Fold orc with immediate to or Richard Henderson
@ 2024-03-12 14:38 ` Richard Henderson
  2024-03-12 14:38 ` [PATCH 04/15] tcg/i386: Do not accept immediate operand for andc Richard Henderson
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2024-03-12 14:38 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 5729433548..c6b0ab35c8 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1687,6 +1687,15 @@ static bool fold_eqv(OptContext *ctx, TCGOp *op)
         return true;
     }
 
+    /* Fold eqv r,x,i to xor r,x,~i. */
+    if (arg_is_const(op->args[2])) {
+        uint64_t val = ~arg_info(op->args[2])->val;
+
+        op->opc = (ctx->type == TCG_TYPE_I32
+                   ? INDEX_op_xor_i32 : INDEX_op_xor_i64);
+        op->args[2] = arg_new_constant(ctx, val);
+    }
+
     ctx->s_mask = arg_info(op->args[1])->s_mask
                 & arg_info(op->args[2])->s_mask;
     return false;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 04/15] tcg/i386: Do not accept immediate operand for andc
  2024-03-12 14:38 [PATCH for-9.1 00/15] tcg: Canonicalize operations during optimize Richard Henderson
                   ` (2 preceding siblings ...)
  2024-03-12 14:38 ` [PATCH 03/15] tcg/optimize: Fold eqv with immediate to xor Richard Henderson
@ 2024-03-12 14:38 ` Richard Henderson
  2024-03-12 14:38 ` [PATCH 05/15] tcg/aarch64: Do not accept immediate operand for andc, orc, eqv Richard Henderson
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2024-03-12 14:38 UTC (permalink / raw)
  To: qemu-devel

The transformation of andc with immediate to and is now
done generically and need not be handled by the backend.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target-con-set.h |  3 +--
 tcg/i386/tcg-target-con-str.h |  1 -
 tcg/i386/tcg-target.c.inc     | 20 +++++---------------
 3 files changed, 6 insertions(+), 18 deletions(-)

diff --git a/tcg/i386/tcg-target-con-set.h b/tcg/i386/tcg-target-con-set.h
index e24241cfa2..69d2d38570 100644
--- a/tcg/i386/tcg-target-con-set.h
+++ b/tcg/i386/tcg-target-con-set.h
@@ -40,11 +40,10 @@ C_O1_I2(r, 0, r)
 C_O1_I2(r, 0, re)
 C_O1_I2(r, 0, reZ)
 C_O1_I2(r, 0, ri)
-C_O1_I2(r, 0, rI)
 C_O1_I2(r, L, L)
+C_O1_I2(r, r, r)
 C_O1_I2(r, r, re)
 C_O1_I2(r, r, ri)
-C_O1_I2(r, r, rI)
 C_O1_I2(x, x, x)
 C_N1_I2(r, r, r)
 C_N1_I2(r, r, rW)
diff --git a/tcg/i386/tcg-target-con-str.h b/tcg/i386/tcg-target-con-str.h
index cc22db227b..0c766eac7e 100644
--- a/tcg/i386/tcg-target-con-str.h
+++ b/tcg/i386/tcg-target-con-str.h
@@ -27,7 +27,6 @@ REGS('s', ALL_BYTEL_REGS & ~SOFTMMU_RESERVE_REGS)    /* qemu_st8_i32 data */
  * CONST(letter, TCG_CT_CONST_* bit set)
  */
 CONST('e', TCG_CT_CONST_S32)
-CONST('I', TCG_CT_CONST_I32)
 CONST('T', TCG_CT_CONST_TST)
 CONST('W', TCG_CT_CONST_WSZ)
 CONST('Z', TCG_CT_CONST_U32)
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index c6ba498623..ed70524864 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -130,9 +130,8 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
 /* Constants we accept.  */
 #define TCG_CT_CONST_S32 0x100
 #define TCG_CT_CONST_U32 0x200
-#define TCG_CT_CONST_I32 0x400
-#define TCG_CT_CONST_WSZ 0x800
-#define TCG_CT_CONST_TST 0x1000
+#define TCG_CT_CONST_WSZ 0x400
+#define TCG_CT_CONST_TST 0x800
 
 /* Registers used with L constraint, which are the first argument
    registers on x86_64, and two random call clobbered registers on
@@ -203,8 +202,7 @@ static bool tcg_target_const_match(int64_t val, int ct,
         return 1;
     }
     if (type == TCG_TYPE_I32) {
-        if (ct & (TCG_CT_CONST_S32 | TCG_CT_CONST_U32 |
-                  TCG_CT_CONST_I32 | TCG_CT_CONST_TST)) {
+        if (ct & (TCG_CT_CONST_S32 | TCG_CT_CONST_U32 | TCG_CT_CONST_TST)) {
             return 1;
         }
     } else {
@@ -214,9 +212,6 @@ static bool tcg_target_const_match(int64_t val, int ct,
         if ((ct & TCG_CT_CONST_U32) && val == (uint32_t)val) {
             return 1;
         }
-        if ((ct & TCG_CT_CONST_I32) && ~val == (int32_t)~val) {
-            return 1;
-        }
         /*
          * This will be used in combination with TCG_CT_CONST_S32,
          * so "normal" TESTQ is already matched.  Also accept:
@@ -2666,12 +2661,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     OP_32_64(andc):
-        if (const_a2) {
-            tcg_out_mov(s, rexw ? TCG_TYPE_I64 : TCG_TYPE_I32, a0, a1);
-            tgen_arithi(s, ARITH_AND + rexw, a0, ~a2, 0);
-        } else {
-            tcg_out_vex_modrm(s, OPC_ANDN + rexw, a0, a2, a1);
-        }
+        tcg_out_vex_modrm(s, OPC_ANDN + rexw, a0, a2, a1);
         break;
 
     OP_32_64(mul):
@@ -3442,7 +3432,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
 
     case INDEX_op_andc_i32:
     case INDEX_op_andc_i64:
-        return C_O1_I2(r, r, rI);
+        return C_O1_I2(r, r, r);
 
     case INDEX_op_shl_i32:
     case INDEX_op_shl_i64:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 05/15] tcg/aarch64: Do not accept immediate operand for andc, orc, eqv
  2024-03-12 14:38 [PATCH for-9.1 00/15] tcg: Canonicalize operations during optimize Richard Henderson
                   ` (3 preceding siblings ...)
  2024-03-12 14:38 ` [PATCH 04/15] tcg/i386: Do not accept immediate operand for andc Richard Henderson
@ 2024-03-12 14:38 ` Richard Henderson
  2024-03-12 14:38 ` [PATCH 06/15] tcg/arm: Do not accept immediate operand for andc Richard Henderson
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2024-03-12 14:38 UTC (permalink / raw)
  To: qemu-devel

The transformations with inverted immediate are now done
generically and need not be handled by the backend.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.c.inc | 50 +++++++++++-------------------------
 1 file changed, 15 insertions(+), 35 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index dec8ecc1b6..68a381e4af 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -2216,17 +2216,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
-    case INDEX_op_andc_i32:
-        a2 = (int32_t)a2;
-        /* FALLTHRU */
-    case INDEX_op_andc_i64:
-        if (c2) {
-            tcg_out_logicali(s, I3404_ANDI, ext, a0, a1, ~a2);
-        } else {
-            tcg_out_insn(s, 3510, BIC, ext, a0, a1, a2);
-        }
-        break;
-
     case INDEX_op_or_i32:
         a2 = (int32_t)a2;
         /* FALLTHRU */
@@ -2238,17 +2227,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
-    case INDEX_op_orc_i32:
-        a2 = (int32_t)a2;
-        /* FALLTHRU */
-    case INDEX_op_orc_i64:
-        if (c2) {
-            tcg_out_logicali(s, I3404_ORRI, ext, a0, a1, ~a2);
-        } else {
-            tcg_out_insn(s, 3510, ORN, ext, a0, a1, a2);
-        }
-        break;
-
     case INDEX_op_xor_i32:
         a2 = (int32_t)a2;
         /* FALLTHRU */
@@ -2260,15 +2238,17 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    case INDEX_op_andc_i32:
+    case INDEX_op_andc_i64:
+        tcg_out_insn(s, 3510, BIC, ext, a0, a1, a2);
+        break;
+    case INDEX_op_orc_i32:
+    case INDEX_op_orc_i64:
+        tcg_out_insn(s, 3510, ORN, ext, a0, a1, a2);
+        break;
     case INDEX_op_eqv_i32:
-        a2 = (int32_t)a2;
-        /* FALLTHRU */
     case INDEX_op_eqv_i64:
-        if (c2) {
-            tcg_out_logicali(s, I3404_EORI, ext, a0, a1, ~a2);
-        } else {
-            tcg_out_insn(s, 3510, EON, ext, a0, a1, a2);
-        }
+        tcg_out_insn(s, 3510, EON, ext, a0, a1, a2);
         break;
 
     case INDEX_op_not_i64:
@@ -2995,6 +2975,12 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_negsetcond_i64:
         return C_O1_I2(r, r, rC);
 
+    case INDEX_op_andc_i32:
+    case INDEX_op_andc_i64:
+    case INDEX_op_orc_i32:
+    case INDEX_op_orc_i64:
+    case INDEX_op_eqv_i32:
+    case INDEX_op_eqv_i64:
     case INDEX_op_mul_i32:
     case INDEX_op_mul_i64:
     case INDEX_op_div_i32:
@@ -3015,12 +3001,6 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_or_i64:
     case INDEX_op_xor_i32:
     case INDEX_op_xor_i64:
-    case INDEX_op_andc_i32:
-    case INDEX_op_andc_i64:
-    case INDEX_op_orc_i32:
-    case INDEX_op_orc_i64:
-    case INDEX_op_eqv_i32:
-    case INDEX_op_eqv_i64:
         return C_O1_I2(r, r, rL);
 
     case INDEX_op_shl_i32:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 06/15] tcg/arm: Do not accept immediate operand for andc
  2024-03-12 14:38 [PATCH for-9.1 00/15] tcg: Canonicalize operations during optimize Richard Henderson
                   ` (4 preceding siblings ...)
  2024-03-12 14:38 ` [PATCH 05/15] tcg/aarch64: Do not accept immediate operand for andc, orc, eqv Richard Henderson
@ 2024-03-12 14:38 ` Richard Henderson
  2024-03-12 14:38 ` [PATCH 07/15] tcg/ppc: Do not accept immediate operand for andc, orc, eqv Richard Henderson
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2024-03-12 14:38 UTC (permalink / raw)
  To: qemu-devel

The transformation of andc with immediate to and is now
done generically and need not be handled by the backend.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/arm/tcg-target.c.inc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index 6a04c73c76..a0c5887579 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -1869,8 +1869,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
                         args[0], args[1], args[2], const_args[2]);
         break;
     case INDEX_op_andc_i32:
-        tcg_out_dat_rIK(s, COND_AL, ARITH_BIC, ARITH_AND,
-                        args[0], args[1], args[2], const_args[2]);
+        tcg_out_dat_reg(s, COND_AL, ARITH_BIC, args[0], args[1],
+                        args[2], SHIFT_IMM_LSL(0));
         break;
     case INDEX_op_or_i32:
         c = ARITH_ORR;
@@ -2152,11 +2152,11 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
         return C_O1_I2(r, r, rIN);
 
     case INDEX_op_and_i32:
-    case INDEX_op_andc_i32:
     case INDEX_op_clz_i32:
     case INDEX_op_ctz_i32:
         return C_O1_I2(r, r, rIK);
 
+    case INDEX_op_andc_i32:
     case INDEX_op_mul_i32:
     case INDEX_op_div_i32:
     case INDEX_op_divu_i32:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 07/15] tcg/ppc: Do not accept immediate operand for andc, orc, eqv
  2024-03-12 14:38 [PATCH for-9.1 00/15] tcg: Canonicalize operations during optimize Richard Henderson
                   ` (5 preceding siblings ...)
  2024-03-12 14:38 ` [PATCH 06/15] tcg/arm: Do not accept immediate operand for andc Richard Henderson
@ 2024-03-12 14:38 ` Richard Henderson
  2024-03-12 14:38 ` [PATCH 08/15] tcg/loongarch64: Do not accept immediate operand for andc, orc Richard Henderson
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2024-03-12 14:38 UTC (permalink / raw)
  To: qemu-devel

The transformations with inverted immediate are now done
generically and need not be handled by the backend.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/ppc/tcg-target.c.inc | 32 +++++---------------------------
 1 file changed, 5 insertions(+), 27 deletions(-)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 7f3829beeb..336b8a28ba 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -3070,36 +3070,14 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
     case INDEX_op_andc_i32:
-        a0 = args[0], a1 = args[1], a2 = args[2];
-        if (const_args[2]) {
-            tcg_out_andi32(s, a0, a1, ~a2);
-        } else {
-            tcg_out32(s, ANDC | SAB(a1, a0, a2));
-        }
-        break;
     case INDEX_op_andc_i64:
-        a0 = args[0], a1 = args[1], a2 = args[2];
-        if (const_args[2]) {
-            tcg_out_andi64(s, a0, a1, ~a2);
-        } else {
-            tcg_out32(s, ANDC | SAB(a1, a0, a2));
-        }
+        tcg_out32(s, ANDC | SAB(args[1], args[0], args[2]));
         break;
     case INDEX_op_orc_i32:
-        if (const_args[2]) {
-            tcg_out_ori32(s, args[0], args[1], ~args[2]);
-            break;
-        }
-        /* FALLTHRU */
     case INDEX_op_orc_i64:
         tcg_out32(s, ORC | SAB(args[1], args[0], args[2]));
         break;
     case INDEX_op_eqv_i32:
-        if (const_args[2]) {
-            tcg_out_xori32(s, args[0], args[1], ~args[2]);
-            break;
-        }
-        /* FALLTHRU */
     case INDEX_op_eqv_i64:
         tcg_out32(s, EQV | SAB(args[1], args[0], args[2]));
         break;
@@ -4120,16 +4098,12 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_and_i32:
     case INDEX_op_or_i32:
     case INDEX_op_xor_i32:
-    case INDEX_op_andc_i32:
-    case INDEX_op_orc_i32:
-    case INDEX_op_eqv_i32:
     case INDEX_op_shl_i32:
     case INDEX_op_shr_i32:
     case INDEX_op_sar_i32:
     case INDEX_op_rotl_i32:
     case INDEX_op_rotr_i32:
     case INDEX_op_and_i64:
-    case INDEX_op_andc_i64:
     case INDEX_op_shl_i64:
     case INDEX_op_shr_i64:
     case INDEX_op_sar_i64:
@@ -4145,10 +4119,14 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_divu_i32:
     case INDEX_op_rem_i32:
     case INDEX_op_remu_i32:
+    case INDEX_op_andc_i32:
+    case INDEX_op_orc_i32:
+    case INDEX_op_eqv_i32:
     case INDEX_op_nand_i32:
     case INDEX_op_nor_i32:
     case INDEX_op_muluh_i32:
     case INDEX_op_mulsh_i32:
+    case INDEX_op_andc_i64:
     case INDEX_op_orc_i64:
     case INDEX_op_eqv_i64:
     case INDEX_op_nand_i64:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 08/15] tcg/loongarch64: Do not accept immediate operand for andc, orc
  2024-03-12 14:38 [PATCH for-9.1 00/15] tcg: Canonicalize operations during optimize Richard Henderson
                   ` (6 preceding siblings ...)
  2024-03-12 14:38 ` [PATCH 07/15] tcg/ppc: Do not accept immediate operand for andc, orc, eqv Richard Henderson
@ 2024-03-12 14:38 ` Richard Henderson
  2024-03-12 14:38 ` [PATCH 09/15] tcg/s390x: " Richard Henderson
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2024-03-12 14:38 UTC (permalink / raw)
  To: qemu-devel

The transformations with inverted immediate are now done
generically and need not be handled by the backend.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/loongarch64/tcg-target-con-set.h |  2 +-
 tcg/loongarch64/tcg-target-con-str.h |  1 -
 tcg/loongarch64/tcg-target.c.inc     | 31 ++++++----------------------
 3 files changed, 7 insertions(+), 27 deletions(-)

diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h
index cae6c2aad6..272f33c1e4 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -22,7 +22,7 @@ C_O0_I3(r, r, r)
 C_O1_I1(r, r)
 C_O1_I1(w, r)
 C_O1_I1(w, w)
-C_O1_I2(r, r, rC)
+C_O1_I2(r, r, r)
 C_O1_I2(r, r, ri)
 C_O1_I2(r, r, rI)
 C_O1_I2(r, r, rJ)
diff --git a/tcg/loongarch64/tcg-target-con-str.h b/tcg/loongarch64/tcg-target-con-str.h
index 2ba9c135ac..e7d2686db3 100644
--- a/tcg/loongarch64/tcg-target-con-str.h
+++ b/tcg/loongarch64/tcg-target-con-str.h
@@ -24,7 +24,6 @@ CONST('I', TCG_CT_CONST_S12)
 CONST('J', TCG_CT_CONST_S32)
 CONST('U', TCG_CT_CONST_U12)
 CONST('Z', TCG_CT_CONST_ZERO)
-CONST('C', TCG_CT_CONST_C12)
 CONST('W', TCG_CT_CONST_WSZ)
 CONST('M', TCG_CT_CONST_VCMP)
 CONST('A', TCG_CT_CONST_VADD)
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 69c5b8ac4f..e343d33dba 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -169,10 +169,9 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
 #define TCG_CT_CONST_S12   0x200
 #define TCG_CT_CONST_S32   0x400
 #define TCG_CT_CONST_U12   0x800
-#define TCG_CT_CONST_C12   0x1000
-#define TCG_CT_CONST_WSZ   0x2000
-#define TCG_CT_CONST_VCMP  0x4000
-#define TCG_CT_CONST_VADD  0x8000
+#define TCG_CT_CONST_WSZ   0x1000
+#define TCG_CT_CONST_VCMP  0x2000
+#define TCG_CT_CONST_VADD  0x4000
 
 #define ALL_GENERAL_REGS   MAKE_64BIT_MASK(0, 32)
 #define ALL_VECTOR_REGS    MAKE_64BIT_MASK(32, 32)
@@ -201,9 +200,6 @@ static bool tcg_target_const_match(int64_t val, int ct,
     if ((ct & TCG_CT_CONST_U12) && val >= 0 && val <= 0xfff) {
         return true;
     }
-    if ((ct & TCG_CT_CONST_C12) && ~val >= 0 && ~val <= 0xfff) {
-        return true;
-    }
     if ((ct & TCG_CT_CONST_WSZ) && val == (type == TCG_TYPE_I32 ? 32 : 64)) {
         return true;
     }
@@ -1236,22 +1232,12 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     case INDEX_op_andc_i32:
     case INDEX_op_andc_i64:
-        if (c2) {
-            /* guaranteed to fit due to constraint */
-            tcg_out_opc_andi(s, a0, a1, ~a2);
-        } else {
-            tcg_out_opc_andn(s, a0, a1, a2);
-        }
+        tcg_out_opc_andn(s, a0, a1, a2);
         break;
 
     case INDEX_op_orc_i32:
     case INDEX_op_orc_i64:
-        if (c2) {
-            /* guaranteed to fit due to constraint */
-            tcg_out_opc_ori(s, a0, a1, ~a2);
-        } else {
-            tcg_out_opc_orn(s, a0, a1, a2);
-        }
+        tcg_out_opc_orn(s, a0, a1, a2);
         break;
 
     case INDEX_op_and_i32:
@@ -2120,12 +2106,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_andc_i64:
     case INDEX_op_orc_i32:
     case INDEX_op_orc_i64:
-        /*
-         * LoongArch insns for these ops don't have reg-imm forms, but we
-         * can express using andi/ori if ~constant satisfies
-         * TCG_CT_CONST_U12.
-         */
-        return C_O1_I2(r, r, rC);
+        return C_O1_I2(r, r, r);
 
     case INDEX_op_shl_i32:
     case INDEX_op_shl_i64:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 09/15] tcg/s390x: Do not accept immediate operand for andc, orc
  2024-03-12 14:38 [PATCH for-9.1 00/15] tcg: Canonicalize operations during optimize Richard Henderson
                   ` (7 preceding siblings ...)
  2024-03-12 14:38 ` [PATCH 08/15] tcg/loongarch64: Do not accept immediate operand for andc, orc Richard Henderson
@ 2024-03-12 14:38 ` Richard Henderson
  2024-03-12 14:38 ` [PATCH 10/15] tcg/riscv: Do not accept immediate operand for andc, orc, eqv Richard Henderson
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2024-03-12 14:38 UTC (permalink / raw)
  To: qemu-devel

The transformations with inverted immediate are now done
generically and need not be handled by the backend.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target.c.inc | 56 ++++++--------------------------------
 1 file changed, 8 insertions(+), 48 deletions(-)

diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index ad587325fc..b9a3e6e56a 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -2216,31 +2216,13 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_andc_i32:
-        a0 = args[0], a1 = args[1], a2 = (uint32_t)args[2];
-        if (const_args[2]) {
-            tcg_out_mov(s, TCG_TYPE_I32, a0, a1);
-            tgen_andi(s, TCG_TYPE_I32, a0, (uint32_t)~a2);
-	} else {
-            tcg_out_insn(s, RRFa, NCRK, a0, a1, a2);
-	}
+        tcg_out_insn(s, RRFa, NCRK, args[0], args[1], args[2]);
         break;
     case INDEX_op_orc_i32:
-        a0 = args[0], a1 = args[1], a2 = (uint32_t)args[2];
-        if (const_args[2]) {
-            tcg_out_mov(s, TCG_TYPE_I32, a0, a1);
-            tgen_ori(s, a0, (uint32_t)~a2);
-        } else {
-            tcg_out_insn(s, RRFa, OCRK, a0, a1, a2);
-        }
+        tcg_out_insn(s, RRFa, OCRK, args[0], args[1], args[2]);
         break;
     case INDEX_op_eqv_i32:
-        a0 = args[0], a1 = args[1], a2 = (uint32_t)args[2];
-        if (const_args[2]) {
-            tcg_out_mov(s, TCG_TYPE_I32, a0, a1);
-            tcg_out_insn(s, RIL, XILF, a0, ~a2);
-        } else {
-            tcg_out_insn(s, RRFa, NXRK, a0, a1, a2);
-        }
+        tcg_out_insn(s, RRFa, NXRK, args[0], args[1], args[2]);
         break;
     case INDEX_op_nand_i32:
         tcg_out_insn(s, RRFa, NNRK, args[0], args[1], args[2]);
@@ -2517,31 +2499,13 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_andc_i64:
-        a0 = args[0], a1 = args[1], a2 = args[2];
-        if (const_args[2]) {
-            tcg_out_mov(s, TCG_TYPE_I64, a0, a1);
-            tgen_andi(s, TCG_TYPE_I64, a0, ~a2);
-        } else {
-            tcg_out_insn(s, RRFa, NCGRK, a0, a1, a2);
-        }
+        tcg_out_insn(s, RRFa, NCGRK, args[0], args[1], args[2]);
         break;
     case INDEX_op_orc_i64:
-        a0 = args[0], a1 = args[1], a2 = args[2];
-        if (const_args[2]) {
-            tcg_out_mov(s, TCG_TYPE_I64, a0, a1);
-            tgen_ori(s, a0, ~a2);
-        } else {
-            tcg_out_insn(s, RRFa, OCGRK, a0, a1, a2);
-        }
+        tcg_out_insn(s, RRFa, OCGRK, args[0], args[1], args[2]);
         break;
     case INDEX_op_eqv_i64:
-        a0 = args[0], a1 = args[1], a2 = args[2];
-        if (const_args[2]) {
-            tcg_out_mov(s, TCG_TYPE_I64, a0, a1);
-            tgen_xori(s, a0, ~a2);
-        } else {
-            tcg_out_insn(s, RRFa, NXGRK, a0, a1, a2);
-        }
+        tcg_out_insn(s, RRFa, NXGRK, args[0], args[1], args[2]);
         break;
     case INDEX_op_nand_i64:
         tcg_out_insn(s, RRFa, NNGRK, args[0], args[1], args[2]);
@@ -3244,15 +3208,11 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
         return C_O1_I2(r, r, rK);
 
     case INDEX_op_andc_i32:
-    case INDEX_op_orc_i32:
-    case INDEX_op_eqv_i32:
-        return C_O1_I2(r, r, ri);
     case INDEX_op_andc_i64:
-        return C_O1_I2(r, r, rKR);
+    case INDEX_op_orc_i32:
     case INDEX_op_orc_i64:
+    case INDEX_op_eqv_i32:
     case INDEX_op_eqv_i64:
-        return C_O1_I2(r, r, rNK);
-
     case INDEX_op_nand_i32:
     case INDEX_op_nand_i64:
     case INDEX_op_nor_i32:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 10/15] tcg/riscv: Do not accept immediate operand for andc, orc, eqv
  2024-03-12 14:38 [PATCH for-9.1 00/15] tcg: Canonicalize operations during optimize Richard Henderson
                   ` (8 preceding siblings ...)
  2024-03-12 14:38 ` [PATCH 09/15] tcg/s390x: " Richard Henderson
@ 2024-03-12 14:38 ` Richard Henderson
  2024-03-12 14:38 ` [PATCH 11/15] tcg/riscv: Do not accept immediate operands for sub Richard Henderson
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2024-03-12 14:38 UTC (permalink / raw)
  To: qemu-devel

The transformations with inverted immediate are now done
generically and need not be handled by the backend.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/riscv/tcg-target-con-set.h |  1 -
 tcg/riscv/tcg-target-con-str.h |  1 -
 tcg/riscv/tcg-target.c.inc     | 36 +++++++---------------------------
 3 files changed, 7 insertions(+), 31 deletions(-)

diff --git a/tcg/riscv/tcg-target-con-set.h b/tcg/riscv/tcg-target-con-set.h
index aac5ceee2b..0f72281a08 100644
--- a/tcg/riscv/tcg-target-con-set.h
+++ b/tcg/riscv/tcg-target-con-set.h
@@ -15,7 +15,6 @@ C_O0_I2(rZ, rZ)
 C_O1_I1(r, r)
 C_O1_I2(r, r, ri)
 C_O1_I2(r, r, rI)
-C_O1_I2(r, r, rJ)
 C_O1_I2(r, rZ, rN)
 C_O1_I2(r, rZ, rZ)
 C_N1_I2(r, r, rM)
diff --git a/tcg/riscv/tcg-target-con-str.h b/tcg/riscv/tcg-target-con-str.h
index d5c419dff1..6f1cfb976c 100644
--- a/tcg/riscv/tcg-target-con-str.h
+++ b/tcg/riscv/tcg-target-con-str.h
@@ -15,7 +15,6 @@ REGS('r', ALL_GENERAL_REGS)
  * CONST(letter, TCG_CT_CONST_* bit set)
  */
 CONST('I', TCG_CT_CONST_S12)
-CONST('J', TCG_CT_CONST_J12)
 CONST('N', TCG_CT_CONST_N12)
 CONST('M', TCG_CT_CONST_M12)
 CONST('Z', TCG_CT_CONST_ZERO)
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 639363039b..2b889486e4 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -138,7 +138,6 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
 #define TCG_CT_CONST_S12   0x200
 #define TCG_CT_CONST_N12   0x400
 #define TCG_CT_CONST_M12   0x800
-#define TCG_CT_CONST_J12  0x1000
 
 #define ALL_GENERAL_REGS   MAKE_64BIT_MASK(0, 32)
 
@@ -176,13 +175,6 @@ static bool tcg_target_const_match(int64_t val, int ct,
     if ((ct & TCG_CT_CONST_M12) && val >= -0x7ff && val <= 0x7ff) {
         return 1;
     }
-    /*
-     * Inverse of sign extended from 12 bits: ~[-0x800, 0x7ff].
-     * Used to map ANDN back to ANDI, etc.
-     */
-    if ((ct & TCG_CT_CONST_J12) && ~val >= -0x800 && ~val <= 0x7ff) {
-        return 1;
-    }
     return 0;
 }
 
@@ -1610,27 +1602,15 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     case INDEX_op_andc_i32:
     case INDEX_op_andc_i64:
-        if (c2) {
-            tcg_out_opc_imm(s, OPC_ANDI, a0, a1, ~a2);
-        } else {
-            tcg_out_opc_reg(s, OPC_ANDN, a0, a1, a2);
-        }
+        tcg_out_opc_reg(s, OPC_ANDN, a0, a1, a2);
         break;
     case INDEX_op_orc_i32:
     case INDEX_op_orc_i64:
-        if (c2) {
-            tcg_out_opc_imm(s, OPC_ORI, a0, a1, ~a2);
-        } else {
-            tcg_out_opc_reg(s, OPC_ORN, a0, a1, a2);
-        }
+        tcg_out_opc_reg(s, OPC_ORN, a0, a1, a2);
         break;
     case INDEX_op_eqv_i32:
     case INDEX_op_eqv_i64:
-        if (c2) {
-            tcg_out_opc_imm(s, OPC_XORI, a0, a1, ~a2);
-        } else {
-            tcg_out_opc_reg(s, OPC_XNOR, a0, a1, a2);
-        }
+        tcg_out_opc_reg(s, OPC_XNOR, a0, a1, a2);
         break;
 
     case INDEX_op_not_i32:
@@ -1963,18 +1943,16 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_negsetcond_i64:
         return C_O1_I2(r, r, rI);
 
+    case INDEX_op_sub_i32:
+    case INDEX_op_sub_i64:
+        return C_O1_I2(r, rZ, rN);
+
     case INDEX_op_andc_i32:
     case INDEX_op_andc_i64:
     case INDEX_op_orc_i32:
     case INDEX_op_orc_i64:
     case INDEX_op_eqv_i32:
     case INDEX_op_eqv_i64:
-        return C_O1_I2(r, r, rJ);
-
-    case INDEX_op_sub_i32:
-    case INDEX_op_sub_i64:
-        return C_O1_I2(r, rZ, rN);
-
     case INDEX_op_mul_i32:
     case INDEX_op_mulsh_i32:
     case INDEX_op_muluh_i32:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 11/15] tcg/riscv: Do not accept immediate operands for sub
  2024-03-12 14:38 [PATCH for-9.1 00/15] tcg: Canonicalize operations during optimize Richard Henderson
                   ` (9 preceding siblings ...)
  2024-03-12 14:38 ` [PATCH 10/15] tcg/riscv: Do not accept immediate operand for andc, orc, eqv Richard Henderson
@ 2024-03-12 14:38 ` Richard Henderson
  2024-03-12 14:38 ` [PATCH 12/15] tcg/riscv: Do not accept zero operands for logicals, multiply or divide Richard Henderson
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2024-03-12 14:38 UTC (permalink / raw)
  To: qemu-devel

The transformations to neg and add immediate are now done
generically and need not be handled by the backend.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/riscv/tcg-target-con-set.h |  2 +-
 tcg/riscv/tcg-target-con-str.h |  1 -
 tcg/riscv/tcg-target.c.inc     | 24 ++++--------------------
 3 files changed, 5 insertions(+), 22 deletions(-)

diff --git a/tcg/riscv/tcg-target-con-set.h b/tcg/riscv/tcg-target-con-set.h
index 0f72281a08..13a383aeb1 100644
--- a/tcg/riscv/tcg-target-con-set.h
+++ b/tcg/riscv/tcg-target-con-set.h
@@ -13,9 +13,9 @@ C_O0_I1(r)
 C_O0_I2(rZ, r)
 C_O0_I2(rZ, rZ)
 C_O1_I1(r, r)
+C_O1_I2(r, r, r)
 C_O1_I2(r, r, ri)
 C_O1_I2(r, r, rI)
-C_O1_I2(r, rZ, rN)
 C_O1_I2(r, rZ, rZ)
 C_N1_I2(r, r, rM)
 C_O1_I4(r, r, rI, rM, rM)
diff --git a/tcg/riscv/tcg-target-con-str.h b/tcg/riscv/tcg-target-con-str.h
index 6f1cfb976c..a8d57c0e37 100644
--- a/tcg/riscv/tcg-target-con-str.h
+++ b/tcg/riscv/tcg-target-con-str.h
@@ -15,6 +15,5 @@ REGS('r', ALL_GENERAL_REGS)
  * CONST(letter, TCG_CT_CONST_* bit set)
  */
 CONST('I', TCG_CT_CONST_S12)
-CONST('N', TCG_CT_CONST_N12)
 CONST('M', TCG_CT_CONST_M12)
 CONST('Z', TCG_CT_CONST_ZERO)
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 2b889486e4..6b28f2f85d 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -136,8 +136,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
 
 #define TCG_CT_CONST_ZERO  0x100
 #define TCG_CT_CONST_S12   0x200
-#define TCG_CT_CONST_N12   0x400
-#define TCG_CT_CONST_M12   0x800
+#define TCG_CT_CONST_M12   0x400
 
 #define ALL_GENERAL_REGS   MAKE_64BIT_MASK(0, 32)
 
@@ -160,13 +159,6 @@ static bool tcg_target_const_match(int64_t val, int ct,
     if ((ct & TCG_CT_CONST_S12) && val >= -0x800 && val <= 0x7ff) {
         return 1;
     }
-    /*
-     * Sign extended from 12 bits, negated: [-0x7ff, 0x800].
-     * Used for subtraction, where a constant must be handled by ADDI.
-     */
-    if ((ct & TCG_CT_CONST_N12) && val >= -0x7ff && val <= 0x800) {
-        return 1;
-    }
     /*
      * Sign extended from 12 bits, +/- matching: [-0x7ff, 0x7ff].
      * Used by addsub2 and movcond, which may need the negative value,
@@ -1559,18 +1551,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_sub_i32:
-        if (c2) {
-            tcg_out_opc_imm(s, OPC_ADDIW, a0, a1, -a2);
-        } else {
-            tcg_out_opc_reg(s, OPC_SUBW, a0, a1, a2);
-        }
+        tcg_out_opc_reg(s, OPC_SUBW, a0, a1, a2);
         break;
     case INDEX_op_sub_i64:
-        if (c2) {
-            tcg_out_opc_imm(s, OPC_ADDI, a0, a1, -a2);
-        } else {
-            tcg_out_opc_reg(s, OPC_SUB, a0, a1, a2);
-        }
+        tcg_out_opc_reg(s, OPC_SUB, a0, a1, a2);
         break;
 
     case INDEX_op_and_i32:
@@ -1945,7 +1929,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
 
     case INDEX_op_sub_i32:
     case INDEX_op_sub_i64:
-        return C_O1_I2(r, rZ, rN);
+        return C_O1_I2(r, r, r);
 
     case INDEX_op_andc_i32:
     case INDEX_op_andc_i64:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 12/15] tcg/riscv: Do not accept zero operands for logicals, multiply or divide
  2024-03-12 14:38 [PATCH for-9.1 00/15] tcg: Canonicalize operations during optimize Richard Henderson
                   ` (10 preceding siblings ...)
  2024-03-12 14:38 ` [PATCH 11/15] tcg/riscv: Do not accept immediate operands for sub Richard Henderson
@ 2024-03-12 14:38 ` Richard Henderson
  2024-03-12 14:38 ` [PATCH 13/15] tcg/optimize: Fold and to extu during optimize Richard Henderson
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2024-03-12 14:38 UTC (permalink / raw)
  To: qemu-devel

Trust that the optimizer has folded all of these away.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/riscv/tcg-target-con-set.h | 1 -
 tcg/riscv/tcg-target.c.inc     | 4 +---
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/tcg/riscv/tcg-target-con-set.h b/tcg/riscv/tcg-target-con-set.h
index 13a383aeb1..527d2fd4d9 100644
--- a/tcg/riscv/tcg-target-con-set.h
+++ b/tcg/riscv/tcg-target-con-set.h
@@ -16,7 +16,6 @@ C_O1_I1(r, r)
 C_O1_I2(r, r, r)
 C_O1_I2(r, r, ri)
 C_O1_I2(r, r, rI)
-C_O1_I2(r, rZ, rZ)
 C_N1_I2(r, r, rM)
 C_O1_I4(r, r, rI, rM, rM)
 C_O2_I4(r, r, rZ, rZ, rM, rM)
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 6b28f2f85d..0dc1b2d8f7 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -1929,8 +1929,6 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
 
     case INDEX_op_sub_i32:
     case INDEX_op_sub_i64:
-        return C_O1_I2(r, r, r);
-
     case INDEX_op_andc_i32:
     case INDEX_op_andc_i64:
     case INDEX_op_orc_i32:
@@ -1951,7 +1949,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_divu_i64:
     case INDEX_op_rem_i64:
     case INDEX_op_remu_i64:
-        return C_O1_I2(r, rZ, rZ);
+        return C_O1_I2(r, r, r);
 
     case INDEX_op_shl_i32:
     case INDEX_op_shr_i32:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 13/15] tcg/optimize: Fold and to extu during optimize
  2024-03-12 14:38 [PATCH for-9.1 00/15] tcg: Canonicalize operations during optimize Richard Henderson
                   ` (11 preceding siblings ...)
  2024-03-12 14:38 ` [PATCH 12/15] tcg/riscv: Do not accept zero operands for logicals, multiply or divide Richard Henderson
@ 2024-03-12 14:38 ` Richard Henderson
  2024-03-12 14:38 ` [PATCH 14/15] tcg: Use arg_is_const_val in fold_sub_to_neg Richard Henderson
  2024-03-12 14:38 ` [PATCH 15/15] tcg/optimize: Lower unsupported deposit during optimize Richard Henderson
  14 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2024-03-12 14:38 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 43 +++++++++++++++++++++++++++++++++++++++----
 1 file changed, 39 insertions(+), 4 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index c6b0ab35c8..39bcd32f72 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1300,11 +1300,46 @@ static bool fold_and(OptContext *ctx, TCGOp *op)
     ctx->s_mask = arg_info(op->args[1])->s_mask
                 & arg_info(op->args[2])->s_mask;
 
-    /*
-     * Known-zeros does not imply known-ones.  Therefore unless
-     * arg2 is constant, we can't infer affected bits from it.
-     */
     if (arg_is_const(op->args[2])) {
+        TCGOpcode ext8 = 0, ext16 = 0, ext32 = 0;
+
+        /* Canonicalize as zero-extend, if supported. */
+        switch (ctx->type) {
+        case TCG_TYPE_I32:
+            ext8 = TCG_TARGET_HAS_ext8u_i32 ? INDEX_op_ext8u_i32 : 0;
+            ext16 = TCG_TARGET_HAS_ext16u_i32 ? INDEX_op_ext16u_i32 : 0;
+            break;
+        case TCG_TYPE_I64:
+            ext8 = TCG_TARGET_HAS_ext8u_i64 ? INDEX_op_ext8u_i64 : 0;
+            ext16 = TCG_TARGET_HAS_ext16u_i64 ? INDEX_op_ext16u_i64 : 0;
+            ext32 = TCG_TARGET_HAS_ext32u_i64 ? INDEX_op_ext32u_i64 : 0;
+            break;
+        default:
+            break;
+        }
+
+        switch (arg_info(op->args[2])->val) {
+        case 0xff:
+            if (ext8) {
+                op->opc = ext8;
+            }
+            break;
+        case 0xffff:
+            if (ext16) {
+                op->opc = ext16;
+            }
+            break;
+        case UINT32_MAX:
+            if (ext32) {
+                op->opc = ext32;
+            }
+            break;
+        }
+
+        /*
+         * Known-zeros does not imply known-ones.  Therefore unless
+         * arg2 is constant, we can't infer affected bits from it.
+         */
         ctx->a_mask = z1 & ~z2;
     }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 14/15] tcg: Use arg_is_const_val in fold_sub_to_neg
  2024-03-12 14:38 [PATCH for-9.1 00/15] tcg: Canonicalize operations during optimize Richard Henderson
                   ` (12 preceding siblings ...)
  2024-03-12 14:38 ` [PATCH 13/15] tcg/optimize: Fold and to extu during optimize Richard Henderson
@ 2024-03-12 14:38 ` Richard Henderson
  2024-03-12 14:38 ` [PATCH 15/15] tcg/optimize: Lower unsupported deposit during optimize Richard Henderson
  14 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2024-03-12 14:38 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 39bcd32f72..f3867ce9e6 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -2451,7 +2451,7 @@ static bool fold_sub_to_neg(OptContext *ctx, TCGOp *op)
     TCGOpcode neg_op;
     bool have_neg;
 
-    if (!arg_is_const(op->args[1]) || arg_info(op->args[1])->val != 0) {
+    if (!arg_is_const_val(op->args[1], 0)) {
         return false;
     }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 15/15] tcg/optimize: Lower unsupported deposit during optimize
  2024-03-12 14:38 [PATCH for-9.1 00/15] tcg: Canonicalize operations during optimize Richard Henderson
                   ` (13 preceding siblings ...)
  2024-03-12 14:38 ` [PATCH 14/15] tcg: Use arg_is_const_val in fold_sub_to_neg Richard Henderson
@ 2024-03-12 14:38 ` Richard Henderson
  14 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2024-03-12 14:38 UTC (permalink / raw)
  To: qemu-devel

The expansions that we chose in tcg-op.c may be less than optimial.
Delay lowering until optimize, so that we have propagated constants
and have computed known zero masks.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 231 +++++++++++++++++++++++++++++++++++++++++-----
 tcg/tcg-op.c   | 244 ++++++++++++-------------------------------------
 2 files changed, 266 insertions(+), 209 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index f3867ce9e6..ce1dbab097 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1632,51 +1632,234 @@ static bool fold_ctpop(OptContext *ctx, TCGOp *op)
 
 static bool fold_deposit(OptContext *ctx, TCGOp *op)
 {
-    TCGOpcode and_opc;
+    TCGOpcode and_opc, or_opc, ex2_opc, shl_opc, rotl_opc;
+    TCGOp *op2;
+    TCGArg ret = op->args[0];
+    TCGArg arg1 = op->args[1];
+    TCGArg arg2 = op->args[2];
+    int ofs = op->args[3];
+    int len = op->args[4];
+    int width;
+    uint64_t type_mask;
+    bool valid;
 
-    if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
-        uint64_t t1 = arg_info(op->args[1])->val;
-        uint64_t t2 = arg_info(op->args[2])->val;
+    if (arg_is_const(arg1) && arg_is_const(arg2)) {
+        uint64_t t1 = arg_info(arg1)->val;
+        uint64_t t2 = arg_info(arg2)->val;
 
-        t1 = deposit64(t1, op->args[3], op->args[4], t2);
-        return tcg_opt_gen_movi(ctx, op, op->args[0], t1);
+        t1 = deposit64(t1, ofs, len, t2);
+        return tcg_opt_gen_movi(ctx, op, ret, t1);
     }
 
     switch (ctx->type) {
     case TCG_TYPE_I32:
         and_opc = INDEX_op_and_i32;
+        or_opc = INDEX_op_or_i32;
+        shl_opc = INDEX_op_shl_i32;
+        ex2_opc = TCG_TARGET_HAS_extract2_i32 ? INDEX_op_extract2_i32 : 0;
+        rotl_opc = TCG_TARGET_HAS_rot_i32 ? INDEX_op_rotl_i32 : 0;
+        valid = (TCG_TARGET_HAS_deposit_i32 &&
+                 TCG_TARGET_deposit_i32_valid(ofs, len));
+        width = 32;
+        type_mask = UINT32_MAX;
         break;
     case TCG_TYPE_I64:
         and_opc = INDEX_op_and_i64;
+        or_opc = INDEX_op_or_i64;
+        shl_opc = INDEX_op_shl_i64;
+        ex2_opc = TCG_TARGET_HAS_extract2_i64 ? INDEX_op_extract2_i64 : 0;
+        rotl_opc = TCG_TARGET_HAS_rot_i64 ? INDEX_op_rotl_i64 : 0;
+        valid = (TCG_TARGET_HAS_deposit_i64 &&
+                 TCG_TARGET_deposit_i64_valid(ofs, len));
+        width = 64;
+        type_mask = UINT64_MAX;
         break;
     default:
         g_assert_not_reached();
     }
 
-    /* Inserting a value into zero at offset 0. */
-    if (arg_is_const_val(op->args[1], 0) && op->args[3] == 0) {
-        uint64_t mask = MAKE_64BIT_MASK(0, op->args[4]);
+    if (arg_is_const(arg2)) {
+        uint64_t val = arg_info(arg2)->val;
+        uint64_t mask = MAKE_64BIT_MASK(0, len);
 
-        op->opc = and_opc;
-        op->args[1] = op->args[2];
-        op->args[2] = arg_new_constant(ctx, mask);
-        ctx->z_mask = mask & arg_info(op->args[1])->z_mask;
-        return false;
+        /* Inserting all-zero into a value. */
+        if ((val & mask) == 0) {
+            op->opc = and_opc;
+            op->args[2] = arg_new_constant(ctx, ~(mask << ofs));
+            return fold_and(ctx, op);
+        }
+
+        /* Inserting all-one into a value. */
+        if ((val & mask) == mask) {
+            op->opc = or_opc;
+            op->args[2] = arg_new_constant(ctx, mask << ofs);
+            goto done;
+        }
+
+        /* Lower invalid deposit of constant as AND + OR. */
+        if (!valid) {
+            op2 = tcg_op_insert_before(ctx->tcg, op, and_opc, 3);
+            op2->args[0] = ret;
+            op2->args[1] = arg1;
+            op2->args[2] = arg_new_constant(ctx, ~(mask << ofs));
+            fold_and(ctx, op2); /* fold to ext*u */
+
+            op->opc = or_opc;
+            op->args[1] = ret;
+            op->args[2] = arg_new_constant(ctx, (val & mask) << ofs);
+            goto done;
+        }
     }
 
-    /* Inserting zero into a value. */
-    if (arg_is_const_val(op->args[2], 0)) {
-        uint64_t mask = deposit64(-1, op->args[3], op->args[4], 0);
+    /* Inserting a value into zero. */
+    if (arg_is_const_val(arg1, 0)) {
+        uint64_t mask = MAKE_64BIT_MASK(0, len);
+        uint64_t need_mask = arg_info(arg2)->z_mask & ~mask & type_mask;
 
-        op->opc = and_opc;
-        op->args[2] = arg_new_constant(ctx, mask);
-        ctx->z_mask = mask & arg_info(op->args[1])->z_mask;
-        return false;
+        /* Always lower deposit into zero at 0 as AND. */
+        if (ofs == 0) {
+            if (!need_mask) {
+                return tcg_opt_gen_mov(ctx, op, ret, arg2);
+            }
+            op->opc = and_opc;
+            op->args[1] = arg2;
+            op->args[2] = arg_new_constant(ctx, mask);
+            return fold_and(ctx, op);
+        }
+
+        /* If no mask required, fold as SHL. */
+        if (!((need_mask << ofs) & type_mask)) {
+            op->opc = shl_opc;
+            op->args[1] = arg2;
+            op->args[2] = arg_new_constant(ctx, ofs);
+            goto done;
+        }
+
+        /* Lower invalid deposit into zero as AND + SHL. */
+        if (!valid) {
+            /*
+             * ret = arg2 & mask
+             * ret = ret << ofs
+             */
+            TCGOpcode ext_second_opc = 0;
+
+            switch (ofs + len) {
+            case 8:
+                ext_second_opc =
+                    (ctx->type == TCG_TYPE_I32
+                     ? (TCG_TARGET_HAS_ext8u_i32 ? INDEX_op_ext8u_i32 : 0)
+                     : (TCG_TARGET_HAS_ext8u_i64 ? INDEX_op_ext8u_i64 : 0));
+                break;
+            case 16:
+                ext_second_opc =
+                    (ctx->type == TCG_TYPE_I32
+                     ? (TCG_TARGET_HAS_ext16u_i32 ? INDEX_op_ext16u_i32 : 0)
+                     : (TCG_TARGET_HAS_ext16u_i64 ? INDEX_op_ext16u_i64 : 0));
+                break;
+            case 32:
+                ext_second_opc =
+                    TCG_TARGET_HAS_ext32u_i64 ? INDEX_op_ext32u_i64 : 0;
+                break;
+            }
+
+            if (ext_second_opc) {
+                op2 = tcg_op_insert_before(ctx->tcg, op, shl_opc, 3);
+                op2->args[0] = ret;
+                op2->args[1] = arg2;
+                op2->args[2] = arg_new_constant(ctx, ofs);
+
+                op->opc = ext_second_opc;
+                op->args[1] = ret;
+            } else {
+                op2 = tcg_op_insert_before(ctx->tcg, op, and_opc, 3);
+                op2->args[0] = ret;
+                op2->args[1] = arg2;
+                op2->args[2] = arg_new_constant(ctx, mask);
+                fold_and(ctx, op2);
+
+                op->opc = shl_opc;
+                op->args[1] = ret;
+                op->args[2] = arg_new_constant(ctx, ofs);
+            }
+            goto done;
+        }
     }
 
-    ctx->z_mask = deposit64(arg_info(op->args[1])->z_mask,
-                            op->args[3], op->args[4],
-                            arg_info(op->args[2])->z_mask);
+    /* After special cases, lower invalid deposit. */
+    if (!valid) {
+        uint64_t mask = MAKE_64BIT_MASK(0, len);
+        TCGArg tmp;
+
+        /*
+         * ret = arg2:arg1 >> len
+         * ret = rotl(ret, len)
+         */
+        if (ex2_opc && rotl_opc && ofs == 0) {
+            op2 = tcg_op_insert_before(ctx->tcg, op, ex2_opc, 4);
+            op2->args[0] = ret;
+            op2->args[1] = arg1;
+            op2->args[2] = arg2;
+            op2->args[3] = len;
+
+            op->opc = rotl_opc;
+            op->args[1] = ret;
+            op->args[2] = arg_new_constant(ctx, len);
+            goto done;
+        }
+
+        /*
+         * tmp = arg1 << len
+         * ret = arg2:tmp >> len
+         */
+        if (ex2_opc && ofs + len == width) {
+            tmp = ret == arg2 ? arg_new_temp(ctx) : ret;
+
+            op2 = tcg_op_insert_before(ctx->tcg, op, shl_opc, 4);
+            op2->args[0] = tmp;
+            op2->args[1] = arg1;
+            op2->args[2] = arg_new_constant(ctx, len);
+
+            op->opc = ex2_opc;
+            op->args[0] = ret;
+            op->args[1] = tmp;
+            op->args[2] = arg2;
+            op->args[3] = len;
+            goto done;
+        }
+
+        /*
+         * tmp = arg2 & mask
+         * ret = arg1 & ~(mask << ofs)
+         * tmp = tmp << ofs
+         * ret = ret | tmp
+         */
+        tmp = arg_new_temp(ctx);
+
+        op2 = tcg_op_insert_before(ctx->tcg, op, and_opc, 3);
+        op2->args[0] = tmp;
+        op2->args[1] = arg2;
+        op2->args[2] = arg_new_constant(ctx, mask);
+        fold_and(ctx, op2);
+
+        op2 = tcg_op_insert_before(ctx->tcg, op, shl_opc, 3);
+        op2->args[0] = tmp;
+        op2->args[1] = tmp;
+        op2->args[2] = arg_new_constant(ctx, ofs);
+
+        op2 = tcg_op_insert_before(ctx->tcg, op, and_opc, 3);
+        op2->args[0] = ret;
+        op2->args[1] = arg1;
+        op2->args[2] = arg_new_constant(ctx, ~(mask << ofs));
+        fold_and(ctx, op2);
+
+        op->opc = or_opc;
+        op->args[1] = ret;
+        op->args[2] = tmp;
+    }
+
+ done:
+    ctx->z_mask = deposit64(arg_info(arg1)->z_mask, ofs, len,
+                            arg_info(arg2)->z_mask);
     return false;
 }
 
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index aa6bc6f57d..76a1f5e296 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -874,9 +874,6 @@ void tcg_gen_rotri_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
 void tcg_gen_deposit_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2,
                          unsigned int ofs, unsigned int len)
 {
-    uint32_t mask;
-    TCGv_i32 t1;
-
     tcg_debug_assert(ofs < 32);
     tcg_debug_assert(len > 0);
     tcg_debug_assert(len <= 32);
@@ -886,37 +883,7 @@ void tcg_gen_deposit_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2,
         tcg_gen_mov_i32(ret, arg2);
         return;
     }
-    if (TCG_TARGET_HAS_deposit_i32 && TCG_TARGET_deposit_i32_valid(ofs, len)) {
-        tcg_gen_op5ii_i32(INDEX_op_deposit_i32, ret, arg1, arg2, ofs, len);
-        return;
-    }
-
-    t1 = tcg_temp_ebb_new_i32();
-
-    if (TCG_TARGET_HAS_extract2_i32) {
-        if (ofs + len == 32) {
-            tcg_gen_shli_i32(t1, arg1, len);
-            tcg_gen_extract2_i32(ret, t1, arg2, len);
-            goto done;
-        }
-        if (ofs == 0) {
-            tcg_gen_extract2_i32(ret, arg1, arg2, len);
-            tcg_gen_rotli_i32(ret, ret, len);
-            goto done;
-        }
-    }
-
-    mask = (1u << len) - 1;
-    if (ofs + len < 32) {
-        tcg_gen_andi_i32(t1, arg2, mask);
-        tcg_gen_shli_i32(t1, t1, ofs);
-    } else {
-        tcg_gen_shli_i32(t1, arg2, ofs);
-    }
-    tcg_gen_andi_i32(ret, arg1, ~(mask << ofs));
-    tcg_gen_or_i32(ret, ret, t1);
- done:
-    tcg_temp_free_i32(t1);
+    tcg_gen_op5ii_i32(INDEX_op_deposit_i32, ret, arg1, arg2, ofs, len);
 }
 
 void tcg_gen_deposit_z_i32(TCGv_i32 ret, TCGv_i32 arg,
@@ -931,48 +898,9 @@ void tcg_gen_deposit_z_i32(TCGv_i32 ret, TCGv_i32 arg,
         tcg_gen_shli_i32(ret, arg, ofs);
     } else if (ofs == 0) {
         tcg_gen_andi_i32(ret, arg, (1u << len) - 1);
-    } else if (TCG_TARGET_HAS_deposit_i32
-               && TCG_TARGET_deposit_i32_valid(ofs, len)) {
+    } else {
         TCGv_i32 zero = tcg_constant_i32(0);
         tcg_gen_op5ii_i32(INDEX_op_deposit_i32, ret, zero, arg, ofs, len);
-    } else {
-        /* To help two-operand hosts we prefer to zero-extend first,
-           which allows ARG to stay live.  */
-        switch (len) {
-        case 16:
-            if (TCG_TARGET_HAS_ext16u_i32) {
-                tcg_gen_ext16u_i32(ret, arg);
-                tcg_gen_shli_i32(ret, ret, ofs);
-                return;
-            }
-            break;
-        case 8:
-            if (TCG_TARGET_HAS_ext8u_i32) {
-                tcg_gen_ext8u_i32(ret, arg);
-                tcg_gen_shli_i32(ret, ret, ofs);
-                return;
-            }
-            break;
-        }
-        /* Otherwise prefer zero-extension over AND for code size.  */
-        switch (ofs + len) {
-        case 16:
-            if (TCG_TARGET_HAS_ext16u_i32) {
-                tcg_gen_shli_i32(ret, arg, ofs);
-                tcg_gen_ext16u_i32(ret, ret);
-                return;
-            }
-            break;
-        case 8:
-            if (TCG_TARGET_HAS_ext8u_i32) {
-                tcg_gen_shli_i32(ret, arg, ofs);
-                tcg_gen_ext8u_i32(ret, ret);
-                return;
-            }
-            break;
-        }
-        tcg_gen_andi_i32(ret, arg, (1u << len) - 1);
-        tcg_gen_shli_i32(ret, ret, ofs);
     }
 }
 
@@ -2611,9 +2539,6 @@ void tcg_gen_rotri_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
 void tcg_gen_deposit_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2,
                          unsigned int ofs, unsigned int len)
 {
-    uint64_t mask;
-    TCGv_i64 t1;
-
     tcg_debug_assert(ofs < 64);
     tcg_debug_assert(len > 0);
     tcg_debug_assert(len <= 64);
@@ -2623,52 +2548,41 @@ void tcg_gen_deposit_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2,
         tcg_gen_mov_i64(ret, arg2);
         return;
     }
-    if (TCG_TARGET_HAS_deposit_i64 && TCG_TARGET_deposit_i64_valid(ofs, len)) {
+
+    if (TCG_TARGET_REG_BITS == 64) {
         tcg_gen_op5ii_i64(INDEX_op_deposit_i64, ret, arg1, arg2, ofs, len);
-        return;
-    }
-
-    if (TCG_TARGET_REG_BITS == 32) {
-        if (ofs >= 32) {
-            tcg_gen_deposit_i32(TCGV_HIGH(ret), TCGV_HIGH(arg1),
-                                TCGV_LOW(arg2), ofs - 32, len);
-            tcg_gen_mov_i32(TCGV_LOW(ret), TCGV_LOW(arg1));
-            return;
-        }
-        if (ofs + len <= 32) {
-            tcg_gen_deposit_i32(TCGV_LOW(ret), TCGV_LOW(arg1),
-                                TCGV_LOW(arg2), ofs, len);
-            tcg_gen_mov_i32(TCGV_HIGH(ret), TCGV_HIGH(arg1));
-            return;
-        }
-    }
-
-    t1 = tcg_temp_ebb_new_i64();
-
-    if (TCG_TARGET_HAS_extract2_i64) {
-        if (ofs + len == 64) {
-            tcg_gen_shli_i64(t1, arg1, len);
-            tcg_gen_extract2_i64(ret, t1, arg2, len);
-            goto done;
-        }
-        if (ofs == 0) {
-            tcg_gen_extract2_i64(ret, arg1, arg2, len);
-            tcg_gen_rotli_i64(ret, ret, len);
-            goto done;
-        }
-    }
-
-    mask = (1ull << len) - 1;
-    if (ofs + len < 64) {
-        tcg_gen_andi_i64(t1, arg2, mask);
-        tcg_gen_shli_i64(t1, t1, ofs);
+    } else if (ofs >= 32) {
+        tcg_gen_deposit_i32(TCGV_HIGH(ret), TCGV_HIGH(arg1),
+                            TCGV_LOW(arg2), ofs - 32, len);
+        tcg_gen_mov_i32(TCGV_LOW(ret), TCGV_LOW(arg1));
+    } else if (ofs + len <= 32) {
+        tcg_gen_deposit_i32(TCGV_LOW(ret), TCGV_LOW(arg1),
+                            TCGV_LOW(arg2), ofs, len);
+        tcg_gen_mov_i32(TCGV_HIGH(ret), TCGV_HIGH(arg1));
+    } else if (ofs == 0) {
+        tcg_gen_deposit_i32(TCGV_HIGH(ret), TCGV_HIGH(arg1),
+                            TCGV_HIGH(arg2), 0, len - 32);
+        tcg_gen_mov_i32(TCGV_LOW(ret), TCGV_LOW(arg2));
     } else {
-        tcg_gen_shli_i64(t1, arg2, ofs);
+        /* The 64-bit deposit is split across the 32-bit halves. */
+        unsigned lo_len = 32 - ofs;
+        unsigned hi_len = len - lo_len;
+        TCGv_i32 tl = tcg_temp_ebb_new_i32();
+        TCGv_i32 th = tcg_temp_ebb_new_i32();
+
+        tcg_gen_deposit_i32(tl, TCGV_LOW(arg1), TCGV_LOW(arg2), ofs, lo_len);
+        if (len <= 32) {
+            tcg_gen_shri_i32(th, TCGV_LOW(arg2), lo_len);
+        } else {
+            tcg_gen_extract2_i32(th, TCGV_LOW(arg2), TCGV_HIGH(arg2), lo_len);
+        }
+        tcg_gen_deposit_i32(th, TCGV_HIGH(arg1), th, 0, hi_len);
+
+        tcg_gen_mov_i32(TCGV_LOW(ret), tl);
+        tcg_gen_mov_i32(TCGV_HIGH(ret), th);
+        tcg_temp_free_i32(tl);
+        tcg_temp_free_i32(th);
     }
-    tcg_gen_andi_i64(ret, arg1, ~(mask << ofs));
-    tcg_gen_or_i64(ret, ret, t1);
- done:
-    tcg_temp_free_i64(t1);
 }
 
 void tcg_gen_deposit_z_i64(TCGv_i64 ret, TCGv_i64 arg,
@@ -2683,75 +2597,35 @@ void tcg_gen_deposit_z_i64(TCGv_i64 ret, TCGv_i64 arg,
         tcg_gen_shli_i64(ret, arg, ofs);
     } else if (ofs == 0) {
         tcg_gen_andi_i64(ret, arg, (1ull << len) - 1);
-    } else if (TCG_TARGET_HAS_deposit_i64
-               && TCG_TARGET_deposit_i64_valid(ofs, len)) {
+    } else if (TCG_TARGET_REG_BITS == 64) {
         TCGv_i64 zero = tcg_constant_i64(0);
         tcg_gen_op5ii_i64(INDEX_op_deposit_i64, ret, zero, arg, ofs, len);
+    } else if (ofs >= 32) {
+        tcg_gen_deposit_z_i32(TCGV_HIGH(ret), TCGV_LOW(arg), ofs - 32, len);
+        tcg_gen_movi_i32(TCGV_LOW(ret), 0);
+    } else if (ofs + len <= 32) {
+        tcg_gen_deposit_z_i32(TCGV_LOW(ret), TCGV_LOW(arg), ofs, len);
+        tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
+    } else if (ofs == 0) {
+        tcg_gen_deposit_z_i32(TCGV_HIGH(ret), TCGV_HIGH(arg), 0, len - 32);
+        tcg_gen_mov_i32(TCGV_LOW(ret), TCGV_LOW(arg));
     } else {
-        if (TCG_TARGET_REG_BITS == 32) {
-            if (ofs >= 32) {
-                tcg_gen_deposit_z_i32(TCGV_HIGH(ret), TCGV_LOW(arg),
-                                      ofs - 32, len);
-                tcg_gen_movi_i32(TCGV_LOW(ret), 0);
-                return;
-            }
-            if (ofs + len <= 32) {
-                tcg_gen_deposit_z_i32(TCGV_LOW(ret), TCGV_LOW(arg), ofs, len);
-                tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
-                return;
-            }
+        /* The 64-bit deposit is split across the 32-bit halves. */
+        unsigned lo_len = 32 - ofs;
+        unsigned hi_len = len - lo_len;
+        TCGv_i32 tl = tcg_temp_ebb_new_i32();
+        TCGv_i32 th = TCGV_HIGH(ret);
+
+        tcg_gen_deposit_z_i32(tl, TCGV_LOW(arg), ofs, lo_len);
+        if (len <= 32) {
+            tcg_gen_shri_i32(th, TCGV_LOW(arg), lo_len);
+        } else {
+            tcg_gen_extract2_i32(th, TCGV_LOW(arg), TCGV_HIGH(arg), lo_len);
         }
-        /* To help two-operand hosts we prefer to zero-extend first,
-           which allows ARG to stay live.  */
-        switch (len) {
-        case 32:
-            if (TCG_TARGET_HAS_ext32u_i64) {
-                tcg_gen_ext32u_i64(ret, arg);
-                tcg_gen_shli_i64(ret, ret, ofs);
-                return;
-            }
-            break;
-        case 16:
-            if (TCG_TARGET_HAS_ext16u_i64) {
-                tcg_gen_ext16u_i64(ret, arg);
-                tcg_gen_shli_i64(ret, ret, ofs);
-                return;
-            }
-            break;
-        case 8:
-            if (TCG_TARGET_HAS_ext8u_i64) {
-                tcg_gen_ext8u_i64(ret, arg);
-                tcg_gen_shli_i64(ret, ret, ofs);
-                return;
-            }
-            break;
-        }
-        /* Otherwise prefer zero-extension over AND for code size.  */
-        switch (ofs + len) {
-        case 32:
-            if (TCG_TARGET_HAS_ext32u_i64) {
-                tcg_gen_shli_i64(ret, arg, ofs);
-                tcg_gen_ext32u_i64(ret, ret);
-                return;
-            }
-            break;
-        case 16:
-            if (TCG_TARGET_HAS_ext16u_i64) {
-                tcg_gen_shli_i64(ret, arg, ofs);
-                tcg_gen_ext16u_i64(ret, ret);
-                return;
-            }
-            break;
-        case 8:
-            if (TCG_TARGET_HAS_ext8u_i64) {
-                tcg_gen_shli_i64(ret, arg, ofs);
-                tcg_gen_ext8u_i64(ret, ret);
-                return;
-            }
-            break;
-        }
-        tcg_gen_andi_i64(ret, arg, (1ull << len) - 1);
-        tcg_gen_shli_i64(ret, ret, ofs);
+        tcg_gen_deposit_z_i32(th, th, 0, hi_len);
+
+        tcg_gen_mov_i32(TCGV_LOW(ret), tl);
+        tcg_temp_free_i32(tl);
     }
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 01/15] tcg/optimize: Fold andc with immediate to and
  2024-03-12 14:38 ` [PATCH 01/15] tcg/optimize: Fold andc with immediate to and Richard Henderson
@ 2024-03-13  1:29   ` Richard Henderson
  0 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2024-03-13  1:29 UTC (permalink / raw)
  To: qemu-devel

On 3/12/24 04:38, Richard Henderson wrote:
> +        /* Fold andc r,x,i to and r,x,~i. */
> +        op->opc = (ctx->type == TCG_TYPE_I32
> +                   ? INDEX_op_and_i32 : INDEX_op_and_i64);

This and the next two patches also need to handle vector types.


r~


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-03-13  1:30 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-12 14:38 [PATCH for-9.1 00/15] tcg: Canonicalize operations during optimize Richard Henderson
2024-03-12 14:38 ` [PATCH 01/15] tcg/optimize: Fold andc with immediate to and Richard Henderson
2024-03-13  1:29   ` Richard Henderson
2024-03-12 14:38 ` [PATCH 02/15] tcg/optimize: Fold orc with immediate to or Richard Henderson
2024-03-12 14:38 ` [PATCH 03/15] tcg/optimize: Fold eqv with immediate to xor Richard Henderson
2024-03-12 14:38 ` [PATCH 04/15] tcg/i386: Do not accept immediate operand for andc Richard Henderson
2024-03-12 14:38 ` [PATCH 05/15] tcg/aarch64: Do not accept immediate operand for andc, orc, eqv Richard Henderson
2024-03-12 14:38 ` [PATCH 06/15] tcg/arm: Do not accept immediate operand for andc Richard Henderson
2024-03-12 14:38 ` [PATCH 07/15] tcg/ppc: Do not accept immediate operand for andc, orc, eqv Richard Henderson
2024-03-12 14:38 ` [PATCH 08/15] tcg/loongarch64: Do not accept immediate operand for andc, orc Richard Henderson
2024-03-12 14:38 ` [PATCH 09/15] tcg/s390x: " Richard Henderson
2024-03-12 14:38 ` [PATCH 10/15] tcg/riscv: Do not accept immediate operand for andc, orc, eqv Richard Henderson
2024-03-12 14:38 ` [PATCH 11/15] tcg/riscv: Do not accept immediate operands for sub Richard Henderson
2024-03-12 14:38 ` [PATCH 12/15] tcg/riscv: Do not accept zero operands for logicals, multiply or divide Richard Henderson
2024-03-12 14:38 ` [PATCH 13/15] tcg/optimize: Fold and to extu during optimize Richard Henderson
2024-03-12 14:38 ` [PATCH 14/15] tcg: Use arg_is_const_val in fold_sub_to_neg Richard Henderson
2024-03-12 14:38 ` [PATCH 15/15] tcg/optimize: Lower unsupported deposit during optimize Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).