qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/27] tcg/s390x: misc patches
@ 2022-12-09  2:05 Richard Henderson
  2022-12-09  2:05 ` [PATCH v4 01/27] tcg/s390x: Use register pair allocation for div and mulu2 Richard Henderson
                   ` (27 more replies)
  0 siblings, 28 replies; 45+ messages in thread
From: Richard Henderson @ 2022-12-09  2:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, iii

Based-on: 20221202053958.223890-1-richard.henderson@linaro.org
("[PATCH for-8.0 v3 00/34] tcg misc patches")

Changes from v3:
  * Require z196 as minimum cpu -- 6 new patches removing checks.
  * Tighten constraints on AND, OR, XOR, CMP, trying get the register
    allocator to hoist things that can't be done in a single insn.
  * Avoid the constant pool for movi.

I believe that I have addressed all of the discussion in v3,
except perhaps for goto_tb concurrent modifications to jumps.
I'm still not quite sure what to do about that.


r~


Richard Henderson (27):
  tcg/s390x: Use register pair allocation for div and mulu2
  tcg/s390x: Remove TCG_REG_TB
  tcg/s390x: Always set TCG_TARGET_HAS_direct_jump
  tcg/s390x: Remove USE_LONG_BRANCHES
  tcg/s390x: Check for long-displacement facility at startup
  tcg/s390x: Check for extended-immediate facility at startup
  tcg/s390x: Check for general-instruction-extension facility at startup
  tcg/s390x: Check for load-on-condition facility at startup
  tcg/s390x: Remove FAST_BCR_SER facility check
  tcg/s390x: Remove DISTINCT_OPERANDS facility check
  tcg/s390x: Use LARL+AGHI for odd addresses
  tcg/s390x: Distinguish RRF-a and RRF-c formats
  tcg/s390x: Distinguish RIE formats
  tcg/s390x: Support MIE2 multiply single instructions
  tcg/s390x: Support MIE2 MGRK instruction
  tcg/s390x: Issue XILF directly for xor_i32
  tcg/s390x: Tighten constraints for or_i64 and xor_i64
  tcg/s390x: Tighten constraints for and_i64
  tcg/s390x: Support MIE3 logical operations
  tcg/s390x: Create tgen_cmp2 to simplify movcond
  tcg/s390x: Generalize movcond implementation
  tcg/s390x: Support SELGR instruction in movcond
  tcg/s390x: Use tgen_movcond_int in tgen_clz
  tcg/s390x: Implement ctpop operation
  tcg/s390x: Tighten constraints for 64-bit compare
  tcg/s390x: Cleanup tcg_out_movi
  tcg/s390x: Avoid the constant pool in tcg_out_movi

 tcg/s390x/tcg-target-con-set.h |   18 +-
 tcg/s390x/tcg-target-con-str.h |   11 +-
 tcg/s390x/tcg-target.h         |   54 +-
 tcg/s390x/tcg-target.c.inc     | 1251 ++++++++++++++++----------------
 4 files changed, 668 insertions(+), 666 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v4 01/27] tcg/s390x: Use register pair allocation for div and mulu2
  2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
@ 2022-12-09  2:05 ` Richard Henderson
  2022-12-09  2:05 ` [PATCH v4 02/27] tcg/s390x: Remove TCG_REG_TB Richard Henderson
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 45+ messages in thread
From: Richard Henderson @ 2022-12-09  2:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, iii

Previously we hard-coded R2 and R3.

Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target-con-set.h |  4 ++--
 tcg/s390x/tcg-target-con-str.h |  8 +------
 tcg/s390x/tcg-target.c.inc     | 43 +++++++++++++++++++++++++---------
 3 files changed, 35 insertions(+), 20 deletions(-)

diff --git a/tcg/s390x/tcg-target-con-set.h b/tcg/s390x/tcg-target-con-set.h
index 426dd92e51..00ba727b70 100644
--- a/tcg/s390x/tcg-target-con-set.h
+++ b/tcg/s390x/tcg-target-con-set.h
@@ -29,8 +29,8 @@ C_O1_I2(v, v, v)
 C_O1_I3(v, v, v, v)
 C_O1_I4(r, r, ri, r, 0)
 C_O1_I4(r, r, ri, rI, 0)
-C_O2_I2(b, a, 0, r)
-C_O2_I3(b, a, 0, 1, r)
+C_O2_I2(o, m, 0, r)
+C_O2_I3(o, m, 0, 1, r)
 C_O2_I4(r, r, 0, 1, rA, r)
 C_O2_I4(r, r, 0, 1, ri, r)
 C_O2_I4(r, r, 0, 1, r, r)
diff --git a/tcg/s390x/tcg-target-con-str.h b/tcg/s390x/tcg-target-con-str.h
index 8bb0358ae5..76446aecae 100644
--- a/tcg/s390x/tcg-target-con-str.h
+++ b/tcg/s390x/tcg-target-con-str.h
@@ -11,13 +11,7 @@
 REGS('r', ALL_GENERAL_REGS)
 REGS('L', ALL_GENERAL_REGS & ~SOFTMMU_RESERVE_REGS)
 REGS('v', ALL_VECTOR_REGS)
-/*
- * A (single) even/odd pair for division.
- * TODO: Add something to the register allocator to allow
- * this kind of regno+1 pairing to be done more generally.
- */
-REGS('a', 1u << TCG_REG_R2)
-REGS('b', 1u << TCG_REG_R3)
+REGS('o', 0xaaaa) /* odd numbered general regs */
 
 /*
  * Define constraint letters for constants:
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index b9ba7b605e..cb00bb6999 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -2264,10 +2264,18 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_div2_i32:
-        tcg_out_insn(s, RR, DR, TCG_REG_R2, args[4]);
+        tcg_debug_assert(args[0] == args[2]);
+        tcg_debug_assert(args[1] == args[3]);
+        tcg_debug_assert((args[1] & 1) == 0);
+        tcg_debug_assert(args[0] == args[1] + 1);
+        tcg_out_insn(s, RR, DR, args[1], args[4]);
         break;
     case INDEX_op_divu2_i32:
-        tcg_out_insn(s, RRE, DLR, TCG_REG_R2, args[4]);
+        tcg_debug_assert(args[0] == args[2]);
+        tcg_debug_assert(args[1] == args[3]);
+        tcg_debug_assert((args[1] & 1) == 0);
+        tcg_debug_assert(args[0] == args[1] + 1);
+        tcg_out_insn(s, RRE, DLR, args[1], args[4]);
         break;
 
     case INDEX_op_shl_i32:
@@ -2521,17 +2529,30 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_div2_i64:
-        /* ??? We get an unnecessary sign-extension of the dividend
-           into R3 with this definition, but as we do in fact always
-           produce both quotient and remainder using INDEX_op_div_i64
-           instead requires jumping through even more hoops.  */
-        tcg_out_insn(s, RRE, DSGR, TCG_REG_R2, args[4]);
+        /*
+         * ??? We get an unnecessary sign-extension of the dividend
+         * into op0 with this definition, but as we do in fact always
+         * produce both quotient and remainder using INDEX_op_div_i64
+         * instead requires jumping through even more hoops.
+         */
+        tcg_debug_assert(args[0] == args[2]);
+        tcg_debug_assert(args[1] == args[3]);
+        tcg_debug_assert((args[1] & 1) == 0);
+        tcg_debug_assert(args[0] == args[1] + 1);
+        tcg_out_insn(s, RRE, DSGR, args[1], args[4]);
         break;
     case INDEX_op_divu2_i64:
-        tcg_out_insn(s, RRE, DLGR, TCG_REG_R2, args[4]);
+        tcg_debug_assert(args[0] == args[2]);
+        tcg_debug_assert(args[1] == args[3]);
+        tcg_debug_assert((args[1] & 1) == 0);
+        tcg_debug_assert(args[0] == args[1] + 1);
+        tcg_out_insn(s, RRE, DLGR, args[1], args[4]);
         break;
     case INDEX_op_mulu2_i64:
-        tcg_out_insn(s, RRE, MLGR, TCG_REG_R2, args[3]);
+        tcg_debug_assert(args[0] == args[2]);
+        tcg_debug_assert((args[1] & 1) == 0);
+        tcg_debug_assert(args[0] == args[1] + 1);
+        tcg_out_insn(s, RRE, MLGR, args[1], args[3]);
         break;
 
     case INDEX_op_shl_i64:
@@ -3226,10 +3247,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_div2_i64:
     case INDEX_op_divu2_i32:
     case INDEX_op_divu2_i64:
-        return C_O2_I3(b, a, 0, 1, r);
+        return C_O2_I3(o, m, 0, 1, r);
 
     case INDEX_op_mulu2_i64:
-        return C_O2_I2(b, a, 0, r);
+        return C_O2_I2(o, m, 0, r);
 
     case INDEX_op_add2_i32:
     case INDEX_op_sub2_i32:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 02/27] tcg/s390x: Remove TCG_REG_TB
  2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
  2022-12-09  2:05 ` [PATCH v4 01/27] tcg/s390x: Use register pair allocation for div and mulu2 Richard Henderson
@ 2022-12-09  2:05 ` Richard Henderson
  2022-12-09  2:05 ` [PATCH v4 03/27] tcg/s390x: Always set TCG_TARGET_HAS_direct_jump Richard Henderson
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 45+ messages in thread
From: Richard Henderson @ 2022-12-09  2:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, iii

This reverts 829e1376d940 ("tcg/s390: Introduce TCG_REG_TB"), and
several follow-up patches.  The primary motivation is to reduce the
less-tested code paths, pre-z10.  Secondarily, this allows the
unconditional use of TCG_TARGET_HAS_direct_jump, which might be more
important for performance than any slight increase in code size.

Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
v4: Do not simplify tgen_ori, tgen_xori.
---
 tcg/s390x/tcg-target.c.inc | 97 +++-----------------------------------
 1 file changed, 6 insertions(+), 91 deletions(-)

diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index cb00bb6999..ba4bb6a629 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -65,12 +65,6 @@
 /* A scratch register that may be be used throughout the backend.  */
 #define TCG_TMP0        TCG_REG_R1
 
-/* A scratch register that holds a pointer to the beginning of the TB.
-   We don't need this when we have pc-relative loads with the general
-   instructions extension facility.  */
-#define TCG_REG_TB      TCG_REG_R12
-#define USE_REG_TB      (!HAVE_FACILITY(GEN_INST_EXT))
-
 #ifndef CONFIG_SOFTMMU
 #define TCG_GUEST_BASE_REG TCG_REG_R13
 #endif
@@ -813,8 +807,8 @@ static bool maybe_out_small_movi(TCGContext *s, TCGType type,
 }
 
 /* load a register with an immediate value */
-static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
-                             tcg_target_long sval, bool in_prologue)
+static void tcg_out_movi(TCGContext *s, TCGType type,
+                         TCGReg ret, tcg_target_long sval)
 {
     tcg_target_ulong uval;
 
@@ -853,14 +847,6 @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
             tcg_out_insn(s, RIL, LARL, ret, off);
             return;
         }
-    } else if (USE_REG_TB && !in_prologue) {
-        ptrdiff_t off = tcg_tbrel_diff(s, (void *)sval);
-        if (off == sextract64(off, 0, 20)) {
-            /* This is certain to be an address within TB, and therefore
-               OFF will be negative; don't try RX_LA.  */
-            tcg_out_insn(s, RXY, LAY, ret, TCG_REG_TB, TCG_REG_NONE, off);
-            return;
-        }
     }
 
     /* A 32-bit unsigned value can be loaded in 2 insns.  And given
@@ -876,10 +862,6 @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
     if (HAVE_FACILITY(GEN_INST_EXT)) {
         tcg_out_insn(s, RIL, LGRL, ret, 0);
         new_pool_label(s, sval, R_390_PC32DBL, s->code_ptr - 2, 2);
-    } else if (USE_REG_TB && !in_prologue) {
-        tcg_out_insn(s, RXY, LG, ret, TCG_REG_TB, TCG_REG_NONE, 0);
-        new_pool_label(s, sval, R_390_20, s->code_ptr - 2,
-                       tcg_tbrel_diff(s, NULL));
     } else {
         TCGReg base = ret ? ret : TCG_TMP0;
         tcg_out_insn(s, RIL, LARL, base, 0);
@@ -888,12 +870,6 @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
     }
 }
 
-static void tcg_out_movi(TCGContext *s, TCGType type,
-                         TCGReg ret, tcg_target_long sval)
-{
-    tcg_out_movi_int(s, type, ret, sval, false);
-}
-
 /* Emit a load/store type instruction.  Inputs are:
    DATA:     The register to be loaded or stored.
    BASE+OFS: The effective address.
@@ -1037,13 +1013,6 @@ static void tcg_out_ld_abs(TCGContext *s, TCGType type,
             return;
         }
     }
-    if (USE_REG_TB) {
-        ptrdiff_t disp = tcg_tbrel_diff(s, abs);
-        if (disp == sextract64(disp, 0, 20)) {
-            tcg_out_ld(s, type, dest, TCG_REG_TB, disp);
-            return;
-        }
-    }
 
     tcg_out_movi(s, TCG_TYPE_PTR, dest, addr & ~0xffff);
     tcg_out_ld(s, type, dest, dest, addr & 0xffff);
@@ -1243,17 +1212,7 @@ static void tgen_andi(TCGContext *s, TCGType type, TCGReg dest, uint64_t val)
         return;
     }
 
-    /* Use the constant pool if USE_REG_TB, but not for small constants.  */
-    if (USE_REG_TB) {
-        if (!maybe_out_small_movi(s, type, TCG_TMP0, val)) {
-            tcg_out_insn(s, RXY, NG, dest, TCG_REG_TB, TCG_REG_NONE, 0);
-            new_pool_label(s, val & valid, R_390_20, s->code_ptr - 2,
-                           tcg_tbrel_diff(s, NULL));
-            return;
-        }
-    } else {
-        tcg_out_movi(s, type, TCG_TMP0, val);
-    }
+    tcg_out_movi(s, type, TCG_TMP0, val);
     if (type == TCG_TYPE_I32) {
         tcg_out_insn(s, RR, NR, dest, TCG_TMP0);
     } else {
@@ -1297,17 +1256,12 @@ static void tgen_ori(TCGContext *s, TCGType type, TCGReg dest, uint64_t val)
         }
     }
 
-    /* Use the constant pool if USE_REG_TB, but not for small constants.  */
     if (maybe_out_small_movi(s, type, TCG_TMP0, val)) {
         if (type == TCG_TYPE_I32) {
             tcg_out_insn(s, RR, OR, dest, TCG_TMP0);
         } else {
             tcg_out_insn(s, RRE, OGR, dest, TCG_TMP0);
         }
-    } else if (USE_REG_TB) {
-        tcg_out_insn(s, RXY, OG, dest, TCG_REG_TB, TCG_REG_NONE, 0);
-        new_pool_label(s, val, R_390_20, s->code_ptr - 2,
-                       tcg_tbrel_diff(s, NULL));
     } else {
         /* Perform the OR via sequential modifications to the high and
            low parts.  Do this via recursion to handle 16-bit vs 32-bit
@@ -1332,17 +1286,12 @@ static void tgen_xori(TCGContext *s, TCGType type, TCGReg dest, uint64_t val)
         }
     }
 
-    /* Use the constant pool if USE_REG_TB, but not for small constants.  */
     if (maybe_out_small_movi(s, type, TCG_TMP0, val)) {
         if (type == TCG_TYPE_I32) {
             tcg_out_insn(s, RR, XR, dest, TCG_TMP0);
         } else {
             tcg_out_insn(s, RRE, XGR, dest, TCG_TMP0);
         }
-    } else if (USE_REG_TB) {
-        tcg_out_insn(s, RXY, XG, dest, TCG_REG_TB, TCG_REG_NONE, 0);
-        new_pool_label(s, val, R_390_20, s->code_ptr - 2,
-                       tcg_tbrel_diff(s, NULL));
     } else {
         /* Perform the xor by parts.  */
         tcg_debug_assert(HAVE_FACILITY(EXT_IMM));
@@ -1395,19 +1344,6 @@ static int tgen_cmp(TCGContext *s, TCGType type, TCGCond c, TCGReg r1,
         if (maybe_out_small_movi(s, type, TCG_TMP0, c2)) {
             c2 = TCG_TMP0;
             /* fall through to reg-reg */
-        } else if (USE_REG_TB) {
-            if (type == TCG_TYPE_I32) {
-                op = (is_unsigned ? RXY_CLY : RXY_CY);
-                tcg_out_insn_RXY(s, op, r1, TCG_REG_TB, TCG_REG_NONE, 0);
-                new_pool_label(s, (uint32_t)c2, R_390_20, s->code_ptr - 2,
-                               4 - tcg_tbrel_diff(s, NULL));
-            } else {
-                op = (is_unsigned ? RXY_CLG : RXY_CG);
-                tcg_out_insn_RXY(s, op, r1, TCG_REG_TB, TCG_REG_NONE, 0);
-                new_pool_label(s, c2, R_390_20, s->code_ptr - 2,
-                               tcg_tbrel_diff(s, NULL));
-            }
-            goto exit;
         } else {
             if (type == TCG_TYPE_I32) {
                 op = (is_unsigned ? RIL_CLRL : RIL_CRL);
@@ -2109,35 +2045,21 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
             if (!QEMU_PTR_IS_ALIGNED(s->code_ptr + 1, 4)) {
                 tcg_out16(s, NOP);
             }
-            tcg_debug_assert(!USE_REG_TB);
             tcg_out16(s, RIL_BRCL | (S390_CC_ALWAYS << 4));
             s->tb_jmp_insn_offset[a0] = tcg_current_code_size(s);
             s->code_ptr += 2;
         } else {
             /* load address stored at s->tb_jmp_target_addr + a0 */
-            tcg_out_ld_abs(s, TCG_TYPE_PTR, TCG_REG_TB,
+            tcg_out_ld_abs(s, TCG_TYPE_PTR, TCG_TMP0,
                            tcg_splitwx_to_rx(s->tb_jmp_target_addr + a0));
             /* and go there */
-            tcg_out_insn(s, RR, BCR, S390_CC_ALWAYS, TCG_REG_TB);
+            tcg_out_insn(s, RR, BCR, S390_CC_ALWAYS, TCG_TMP0);
         }
         set_jmp_reset_offset(s, a0);
-
-        /* For the unlinked path of goto_tb, we need to reset
-           TCG_REG_TB to the beginning of this TB.  */
-        if (USE_REG_TB) {
-            int ofs = -tcg_current_code_size(s);
-            /* All TB are restricted to 64KiB by unwind info. */
-            tcg_debug_assert(ofs == sextract64(ofs, 0, 20));
-            tcg_out_insn(s, RXY, LAY, TCG_REG_TB,
-                         TCG_REG_TB, TCG_REG_NONE, ofs);
-        }
         break;
 
     case INDEX_op_goto_ptr:
         a0 = args[0];
-        if (USE_REG_TB) {
-            tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_TB, a0);
-        }
         tcg_out_insn(s, RR, BCR, S390_CC_ALWAYS, a0);
         break;
 
@@ -3405,9 +3327,6 @@ static void tcg_target_init(TCGContext *s)
     /* XXX many insns can't be used with R0, so we better avoid it for now */
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_R0);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
-    if (USE_REG_TB) {
-        tcg_regset_set_reg(s->reserved_regs, TCG_REG_TB);
-    }
 }
 
 #define FRAME_SIZE  ((int)(TCG_TARGET_CALL_STACK_OFFSET          \
@@ -3428,16 +3347,12 @@ static void tcg_target_qemu_prologue(TCGContext *s)
 
 #ifndef CONFIG_SOFTMMU
     if (guest_base >= 0x80000) {
-        tcg_out_movi_int(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG, guest_base, true);
+        tcg_out_movi(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG, guest_base);
         tcg_regset_set_reg(s->reserved_regs, TCG_GUEST_BASE_REG);
     }
 #endif
 
     tcg_out_mov(s, TCG_TYPE_PTR, TCG_AREG0, tcg_target_call_iarg_regs[0]);
-    if (USE_REG_TB) {
-        tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_TB,
-                    tcg_target_call_iarg_regs[1]);
-    }
 
     /* br %r3 (go to TB) */
     tcg_out_insn(s, RR, BCR, S390_CC_ALWAYS, tcg_target_call_iarg_regs[1]);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 03/27] tcg/s390x: Always set TCG_TARGET_HAS_direct_jump
  2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
  2022-12-09  2:05 ` [PATCH v4 01/27] tcg/s390x: Use register pair allocation for div and mulu2 Richard Henderson
  2022-12-09  2:05 ` [PATCH v4 02/27] tcg/s390x: Remove TCG_REG_TB Richard Henderson
@ 2022-12-09  2:05 ` Richard Henderson
  2022-12-12 21:51   ` Ilya Leoshkevich
  2022-12-09  2:05 ` [PATCH v4 04/27] tcg/s390x: Remove USE_LONG_BRANCHES Richard Henderson
                   ` (24 subsequent siblings)
  27 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2022-12-09  2:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, iii

Since USE_REG_TB is removed, there is no need to load the
target TB address into a register.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target.h     |  2 +-
 tcg/s390x/tcg-target.c.inc | 48 +++++++-------------------------------
 2 files changed, 10 insertions(+), 40 deletions(-)

diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index 22d70d431b..645f522058 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -103,7 +103,7 @@ extern uint64_t s390_facilities[3];
 #define TCG_TARGET_HAS_mulsh_i32      0
 #define TCG_TARGET_HAS_extrl_i64_i32  0
 #define TCG_TARGET_HAS_extrh_i64_i32  0
-#define TCG_TARGET_HAS_direct_jump    HAVE_FACILITY(GEN_INST_EXT)
+#define TCG_TARGET_HAS_direct_jump    1
 #define TCG_TARGET_HAS_qemu_st8_i32   0
 
 #define TCG_TARGET_HAS_div2_i64       1
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index ba4bb6a629..2cdd0d7a92 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -996,28 +996,6 @@ static inline bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
     return false;
 }
 
-/* load data from an absolute host address */
-static void tcg_out_ld_abs(TCGContext *s, TCGType type,
-                           TCGReg dest, const void *abs)
-{
-    intptr_t addr = (intptr_t)abs;
-
-    if (HAVE_FACILITY(GEN_INST_EXT) && !(addr & 1)) {
-        ptrdiff_t disp = tcg_pcrel_diff(s, abs) >> 1;
-        if (disp == (int32_t)disp) {
-            if (type == TCG_TYPE_I32) {
-                tcg_out_insn(s, RIL, LRL, dest, disp);
-            } else {
-                tcg_out_insn(s, RIL, LGRL, dest, disp);
-            }
-            return;
-        }
-    }
-
-    tcg_out_movi(s, TCG_TYPE_PTR, dest, addr & ~0xffff);
-    tcg_out_ld(s, type, dest, dest, addr & 0xffff);
-}
-
 static inline void tcg_out_risbg(TCGContext *s, TCGReg dest, TCGReg src,
                                  int msb, int lsb, int ofs, int z)
 {
@@ -2037,24 +2015,16 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     case INDEX_op_goto_tb:
         a0 = args[0];
-        if (s->tb_jmp_insn_offset) {
-            /*
-             * branch displacement must be aligned for atomic patching;
-             * see if we need to add extra nop before branch
-             */
-            if (!QEMU_PTR_IS_ALIGNED(s->code_ptr + 1, 4)) {
-                tcg_out16(s, NOP);
-            }
-            tcg_out16(s, RIL_BRCL | (S390_CC_ALWAYS << 4));
-            s->tb_jmp_insn_offset[a0] = tcg_current_code_size(s);
-            s->code_ptr += 2;
-        } else {
-            /* load address stored at s->tb_jmp_target_addr + a0 */
-            tcg_out_ld_abs(s, TCG_TYPE_PTR, TCG_TMP0,
-                           tcg_splitwx_to_rx(s->tb_jmp_target_addr + a0));
-            /* and go there */
-            tcg_out_insn(s, RR, BCR, S390_CC_ALWAYS, TCG_TMP0);
+        /*
+         * branch displacement must be aligned for atomic patching;
+         * see if we need to add extra nop before branch
+         */
+        if (!QEMU_PTR_IS_ALIGNED(s->code_ptr + 1, 4)) {
+            tcg_out16(s, NOP);
         }
+        tcg_out16(s, RIL_BRCL | (S390_CC_ALWAYS << 4));
+        s->tb_jmp_insn_offset[a0] = tcg_current_code_size(s);
+        s->code_ptr += 2;
         set_jmp_reset_offset(s, a0);
         break;
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 04/27] tcg/s390x: Remove USE_LONG_BRANCHES
  2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
                   ` (2 preceding siblings ...)
  2022-12-09  2:05 ` [PATCH v4 03/27] tcg/s390x: Always set TCG_TARGET_HAS_direct_jump Richard Henderson
@ 2022-12-09  2:05 ` Richard Henderson
  2022-12-12 21:52   ` Ilya Leoshkevich
  2022-12-09  2:05 ` [PATCH v4 05/27] tcg/s390x: Check for long-displacement facility at startup Richard Henderson
                   ` (23 subsequent siblings)
  27 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2022-12-09  2:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, iii

The size of a compiled TB is limited by the uint16_t used by
gen_insn_end_off[] -- there is no need for a 32-bit branch.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target.c.inc | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index 2cdd0d7a92..dea889ffa1 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -33,11 +33,6 @@
 #include "../tcg-pool.c.inc"
 #include "elf.h"
 
-/* ??? The translation blocks produced by TCG are generally small enough to
-   be entirely reachable with a 16-bit displacement.  Leaving the option for
-   a 32-bit displacement here Just In Case.  */
-#define USE_LONG_BRANCHES 0
-
 #define TCG_CT_CONST_S16   0x100
 #define TCG_CT_CONST_S32   0x200
 #define TCG_CT_CONST_S33   0x400
@@ -1525,10 +1520,6 @@ static void tgen_branch(TCGContext *s, int cc, TCGLabel *l)
 {
     if (l->has_value) {
         tgen_gotoi(s, cc, l->u.value_ptr);
-    } else if (USE_LONG_BRANCHES) {
-        tcg_out16(s, RIL_BRCL | (cc << 4));
-        tcg_out_reloc(s, s->code_ptr, R_390_PC32DBL, l, 2);
-        s->code_ptr += 2;
     } else {
         tcg_out16(s, RI_BRC | (cc << 4));
         tcg_out_reloc(s, s->code_ptr, R_390_PC16DBL, l, 2);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 05/27] tcg/s390x: Check for long-displacement facility at startup
  2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
                   ` (3 preceding siblings ...)
  2022-12-09  2:05 ` [PATCH v4 04/27] tcg/s390x: Remove USE_LONG_BRANCHES Richard Henderson
@ 2022-12-09  2:05 ` Richard Henderson
  2022-12-12 21:54   ` Ilya Leoshkevich
  2022-12-09  2:05 ` [PATCH v4 06/27] tcg/s390x: Check for extended-immediate " Richard Henderson
                   ` (22 subsequent siblings)
  27 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2022-12-09  2:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, iii

We are already assuming the existance of long-displacement, but were
not being explicit about it.  This has been present since z990.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target.h     |  6 ++++--
 tcg/s390x/tcg-target.c.inc | 15 +++++++++++++++
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index 645f522058..7f230ed243 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -52,11 +52,13 @@ typedef enum TCGReg {
 
 #define TCG_TARGET_NB_REGS 64
 
-/* A list of relevant facilities used by this translator.  Some of these
-   are required for proper operation, and these are checked at startup.  */
+/* Facilities required for proper operation; checked at startup. */
 
 #define FACILITY_ZARCH_ACTIVE         2
 #define FACILITY_LONG_DISP            18
+
+/* Facilities that are checked at runtime. */
+
 #define FACILITY_EXT_IMM              21
 #define FACILITY_GEN_INST_EXT         34
 #define FACILITY_LOAD_ON_COND         45
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index dea889ffa1..1fcefba7ba 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -3211,6 +3211,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
 static void query_s390_facilities(void)
 {
     unsigned long hwcap = qemu_getauxval(AT_HWCAP);
+    const char *which;
 
     /* Is STORE FACILITY LIST EXTENDED available?  Honestly, I believe this
        is present on all 64-bit systems, but let's check for it anyway.  */
@@ -3232,6 +3233,20 @@ static void query_s390_facilities(void)
     if (!(hwcap & HWCAP_S390_VXRS)) {
         s390_facilities[2] = 0;
     }
+
+    /*
+     * Check for all required facilities.
+     * ZARCH_ACTIVE is done via preprocessor check for 64-bit.
+     */
+    if (!HAVE_FACILITY(LONG_DISP)) {
+        which = "long-displacement";
+        goto fail;
+    }
+    return;
+
+ fail:
+    error_report("%s: missing required facility %s", __func__, which);
+    exit(EXIT_FAILURE);
 }
 
 static void tcg_target_init(TCGContext *s)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 06/27] tcg/s390x: Check for extended-immediate facility at startup
  2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
                   ` (4 preceding siblings ...)
  2022-12-09  2:05 ` [PATCH v4 05/27] tcg/s390x: Check for long-displacement facility at startup Richard Henderson
@ 2022-12-09  2:05 ` Richard Henderson
  2022-12-12 22:17   ` Ilya Leoshkevich
  2022-12-09  2:05 ` [PATCH v4 07/27] tcg/s390x: Check for general-instruction-extension " Richard Henderson
                   ` (21 subsequent siblings)
  27 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2022-12-09  2:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, iii

The extended-immediate facility was introduced in z9-109,
which itself was end-of-life in 2017.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target.h     |   4 +-
 tcg/s390x/tcg-target.c.inc | 231 +++++++++++--------------------------
 2 files changed, 72 insertions(+), 163 deletions(-)

diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index 7f230ed243..126ba1048a 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -56,10 +56,10 @@ typedef enum TCGReg {
 
 #define FACILITY_ZARCH_ACTIVE         2
 #define FACILITY_LONG_DISP            18
+#define FACILITY_EXT_IMM              21
 
 /* Facilities that are checked at runtime. */
 
-#define FACILITY_EXT_IMM              21
 #define FACILITY_GEN_INST_EXT         34
 #define FACILITY_LOAD_ON_COND         45
 #define FACILITY_FAST_BCR_SER         FACILITY_LOAD_ON_COND
@@ -126,7 +126,7 @@ extern uint64_t s390_facilities[3];
 #define TCG_TARGET_HAS_eqv_i64        0
 #define TCG_TARGET_HAS_nand_i64       0
 #define TCG_TARGET_HAS_nor_i64        0
-#define TCG_TARGET_HAS_clz_i64        HAVE_FACILITY(EXT_IMM)
+#define TCG_TARGET_HAS_clz_i64        1
 #define TCG_TARGET_HAS_ctz_i64        0
 #define TCG_TARGET_HAS_ctpop_i64      0
 #define TCG_TARGET_HAS_deposit_i64    HAVE_FACILITY(GEN_INST_EXT)
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index 1fcefba7ba..42e161cc7e 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -819,19 +819,17 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
     }
 
     /* Try all 48-bit insns that can load it in one go.  */
-    if (HAVE_FACILITY(EXT_IMM)) {
-        if (sval == (int32_t)sval) {
-            tcg_out_insn(s, RIL, LGFI, ret, sval);
-            return;
-        }
-        if (uval <= 0xffffffff) {
-            tcg_out_insn(s, RIL, LLILF, ret, uval);
-            return;
-        }
-        if ((uval & 0xffffffff) == 0) {
-            tcg_out_insn(s, RIL, LLIHF, ret, uval >> 32);
-            return;
-        }
+    if (sval == (int32_t)sval) {
+        tcg_out_insn(s, RIL, LGFI, ret, sval);
+        return;
+    }
+    if (uval <= 0xffffffff) {
+        tcg_out_insn(s, RIL, LLILF, ret, uval);
+        return;
+    }
+    if ((uval & 0xffffffff) == 0) {
+        tcg_out_insn(s, RIL, LLIHF, ret, uval >> 32);
+        return;
     }
 
     /* Try for PC-relative address load.  For odd addresses,
@@ -844,15 +842,6 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
         }
     }
 
-    /* A 32-bit unsigned value can be loaded in 2 insns.  And given
-       that LLILL, LLIHL, LLILF above did not succeed, we know that
-       both insns are required.  */
-    if (uval <= 0xffffffff) {
-        tcg_out_insn(s, RI, LLILL, ret, uval);
-        tcg_out_insn(s, RI, IILH, ret, uval >> 16);
-        return;
-    }
-
     /* Otherwise, stuff it in the constant pool.  */
     if (HAVE_FACILITY(GEN_INST_EXT)) {
         tcg_out_insn(s, RIL, LGRL, ret, 0);
@@ -1002,82 +991,22 @@ static inline void tcg_out_risbg(TCGContext *s, TCGReg dest, TCGReg src,
 
 static void tgen_ext8s(TCGContext *s, TCGType type, TCGReg dest, TCGReg src)
 {
-    if (HAVE_FACILITY(EXT_IMM)) {
-        tcg_out_insn(s, RRE, LGBR, dest, src);
-        return;
-    }
-
-    if (type == TCG_TYPE_I32) {
-        if (dest == src) {
-            tcg_out_sh32(s, RS_SLL, dest, TCG_REG_NONE, 24);
-        } else {
-            tcg_out_sh64(s, RSY_SLLG, dest, src, TCG_REG_NONE, 24);
-        }
-        tcg_out_sh32(s, RS_SRA, dest, TCG_REG_NONE, 24);
-    } else {
-        tcg_out_sh64(s, RSY_SLLG, dest, src, TCG_REG_NONE, 56);
-        tcg_out_sh64(s, RSY_SRAG, dest, dest, TCG_REG_NONE, 56);
-    }
+    tcg_out_insn(s, RRE, LGBR, dest, src);
 }
 
 static void tgen_ext8u(TCGContext *s, TCGType type, TCGReg dest, TCGReg src)
 {
-    if (HAVE_FACILITY(EXT_IMM)) {
-        tcg_out_insn(s, RRE, LLGCR, dest, src);
-        return;
-    }
-
-    if (dest == src) {
-        tcg_out_movi(s, type, TCG_TMP0, 0xff);
-        src = TCG_TMP0;
-    } else {
-        tcg_out_movi(s, type, dest, 0xff);
-    }
-    if (type == TCG_TYPE_I32) {
-        tcg_out_insn(s, RR, NR, dest, src);
-    } else {
-        tcg_out_insn(s, RRE, NGR, dest, src);
-    }
+    tcg_out_insn(s, RRE, LLGCR, dest, src);
 }
 
 static void tgen_ext16s(TCGContext *s, TCGType type, TCGReg dest, TCGReg src)
 {
-    if (HAVE_FACILITY(EXT_IMM)) {
-        tcg_out_insn(s, RRE, LGHR, dest, src);
-        return;
-    }
-
-    if (type == TCG_TYPE_I32) {
-        if (dest == src) {
-            tcg_out_sh32(s, RS_SLL, dest, TCG_REG_NONE, 16);
-        } else {
-            tcg_out_sh64(s, RSY_SLLG, dest, src, TCG_REG_NONE, 16);
-        }
-        tcg_out_sh32(s, RS_SRA, dest, TCG_REG_NONE, 16);
-    } else {
-        tcg_out_sh64(s, RSY_SLLG, dest, src, TCG_REG_NONE, 48);
-        tcg_out_sh64(s, RSY_SRAG, dest, dest, TCG_REG_NONE, 48);
-    }
+    tcg_out_insn(s, RRE, LGHR, dest, src);
 }
 
 static void tgen_ext16u(TCGContext *s, TCGType type, TCGReg dest, TCGReg src)
 {
-    if (HAVE_FACILITY(EXT_IMM)) {
-        tcg_out_insn(s, RRE, LLGHR, dest, src);
-        return;
-    }
-
-    if (dest == src) {
-        tcg_out_movi(s, type, TCG_TMP0, 0xffff);
-        src = TCG_TMP0;
-    } else {
-        tcg_out_movi(s, type, dest, 0xffff);
-    }
-    if (type == TCG_TYPE_I32) {
-        tcg_out_insn(s, RR, NR, dest, src);
-    } else {
-        tcg_out_insn(s, RRE, NGR, dest, src);
-    }
+    tcg_out_insn(s, RRE, LLGHR, dest, src);
 }
 
 static inline void tgen_ext32s(TCGContext *s, TCGReg dest, TCGReg src)
@@ -1150,15 +1079,13 @@ static void tgen_andi(TCGContext *s, TCGType type, TCGReg dest, uint64_t val)
         tgen_ext32u(s, dest, dest);
         return;
     }
-    if (HAVE_FACILITY(EXT_IMM)) {
-        if ((val & valid) == 0xff) {
-            tgen_ext8u(s, TCG_TYPE_I64, dest, dest);
-            return;
-        }
-        if ((val & valid) == 0xffff) {
-            tgen_ext16u(s, TCG_TYPE_I64, dest, dest);
-            return;
-        }
+    if ((val & valid) == 0xff) {
+        tgen_ext8u(s, TCG_TYPE_I64, dest, dest);
+        return;
+    }
+    if ((val & valid) == 0xffff) {
+        tgen_ext16u(s, TCG_TYPE_I64, dest, dest);
+        return;
     }
 
     /* Try all 32-bit insns that can perform it in one go.  */
@@ -1171,13 +1098,11 @@ static void tgen_andi(TCGContext *s, TCGType type, TCGReg dest, uint64_t val)
     }
 
     /* Try all 48-bit insns that can perform it in one go.  */
-    if (HAVE_FACILITY(EXT_IMM)) {
-        for (i = 0; i < 2; i++) {
-            tcg_target_ulong mask = ~(0xffffffffull << i * 32);
-            if (((val | ~valid) & mask) == mask) {
-                tcg_out_insn_RIL(s, nif_insns[i], dest, val >> i * 32);
-                return;
-            }
+    for (i = 0; i < 2; i++) {
+        tcg_target_ulong mask = ~(0xffffffffull << i * 32);
+        if (((val | ~valid) & mask) == mask) {
+            tcg_out_insn_RIL(s, nif_insns[i], dest, val >> i * 32);
+            return;
         }
     }
     if (HAVE_FACILITY(GEN_INST_EXT) && risbg_mask(val)) {
@@ -1219,13 +1144,11 @@ static void tgen_ori(TCGContext *s, TCGType type, TCGReg dest, uint64_t val)
     }
 
     /* Try all 48-bit insns that can perform it in one go.  */
-    if (HAVE_FACILITY(EXT_IMM)) {
-        for (i = 0; i < 2; i++) {
-            tcg_target_ulong mask = (0xffffffffull << i * 32);
-            if ((val & mask) != 0 && (val & ~mask) == 0) {
-                tcg_out_insn_RIL(s, oif_insns[i], dest, val >> i * 32);
-                return;
-            }
+    for (i = 0; i < 2; i++) {
+        tcg_target_ulong mask = (0xffffffffull << i * 32);
+        if ((val & mask) != 0 && (val & ~mask) == 0) {
+            tcg_out_insn_RIL(s, oif_insns[i], dest, val >> i * 32);
+            return;
         }
     }
 
@@ -1239,7 +1162,6 @@ static void tgen_ori(TCGContext *s, TCGType type, TCGReg dest, uint64_t val)
         /* Perform the OR via sequential modifications to the high and
            low parts.  Do this via recursion to handle 16-bit vs 32-bit
            masks in each half.  */
-        tcg_debug_assert(HAVE_FACILITY(EXT_IMM));
         tgen_ori(s, type, dest, val & 0x00000000ffffffffull);
         tgen_ori(s, type, dest, val & 0xffffffff00000000ull);
     }
@@ -1248,15 +1170,13 @@ static void tgen_ori(TCGContext *s, TCGType type, TCGReg dest, uint64_t val)
 static void tgen_xori(TCGContext *s, TCGType type, TCGReg dest, uint64_t val)
 {
     /* Try all 48-bit insns that can perform it in one go.  */
-    if (HAVE_FACILITY(EXT_IMM)) {
-        if ((val & 0xffffffff00000000ull) == 0) {
-            tcg_out_insn(s, RIL, XILF, dest, val);
-            return;
-        }
-        if ((val & 0x00000000ffffffffull) == 0) {
-            tcg_out_insn(s, RIL, XIHF, dest, val >> 32);
-            return;
-        }
+    if ((val & 0xffffffff00000000ull) == 0) {
+        tcg_out_insn(s, RIL, XILF, dest, val);
+        return;
+    }
+    if ((val & 0x00000000ffffffffull) == 0) {
+        tcg_out_insn(s, RIL, XIHF, dest, val >> 32);
+        return;
     }
 
     if (maybe_out_small_movi(s, type, TCG_TMP0, val)) {
@@ -1267,7 +1187,6 @@ static void tgen_xori(TCGContext *s, TCGType type, TCGReg dest, uint64_t val)
         }
     } else {
         /* Perform the xor by parts.  */
-        tcg_debug_assert(HAVE_FACILITY(EXT_IMM));
         if (val & 0xffffffff) {
             tcg_out_insn(s, RIL, XILF, dest, val);
         }
@@ -1301,16 +1220,15 @@ static int tgen_cmp(TCGContext *s, TCGType type, TCGCond c, TCGReg r1,
             goto exit;
         }
 
-        if (HAVE_FACILITY(EXT_IMM)) {
-            if (type == TCG_TYPE_I32) {
-                op = (is_unsigned ? RIL_CLFI : RIL_CFI);
-                tcg_out_insn_RIL(s, op, r1, c2);
-                goto exit;
-            } else if (c2 == (is_unsigned ? (TCGArg)(uint32_t)c2 : (TCGArg)(int32_t)c2)) {
-                op = (is_unsigned ? RIL_CLGFI : RIL_CGFI);
-                tcg_out_insn_RIL(s, op, r1, c2);
-                goto exit;
-            }
+        if (type == TCG_TYPE_I32) {
+            op = (is_unsigned ? RIL_CLFI : RIL_CFI);
+            tcg_out_insn_RIL(s, op, r1, c2);
+            goto exit;
+        }
+        if (c2 == (is_unsigned ? (TCGArg)(uint32_t)c2 : (TCGArg)(int32_t)c2)) {
+            op = (is_unsigned ? RIL_CLGFI : RIL_CGFI);
+            tcg_out_insn_RIL(s, op, r1, c2);
+            goto exit;
         }
 
         /* Use the constant pool, but not for small constants.  */
@@ -1318,16 +1236,9 @@ static int tgen_cmp(TCGContext *s, TCGType type, TCGCond c, TCGReg r1,
             c2 = TCG_TMP0;
             /* fall through to reg-reg */
         } else {
-            if (type == TCG_TYPE_I32) {
-                op = (is_unsigned ? RIL_CLRL : RIL_CRL);
-                tcg_out_insn_RIL(s, op, r1, 0);
-                new_pool_label(s, (uint32_t)c2, R_390_PC32DBL,
-                               s->code_ptr - 2, 2 + 4);
-            } else {
-                op = (is_unsigned ? RIL_CLGRL : RIL_CGRL);
-                tcg_out_insn_RIL(s, op, r1, 0);
-                new_pool_label(s, c2, R_390_PC32DBL, s->code_ptr - 2, 2);
-            }
+            op = (is_unsigned ? RIL_CLGRL : RIL_CGRL);
+            tcg_out_insn_RIL(s, op, r1, 0);
+            new_pool_label(s, c2, R_390_PC32DBL, s->code_ptr - 2, 2);
             goto exit;
         }
     }
@@ -2072,10 +1983,8 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
                     tcg_out_insn(s, RI, AHI, a0, a2);
                     break;
                 }
-                if (HAVE_FACILITY(EXT_IMM)) {
-                    tcg_out_insn(s, RIL, AFI, a0, a2);
-                    break;
-                }
+                tcg_out_insn(s, RIL, AFI, a0, a2);
+                break;
             }
             tcg_out_mem(s, RX_LA, RXY_LAY, a0, a1, TCG_REG_NONE, a2);
         } else if (a0 == a1) {
@@ -2326,17 +2235,17 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
                     tcg_out_insn(s, RI, AGHI, a0, a2);
                     break;
                 }
-                if (HAVE_FACILITY(EXT_IMM)) {
-                    if (a2 == (int32_t)a2) {
-                        tcg_out_insn(s, RIL, AGFI, a0, a2);
-                        break;
-                    } else if (a2 == (uint32_t)a2) {
-                        tcg_out_insn(s, RIL, ALGFI, a0, a2);
-                        break;
-                    } else if (-a2 == (uint32_t)-a2) {
-                        tcg_out_insn(s, RIL, SLGFI, a0, -a2);
-                        break;
-                    }
+                if (a2 == (int32_t)a2) {
+                    tcg_out_insn(s, RIL, AGFI, a0, a2);
+                    break;
+                }
+                if (a2 == (uint32_t)a2) {
+                    tcg_out_insn(s, RIL, ALGFI, a0, a2);
+                    break;
+                }
+                if (-a2 == (uint32_t)-a2) {
+                    tcg_out_insn(s, RIL, SLGFI, a0, -a2);
+                    break;
                 }
             }
             tcg_out_mem(s, RX_LA, RXY_LAY, a0, a1, TCG_REG_NONE, a2);
@@ -3137,15 +3046,11 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
 
     case INDEX_op_add2_i32:
     case INDEX_op_sub2_i32:
-        return (HAVE_FACILITY(EXT_IMM)
-                ? C_O2_I4(r, r, 0, 1, ri, r)
-                : C_O2_I4(r, r, 0, 1, r, r));
+        return C_O2_I4(r, r, 0, 1, ri, r);
 
     case INDEX_op_add2_i64:
     case INDEX_op_sub2_i64:
-        return (HAVE_FACILITY(EXT_IMM)
-                ? C_O2_I4(r, r, 0, 1, rA, r)
-                : C_O2_I4(r, r, 0, 1, r, r));
+        return C_O2_I4(r, r, 0, 1, rA, r);
 
     case INDEX_op_st_vec:
         return C_O0_I2(v, r);
@@ -3242,6 +3147,10 @@ static void query_s390_facilities(void)
         which = "long-displacement";
         goto fail;
     }
+    if (!HAVE_FACILITY(EXT_IMM)) {
+        which = "extended-immediate";
+        goto fail;
+    }
     return;
 
  fail:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 07/27] tcg/s390x: Check for general-instruction-extension facility at startup
  2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
                   ` (5 preceding siblings ...)
  2022-12-09  2:05 ` [PATCH v4 06/27] tcg/s390x: Check for extended-immediate " Richard Henderson
@ 2022-12-09  2:05 ` Richard Henderson
  2022-12-12 22:21   ` Ilya Leoshkevich
  2022-12-09  2:05 ` [PATCH v4 08/27] tcg/s390x: Check for load-on-condition " Richard Henderson
                   ` (20 subsequent siblings)
  27 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2022-12-09  2:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, iii

The general-instruction-extension facility was introduced in z10,
which itself was end-of-life in 2019.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target.h     |  10 ++--
 tcg/s390x/tcg-target.c.inc | 100 ++++++++++++++++---------------------
 2 files changed, 49 insertions(+), 61 deletions(-)

diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index 126ba1048a..d47e8ba66a 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -57,10 +57,10 @@ typedef enum TCGReg {
 #define FACILITY_ZARCH_ACTIVE         2
 #define FACILITY_LONG_DISP            18
 #define FACILITY_EXT_IMM              21
+#define FACILITY_GEN_INST_EXT         34
 
 /* Facilities that are checked at runtime. */
 
-#define FACILITY_GEN_INST_EXT         34
 #define FACILITY_LOAD_ON_COND         45
 #define FACILITY_FAST_BCR_SER         FACILITY_LOAD_ON_COND
 #define FACILITY_DISTINCT_OPS         FACILITY_LOAD_ON_COND
@@ -92,8 +92,8 @@ extern uint64_t s390_facilities[3];
 #define TCG_TARGET_HAS_clz_i32        0
 #define TCG_TARGET_HAS_ctz_i32        0
 #define TCG_TARGET_HAS_ctpop_i32      0
-#define TCG_TARGET_HAS_deposit_i32    HAVE_FACILITY(GEN_INST_EXT)
-#define TCG_TARGET_HAS_extract_i32    HAVE_FACILITY(GEN_INST_EXT)
+#define TCG_TARGET_HAS_deposit_i32    1
+#define TCG_TARGET_HAS_extract_i32    1
 #define TCG_TARGET_HAS_sextract_i32   0
 #define TCG_TARGET_HAS_extract2_i32   0
 #define TCG_TARGET_HAS_movcond_i32    1
@@ -129,8 +129,8 @@ extern uint64_t s390_facilities[3];
 #define TCG_TARGET_HAS_clz_i64        1
 #define TCG_TARGET_HAS_ctz_i64        0
 #define TCG_TARGET_HAS_ctpop_i64      0
-#define TCG_TARGET_HAS_deposit_i64    HAVE_FACILITY(GEN_INST_EXT)
-#define TCG_TARGET_HAS_extract_i64    HAVE_FACILITY(GEN_INST_EXT)
+#define TCG_TARGET_HAS_deposit_i64    1
+#define TCG_TARGET_HAS_extract_i64    1
 #define TCG_TARGET_HAS_sextract_i64   0
 #define TCG_TARGET_HAS_extract2_i64   0
 #define TCG_TARGET_HAS_movcond_i64    1
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index 42e161cc7e..f0b581293c 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -843,15 +843,8 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
     }
 
     /* Otherwise, stuff it in the constant pool.  */
-    if (HAVE_FACILITY(GEN_INST_EXT)) {
-        tcg_out_insn(s, RIL, LGRL, ret, 0);
-        new_pool_label(s, sval, R_390_PC32DBL, s->code_ptr - 2, 2);
-    } else {
-        TCGReg base = ret ? ret : TCG_TMP0;
-        tcg_out_insn(s, RIL, LARL, base, 0);
-        new_pool_label(s, sval, R_390_PC32DBL, s->code_ptr - 2, 2);
-        tcg_out_insn(s, RXY, LG, ret, base, TCG_REG_NONE, 0);
-    }
+    tcg_out_insn(s, RIL, LGRL, ret, 0);
+    new_pool_label(s, sval, R_390_PC32DBL, s->code_ptr - 2, 2);
 }
 
 /* Emit a load/store type instruction.  Inputs are:
@@ -1105,7 +1098,7 @@ static void tgen_andi(TCGContext *s, TCGType type, TCGReg dest, uint64_t val)
             return;
         }
     }
-    if (HAVE_FACILITY(GEN_INST_EXT) && risbg_mask(val)) {
+    if (risbg_mask(val)) {
         tgen_andi_risbg(s, dest, dest, val);
         return;
     }
@@ -1460,48 +1453,47 @@ static void tgen_brcond(TCGContext *s, TCGType type, TCGCond c,
                         TCGReg r1, TCGArg c2, int c2const, TCGLabel *l)
 {
     int cc;
+    bool is_unsigned = is_unsigned_cond(c);
+    bool in_range;
+    S390Opcode opc;
 
-    if (HAVE_FACILITY(GEN_INST_EXT)) {
-        bool is_unsigned = is_unsigned_cond(c);
-        bool in_range;
-        S390Opcode opc;
+    cc = tcg_cond_to_s390_cond[c];
 
-        cc = tcg_cond_to_s390_cond[c];
+    if (!c2const) {
+        opc = (type == TCG_TYPE_I32
+               ? (is_unsigned ? RIE_CLRJ : RIE_CRJ)
+               : (is_unsigned ? RIE_CLGRJ : RIE_CGRJ));
+        tgen_compare_branch(s, opc, cc, r1, c2, l);
+        return;
+    }
 
-        if (!c2const) {
-            opc = (type == TCG_TYPE_I32
-                   ? (is_unsigned ? RIE_CLRJ : RIE_CRJ)
-                   : (is_unsigned ? RIE_CLGRJ : RIE_CGRJ));
-            tgen_compare_branch(s, opc, cc, r1, c2, l);
-            return;
-        }
-
-        /* COMPARE IMMEDIATE AND BRANCH RELATIVE has an 8-bit immediate field.
-           If the immediate we've been given does not fit that range, we'll
-           fall back to separate compare and branch instructions using the
-           larger comparison range afforded by COMPARE IMMEDIATE.  */
-        if (type == TCG_TYPE_I32) {
-            if (is_unsigned) {
-                opc = RIE_CLIJ;
-                in_range = (uint32_t)c2 == (uint8_t)c2;
-            } else {
-                opc = RIE_CIJ;
-                in_range = (int32_t)c2 == (int8_t)c2;
-            }
+    /*
+     * COMPARE IMMEDIATE AND BRANCH RELATIVE has an 8-bit immediate field.
+     * If the immediate we've been given does not fit that range, we'll
+     * fall back to separate compare and branch instructions using the
+     * larger comparison range afforded by COMPARE IMMEDIATE.
+     */
+    if (type == TCG_TYPE_I32) {
+        if (is_unsigned) {
+            opc = RIE_CLIJ;
+            in_range = (uint32_t)c2 == (uint8_t)c2;
         } else {
-            if (is_unsigned) {
-                opc = RIE_CLGIJ;
-                in_range = (uint64_t)c2 == (uint8_t)c2;
-            } else {
-                opc = RIE_CGIJ;
-                in_range = (int64_t)c2 == (int8_t)c2;
-            }
+            opc = RIE_CIJ;
+            in_range = (int32_t)c2 == (int8_t)c2;
         }
-        if (in_range) {
-            tgen_compare_imm_branch(s, opc, cc, r1, c2, l);
-            return;
+    } else {
+        if (is_unsigned) {
+            opc = RIE_CLGIJ;
+            in_range = (uint64_t)c2 == (uint8_t)c2;
+        } else {
+            opc = RIE_CGIJ;
+            in_range = (int64_t)c2 == (int8_t)c2;
         }
     }
+    if (in_range) {
+        tgen_compare_imm_branch(s, opc, cc, r1, c2, l);
+        return;
+    }
 
     cc = tgen_cmp(s, type, c, r1, c2, c2const, false);
     tgen_branch(s, cc, l);
@@ -1659,7 +1651,7 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, MemOp opc,
        cross pages using the address of the last byte of the access.  */
     a_off = (a_bits >= s_bits ? 0 : s_mask - a_mask);
     tlb_mask = (uint64_t)TARGET_PAGE_MASK | a_mask;
-    if (HAVE_FACILITY(GEN_INST_EXT) && a_off == 0) {
+    if (a_off == 0) {
         tgen_andi_risbg(s, TCG_REG_R3, addr_reg, tlb_mask);
     } else {
         tcg_out_insn(s, RX, LA, TCG_REG_R3, addr_reg, TCG_REG_NONE, a_off);
@@ -2972,17 +2964,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
                 : C_O1_I2(r, 0, ri));
 
     case INDEX_op_mul_i32:
-        /* If we have the general-instruction-extensions, then we have
-           MULTIPLY SINGLE IMMEDIATE with a signed 32-bit, otherwise we
-           have only MULTIPLY HALFWORD IMMEDIATE, with a signed 16-bit.  */
-        return (HAVE_FACILITY(GEN_INST_EXT)
-                ? C_O1_I2(r, 0, ri)
-                : C_O1_I2(r, 0, rI));
-
+        return C_O1_I2(r, 0, ri);
     case INDEX_op_mul_i64:
-        return (HAVE_FACILITY(GEN_INST_EXT)
-                ? C_O1_I2(r, 0, rJ)
-                : C_O1_I2(r, 0, rI));
+        return C_O1_I2(r, 0, rJ);
 
     case INDEX_op_shl_i32:
     case INDEX_op_shr_i32:
@@ -3151,6 +3135,10 @@ static void query_s390_facilities(void)
         which = "extended-immediate";
         goto fail;
     }
+    if (!HAVE_FACILITY(GEN_INST_EXT)) {
+        which = "general-instructions-extension";
+        goto fail;
+    }
     return;
 
  fail:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 08/27] tcg/s390x: Check for load-on-condition facility at startup
  2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
                   ` (6 preceding siblings ...)
  2022-12-09  2:05 ` [PATCH v4 07/27] tcg/s390x: Check for general-instruction-extension " Richard Henderson
@ 2022-12-09  2:05 ` Richard Henderson
  2022-12-12 22:26   ` Ilya Leoshkevich
  2022-12-09  2:05 ` [PATCH v4 09/27] tcg/s390x: Remove FAST_BCR_SER facility check Richard Henderson
                   ` (19 subsequent siblings)
  27 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2022-12-09  2:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, iii

The general-instruction-extension facility was introduced in z196,
which itself was end-of-life in 2021.  In addition, z196 is the
minimum CPU supported by our set of supported operating systems:
RHEL 7 (z196), SLES 12 (z196) and Ubuntu 16.04 (zEC12).

Check for facility number 45, which will be the consilidated check
for several facilities.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target.h     |  6 ++--
 tcg/s390x/tcg-target.c.inc | 72 +++++++++++++-------------------------
 2 files changed, 27 insertions(+), 51 deletions(-)

diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index d47e8ba66a..31d5510d2d 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -58,12 +58,12 @@ typedef enum TCGReg {
 #define FACILITY_LONG_DISP            18
 #define FACILITY_EXT_IMM              21
 #define FACILITY_GEN_INST_EXT         34
+#define FACILITY_45                   45
 
 /* Facilities that are checked at runtime. */
 
-#define FACILITY_LOAD_ON_COND         45
-#define FACILITY_FAST_BCR_SER         FACILITY_LOAD_ON_COND
-#define FACILITY_DISTINCT_OPS         FACILITY_LOAD_ON_COND
+#define FACILITY_FAST_BCR_SER         45
+#define FACILITY_DISTINCT_OPS         45
 #define FACILITY_LOAD_ON_COND2        53
 #define FACILITY_VECTOR               129
 #define FACILITY_VECTOR_ENH1          135
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index f0b581293c..29a64ad0fe 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -1252,7 +1252,6 @@ static void tgen_setcond(TCGContext *s, TCGType type, TCGCond cond,
                          TCGReg dest, TCGReg c1, TCGArg c2, int c2const)
 {
     int cc;
-    bool have_loc;
 
     /* With LOC2, we can always emit the minimum 3 insns.  */
     if (HAVE_FACILITY(LOAD_ON_COND2)) {
@@ -1263,9 +1262,6 @@ static void tgen_setcond(TCGContext *s, TCGType type, TCGCond cond,
         return;
     }
 
-    have_loc = HAVE_FACILITY(LOAD_ON_COND);
-
-    /* For HAVE_LOC, only the paths through GTU/GT/LEU/LE are smaller.  */
  restart:
     switch (cond) {
     case TCG_COND_NE:
@@ -1310,59 +1306,35 @@ static void tgen_setcond(TCGContext *s, TCGType type, TCGCond cond,
     case TCG_COND_LT:
     case TCG_COND_GE:
         /* Swap operands so that we can use LEU/GTU/GT/LE.  */
-        if (c2const) {
-            if (have_loc) {
-                break;
-            }
-            tcg_out_movi(s, type, TCG_TMP0, c2);
-            c2 = c1;
-            c2const = 0;
-            c1 = TCG_TMP0;
-        } else {
+        if (!c2const) {
             TCGReg t = c1;
             c1 = c2;
             c2 = t;
+            cond = tcg_swap_cond(cond);
+            goto restart;
         }
-        cond = tcg_swap_cond(cond);
-        goto restart;
+        break;
 
     default:
         g_assert_not_reached();
     }
 
     cc = tgen_cmp(s, type, cond, c1, c2, c2const, false);
-    if (have_loc) {
-        /* Emit: d = 0, t = 1, d = (cc ? t : d).  */
-        tcg_out_movi(s, TCG_TYPE_I64, dest, 0);
-        tcg_out_movi(s, TCG_TYPE_I64, TCG_TMP0, 1);
-        tcg_out_insn(s, RRF, LOCGR, dest, TCG_TMP0, cc);
-    } else {
-        /* Emit: d = 1; if (cc) goto over; d = 0; over:  */
-        tcg_out_movi(s, type, dest, 1);
-        tcg_out_insn(s, RI, BRC, cc, (4 + 4) >> 1);
-        tcg_out_movi(s, type, dest, 0);
-    }
+    /* Emit: d = 0, t = 1, d = (cc ? t : d).  */
+    tcg_out_movi(s, TCG_TYPE_I64, dest, 0);
+    tcg_out_movi(s, TCG_TYPE_I64, TCG_TMP0, 1);
+    tcg_out_insn(s, RRF, LOCGR, dest, TCG_TMP0, cc);
 }
 
 static void tgen_movcond(TCGContext *s, TCGType type, TCGCond c, TCGReg dest,
                          TCGReg c1, TCGArg c2, int c2const,
                          TCGArg v3, int v3const)
 {
-    int cc;
-    if (HAVE_FACILITY(LOAD_ON_COND)) {
-        cc = tgen_cmp(s, type, c, c1, c2, c2const, false);
-        if (v3const) {
-            tcg_out_insn(s, RIE, LOCGHI, dest, v3, cc);
-        } else {
-            tcg_out_insn(s, RRF, LOCGR, dest, v3, cc);
-        }
+    int cc = tgen_cmp(s, type, c, c1, c2, c2const, false);
+    if (v3const) {
+        tcg_out_insn(s, RIE, LOCGHI, dest, v3, cc);
     } else {
-        c = tcg_invert_cond(c);
-        cc = tgen_cmp(s, type, c, c1, c2, c2const, false);
-
-        /* Emit: if (cc) goto over; dest = r3; over:  */
-        tcg_out_insn(s, RI, BRC, cc, (4 + 4) >> 1);
-        tcg_out_insn(s, RRE, LGR, dest, v3);
+        tcg_out_insn(s, RRF, LOCGR, dest, v3, cc);
     }
 }
 
@@ -1382,14 +1354,8 @@ static void tgen_clz(TCGContext *s, TCGReg dest, TCGReg a1,
         } else {
             tcg_out_mov(s, TCG_TYPE_I64, dest, a2);
         }
-        if (HAVE_FACILITY(LOAD_ON_COND)) {
-            /* Emit: if (one bit found) dest = r0.  */
-            tcg_out_insn(s, RRF, LOCGR, dest, TCG_REG_R0, 2);
-        } else {
-            /* Emit: if (no one bit found) goto over; dest = r0; over:  */
-            tcg_out_insn(s, RI, BRC, 8, (4 + 4) >> 1);
-            tcg_out_insn(s, RRE, LGR, dest, TCG_REG_R0);
-        }
+        /* Emit: if (one bit found) dest = r0.  */
+        tcg_out_insn(s, RRF, LOCGR, dest, TCG_REG_R0, 2);
     }
 }
 
@@ -3124,6 +3090,7 @@ static void query_s390_facilities(void)
     }
 
     /*
+     * Minimum supported cpu revision is z196.
      * Check for all required facilities.
      * ZARCH_ACTIVE is done via preprocessor check for 64-bit.
      */
@@ -3139,6 +3106,15 @@ static void query_s390_facilities(void)
         which = "general-instructions-extension";
         goto fail;
     }
+    /*
+     * Facility 45 is a big bin that contains: distinct-operands,
+     * fast-BCR-serialization, high-word, population-count,
+     * interlocked-access-1, and load/store-on-condition-1
+     */
+    if (!HAVE_FACILITY(45)) {
+        which = "45";
+        goto fail;
+    }
     return;
 
  fail:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 09/27] tcg/s390x: Remove FAST_BCR_SER facility check
  2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
                   ` (7 preceding siblings ...)
  2022-12-09  2:05 ` [PATCH v4 08/27] tcg/s390x: Check for load-on-condition " Richard Henderson
@ 2022-12-09  2:05 ` Richard Henderson
  2022-12-12 22:08   ` Philippe Mathieu-Daudé
  2022-12-09  2:05 ` [PATCH v4 10/27] tcg/s390x: Remove DISTINCT_OPERANDS " Richard Henderson
                   ` (18 subsequent siblings)
  27 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2022-12-09  2:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, iii

The fast-bcr-serialization facility is bundled into facility 45,
along with load-on-condition.  We are checking this at startup.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target.h     | 1 -
 tcg/s390x/tcg-target.c.inc | 3 ++-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index 31d5510d2d..fc9ae82700 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -62,7 +62,6 @@ typedef enum TCGReg {
 
 /* Facilities that are checked at runtime. */
 
-#define FACILITY_FAST_BCR_SER         45
 #define FACILITY_DISTINCT_OPS         45
 #define FACILITY_LOAD_ON_COND2        53
 #define FACILITY_VECTOR               129
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index 29a64ad0fe..dd58f0cdb5 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -2431,7 +2431,8 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         /* The host memory model is quite strong, we simply need to
            serialize the instruction stream.  */
         if (args[0] & TCG_MO_ST_LD) {
-            tcg_out_insn(s, RR, BCR, HAVE_FACILITY(FAST_BCR_SER) ? 14 : 15, 0);
+            /* fast-bcr-serialization facility (45) is present */
+            tcg_out_insn(s, RR, BCR, 14, 0);
         }
         break;
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 10/27] tcg/s390x: Remove DISTINCT_OPERANDS facility check
  2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
                   ` (8 preceding siblings ...)
  2022-12-09  2:05 ` [PATCH v4 09/27] tcg/s390x: Remove FAST_BCR_SER facility check Richard Henderson
@ 2022-12-09  2:05 ` Richard Henderson
  2022-12-12 22:29   ` Ilya Leoshkevich
  2022-12-09  2:05 ` [PATCH v4 11/27] tcg/s390x: Use LARL+AGHI for odd addresses Richard Henderson
                   ` (17 subsequent siblings)
  27 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2022-12-09  2:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, iii

The distinct-operands facility is bundled into facility 45,
along with load-on-condition.  We are checking this at startup.
Remove the a0 == a1 checks for 64-bit sub, and, or, xor, as there
is no space savings for avoiding the distinct-operands insn.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target.h     |  1 -
 tcg/s390x/tcg-target.c.inc | 16 ++--------------
 2 files changed, 2 insertions(+), 15 deletions(-)

diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index fc9ae82700..db10a39381 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -62,7 +62,6 @@ typedef enum TCGReg {
 
 /* Facilities that are checked at runtime. */
 
-#define FACILITY_DISTINCT_OPS         45
 #define FACILITY_LOAD_ON_COND2        53
 #define FACILITY_VECTOR               129
 #define FACILITY_VECTOR_ENH1          135
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index dd58f0cdb5..e4403ffabf 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -2218,8 +2218,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         if (const_args[2]) {
             a2 = -a2;
             goto do_addi_64;
-        } else if (a0 == a1) {
-            tcg_out_insn(s, RRE, SGR, a0, a2);
         } else {
             tcg_out_insn(s, RRF, SGRK, a0, a1, a2);
         }
@@ -2230,8 +2228,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         if (const_args[2]) {
             tcg_out_mov(s, TCG_TYPE_I64, a0, a1);
             tgen_andi(s, TCG_TYPE_I64, args[0], args[2]);
-        } else if (a0 == a1) {
-            tcg_out_insn(s, RRE, NGR, args[0], args[2]);
         } else {
             tcg_out_insn(s, RRF, NGRK, a0, a1, a2);
         }
@@ -2241,8 +2237,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         if (const_args[2]) {
             tcg_out_mov(s, TCG_TYPE_I64, a0, a1);
             tgen_ori(s, TCG_TYPE_I64, a0, a2);
-        } else if (a0 == a1) {
-            tcg_out_insn(s, RRE, OGR, a0, a2);
         } else {
             tcg_out_insn(s, RRF, OGRK, a0, a1, a2);
         }
@@ -2252,8 +2246,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         if (const_args[2]) {
             tcg_out_mov(s, TCG_TYPE_I64, a0, a1);
             tgen_xori(s, TCG_TYPE_I64, a0, a2);
-        } else if (a0 == a1) {
-            tcg_out_insn(s, RRE, XGR, a0, a2);
         } else {
             tcg_out_insn(s, RRF, XGRK, a0, a1, a2);
         }
@@ -2926,9 +2918,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_or_i64:
     case INDEX_op_xor_i32:
     case INDEX_op_xor_i64:
-        return (HAVE_FACILITY(DISTINCT_OPS)
-                ? C_O1_I2(r, r, ri)
-                : C_O1_I2(r, 0, ri));
+        return C_O1_I2(r, r, ri);
 
     case INDEX_op_mul_i32:
         return C_O1_I2(r, 0, ri);
@@ -2938,9 +2928,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_shl_i32:
     case INDEX_op_shr_i32:
     case INDEX_op_sar_i32:
-        return (HAVE_FACILITY(DISTINCT_OPS)
-                ? C_O1_I2(r, r, ri)
-                : C_O1_I2(r, 0, ri));
+        return C_O1_I2(r, r, ri);
 
     case INDEX_op_brcond_i32:
     case INDEX_op_brcond_i64:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 11/27] tcg/s390x: Use LARL+AGHI for odd addresses
  2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
                   ` (9 preceding siblings ...)
  2022-12-09  2:05 ` [PATCH v4 10/27] tcg/s390x: Remove DISTINCT_OPERANDS " Richard Henderson
@ 2022-12-09  2:05 ` Richard Henderson
  2022-12-09  2:05 ` [PATCH v4 12/27] tcg/s390x: Distinguish RRF-a and RRF-c formats Richard Henderson
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 45+ messages in thread
From: Richard Henderson @ 2022-12-09  2:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, iii

Add one instead of dropping odd addresses to the constant pool.

Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target.c.inc | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index e4403ffabf..6cf07152a5 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -806,6 +806,7 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
                          TCGReg ret, tcg_target_long sval)
 {
     tcg_target_ulong uval;
+    ptrdiff_t pc_off;
 
     /* Try all 32-bit insns that can load it in one go.  */
     if (maybe_out_small_movi(s, type, ret, sval)) {
@@ -832,14 +833,14 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
         return;
     }
 
-    /* Try for PC-relative address load.  For odd addresses,
-       attempt to use an offset from the start of the TB.  */
-    if ((sval & 1) == 0) {
-        ptrdiff_t off = tcg_pcrel_diff(s, (void *)sval) >> 1;
-        if (off == (int32_t)off) {
-            tcg_out_insn(s, RIL, LARL, ret, off);
-            return;
+    /* Try for PC-relative address load.  For odd addresses, add one. */
+    pc_off = tcg_pcrel_diff(s, (void *)sval) >> 1;
+    if (pc_off == (int32_t)pc_off) {
+        tcg_out_insn(s, RIL, LARL, ret, pc_off);
+        if (sval & 1) {
+            tcg_out_insn(s, RI, AGHI, ret, 1);
         }
+        return;
     }
 
     /* Otherwise, stuff it in the constant pool.  */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 12/27] tcg/s390x: Distinguish RRF-a and RRF-c formats
  2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
                   ` (10 preceding siblings ...)
  2022-12-09  2:05 ` [PATCH v4 11/27] tcg/s390x: Use LARL+AGHI for odd addresses Richard Henderson
@ 2022-12-09  2:05 ` Richard Henderson
  2022-12-09  2:05 ` [PATCH v4 13/27] tcg/s390x: Distinguish RIE formats Richard Henderson
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 45+ messages in thread
From: Richard Henderson @ 2022-12-09  2:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, iii

One has 3 register arguments; the other has 2 plus an m3 field.

Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target.c.inc | 57 +++++++++++++++++++++-----------------
 1 file changed, 32 insertions(+), 25 deletions(-)

diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index 6cf07152a5..d38a602dd9 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -172,18 +172,19 @@ typedef enum S390Opcode {
     RRE_SLBGR   = 0xb989,
     RRE_XGR     = 0xb982,
 
-    RRF_LOCR    = 0xb9f2,
-    RRF_LOCGR   = 0xb9e2,
-    RRF_NRK     = 0xb9f4,
-    RRF_NGRK    = 0xb9e4,
-    RRF_ORK     = 0xb9f6,
-    RRF_OGRK    = 0xb9e6,
-    RRF_SRK     = 0xb9f9,
-    RRF_SGRK    = 0xb9e9,
-    RRF_SLRK    = 0xb9fb,
-    RRF_SLGRK   = 0xb9eb,
-    RRF_XRK     = 0xb9f7,
-    RRF_XGRK    = 0xb9e7,
+    RRFa_NRK    = 0xb9f4,
+    RRFa_NGRK   = 0xb9e4,
+    RRFa_ORK    = 0xb9f6,
+    RRFa_OGRK   = 0xb9e6,
+    RRFa_SRK    = 0xb9f9,
+    RRFa_SGRK   = 0xb9e9,
+    RRFa_SLRK   = 0xb9fb,
+    RRFa_SLGRK  = 0xb9eb,
+    RRFa_XRK    = 0xb9f7,
+    RRFa_XGRK   = 0xb9e7,
+
+    RRFc_LOCR   = 0xb9f2,
+    RRFc_LOCGR  = 0xb9e2,
 
     RR_AR       = 0x1a,
     RR_ALR      = 0x1e,
@@ -538,8 +539,14 @@ static void tcg_out_insn_RRE(TCGContext *s, S390Opcode op,
     tcg_out32(s, (op << 16) | (r1 << 4) | r2);
 }
 
-static void tcg_out_insn_RRF(TCGContext *s, S390Opcode op,
-                             TCGReg r1, TCGReg r2, int m3)
+static void tcg_out_insn_RRFa(TCGContext *s, S390Opcode op,
+                              TCGReg r1, TCGReg r2, TCGReg r3)
+{
+    tcg_out32(s, (op << 16) | (r3 << 12) | (r1 << 4) | r2);
+}
+
+static void tcg_out_insn_RRFc(TCGContext *s, S390Opcode op,
+                              TCGReg r1, TCGReg r2, int m3)
 {
     tcg_out32(s, (op << 16) | (m3 << 12) | (r1 << 4) | r2);
 }
@@ -1324,7 +1331,7 @@ static void tgen_setcond(TCGContext *s, TCGType type, TCGCond cond,
     /* Emit: d = 0, t = 1, d = (cc ? t : d).  */
     tcg_out_movi(s, TCG_TYPE_I64, dest, 0);
     tcg_out_movi(s, TCG_TYPE_I64, TCG_TMP0, 1);
-    tcg_out_insn(s, RRF, LOCGR, dest, TCG_TMP0, cc);
+    tcg_out_insn(s, RRFc, LOCGR, dest, TCG_TMP0, cc);
 }
 
 static void tgen_movcond(TCGContext *s, TCGType type, TCGCond c, TCGReg dest,
@@ -1335,7 +1342,7 @@ static void tgen_movcond(TCGContext *s, TCGType type, TCGCond c, TCGReg dest,
     if (v3const) {
         tcg_out_insn(s, RIE, LOCGHI, dest, v3, cc);
     } else {
-        tcg_out_insn(s, RRF, LOCGR, dest, v3, cc);
+        tcg_out_insn(s, RRFc, LOCGR, dest, v3, cc);
     }
 }
 
@@ -1356,7 +1363,7 @@ static void tgen_clz(TCGContext *s, TCGReg dest, TCGReg a1,
             tcg_out_mov(s, TCG_TYPE_I64, dest, a2);
         }
         /* Emit: if (one bit found) dest = r0.  */
-        tcg_out_insn(s, RRF, LOCGR, dest, TCG_REG_R0, 2);
+        tcg_out_insn(s, RRFc, LOCGR, dest, TCG_REG_R0, 2);
     }
 }
 
@@ -1960,7 +1967,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         } else if (a0 == a1) {
             tcg_out_insn(s, RR, SR, a0, a2);
         } else {
-            tcg_out_insn(s, RRF, SRK, a0, a1, a2);
+            tcg_out_insn(s, RRFa, SRK, a0, a1, a2);
         }
         break;
 
@@ -1972,7 +1979,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         } else if (a0 == a1) {
             tcg_out_insn(s, RR, NR, a0, a2);
         } else {
-            tcg_out_insn(s, RRF, NRK, a0, a1, a2);
+            tcg_out_insn(s, RRFa, NRK, a0, a1, a2);
         }
         break;
     case INDEX_op_or_i32:
@@ -1983,7 +1990,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         } else if (a0 == a1) {
             tcg_out_insn(s, RR, OR, a0, a2);
         } else {
-            tcg_out_insn(s, RRF, ORK, a0, a1, a2);
+            tcg_out_insn(s, RRFa, ORK, a0, a1, a2);
         }
         break;
     case INDEX_op_xor_i32:
@@ -1994,7 +2001,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         } else if (a0 == a1) {
             tcg_out_insn(s, RR, XR, args[0], args[2]);
         } else {
-            tcg_out_insn(s, RRF, XRK, a0, a1, a2);
+            tcg_out_insn(s, RRFa, XRK, a0, a1, a2);
         }
         break;
 
@@ -2220,7 +2227,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
             a2 = -a2;
             goto do_addi_64;
         } else {
-            tcg_out_insn(s, RRF, SGRK, a0, a1, a2);
+            tcg_out_insn(s, RRFa, SGRK, a0, a1, a2);
         }
         break;
 
@@ -2230,7 +2237,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
             tcg_out_mov(s, TCG_TYPE_I64, a0, a1);
             tgen_andi(s, TCG_TYPE_I64, args[0], args[2]);
         } else {
-            tcg_out_insn(s, RRF, NGRK, a0, a1, a2);
+            tcg_out_insn(s, RRFa, NGRK, a0, a1, a2);
         }
         break;
     case INDEX_op_or_i64:
@@ -2239,7 +2246,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
             tcg_out_mov(s, TCG_TYPE_I64, a0, a1);
             tgen_ori(s, TCG_TYPE_I64, a0, a2);
         } else {
-            tcg_out_insn(s, RRF, OGRK, a0, a1, a2);
+            tcg_out_insn(s, RRFa, OGRK, a0, a1, a2);
         }
         break;
     case INDEX_op_xor_i64:
@@ -2248,7 +2255,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
             tcg_out_mov(s, TCG_TYPE_I64, a0, a1);
             tgen_xori(s, TCG_TYPE_I64, a0, a2);
         } else {
-            tcg_out_insn(s, RRF, XGRK, a0, a1, a2);
+            tcg_out_insn(s, RRFa, XGRK, a0, a1, a2);
         }
         break;
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 13/27] tcg/s390x: Distinguish RIE formats
  2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
                   ` (11 preceding siblings ...)
  2022-12-09  2:05 ` [PATCH v4 12/27] tcg/s390x: Distinguish RRF-a and RRF-c formats Richard Henderson
@ 2022-12-09  2:05 ` Richard Henderson
  2022-12-09  2:05 ` [PATCH v4 14/27] tcg/s390x: Support MIE2 multiply single instructions Richard Henderson
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 45+ messages in thread
From: Richard Henderson @ 2022-12-09  2:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, iii

There are multiple variations, with different fields.

Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target.c.inc | 47 +++++++++++++++++++++-----------------
 1 file changed, 26 insertions(+), 21 deletions(-)

diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index d38a602dd9..a81a82c70b 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -128,16 +128,19 @@ typedef enum S390Opcode {
     RI_OILL     = 0xa50b,
     RI_TMLL     = 0xa701,
 
-    RIE_CGIJ    = 0xec7c,
-    RIE_CGRJ    = 0xec64,
-    RIE_CIJ     = 0xec7e,
-    RIE_CLGRJ   = 0xec65,
-    RIE_CLIJ    = 0xec7f,
-    RIE_CLGIJ   = 0xec7d,
-    RIE_CLRJ    = 0xec77,
-    RIE_CRJ     = 0xec76,
-    RIE_LOCGHI  = 0xec46,
-    RIE_RISBG   = 0xec55,
+    RIEb_CGRJ    = 0xec64,
+    RIEb_CLGRJ   = 0xec65,
+    RIEb_CLRJ    = 0xec77,
+    RIEb_CRJ     = 0xec76,
+
+    RIEc_CGIJ    = 0xec7c,
+    RIEc_CIJ     = 0xec7e,
+    RIEc_CLGIJ   = 0xec7d,
+    RIEc_CLIJ    = 0xec7f,
+
+    RIEf_RISBG   = 0xec55,
+
+    RIEg_LOCGHI  = 0xec46,
 
     RRE_AGR     = 0xb908,
     RRE_ALGR    = 0xb90a,
@@ -556,7 +559,7 @@ static void tcg_out_insn_RI(TCGContext *s, S390Opcode op, TCGReg r1, int i2)
     tcg_out32(s, (op << 16) | (r1 << 20) | (i2 & 0xffff));
 }
 
-static void tcg_out_insn_RIE(TCGContext *s, S390Opcode op, TCGReg r1,
+static void tcg_out_insn_RIEg(TCGContext *s, S390Opcode op, TCGReg r1,
                              int i2, int m3)
 {
     tcg_out16(s, (op & 0xff00) | (r1 << 4) | m3);
@@ -985,9 +988,9 @@ static inline void tcg_out_risbg(TCGContext *s, TCGReg dest, TCGReg src,
                                  int msb, int lsb, int ofs, int z)
 {
     /* Format RIE-f */
-    tcg_out16(s, (RIE_RISBG & 0xff00) | (dest << 4) | src);
+    tcg_out16(s, (RIEf_RISBG & 0xff00) | (dest << 4) | src);
     tcg_out16(s, (msb << 8) | (z << 7) | lsb);
-    tcg_out16(s, (ofs << 8) | (RIE_RISBG & 0xff));
+    tcg_out16(s, (ofs << 8) | (RIEf_RISBG & 0xff));
 }
 
 static void tgen_ext8s(TCGContext *s, TCGType type, TCGReg dest, TCGReg src)
@@ -1266,7 +1269,7 @@ static void tgen_setcond(TCGContext *s, TCGType type, TCGCond cond,
         /* Emit: d = 0, d = (cc ? 1 : d).  */
         cc = tgen_cmp(s, type, cond, c1, c2, c2const, false);
         tcg_out_movi(s, TCG_TYPE_I64, dest, 0);
-        tcg_out_insn(s, RIE, LOCGHI, dest, 1, cc);
+        tcg_out_insn(s, RIEg, LOCGHI, dest, 1, cc);
         return;
     }
 
@@ -1340,7 +1343,7 @@ static void tgen_movcond(TCGContext *s, TCGType type, TCGCond c, TCGReg dest,
 {
     int cc = tgen_cmp(s, type, c, c1, c2, c2const, false);
     if (v3const) {
-        tcg_out_insn(s, RIE, LOCGHI, dest, v3, cc);
+        tcg_out_insn(s, RIEg, LOCGHI, dest, v3, cc);
     } else {
         tcg_out_insn(s, RRFc, LOCGR, dest, v3, cc);
     }
@@ -1409,6 +1412,7 @@ static void tgen_compare_branch(TCGContext *s, S390Opcode opc, int cc,
                                 TCGReg r1, TCGReg r2, TCGLabel *l)
 {
     tcg_out_reloc(s, s->code_ptr + 1, R_390_PC16DBL, l, 2);
+    /* Format RIE-b */
     tcg_out16(s, (opc & 0xff00) | (r1 << 4) | r2);
     tcg_out16(s, 0);
     tcg_out16(s, cc << 12 | (opc & 0xff));
@@ -1418,6 +1422,7 @@ static void tgen_compare_imm_branch(TCGContext *s, S390Opcode opc, int cc,
                                     TCGReg r1, int i2, TCGLabel *l)
 {
     tcg_out_reloc(s, s->code_ptr + 1, R_390_PC16DBL, l, 2);
+    /* Format RIE-c */
     tcg_out16(s, (opc & 0xff00) | (r1 << 4) | cc);
     tcg_out16(s, 0);
     tcg_out16(s, (i2 << 8) | (opc & 0xff));
@@ -1435,8 +1440,8 @@ static void tgen_brcond(TCGContext *s, TCGType type, TCGCond c,
 
     if (!c2const) {
         opc = (type == TCG_TYPE_I32
-               ? (is_unsigned ? RIE_CLRJ : RIE_CRJ)
-               : (is_unsigned ? RIE_CLGRJ : RIE_CGRJ));
+               ? (is_unsigned ? RIEb_CLRJ : RIEb_CRJ)
+               : (is_unsigned ? RIEb_CLGRJ : RIEb_CGRJ));
         tgen_compare_branch(s, opc, cc, r1, c2, l);
         return;
     }
@@ -1449,18 +1454,18 @@ static void tgen_brcond(TCGContext *s, TCGType type, TCGCond c,
      */
     if (type == TCG_TYPE_I32) {
         if (is_unsigned) {
-            opc = RIE_CLIJ;
+            opc = RIEc_CLIJ;
             in_range = (uint32_t)c2 == (uint8_t)c2;
         } else {
-            opc = RIE_CIJ;
+            opc = RIEc_CIJ;
             in_range = (int32_t)c2 == (int8_t)c2;
         }
     } else {
         if (is_unsigned) {
-            opc = RIE_CLGIJ;
+            opc = RIEc_CLGIJ;
             in_range = (uint64_t)c2 == (uint8_t)c2;
         } else {
-            opc = RIE_CGIJ;
+            opc = RIEc_CGIJ;
             in_range = (int64_t)c2 == (int8_t)c2;
         }
     }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 14/27] tcg/s390x: Support MIE2 multiply single instructions
  2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
                   ` (12 preceding siblings ...)
  2022-12-09  2:05 ` [PATCH v4 13/27] tcg/s390x: Distinguish RIE formats Richard Henderson
@ 2022-12-09  2:05 ` Richard Henderson
  2022-12-09  2:05 ` [PATCH v4 15/27] tcg/s390x: Support MIE2 MGRK instruction Richard Henderson
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 45+ messages in thread
From: Richard Henderson @ 2022-12-09  2:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, iii

The MIE2 facility adds 3-operand versions of multiply.

Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target-con-set.h |  1 +
 tcg/s390x/tcg-target.h         |  1 +
 tcg/s390x/tcg-target.c.inc     | 34 ++++++++++++++++++++++++----------
 3 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/tcg/s390x/tcg-target-con-set.h b/tcg/s390x/tcg-target-con-set.h
index 00ba727b70..33a82e3286 100644
--- a/tcg/s390x/tcg-target-con-set.h
+++ b/tcg/s390x/tcg-target-con-set.h
@@ -23,6 +23,7 @@ C_O1_I2(r, 0, ri)
 C_O1_I2(r, 0, rI)
 C_O1_I2(r, 0, rJ)
 C_O1_I2(r, r, ri)
+C_O1_I2(r, r, rJ)
 C_O1_I2(r, rZ, r)
 C_O1_I2(v, v, r)
 C_O1_I2(v, v, v)
diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index db10a39381..1fb7b8fb1d 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -63,6 +63,7 @@ typedef enum TCGReg {
 /* Facilities that are checked at runtime. */
 
 #define FACILITY_LOAD_ON_COND2        53
+#define FACILITY_MISC_INSN_EXT2       58
 #define FACILITY_VECTOR               129
 #define FACILITY_VECTOR_ENH1          135
 
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index a81a82c70b..9634126ed1 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -175,6 +175,8 @@ typedef enum S390Opcode {
     RRE_SLBGR   = 0xb989,
     RRE_XGR     = 0xb982,
 
+    RRFa_MSRKC  = 0xb9fd,
+    RRFa_MSGRKC = 0xb9ed,
     RRFa_NRK    = 0xb9f4,
     RRFa_NGRK   = 0xb9e4,
     RRFa_ORK    = 0xb9f6,
@@ -2015,14 +2017,18 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_mul_i32:
+        a0 = args[0], a1 = args[1], a2 = (int32_t)args[2];
         if (const_args[2]) {
-            if ((int32_t)args[2] == (int16_t)args[2]) {
-                tcg_out_insn(s, RI, MHI, args[0], args[2]);
+            tcg_out_mov(s, TCG_TYPE_I32, a0, a1);
+            if (a2 == (int16_t)a2) {
+                tcg_out_insn(s, RI, MHI, a0, a2);
             } else {
-                tcg_out_insn(s, RIL, MSFI, args[0], args[2]);
+                tcg_out_insn(s, RIL, MSFI, a0, a2);
             }
+        } else if (a0 == a1) {
+            tcg_out_insn(s, RRE, MSR, a0, a2);
         } else {
-            tcg_out_insn(s, RRE, MSR, args[0], args[2]);
+            tcg_out_insn(s, RRFa, MSRKC, a0, a1, a2);
         }
         break;
 
@@ -2272,14 +2278,18 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_mul_i64:
+        a0 = args[0], a1 = args[1], a2 = args[2];
         if (const_args[2]) {
-            if (args[2] == (int16_t)args[2]) {
-                tcg_out_insn(s, RI, MGHI, args[0], args[2]);
+            tcg_out_mov(s, TCG_TYPE_I64, a0, a1);
+            if (a2 == (int16_t)a2) {
+                tcg_out_insn(s, RI, MGHI, a0, a2);
             } else {
-                tcg_out_insn(s, RIL, MSGFI, args[0], args[2]);
+                tcg_out_insn(s, RIL, MSGFI, a0, a2);
             }
+        } else if (a0 == a1) {
+            tcg_out_insn(s, RRE, MSGR, a0, a2);
         } else {
-            tcg_out_insn(s, RRE, MSGR, args[0], args[2]);
+            tcg_out_insn(s, RRFa, MSGRKC, a0, a1, a2);
         }
         break;
 
@@ -2934,9 +2944,13 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
         return C_O1_I2(r, r, ri);
 
     case INDEX_op_mul_i32:
-        return C_O1_I2(r, 0, ri);
+        return (HAVE_FACILITY(MISC_INSN_EXT2)
+                ? C_O1_I2(r, r, ri)
+                : C_O1_I2(r, 0, ri));
     case INDEX_op_mul_i64:
-        return C_O1_I2(r, 0, rJ);
+        return (HAVE_FACILITY(MISC_INSN_EXT2)
+                ? C_O1_I2(r, r, rJ)
+                : C_O1_I2(r, 0, rJ));
 
     case INDEX_op_shl_i32:
     case INDEX_op_shr_i32:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 15/27] tcg/s390x: Support MIE2 MGRK instruction
  2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
                   ` (13 preceding siblings ...)
  2022-12-09  2:05 ` [PATCH v4 14/27] tcg/s390x: Support MIE2 multiply single instructions Richard Henderson
@ 2022-12-09  2:05 ` Richard Henderson
  2022-12-09  2:05 ` [PATCH v4 16/27] tcg/s390x: Issue XILF directly for xor_i32 Richard Henderson
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 45+ messages in thread
From: Richard Henderson @ 2022-12-09  2:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, iii

The MIE2 facility adds a 3-operand signed 64x64->128 multiply.

Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target-con-set.h | 1 +
 tcg/s390x/tcg-target.h         | 2 +-
 tcg/s390x/tcg-target.c.inc     | 8 ++++++++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/tcg/s390x/tcg-target-con-set.h b/tcg/s390x/tcg-target-con-set.h
index 33a82e3286..b1a89a88ba 100644
--- a/tcg/s390x/tcg-target-con-set.h
+++ b/tcg/s390x/tcg-target-con-set.h
@@ -31,6 +31,7 @@ C_O1_I3(v, v, v, v)
 C_O1_I4(r, r, ri, r, 0)
 C_O1_I4(r, r, ri, rI, 0)
 C_O2_I2(o, m, 0, r)
+C_O2_I2(o, m, r, r)
 C_O2_I3(o, m, 0, 1, r)
 C_O2_I4(r, r, 0, 1, rA, r)
 C_O2_I4(r, r, 0, 1, ri, r)
diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index 1fb7b8fb1d..03ce11a34a 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -136,7 +136,7 @@ extern uint64_t s390_facilities[3];
 #define TCG_TARGET_HAS_add2_i64       1
 #define TCG_TARGET_HAS_sub2_i64       1
 #define TCG_TARGET_HAS_mulu2_i64      1
-#define TCG_TARGET_HAS_muls2_i64      0
+#define TCG_TARGET_HAS_muls2_i64      HAVE_FACILITY(MISC_INSN_EXT2)
 #define TCG_TARGET_HAS_muluh_i64      0
 #define TCG_TARGET_HAS_mulsh_i64      0
 
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index 9634126ed1..871fcb7683 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -175,6 +175,7 @@ typedef enum S390Opcode {
     RRE_SLBGR   = 0xb989,
     RRE_XGR     = 0xb982,
 
+    RRFa_MGRK   = 0xb9ec,
     RRFa_MSRKC  = 0xb9fd,
     RRFa_MSGRKC = 0xb9ed,
     RRFa_NRK    = 0xb9f4,
@@ -2319,6 +2320,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_debug_assert(args[0] == args[1] + 1);
         tcg_out_insn(s, RRE, MLGR, args[1], args[3]);
         break;
+    case INDEX_op_muls2_i64:
+        tcg_debug_assert((args[1] & 1) == 0);
+        tcg_debug_assert(args[0] == args[1] + 1);
+        tcg_out_insn(s, RRFa, MGRK, args[1], args[2], args[3]);
+        break;
 
     case INDEX_op_shl_i64:
         op = RSY_SLLG;
@@ -3009,6 +3015,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
 
     case INDEX_op_mulu2_i64:
         return C_O2_I2(o, m, 0, r);
+    case INDEX_op_muls2_i64:
+        return C_O2_I2(o, m, r, r);
 
     case INDEX_op_add2_i32:
     case INDEX_op_sub2_i32:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 16/27] tcg/s390x: Issue XILF directly for xor_i32
  2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
                   ` (14 preceding siblings ...)
  2022-12-09  2:05 ` [PATCH v4 15/27] tcg/s390x: Support MIE2 MGRK instruction Richard Henderson
@ 2022-12-09  2:05 ` Richard Henderson
  2022-12-12 22:30   ` Ilya Leoshkevich
  2022-12-09  2:05 ` [PATCH v4 17/27] tcg/s390x: Tighten constraints for or_i64 and xor_i64 Richard Henderson
                   ` (11 subsequent siblings)
  27 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2022-12-09  2:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, iii

There is only one instruction that is applicable
to a 32-bit immediate xor.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target.c.inc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index 871fcb7683..fc304327fc 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -2005,7 +2005,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         a0 = args[0], a1 = args[1], a2 = (uint32_t)args[2];
         if (const_args[2]) {
             tcg_out_mov(s, TCG_TYPE_I32, a0, a1);
-            tgen_xori(s, TCG_TYPE_I32, a0, a2);
+            tcg_out_insn(s, RIL, XILF, a0, a2);
         } else if (a0 == a1) {
             tcg_out_insn(s, RR, XR, args[0], args[2]);
         } else {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 17/27] tcg/s390x: Tighten constraints for or_i64 and xor_i64
  2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
                   ` (15 preceding siblings ...)
  2022-12-09  2:05 ` [PATCH v4 16/27] tcg/s390x: Issue XILF directly for xor_i32 Richard Henderson
@ 2022-12-09  2:05 ` Richard Henderson
  2022-12-12 22:41   ` Ilya Leoshkevich
  2022-12-09  2:05 ` [PATCH v4 18/27] tcg/s390x: Tighten constraints for and_i64 Richard Henderson
                   ` (10 subsequent siblings)
  27 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2022-12-09  2:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, iii

Drop support for sequential OR and XOR, as the serial dependency is
slower than loading the constant first.  Let the register allocator
handle such immediates by matching only what one insn can achieve.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target-con-set.h |   1 +
 tcg/s390x/tcg-target-con-str.h |   1 +
 tcg/s390x/tcg-target.c.inc     | 114 ++++++++++++++++-----------------
 3 files changed, 56 insertions(+), 60 deletions(-)

diff --git a/tcg/s390x/tcg-target-con-set.h b/tcg/s390x/tcg-target-con-set.h
index b1a89a88ba..34ae4c7743 100644
--- a/tcg/s390x/tcg-target-con-set.h
+++ b/tcg/s390x/tcg-target-con-set.h
@@ -24,6 +24,7 @@ C_O1_I2(r, 0, rI)
 C_O1_I2(r, 0, rJ)
 C_O1_I2(r, r, ri)
 C_O1_I2(r, r, rJ)
+C_O1_I2(r, r, rK)
 C_O1_I2(r, rZ, r)
 C_O1_I2(v, v, r)
 C_O1_I2(v, v, v)
diff --git a/tcg/s390x/tcg-target-con-str.h b/tcg/s390x/tcg-target-con-str.h
index 76446aecae..7b910d6d11 100644
--- a/tcg/s390x/tcg-target-con-str.h
+++ b/tcg/s390x/tcg-target-con-str.h
@@ -20,4 +20,5 @@ REGS('o', 0xaaaa) /* odd numbered general regs */
 CONST('A', TCG_CT_CONST_S33)
 CONST('I', TCG_CT_CONST_S16)
 CONST('J', TCG_CT_CONST_S32)
+CONST('K', TCG_CT_CONST_P32)
 CONST('Z', TCG_CT_CONST_ZERO)
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index fc304327fc..2a7410ba58 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -37,6 +37,7 @@
 #define TCG_CT_CONST_S32   0x200
 #define TCG_CT_CONST_S33   0x400
 #define TCG_CT_CONST_ZERO  0x800
+#define TCG_CT_CONST_P32   0x1000
 
 #define ALL_GENERAL_REGS     MAKE_64BIT_MASK(0, 16)
 #define ALL_VECTOR_REGS      MAKE_64BIT_MASK(32, 32)
@@ -507,6 +508,28 @@ static bool patch_reloc(tcg_insn_unit *src_rw, int type,
     return false;
 }
 
+static int is_const_p16(uint64_t val)
+{
+    for (int i = 0; i < 4; ++i) {
+        uint64_t mask = 0xffffull << (i * 16);
+        if ((val & ~mask) == 0) {
+            return i;
+        }
+    }
+    return -1;
+}
+
+static int is_const_p32(uint64_t val)
+{
+    if ((val & 0xffffffff00000000ull) == 0) {
+        return 0;
+    }
+    if ((val & 0x00000000ffffffffull) == 0) {
+        return 1;
+    }
+    return -1;
+}
+
 /* Test if a constant matches the constraint. */
 static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
 {
@@ -529,6 +552,14 @@ static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
         return val == 0;
     }
 
+    /*
+     * Note that is_const_p16 is a subset of is_const_p32,
+     * so we don't need both constraints.
+     */
+    if ((ct & TCG_CT_CONST_P32) && is_const_p32(val) >= 0) {
+        return true;
+    }
+
     return 0;
 }
 
@@ -1125,7 +1156,7 @@ static void tgen_andi(TCGContext *s, TCGType type, TCGReg dest, uint64_t val)
     }
 }
 
-static void tgen_ori(TCGContext *s, TCGType type, TCGReg dest, uint64_t val)
+static void tgen_ori(TCGContext *s, TCGReg dest, uint64_t val)
 {
     static const S390Opcode oi_insns[4] = {
         RI_OILL, RI_OILH, RI_OIHL, RI_OIHH
@@ -1136,70 +1167,32 @@ static void tgen_ori(TCGContext *s, TCGType type, TCGReg dest, uint64_t val)
 
     int i;
 
-    /* Look for no-op.  */
-    if (unlikely(val == 0)) {
+    i = is_const_p16(val);
+    if (i >= 0) {
+        tcg_out_insn_RI(s, oi_insns[i], dest, val >> (i * 16));
         return;
     }
 
-    /* Try all 32-bit insns that can perform it in one go.  */
-    for (i = 0; i < 4; i++) {
-        tcg_target_ulong mask = (0xffffull << i * 16);
-        if ((val & mask) != 0 && (val & ~mask) == 0) {
-            tcg_out_insn_RI(s, oi_insns[i], dest, val >> i * 16);
-            return;
-        }
+    i = is_const_p32(val);
+    if (i >= 0) {
+        tcg_out_insn_RIL(s, oif_insns[i], dest, val >> (i * 32));
+        return;
     }
 
-    /* Try all 48-bit insns that can perform it in one go.  */
-    for (i = 0; i < 2; i++) {
-        tcg_target_ulong mask = (0xffffffffull << i * 32);
-        if ((val & mask) != 0 && (val & ~mask) == 0) {
-            tcg_out_insn_RIL(s, oif_insns[i], dest, val >> i * 32);
-            return;
-        }
-    }
-
-    if (maybe_out_small_movi(s, type, TCG_TMP0, val)) {
-        if (type == TCG_TYPE_I32) {
-            tcg_out_insn(s, RR, OR, dest, TCG_TMP0);
-        } else {
-            tcg_out_insn(s, RRE, OGR, dest, TCG_TMP0);
-        }
-    } else {
-        /* Perform the OR via sequential modifications to the high and
-           low parts.  Do this via recursion to handle 16-bit vs 32-bit
-           masks in each half.  */
-        tgen_ori(s, type, dest, val & 0x00000000ffffffffull);
-        tgen_ori(s, type, dest, val & 0xffffffff00000000ull);
-    }
+    g_assert_not_reached();
 }
 
-static void tgen_xori(TCGContext *s, TCGType type, TCGReg dest, uint64_t val)
+static void tgen_xori(TCGContext *s, TCGReg dest, uint64_t val)
 {
-    /* Try all 48-bit insns that can perform it in one go.  */
-    if ((val & 0xffffffff00000000ull) == 0) {
+    switch (is_const_p32(val)) {
+    case 0:
         tcg_out_insn(s, RIL, XILF, dest, val);
-        return;
-    }
-    if ((val & 0x00000000ffffffffull) == 0) {
+        break;
+    case 1:
         tcg_out_insn(s, RIL, XIHF, dest, val >> 32);
-        return;
-    }
-
-    if (maybe_out_small_movi(s, type, TCG_TMP0, val)) {
-        if (type == TCG_TYPE_I32) {
-            tcg_out_insn(s, RR, XR, dest, TCG_TMP0);
-        } else {
-            tcg_out_insn(s, RRE, XGR, dest, TCG_TMP0);
-        }
-    } else {
-        /* Perform the xor by parts.  */
-        if (val & 0xffffffff) {
-            tcg_out_insn(s, RIL, XILF, dest, val);
-        }
-        if (val > 0xffffffff) {
-            tcg_out_insn(s, RIL, XIHF, dest, val >> 32);
-        }
+        break;
+    default:
+        g_assert_not_reached();
     }
 }
 
@@ -1994,7 +1987,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         a0 = args[0], a1 = args[1], a2 = (uint32_t)args[2];
         if (const_args[2]) {
             tcg_out_mov(s, TCG_TYPE_I32, a0, a1);
-            tgen_ori(s, TCG_TYPE_I32, a0, a2);
+            tgen_ori(s, a0, a2);
         } else if (a0 == a1) {
             tcg_out_insn(s, RR, OR, a0, a2);
         } else {
@@ -2256,7 +2249,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         a0 = args[0], a1 = args[1], a2 = args[2];
         if (const_args[2]) {
             tcg_out_mov(s, TCG_TYPE_I64, a0, a1);
-            tgen_ori(s, TCG_TYPE_I64, a0, a2);
+            tgen_ori(s, a0, a2);
         } else {
             tcg_out_insn(s, RRFa, OGRK, a0, a1, a2);
         }
@@ -2265,7 +2258,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         a0 = args[0], a1 = args[1], a2 = args[2];
         if (const_args[2]) {
             tcg_out_mov(s, TCG_TYPE_I64, a0, a1);
-            tgen_xori(s, TCG_TYPE_I64, a0, a2);
+            tgen_xori(s, a0, a2);
         } else {
             tcg_out_insn(s, RRFa, XGRK, a0, a1, a2);
         }
@@ -2944,10 +2937,11 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_and_i32:
     case INDEX_op_and_i64:
     case INDEX_op_or_i32:
-    case INDEX_op_or_i64:
     case INDEX_op_xor_i32:
-    case INDEX_op_xor_i64:
         return C_O1_I2(r, r, ri);
+    case INDEX_op_or_i64:
+    case INDEX_op_xor_i64:
+        return C_O1_I2(r, r, rK);
 
     case INDEX_op_mul_i32:
         return (HAVE_FACILITY(MISC_INSN_EXT2)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 18/27] tcg/s390x: Tighten constraints for and_i64
  2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
                   ` (16 preceding siblings ...)
  2022-12-09  2:05 ` [PATCH v4 17/27] tcg/s390x: Tighten constraints for or_i64 and xor_i64 Richard Henderson
@ 2022-12-09  2:05 ` Richard Henderson
  2022-12-12 22:57   ` Ilya Leoshkevich
  2022-12-09  2:05 ` [PATCH v4 19/27] tcg/s390x: Support MIE3 logical operations Richard Henderson
                   ` (9 subsequent siblings)
  27 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2022-12-09  2:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, iii

Let the register allocator handle such immediates by matching
only what one insn can achieve.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target-con-set.h |   1 +
 tcg/s390x/tcg-target-con-str.h |   2 +
 tcg/s390x/tcg-target.c.inc     | 114 +++++++++++++++++----------------
 3 files changed, 61 insertions(+), 56 deletions(-)

diff --git a/tcg/s390x/tcg-target-con-set.h b/tcg/s390x/tcg-target-con-set.h
index 34ae4c7743..0c4d0da8f5 100644
--- a/tcg/s390x/tcg-target-con-set.h
+++ b/tcg/s390x/tcg-target-con-set.h
@@ -25,6 +25,7 @@ C_O1_I2(r, 0, rJ)
 C_O1_I2(r, r, ri)
 C_O1_I2(r, r, rJ)
 C_O1_I2(r, r, rK)
+C_O1_I2(r, r, rNKR)
 C_O1_I2(r, rZ, r)
 C_O1_I2(v, v, r)
 C_O1_I2(v, v, v)
diff --git a/tcg/s390x/tcg-target-con-str.h b/tcg/s390x/tcg-target-con-str.h
index 7b910d6d11..6fa64a1ed6 100644
--- a/tcg/s390x/tcg-target-con-str.h
+++ b/tcg/s390x/tcg-target-con-str.h
@@ -21,4 +21,6 @@ CONST('A', TCG_CT_CONST_S33)
 CONST('I', TCG_CT_CONST_S16)
 CONST('J', TCG_CT_CONST_S32)
 CONST('K', TCG_CT_CONST_P32)
+CONST('N', TCG_CT_CONST_INV)
+CONST('R', TCG_CT_CONST_INVRISBG)
 CONST('Z', TCG_CT_CONST_ZERO)
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index 2a7410ba58..21007f94ad 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -33,11 +33,13 @@
 #include "../tcg-pool.c.inc"
 #include "elf.h"
 
-#define TCG_CT_CONST_S16   0x100
-#define TCG_CT_CONST_S32   0x200
-#define TCG_CT_CONST_S33   0x400
-#define TCG_CT_CONST_ZERO  0x800
-#define TCG_CT_CONST_P32   0x1000
+#define TCG_CT_CONST_S16        (1 << 8)
+#define TCG_CT_CONST_S32        (1 << 9)
+#define TCG_CT_CONST_S33        (1 << 10)
+#define TCG_CT_CONST_ZERO       (1 << 11)
+#define TCG_CT_CONST_P32        (1 << 12)
+#define TCG_CT_CONST_INV        (1 << 13)
+#define TCG_CT_CONST_INVRISBG   (1 << 14)
 
 #define ALL_GENERAL_REGS     MAKE_64BIT_MASK(0, 16)
 #define ALL_VECTOR_REGS      MAKE_64BIT_MASK(32, 32)
@@ -530,6 +532,38 @@ static int is_const_p32(uint64_t val)
     return -1;
 }
 
+/*
+ * Accept bit patterns like these:
+ *  0....01....1
+ *  1....10....0
+ *  1..10..01..1
+ *  0..01..10..0
+ * Copied from gcc sources.
+ */
+static bool risbg_mask(uint64_t c)
+{
+    uint64_t lsb;
+    /* We don't change the number of transitions by inverting,
+       so make sure we start with the LSB zero.  */
+    if (c & 1) {
+        c = ~c;
+    }
+    /* Reject all zeros or all ones.  */
+    if (c == 0) {
+        return false;
+    }
+    /* Find the first transition.  */
+    lsb = c & -c;
+    /* Invert to look for a second transition.  */
+    c = ~c;
+    /* Erase the first transition.  */
+    c &= -lsb;
+    /* Find the second transition, if any.  */
+    lsb = c & -c;
+    /* Match if all the bits are 1's, or if c is zero.  */
+    return c == -lsb;
+}
+
 /* Test if a constant matches the constraint. */
 static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
 {
@@ -552,6 +586,9 @@ static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
         return val == 0;
     }
 
+    if (ct & TCG_CT_CONST_INV) {
+        val = ~val;
+    }
     /*
      * Note that is_const_p16 is a subset of is_const_p32,
      * so we don't need both constraints.
@@ -559,6 +596,9 @@ static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
     if ((ct & TCG_CT_CONST_P32) && is_const_p32(val) >= 0) {
         return true;
     }
+    if ((ct & TCG_CT_CONST_INVRISBG) && risbg_mask(~val)) {
+        return true;
+    }
 
     return 0;
 }
@@ -1057,36 +1097,6 @@ static inline void tgen_ext32u(TCGContext *s, TCGReg dest, TCGReg src)
     tcg_out_insn(s, RRE, LLGFR, dest, src);
 }
 
-/* Accept bit patterns like these:
-    0....01....1
-    1....10....0
-    1..10..01..1
-    0..01..10..0
-   Copied from gcc sources.  */
-static inline bool risbg_mask(uint64_t c)
-{
-    uint64_t lsb;
-    /* We don't change the number of transitions by inverting,
-       so make sure we start with the LSB zero.  */
-    if (c & 1) {
-        c = ~c;
-    }
-    /* Reject all zeros or all ones.  */
-    if (c == 0) {
-        return false;
-    }
-    /* Find the first transition.  */
-    lsb = c & -c;
-    /* Invert to look for a second transition.  */
-    c = ~c;
-    /* Erase the first transition.  */
-    c &= -lsb;
-    /* Find the second transition, if any.  */
-    lsb = c & -c;
-    /* Match if all the bits are 1's, or if c is zero.  */
-    return c == -lsb;
-}
-
 static void tgen_andi_risbg(TCGContext *s, TCGReg out, TCGReg in, uint64_t val)
 {
     int msb, lsb;
@@ -1126,34 +1136,25 @@ static void tgen_andi(TCGContext *s, TCGType type, TCGReg dest, uint64_t val)
         return;
     }
 
-    /* Try all 32-bit insns that can perform it in one go.  */
-    for (i = 0; i < 4; i++) {
-        tcg_target_ulong mask = ~(0xffffull << i * 16);
-        if (((val | ~valid) & mask) == mask) {
-            tcg_out_insn_RI(s, ni_insns[i], dest, val >> i * 16);
-            return;
-        }
+    i = is_const_p16(~val & valid);
+    if (i >= 0) {
+        tcg_out_insn_RI(s, ni_insns[i], dest, val >> (i * 16));
+        return;
     }
 
-    /* Try all 48-bit insns that can perform it in one go.  */
-    for (i = 0; i < 2; i++) {
-        tcg_target_ulong mask = ~(0xffffffffull << i * 32);
-        if (((val | ~valid) & mask) == mask) {
-            tcg_out_insn_RIL(s, nif_insns[i], dest, val >> i * 32);
-            return;
-        }
+    i = is_const_p32(~val & valid);
+    tcg_debug_assert(i == 0 || type != TCG_TYPE_I32);
+    if (i >= 0) {
+        tcg_out_insn_RIL(s, nif_insns[i], dest, val >> (i * 32));
+        return;
     }
+
     if (risbg_mask(val)) {
         tgen_andi_risbg(s, dest, dest, val);
         return;
     }
 
-    tcg_out_movi(s, type, TCG_TMP0, val);
-    if (type == TCG_TYPE_I32) {
-        tcg_out_insn(s, RR, NR, dest, TCG_TMP0);
-    } else {
-        tcg_out_insn(s, RRE, NGR, dest, TCG_TMP0);
-    }
+    g_assert_not_reached();
 }
 
 static void tgen_ori(TCGContext *s, TCGReg dest, uint64_t val)
@@ -2935,10 +2936,11 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_sub_i32:
     case INDEX_op_sub_i64:
     case INDEX_op_and_i32:
-    case INDEX_op_and_i64:
     case INDEX_op_or_i32:
     case INDEX_op_xor_i32:
         return C_O1_I2(r, r, ri);
+    case INDEX_op_and_i64:
+        return C_O1_I2(r, r, rNKR);
     case INDEX_op_or_i64:
     case INDEX_op_xor_i64:
         return C_O1_I2(r, r, rK);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 19/27] tcg/s390x: Support MIE3 logical operations
  2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
                   ` (17 preceding siblings ...)
  2022-12-09  2:05 ` [PATCH v4 18/27] tcg/s390x: Tighten constraints for and_i64 Richard Henderson
@ 2022-12-09  2:05 ` Richard Henderson
  2022-12-09  2:05 ` [PATCH v4 20/27] tcg/s390x: Create tgen_cmp2 to simplify movcond Richard Henderson
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 45+ messages in thread
From: Richard Henderson @ 2022-12-09  2:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, iii

This is andc, orc, nand, nor, eqv.
We can use nor for implementing not.

Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target-con-set.h |   3 +
 tcg/s390x/tcg-target.h         |  25 ++++----
 tcg/s390x/tcg-target.c.inc     | 102 +++++++++++++++++++++++++++++++++
 3 files changed, 118 insertions(+), 12 deletions(-)

diff --git a/tcg/s390x/tcg-target-con-set.h b/tcg/s390x/tcg-target-con-set.h
index 0c4d0da8f5..b194ad7f03 100644
--- a/tcg/s390x/tcg-target-con-set.h
+++ b/tcg/s390x/tcg-target-con-set.h
@@ -22,9 +22,12 @@ C_O1_I1(v, vr)
 C_O1_I2(r, 0, ri)
 C_O1_I2(r, 0, rI)
 C_O1_I2(r, 0, rJ)
+C_O1_I2(r, r, r)
 C_O1_I2(r, r, ri)
 C_O1_I2(r, r, rJ)
 C_O1_I2(r, r, rK)
+C_O1_I2(r, r, rKR)
+C_O1_I2(r, r, rNK)
 C_O1_I2(r, r, rNKR)
 C_O1_I2(r, rZ, r)
 C_O1_I2(v, v, r)
diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index 03ce11a34a..dabdae1e84 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -64,6 +64,7 @@ typedef enum TCGReg {
 
 #define FACILITY_LOAD_ON_COND2        53
 #define FACILITY_MISC_INSN_EXT2       58
+#define FACILITY_MISC_INSN_EXT3       61
 #define FACILITY_VECTOR               129
 #define FACILITY_VECTOR_ENH1          135
 
@@ -81,13 +82,13 @@ extern uint64_t s390_facilities[3];
 #define TCG_TARGET_HAS_ext16u_i32     1
 #define TCG_TARGET_HAS_bswap16_i32    1
 #define TCG_TARGET_HAS_bswap32_i32    1
-#define TCG_TARGET_HAS_not_i32        0
+#define TCG_TARGET_HAS_not_i32        HAVE_FACILITY(MISC_INSN_EXT3)
 #define TCG_TARGET_HAS_neg_i32        1
-#define TCG_TARGET_HAS_andc_i32       0
-#define TCG_TARGET_HAS_orc_i32        0
-#define TCG_TARGET_HAS_eqv_i32        0
-#define TCG_TARGET_HAS_nand_i32       0
-#define TCG_TARGET_HAS_nor_i32        0
+#define TCG_TARGET_HAS_andc_i32       HAVE_FACILITY(MISC_INSN_EXT3)
+#define TCG_TARGET_HAS_orc_i32        HAVE_FACILITY(MISC_INSN_EXT3)
+#define TCG_TARGET_HAS_eqv_i32        HAVE_FACILITY(MISC_INSN_EXT3)
+#define TCG_TARGET_HAS_nand_i32       HAVE_FACILITY(MISC_INSN_EXT3)
+#define TCG_TARGET_HAS_nor_i32        HAVE_FACILITY(MISC_INSN_EXT3)
 #define TCG_TARGET_HAS_clz_i32        0
 #define TCG_TARGET_HAS_ctz_i32        0
 #define TCG_TARGET_HAS_ctpop_i32      0
@@ -118,13 +119,13 @@ extern uint64_t s390_facilities[3];
 #define TCG_TARGET_HAS_bswap16_i64    1
 #define TCG_TARGET_HAS_bswap32_i64    1
 #define TCG_TARGET_HAS_bswap64_i64    1
-#define TCG_TARGET_HAS_not_i64        0
+#define TCG_TARGET_HAS_not_i64        HAVE_FACILITY(MISC_INSN_EXT3)
 #define TCG_TARGET_HAS_neg_i64        1
-#define TCG_TARGET_HAS_andc_i64       0
-#define TCG_TARGET_HAS_orc_i64        0
-#define TCG_TARGET_HAS_eqv_i64        0
-#define TCG_TARGET_HAS_nand_i64       0
-#define TCG_TARGET_HAS_nor_i64        0
+#define TCG_TARGET_HAS_andc_i64       HAVE_FACILITY(MISC_INSN_EXT3)
+#define TCG_TARGET_HAS_orc_i64        HAVE_FACILITY(MISC_INSN_EXT3)
+#define TCG_TARGET_HAS_eqv_i64        HAVE_FACILITY(MISC_INSN_EXT3)
+#define TCG_TARGET_HAS_nand_i64       HAVE_FACILITY(MISC_INSN_EXT3)
+#define TCG_TARGET_HAS_nor_i64        HAVE_FACILITY(MISC_INSN_EXT3)
 #define TCG_TARGET_HAS_clz_i64        1
 #define TCG_TARGET_HAS_ctz_i64        0
 #define TCG_TARGET_HAS_ctpop_i64      0
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index 21007f94ad..bab2d679c2 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -181,8 +181,18 @@ typedef enum S390Opcode {
     RRFa_MGRK   = 0xb9ec,
     RRFa_MSRKC  = 0xb9fd,
     RRFa_MSGRKC = 0xb9ed,
+    RRFa_NCRK   = 0xb9f5,
+    RRFa_NCGRK  = 0xb9e5,
+    RRFa_NNRK   = 0xb974,
+    RRFa_NNGRK  = 0xb964,
+    RRFa_NORK   = 0xb976,
+    RRFa_NOGRK  = 0xb966,
     RRFa_NRK    = 0xb9f4,
     RRFa_NGRK   = 0xb9e4,
+    RRFa_NXRK   = 0xb977,
+    RRFa_NXGRK  = 0xb967,
+    RRFa_OCRK   = 0xb975,
+    RRFa_OCGRK  = 0xb965,
     RRFa_ORK    = 0xb9f6,
     RRFa_OGRK   = 0xb9e6,
     RRFa_SRK    = 0xb9f9,
@@ -2007,9 +2017,46 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    case INDEX_op_andc_i32:
+        a0 = args[0], a1 = args[1], a2 = (uint32_t)args[2];
+        if (const_args[2]) {
+            tcg_out_mov(s, TCG_TYPE_I32, a0, a1);
+            tgen_andi(s, TCG_TYPE_I32, a0, (uint32_t)~a2);
+	} else {
+            tcg_out_insn(s, RRFa, NCRK, a0, a1, a2);
+	}
+        break;
+    case INDEX_op_orc_i32:
+        a0 = args[0], a1 = args[1], a2 = (uint32_t)args[2];
+        if (const_args[2]) {
+            tcg_out_mov(s, TCG_TYPE_I32, a0, a1);
+            tgen_ori(s, a0, (uint32_t)~a2);
+        } else {
+            tcg_out_insn(s, RRFa, OCRK, a0, a1, a2);
+        }
+        break;
+    case INDEX_op_eqv_i32:
+        a0 = args[0], a1 = args[1], a2 = (uint32_t)args[2];
+        if (const_args[2]) {
+            tcg_out_mov(s, TCG_TYPE_I32, a0, a1);
+            tcg_out_insn(s, RIL, XILF, a0, ~a2);
+        } else {
+            tcg_out_insn(s, RRFa, NXRK, a0, a1, a2);
+        }
+        break;
+    case INDEX_op_nand_i32:
+        tcg_out_insn(s, RRFa, NNRK, args[0], args[1], args[2]);
+        break;
+    case INDEX_op_nor_i32:
+        tcg_out_insn(s, RRFa, NORK, args[0], args[1], args[2]);
+        break;
+
     case INDEX_op_neg_i32:
         tcg_out_insn(s, RR, LCR, args[0], args[1]);
         break;
+    case INDEX_op_not_i32:
+        tcg_out_insn(s, RRFa, NORK, args[0], args[1], args[1]);
+        break;
 
     case INDEX_op_mul_i32:
         a0 = args[0], a1 = args[1], a2 = (int32_t)args[2];
@@ -2265,9 +2312,46 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    case INDEX_op_andc_i64:
+        a0 = args[0], a1 = args[1], a2 = args[2];
+        if (const_args[2]) {
+            tcg_out_mov(s, TCG_TYPE_I64, a0, a1);
+            tgen_andi(s, TCG_TYPE_I64, a0, ~a2);
+        } else {
+            tcg_out_insn(s, RRFa, NCGRK, a0, a1, a2);
+        }
+        break;
+    case INDEX_op_orc_i64:
+        a0 = args[0], a1 = args[1], a2 = args[2];
+        if (const_args[2]) {
+            tcg_out_mov(s, TCG_TYPE_I64, a0, a1);
+            tgen_ori(s, a0, ~a2);
+        } else {
+            tcg_out_insn(s, RRFa, OCGRK, a0, a1, a2);
+        }
+        break;
+    case INDEX_op_eqv_i64:
+        a0 = args[0], a1 = args[1], a2 = args[2];
+        if (const_args[2]) {
+            tcg_out_mov(s, TCG_TYPE_I64, a0, a1);
+            tgen_xori(s, a0, ~a2);
+        } else {
+            tcg_out_insn(s, RRFa, NXGRK, a0, a1, a2);
+        }
+        break;
+    case INDEX_op_nand_i64:
+        tcg_out_insn(s, RRFa, NNGRK, args[0], args[1], args[2]);
+        break;
+    case INDEX_op_nor_i64:
+        tcg_out_insn(s, RRFa, NOGRK, args[0], args[1], args[2]);
+        break;
+
     case INDEX_op_neg_i64:
         tcg_out_insn(s, RRE, LCGR, args[0], args[1]);
         break;
+    case INDEX_op_not_i64:
+        tcg_out_insn(s, RRFa, NOGRK, args[0], args[1], args[1]);
+        break;
     case INDEX_op_bswap64_i64:
         tcg_out_insn(s, RRE, LRVGR, args[0], args[1]);
         break;
@@ -2945,6 +3029,22 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_xor_i64:
         return C_O1_I2(r, r, rK);
 
+    case INDEX_op_andc_i32:
+    case INDEX_op_orc_i32:
+    case INDEX_op_eqv_i32:
+        return C_O1_I2(r, r, ri);
+    case INDEX_op_andc_i64:
+        return C_O1_I2(r, r, rKR);
+    case INDEX_op_orc_i64:
+    case INDEX_op_eqv_i64:
+        return C_O1_I2(r, r, rNK);
+
+    case INDEX_op_nand_i32:
+    case INDEX_op_nand_i64:
+    case INDEX_op_nor_i32:
+    case INDEX_op_nor_i64:
+        return C_O1_I2(r, r, r);
+
     case INDEX_op_mul_i32:
         return (HAVE_FACILITY(MISC_INSN_EXT2)
                 ? C_O1_I2(r, r, ri)
@@ -2970,6 +3070,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_bswap64_i64:
     case INDEX_op_neg_i32:
     case INDEX_op_neg_i64:
+    case INDEX_op_not_i32:
+    case INDEX_op_not_i64:
     case INDEX_op_ext8s_i32:
     case INDEX_op_ext8s_i64:
     case INDEX_op_ext8u_i32:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 20/27] tcg/s390x: Create tgen_cmp2 to simplify movcond
  2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
                   ` (18 preceding siblings ...)
  2022-12-09  2:05 ` [PATCH v4 19/27] tcg/s390x: Support MIE3 logical operations Richard Henderson
@ 2022-12-09  2:05 ` Richard Henderson
  2022-12-09  2:05 ` [PATCH v4 21/27] tcg/s390x: Generalize movcond implementation Richard Henderson
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 45+ messages in thread
From: Richard Henderson @ 2022-12-09  2:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, iii

Return both regular and inverted condition codes from tgen_cmp2.
This lets us choose after the fact which comparision we want.

Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target.c.inc | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index bab2d679c2..a9e3b4a9b9 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -1207,10 +1207,11 @@ static void tgen_xori(TCGContext *s, TCGReg dest, uint64_t val)
     }
 }
 
-static int tgen_cmp(TCGContext *s, TCGType type, TCGCond c, TCGReg r1,
-                    TCGArg c2, bool c2const, bool need_carry)
+static int tgen_cmp2(TCGContext *s, TCGType type, TCGCond c, TCGReg r1,
+                     TCGArg c2, bool c2const, bool need_carry, int *inv_cc)
 {
     bool is_unsigned = is_unsigned_cond(c);
+    TCGCond inv_c = tcg_invert_cond(c);
     S390Opcode op;
 
     if (c2const) {
@@ -1221,6 +1222,7 @@ static int tgen_cmp(TCGContext *s, TCGType type, TCGCond c, TCGReg r1,
                 } else {
                     tcg_out_insn(s, RRE, LTGR, r1, r1);
                 }
+                *inv_cc = tcg_cond_to_ltr_cond[inv_c];
                 return tcg_cond_to_ltr_cond[c];
             }
         }
@@ -1263,9 +1265,17 @@ static int tgen_cmp(TCGContext *s, TCGType type, TCGCond c, TCGReg r1,
     }
 
  exit:
+    *inv_cc = tcg_cond_to_s390_cond[inv_c];
     return tcg_cond_to_s390_cond[c];
 }
 
+static int tgen_cmp(TCGContext *s, TCGType type, TCGCond c, TCGReg r1,
+                    TCGArg c2, bool c2const, bool need_carry)
+{
+    int inv_cc;
+    return tgen_cmp2(s, type, c, r1, c2, c2const, need_carry, &inv_cc);
+}
+
 static void tgen_setcond(TCGContext *s, TCGType type, TCGCond cond,
                          TCGReg dest, TCGReg c1, TCGArg c2, int c2const)
 {
@@ -1348,7 +1358,10 @@ static void tgen_movcond(TCGContext *s, TCGType type, TCGCond c, TCGReg dest,
                          TCGReg c1, TCGArg c2, int c2const,
                          TCGArg v3, int v3const)
 {
-    int cc = tgen_cmp(s, type, c, c1, c2, c2const, false);
+    int cc, inv_cc;
+
+    cc = tgen_cmp2(s, type, c, c1, c2, c2const, false, &inv_cc);
+
     if (v3const) {
         tcg_out_insn(s, RIEg, LOCGHI, dest, v3, cc);
     } else {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 21/27] tcg/s390x: Generalize movcond implementation
  2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
                   ` (19 preceding siblings ...)
  2022-12-09  2:05 ` [PATCH v4 20/27] tcg/s390x: Create tgen_cmp2 to simplify movcond Richard Henderson
@ 2022-12-09  2:05 ` Richard Henderson
  2022-12-09  2:05 ` [PATCH v4 22/27] tcg/s390x: Support SELGR instruction in movcond Richard Henderson
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 45+ messages in thread
From: Richard Henderson @ 2022-12-09  2:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, iii

Generalize movcond to support pre-computed conditions, and the same
set of arguments at all times.  This will be assumed by a following
patch, which needs to reuse tgen_movcond_int.

Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target-con-set.h |  3 +-
 tcg/s390x/tcg-target.c.inc     | 52 ++++++++++++++++++++++++++--------
 2 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/tcg/s390x/tcg-target-con-set.h b/tcg/s390x/tcg-target-con-set.h
index b194ad7f03..8cf8ed4dff 100644
--- a/tcg/s390x/tcg-target-con-set.h
+++ b/tcg/s390x/tcg-target-con-set.h
@@ -33,8 +33,7 @@ C_O1_I2(r, rZ, r)
 C_O1_I2(v, v, r)
 C_O1_I2(v, v, v)
 C_O1_I3(v, v, v, v)
-C_O1_I4(r, r, ri, r, 0)
-C_O1_I4(r, r, ri, rI, 0)
+C_O1_I4(r, r, ri, rI, r)
 C_O2_I2(o, m, 0, r)
 C_O2_I2(o, m, r, r)
 C_O2_I3(o, m, 0, 1, r)
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index a9e3b4a9b9..30c12052f0 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -1354,19 +1354,49 @@ static void tgen_setcond(TCGContext *s, TCGType type, TCGCond cond,
     tcg_out_insn(s, RRFc, LOCGR, dest, TCG_TMP0, cc);
 }
 
+static void tgen_movcond_int(TCGContext *s, TCGType type, TCGReg dest,
+                             TCGArg v3, int v3const, TCGReg v4,
+                             int cc, int inv_cc)
+{
+    TCGReg src;
+
+    if (v3const) {
+        if (dest == v4) {
+            if (HAVE_FACILITY(LOAD_ON_COND2)) {
+                /* Emit: if (cc) dest = v3. */
+                tcg_out_insn(s, RIEg, LOCGHI, dest, v3, cc);
+                return;
+            }
+            tcg_out_insn(s, RI, LGHI, TCG_TMP0, v3);
+            src = TCG_TMP0;
+        } else {
+            /* LGR+LOCGHI is larger than LGHI+LOCGR. */
+            tcg_out_insn(s, RI, LGHI, dest, v3);
+            cc = inv_cc;
+            src = v4;
+        }
+    } else {
+        if (dest == v4) {
+            src = v3;
+        } else {
+            tcg_out_mov(s, type, dest, v3);
+            cc = inv_cc;
+            src = v4;
+        }
+    }
+
+    /* Emit: if (cc) dest = src. */
+    tcg_out_insn(s, RRFc, LOCGR, dest, src, cc);
+}
+
 static void tgen_movcond(TCGContext *s, TCGType type, TCGCond c, TCGReg dest,
                          TCGReg c1, TCGArg c2, int c2const,
-                         TCGArg v3, int v3const)
+                         TCGArg v3, int v3const, TCGReg v4)
 {
     int cc, inv_cc;
 
     cc = tgen_cmp2(s, type, c, c1, c2, c2const, false, &inv_cc);
-
-    if (v3const) {
-        tcg_out_insn(s, RIEg, LOCGHI, dest, v3, cc);
-    } else {
-        tcg_out_insn(s, RRFc, LOCGR, dest, v3, cc);
-    }
+    tgen_movcond_int(s, type, dest, v3, v3const, v4, cc, inv_cc);
 }
 
 static void tgen_clz(TCGContext *s, TCGReg dest, TCGReg a1,
@@ -2225,7 +2255,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
     case INDEX_op_movcond_i32:
         tgen_movcond(s, TCG_TYPE_I32, args[5], args[0], args[1],
-                     args[2], const_args[2], args[3], const_args[3]);
+                     args[2], const_args[2], args[3], const_args[3], args[4]);
         break;
 
     case INDEX_op_qemu_ld_i32:
@@ -2509,7 +2539,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
     case INDEX_op_movcond_i64:
         tgen_movcond(s, TCG_TYPE_I64, args[5], args[0], args[1],
-                     args[2], const_args[2], args[3], const_args[3]);
+                     args[2], const_args[2], args[3], const_args[3], args[4]);
         break;
 
     OP_32_64(deposit):
@@ -3114,9 +3144,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
 
     case INDEX_op_movcond_i32:
     case INDEX_op_movcond_i64:
-        return (HAVE_FACILITY(LOAD_ON_COND2)
-                ? C_O1_I4(r, r, ri, rI, 0)
-                : C_O1_I4(r, r, ri, r, 0));
+        return C_O1_I4(r, r, ri, rI, r);
 
     case INDEX_op_div2_i32:
     case INDEX_op_div2_i64:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 22/27] tcg/s390x: Support SELGR instruction in movcond
  2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
                   ` (20 preceding siblings ...)
  2022-12-09  2:05 ` [PATCH v4 21/27] tcg/s390x: Generalize movcond implementation Richard Henderson
@ 2022-12-09  2:05 ` Richard Henderson
  2022-12-09  2:05 ` [PATCH v4 23/27] tcg/s390x: Use tgen_movcond_int in tgen_clz Richard Henderson
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 45+ messages in thread
From: Richard Henderson @ 2022-12-09  2:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, iii

The new select instruction provides two separate register inputs,
whereas the old load-on-condition instruction overlaps one of the
register inputs with the destination.

Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target.c.inc | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index 30c12052f0..ab1fb45cc2 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -202,6 +202,8 @@ typedef enum S390Opcode {
     RRFa_XRK    = 0xb9f7,
     RRFa_XGRK   = 0xb9e7,
 
+    RRFam_SELGR = 0xb9e3,
+
     RRFc_LOCR   = 0xb9f2,
     RRFc_LOCGR  = 0xb9e2,
 
@@ -626,12 +628,20 @@ static void tcg_out_insn_RRE(TCGContext *s, S390Opcode op,
     tcg_out32(s, (op << 16) | (r1 << 4) | r2);
 }
 
+/* RRF-a without the m4 field */
 static void tcg_out_insn_RRFa(TCGContext *s, S390Opcode op,
                               TCGReg r1, TCGReg r2, TCGReg r3)
 {
     tcg_out32(s, (op << 16) | (r3 << 12) | (r1 << 4) | r2);
 }
 
+/* RRF-a with the m4 field */
+static void tcg_out_insn_RRFam(TCGContext *s, S390Opcode op,
+                               TCGReg r1, TCGReg r2, TCGReg r3, int m4)
+{
+    tcg_out32(s, (op << 16) | (r3 << 12) | (m4 << 8) | (r1 << 4) | r2);
+}
+
 static void tcg_out_insn_RRFc(TCGContext *s, S390Opcode op,
                               TCGReg r1, TCGReg r2, int m3)
 {
@@ -1376,6 +1386,11 @@ static void tgen_movcond_int(TCGContext *s, TCGType type, TCGReg dest,
             src = v4;
         }
     } else {
+        if (HAVE_FACILITY(MISC_INSN_EXT3)) {
+            /* Emit: dest = cc ? v3 : v4. */
+            tcg_out_insn(s, RRFam, SELGR, dest, v3, v4, cc);
+            return;
+        }
         if (dest == v4) {
             src = v3;
         } else {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 23/27] tcg/s390x: Use tgen_movcond_int in tgen_clz
  2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
                   ` (21 preceding siblings ...)
  2022-12-09  2:05 ` [PATCH v4 22/27] tcg/s390x: Support SELGR instruction in movcond Richard Henderson
@ 2022-12-09  2:05 ` Richard Henderson
  2022-12-09  2:05 ` [PATCH v4 24/27] tcg/s390x: Implement ctpop operation Richard Henderson
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 45+ messages in thread
From: Richard Henderson @ 2022-12-09  2:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, iii

Reuse code from movcond to conditionally copy a2 to dest,
based on the condition codes produced by FLOGR.

Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target-con-set.h |  1 +
 tcg/s390x/tcg-target.c.inc     | 20 +++++++++++---------
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/tcg/s390x/tcg-target-con-set.h b/tcg/s390x/tcg-target-con-set.h
index 8cf8ed4dff..baf3bc9037 100644
--- a/tcg/s390x/tcg-target-con-set.h
+++ b/tcg/s390x/tcg-target-con-set.h
@@ -24,6 +24,7 @@ C_O1_I2(r, 0, rI)
 C_O1_I2(r, 0, rJ)
 C_O1_I2(r, r, r)
 C_O1_I2(r, r, ri)
+C_O1_I2(r, r, rI)
 C_O1_I2(r, r, rJ)
 C_O1_I2(r, r, rK)
 C_O1_I2(r, r, rKR)
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index ab1fb45cc2..8254f9f650 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -1424,15 +1424,15 @@ static void tgen_clz(TCGContext *s, TCGReg dest, TCGReg a1,
 
     if (a2const && a2 == 64) {
         tcg_out_mov(s, TCG_TYPE_I64, dest, TCG_REG_R0);
-    } else {
-        if (a2const) {
-            tcg_out_movi(s, TCG_TYPE_I64, dest, a2);
-        } else {
-            tcg_out_mov(s, TCG_TYPE_I64, dest, a2);
-        }
-        /* Emit: if (one bit found) dest = r0.  */
-        tcg_out_insn(s, RRFc, LOCGR, dest, TCG_REG_R0, 2);
+        return;
     }
+
+    /*
+     * Conditions from FLOGR are:
+     *   2 -> one bit found
+     *   8 -> no one bit found
+     */
+    tgen_movcond_int(s, TCG_TYPE_I64, dest, a2, a2const, TCG_REG_R0, 8, 2);
 }
 
 static void tgen_deposit(TCGContext *s, TCGReg dest, TCGReg src,
@@ -3070,11 +3070,13 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_rotl_i64:
     case INDEX_op_rotr_i32:
     case INDEX_op_rotr_i64:
-    case INDEX_op_clz_i64:
     case INDEX_op_setcond_i32:
     case INDEX_op_setcond_i64:
         return C_O1_I2(r, r, ri);
 
+    case INDEX_op_clz_i64:
+        return C_O1_I2(r, r, rI);
+
     case INDEX_op_sub_i32:
     case INDEX_op_sub_i64:
     case INDEX_op_and_i32:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 24/27] tcg/s390x: Implement ctpop operation
  2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
                   ` (22 preceding siblings ...)
  2022-12-09  2:05 ` [PATCH v4 23/27] tcg/s390x: Use tgen_movcond_int in tgen_clz Richard Henderson
@ 2022-12-09  2:05 ` Richard Henderson
  2022-12-09  2:05 ` [PATCH v4 25/27] tcg/s390x: Tighten constraints for 64-bit compare Richard Henderson
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 45+ messages in thread
From: Richard Henderson @ 2022-12-09  2:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, iii

There is an older form that produces per-byte results,
and a newer form that produces per-register results.

Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target.h     |  4 ++--
 tcg/s390x/tcg-target.c.inc | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index dabdae1e84..68dcbc6645 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -91,7 +91,7 @@ extern uint64_t s390_facilities[3];
 #define TCG_TARGET_HAS_nor_i32        HAVE_FACILITY(MISC_INSN_EXT3)
 #define TCG_TARGET_HAS_clz_i32        0
 #define TCG_TARGET_HAS_ctz_i32        0
-#define TCG_TARGET_HAS_ctpop_i32      0
+#define TCG_TARGET_HAS_ctpop_i32      1
 #define TCG_TARGET_HAS_deposit_i32    1
 #define TCG_TARGET_HAS_extract_i32    1
 #define TCG_TARGET_HAS_sextract_i32   0
@@ -128,7 +128,7 @@ extern uint64_t s390_facilities[3];
 #define TCG_TARGET_HAS_nor_i64        HAVE_FACILITY(MISC_INSN_EXT3)
 #define TCG_TARGET_HAS_clz_i64        1
 #define TCG_TARGET_HAS_ctz_i64        0
-#define TCG_TARGET_HAS_ctpop_i64      0
+#define TCG_TARGET_HAS_ctpop_i64      1
 #define TCG_TARGET_HAS_deposit_i64    1
 #define TCG_TARGET_HAS_extract_i64    1
 #define TCG_TARGET_HAS_sextract_i64   0
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index 8254f9f650..c0434fa2f8 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -206,6 +206,7 @@ typedef enum S390Opcode {
 
     RRFc_LOCR   = 0xb9f2,
     RRFc_LOCGR  = 0xb9e2,
+    RRFc_POPCNT = 0xb9e1,
 
     RR_AR       = 0x1a,
     RR_ALR      = 0x1e,
@@ -1435,6 +1436,32 @@ static void tgen_clz(TCGContext *s, TCGReg dest, TCGReg a1,
     tgen_movcond_int(s, TCG_TYPE_I64, dest, a2, a2const, TCG_REG_R0, 8, 2);
 }
 
+static void tgen_ctpop(TCGContext *s, TCGType type, TCGReg dest, TCGReg src)
+{
+    /* With MIE3, and bit 0 of m4 set, we get the complete result. */
+    if (HAVE_FACILITY(MISC_INSN_EXT3)) {
+        if (type == TCG_TYPE_I32) {
+            tgen_ext32u(s, dest, src);
+            src = dest;
+        }
+        tcg_out_insn(s, RRFc, POPCNT, dest, src, 8);
+        return;
+    }
+
+    /* Without MIE3, each byte gets the count of bits for the byte. */
+    tcg_out_insn(s, RRFc, POPCNT, dest, src, 0);
+
+    /* Multiply to sum each byte at the top of the word. */
+    if (type == TCG_TYPE_I32) {
+        tcg_out_insn(s, RIL, MSFI, dest, 0x01010101);
+        tcg_out_sh32(s, RS_SRL, dest, TCG_REG_NONE, 24);
+    } else {
+        tcg_out_movi(s, TCG_TYPE_I64, TCG_TMP0, 0x0101010101010101ull);
+        tcg_out_insn(s, RRE, MSGR, dest, TCG_TMP0);
+        tcg_out_sh64(s, RSY_SRLG, dest, dest, TCG_REG_NONE, 56);
+    }
+}
+
 static void tgen_deposit(TCGContext *s, TCGReg dest, TCGReg src,
                          int ofs, int len, int z)
 {
@@ -2584,6 +2611,13 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tgen_clz(s, args[0], args[1], args[2], const_args[2]);
         break;
 
+    case INDEX_op_ctpop_i32:
+        tgen_ctpop(s, TCG_TYPE_I32, args[0], args[1]);
+        break;
+    case INDEX_op_ctpop_i64:
+        tgen_ctpop(s, TCG_TYPE_I64, args[0], args[1]);
+        break;
+
     case INDEX_op_mb:
         /* The host memory model is quite strong, we simply need to
            serialize the instruction stream.  */
@@ -3146,6 +3180,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_extu_i32_i64:
     case INDEX_op_extract_i32:
     case INDEX_op_extract_i64:
+    case INDEX_op_ctpop_i32:
+    case INDEX_op_ctpop_i64:
         return C_O1_I1(r, r);
 
     case INDEX_op_qemu_ld_i32:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 25/27] tcg/s390x: Tighten constraints for 64-bit compare
  2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
                   ` (23 preceding siblings ...)
  2022-12-09  2:05 ` [PATCH v4 24/27] tcg/s390x: Implement ctpop operation Richard Henderson
@ 2022-12-09  2:05 ` Richard Henderson
  2022-12-13 16:25   ` Ilya Leoshkevich
  2022-12-09  2:05 ` [PATCH v4 26/27] tcg/s390x: Cleanup tcg_out_movi Richard Henderson
                   ` (2 subsequent siblings)
  27 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2022-12-09  2:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, iii

Give 64-bit comparison second operand a signed 33-bit immediate.
This is the smallest superset of uint32_t and int32_t, as used
by CLGFI and CGFI respectively.  The rest of the 33-bit space
can be loaded into TCG_TMP0.  Drop use of the constant pool.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target-con-set.h |  3 +++
 tcg/s390x/tcg-target.c.inc     | 27 ++++++++++++++-------------
 2 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/tcg/s390x/tcg-target-con-set.h b/tcg/s390x/tcg-target-con-set.h
index baf3bc9037..15f1c55103 100644
--- a/tcg/s390x/tcg-target-con-set.h
+++ b/tcg/s390x/tcg-target-con-set.h
@@ -13,6 +13,7 @@ C_O0_I1(r)
 C_O0_I2(L, L)
 C_O0_I2(r, r)
 C_O0_I2(r, ri)
+C_O0_I2(r, rA)
 C_O0_I2(v, r)
 C_O1_I1(r, L)
 C_O1_I1(r, r)
@@ -24,6 +25,7 @@ C_O1_I2(r, 0, rI)
 C_O1_I2(r, 0, rJ)
 C_O1_I2(r, r, r)
 C_O1_I2(r, r, ri)
+C_O1_I2(r, r, rA)
 C_O1_I2(r, r, rI)
 C_O1_I2(r, r, rJ)
 C_O1_I2(r, r, rK)
@@ -35,6 +37,7 @@ C_O1_I2(v, v, r)
 C_O1_I2(v, v, v)
 C_O1_I3(v, v, v, v)
 C_O1_I4(r, r, ri, rI, r)
+C_O1_I4(r, r, rA, rI, r)
 C_O2_I2(o, m, 0, r)
 C_O2_I2(o, m, r, r)
 C_O2_I3(o, m, 0, 1, r)
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index c0434fa2f8..4d113139e5 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -1249,22 +1249,20 @@ static int tgen_cmp2(TCGContext *s, TCGType type, TCGCond c, TCGReg r1,
             tcg_out_insn_RIL(s, op, r1, c2);
             goto exit;
         }
+
+        /*
+         * Constraints are for a signed 33-bit operand, which is a
+         * convenient superset of this signed/unsigned test.
+         */
         if (c2 == (is_unsigned ? (TCGArg)(uint32_t)c2 : (TCGArg)(int32_t)c2)) {
             op = (is_unsigned ? RIL_CLGFI : RIL_CGFI);
             tcg_out_insn_RIL(s, op, r1, c2);
             goto exit;
         }
 
-        /* Use the constant pool, but not for small constants.  */
-        if (maybe_out_small_movi(s, type, TCG_TMP0, c2)) {
-            c2 = TCG_TMP0;
-            /* fall through to reg-reg */
-        } else {
-            op = (is_unsigned ? RIL_CLGRL : RIL_CGRL);
-            tcg_out_insn_RIL(s, op, r1, 0);
-            new_pool_label(s, c2, R_390_PC32DBL, s->code_ptr - 2, 2);
-            goto exit;
-        }
+        /* Load everything else into a register. */
+        tcg_out_movi(s, TCG_TYPE_I64, TCG_TMP0, c2);
+        c2 = TCG_TMP0;
     }
 
     if (type == TCG_TYPE_I32) {
@@ -3105,8 +3103,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_rotr_i32:
     case INDEX_op_rotr_i64:
     case INDEX_op_setcond_i32:
-    case INDEX_op_setcond_i64:
         return C_O1_I2(r, r, ri);
+    case INDEX_op_setcond_i64:
+        return C_O1_I2(r, r, rA);
 
     case INDEX_op_clz_i64:
         return C_O1_I2(r, r, rI);
@@ -3154,8 +3153,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
         return C_O1_I2(r, r, ri);
 
     case INDEX_op_brcond_i32:
-    case INDEX_op_brcond_i64:
         return C_O0_I2(r, ri);
+    case INDEX_op_brcond_i64:
+        return C_O0_I2(r, rA);
 
     case INDEX_op_bswap16_i32:
     case INDEX_op_bswap16_i64:
@@ -3196,8 +3196,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
         return C_O1_I2(r, rZ, r);
 
     case INDEX_op_movcond_i32:
-    case INDEX_op_movcond_i64:
         return C_O1_I4(r, r, ri, rI, r);
+    case INDEX_op_movcond_i64:
+        return C_O1_I4(r, r, rA, rI, r);
 
     case INDEX_op_div2_i32:
     case INDEX_op_div2_i64:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 26/27] tcg/s390x: Cleanup tcg_out_movi
  2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
                   ` (24 preceding siblings ...)
  2022-12-09  2:05 ` [PATCH v4 25/27] tcg/s390x: Tighten constraints for 64-bit compare Richard Henderson
@ 2022-12-09  2:05 ` Richard Henderson
  2022-12-13 16:29   ` Ilya Leoshkevich
  2022-12-09  2:05 ` [PATCH v4 27/27] tcg/s390x: Avoid the constant pool in tcg_out_movi Richard Henderson
  2022-12-13 16:35 ` [PATCH v4 00/27] tcg/s390x: misc patches Ilya Leoshkevich
  27 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2022-12-09  2:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, iii

Merge maybe_out_small_movi, as it no longer has additional users.
Use is_const_p{16,32}.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target.c.inc | 52 ++++++++++++--------------------------
 1 file changed, 16 insertions(+), 36 deletions(-)

diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index 4d113139e5..b72c43e4aa 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -874,14 +874,19 @@ static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg dst, TCGReg src)
     return true;
 }
 
-static const S390Opcode lli_insns[4] = {
+static const S390Opcode li_insns[4] = {
     RI_LLILL, RI_LLILH, RI_LLIHL, RI_LLIHH
 };
+static const S390Opcode lif_insns[2] = {
+    RIL_LLILF, RIL_LLIHF,
+};
 
-static bool maybe_out_small_movi(TCGContext *s, TCGType type,
-                                 TCGReg ret, tcg_target_long sval)
+/* load a register with an immediate value */
+static void tcg_out_movi(TCGContext *s, TCGType type,
+                         TCGReg ret, tcg_target_long sval)
 {
     tcg_target_ulong uval = sval;
+    ptrdiff_t pc_off;
     int i;
 
     if (type == TCG_TYPE_I32) {
@@ -892,36 +897,13 @@ static bool maybe_out_small_movi(TCGContext *s, TCGType type,
     /* Try all 32-bit insns that can load it in one go.  */
     if (sval >= -0x8000 && sval < 0x8000) {
         tcg_out_insn(s, RI, LGHI, ret, sval);
-        return true;
-    }
-
-    for (i = 0; i < 4; i++) {
-        tcg_target_long mask = 0xffffull << i * 16;
-        if ((uval & mask) == uval) {
-            tcg_out_insn_RI(s, lli_insns[i], ret, uval >> i * 16);
-            return true;
-        }
-    }
-
-    return false;
-}
-
-/* load a register with an immediate value */
-static void tcg_out_movi(TCGContext *s, TCGType type,
-                         TCGReg ret, tcg_target_long sval)
-{
-    tcg_target_ulong uval;
-    ptrdiff_t pc_off;
-
-    /* Try all 32-bit insns that can load it in one go.  */
-    if (maybe_out_small_movi(s, type, ret, sval)) {
         return;
     }
 
-    uval = sval;
-    if (type == TCG_TYPE_I32) {
-        uval = (uint32_t)sval;
-        sval = (int32_t)sval;
+    i = is_const_p16(uval);
+    if (i >= 0) {
+        tcg_out_insn_RI(s, li_insns[i], ret, uval >> (i * 16));
+        return;
     }
 
     /* Try all 48-bit insns that can load it in one go.  */
@@ -929,12 +911,10 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
         tcg_out_insn(s, RIL, LGFI, ret, sval);
         return;
     }
-    if (uval <= 0xffffffff) {
-        tcg_out_insn(s, RIL, LLILF, ret, uval);
-        return;
-    }
-    if ((uval & 0xffffffff) == 0) {
-        tcg_out_insn(s, RIL, LLIHF, ret, uval >> 32);
+
+    i = is_const_p32(uval);
+    if (i >= 0) {
+        tcg_out_insn_RIL(s, lif_insns[i], ret, uval >> (i * 32));
         return;
     }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 27/27] tcg/s390x: Avoid the constant pool in tcg_out_movi
  2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
                   ` (25 preceding siblings ...)
  2022-12-09  2:05 ` [PATCH v4 26/27] tcg/s390x: Cleanup tcg_out_movi Richard Henderson
@ 2022-12-09  2:05 ` Richard Henderson
  2022-12-13 16:31   ` Ilya Leoshkevich
  2022-12-13 16:35 ` [PATCH v4 00/27] tcg/s390x: misc patches Ilya Leoshkevich
  27 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2022-12-09  2:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, iii

Load constants in no more than two insns, which turns
out to be faster than using the constant pool.

Suggested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390x/tcg-target.c.inc | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index b72c43e4aa..2b38fd991d 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -877,6 +877,9 @@ static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg dst, TCGReg src)
 static const S390Opcode li_insns[4] = {
     RI_LLILL, RI_LLILH, RI_LLIHL, RI_LLIHH
 };
+static const S390Opcode oi_insns[4] = {
+    RI_OILL, RI_OILH, RI_OIHL, RI_OIHH
+};
 static const S390Opcode lif_insns[2] = {
     RIL_LLILF, RIL_LLIHF,
 };
@@ -928,9 +931,20 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
         return;
     }
 
-    /* Otherwise, stuff it in the constant pool.  */
-    tcg_out_insn(s, RIL, LGRL, ret, 0);
-    new_pool_label(s, sval, R_390_PC32DBL, s->code_ptr - 2, 2);
+    /* Otherwise, load it by parts. */
+    i = is_const_p16((uint32_t)uval);
+    if (i >= 0) {
+        tcg_out_insn_RI(s, li_insns[i], ret, uval >> (i * 16));
+    } else {
+        tcg_out_insn(s, RIL, LLILF, ret, uval);
+    }
+    uval >>= 32;
+    i = is_const_p16(uval);
+    if (i >= 0) {
+        tcg_out_insn_RI(s, oi_insns[i + 2], ret, uval >> (i * 16));
+    } else {
+        tcg_out_insn(s, RIL, OIHF, ret, uval);
+    }
 }
 
 /* Emit a load/store type instruction.  Inputs are:
@@ -1160,9 +1174,6 @@ static void tgen_andi(TCGContext *s, TCGType type, TCGReg dest, uint64_t val)
 
 static void tgen_ori(TCGContext *s, TCGReg dest, uint64_t val)
 {
-    static const S390Opcode oi_insns[4] = {
-        RI_OILL, RI_OILH, RI_OIHL, RI_OIHH
-    };
     static const S390Opcode oif_insns[2] = {
         RIL_OILF, RIL_OIHF
     };
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 03/27] tcg/s390x: Always set TCG_TARGET_HAS_direct_jump
  2022-12-09  2:05 ` [PATCH v4 03/27] tcg/s390x: Always set TCG_TARGET_HAS_direct_jump Richard Henderson
@ 2022-12-12 21:51   ` Ilya Leoshkevich
  0 siblings, 0 replies; 45+ messages in thread
From: Ilya Leoshkevich @ 2022-12-12 21:51 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: thuth

On Thu, Dec 08, 2022 at 08:05:06PM -0600, Richard Henderson wrote:
> Since USE_REG_TB is removed, there is no need to load the
> target TB address into a register.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/s390x/tcg-target.h     |  2 +-
>  tcg/s390x/tcg-target.c.inc | 48 +++++++-------------------------------
>  2 files changed, 10 insertions(+), 40 deletions(-)

Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 04/27] tcg/s390x: Remove USE_LONG_BRANCHES
  2022-12-09  2:05 ` [PATCH v4 04/27] tcg/s390x: Remove USE_LONG_BRANCHES Richard Henderson
@ 2022-12-12 21:52   ` Ilya Leoshkevich
  0 siblings, 0 replies; 45+ messages in thread
From: Ilya Leoshkevich @ 2022-12-12 21:52 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: thuth

On Thu, Dec 08, 2022 at 08:05:07PM -0600, Richard Henderson wrote:
> The size of a compiled TB is limited by the uint16_t used by
> gen_insn_end_off[] -- there is no need for a 32-bit branch.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/s390x/tcg-target.c.inc | 9 ---------
>  1 file changed, 9 deletions(-)

Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 05/27] tcg/s390x: Check for long-displacement facility at startup
  2022-12-09  2:05 ` [PATCH v4 05/27] tcg/s390x: Check for long-displacement facility at startup Richard Henderson
@ 2022-12-12 21:54   ` Ilya Leoshkevich
  0 siblings, 0 replies; 45+ messages in thread
From: Ilya Leoshkevich @ 2022-12-12 21:54 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: thuth

On Thu, Dec 08, 2022 at 08:05:08PM -0600, Richard Henderson wrote:
> We are already assuming the existance of long-displacement, but were
> not being explicit about it.  This has been present since z990.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/s390x/tcg-target.h     |  6 ++++--
>  tcg/s390x/tcg-target.c.inc | 15 +++++++++++++++
>  2 files changed, 19 insertions(+), 2 deletions(-)

Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 09/27] tcg/s390x: Remove FAST_BCR_SER facility check
  2022-12-09  2:05 ` [PATCH v4 09/27] tcg/s390x: Remove FAST_BCR_SER facility check Richard Henderson
@ 2022-12-12 22:08   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 45+ messages in thread
From: Philippe Mathieu-Daudé @ 2022-12-12 22:08 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: thuth, iii

On 9/12/22 03:05, Richard Henderson wrote:
> The fast-bcr-serialization facility is bundled into facility 45,
> along with load-on-condition.  We are checking this at startup.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   tcg/s390x/tcg-target.h     | 1 -
>   tcg/s390x/tcg-target.c.inc | 3 ++-
>   2 files changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 06/27] tcg/s390x: Check for extended-immediate facility at startup
  2022-12-09  2:05 ` [PATCH v4 06/27] tcg/s390x: Check for extended-immediate " Richard Henderson
@ 2022-12-12 22:17   ` Ilya Leoshkevich
  0 siblings, 0 replies; 45+ messages in thread
From: Ilya Leoshkevich @ 2022-12-12 22:17 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: thuth

On Thu, Dec 08, 2022 at 08:05:09PM -0600, Richard Henderson wrote:
> The extended-immediate facility was introduced in z9-109,
> which itself was end-of-life in 2017.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/s390x/tcg-target.h     |   4 +-
>  tcg/s390x/tcg-target.c.inc | 231 +++++++++++--------------------------
>  2 files changed, 72 insertions(+), 163 deletions(-)

Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 07/27] tcg/s390x: Check for general-instruction-extension facility at startup
  2022-12-09  2:05 ` [PATCH v4 07/27] tcg/s390x: Check for general-instruction-extension " Richard Henderson
@ 2022-12-12 22:21   ` Ilya Leoshkevich
  0 siblings, 0 replies; 45+ messages in thread
From: Ilya Leoshkevich @ 2022-12-12 22:21 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: thuth

On Thu, Dec 08, 2022 at 08:05:10PM -0600, Richard Henderson wrote:
> The general-instruction-extension facility was introduced in z10,
> which itself was end-of-life in 2019.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/s390x/tcg-target.h     |  10 ++--
>  tcg/s390x/tcg-target.c.inc | 100 ++++++++++++++++---------------------
>  2 files changed, 49 insertions(+), 61 deletions(-)

Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 08/27] tcg/s390x: Check for load-on-condition facility at startup
  2022-12-09  2:05 ` [PATCH v4 08/27] tcg/s390x: Check for load-on-condition " Richard Henderson
@ 2022-12-12 22:26   ` Ilya Leoshkevich
  0 siblings, 0 replies; 45+ messages in thread
From: Ilya Leoshkevich @ 2022-12-12 22:26 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: thuth

On Thu, Dec 08, 2022 at 08:05:11PM -0600, Richard Henderson wrote:
> The general-instruction-extension facility was introduced in z196,
> which itself was end-of-life in 2021.  In addition, z196 is the
> minimum CPU supported by our set of supported operating systems:
> RHEL 7 (z196), SLES 12 (z196) and Ubuntu 16.04 (zEC12).
> 
> Check for facility number 45, which will be the consilidated check
> for several facilities.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 10/27] tcg/s390x: Remove DISTINCT_OPERANDS facility check
  2022-12-09  2:05 ` [PATCH v4 10/27] tcg/s390x: Remove DISTINCT_OPERANDS " Richard Henderson
@ 2022-12-12 22:29   ` Ilya Leoshkevich
  0 siblings, 0 replies; 45+ messages in thread
From: Ilya Leoshkevich @ 2022-12-12 22:29 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: thuth

On Thu, Dec 08, 2022 at 08:05:13PM -0600, Richard Henderson wrote:
> The distinct-operands facility is bundled into facility 45,
> along with load-on-condition.  We are checking this at startup.
> Remove the a0 == a1 checks for 64-bit sub, and, or, xor, as there
> is no space savings for avoiding the distinct-operands insn.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 16/27] tcg/s390x: Issue XILF directly for xor_i32
  2022-12-09  2:05 ` [PATCH v4 16/27] tcg/s390x: Issue XILF directly for xor_i32 Richard Henderson
@ 2022-12-12 22:30   ` Ilya Leoshkevich
  0 siblings, 0 replies; 45+ messages in thread
From: Ilya Leoshkevich @ 2022-12-12 22:30 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: thuth

On Thu, Dec 08, 2022 at 08:05:19PM -0600, Richard Henderson wrote:
> There is only one instruction that is applicable
> to a 32-bit immediate xor.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 17/27] tcg/s390x: Tighten constraints for or_i64 and xor_i64
  2022-12-09  2:05 ` [PATCH v4 17/27] tcg/s390x: Tighten constraints for or_i64 and xor_i64 Richard Henderson
@ 2022-12-12 22:41   ` Ilya Leoshkevich
  0 siblings, 0 replies; 45+ messages in thread
From: Ilya Leoshkevich @ 2022-12-12 22:41 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: thuth

On Thu, Dec 08, 2022 at 08:05:20PM -0600, Richard Henderson wrote:
> Drop support for sequential OR and XOR, as the serial dependency is
> slower than loading the constant first.  Let the register allocator
> handle such immediates by matching only what one insn can achieve.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 18/27] tcg/s390x: Tighten constraints for and_i64
  2022-12-09  2:05 ` [PATCH v4 18/27] tcg/s390x: Tighten constraints for and_i64 Richard Henderson
@ 2022-12-12 22:57   ` Ilya Leoshkevich
  0 siblings, 0 replies; 45+ messages in thread
From: Ilya Leoshkevich @ 2022-12-12 22:57 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: thuth

On Thu, Dec 08, 2022 at 08:05:21PM -0600, Richard Henderson wrote:
> Let the register allocator handle such immediates by matching
> only what one insn can achieve.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 25/27] tcg/s390x: Tighten constraints for 64-bit compare
  2022-12-09  2:05 ` [PATCH v4 25/27] tcg/s390x: Tighten constraints for 64-bit compare Richard Henderson
@ 2022-12-13 16:25   ` Ilya Leoshkevich
  2022-12-13 16:43     ` Richard Henderson
  0 siblings, 1 reply; 45+ messages in thread
From: Ilya Leoshkevich @ 2022-12-13 16:25 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: thuth

On Thu, Dec 08, 2022 at 08:05:28PM -0600, Richard Henderson wrote:
> Give 64-bit comparison second operand a signed 33-bit immediate.
> This is the smallest superset of uint32_t and int32_t, as used
> by CLGFI and CGFI respectively.  The rest of the 33-bit space
> can be loaded into TCG_TMP0.  Drop use of the constant pool.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/s390x/tcg-target-con-set.h |  3 +++
>  tcg/s390x/tcg-target.c.inc     | 27 ++++++++++++++-------------
>  2 files changed, 17 insertions(+), 13 deletions(-)

<...>
 
> --- a/tcg/s390x/tcg-target.c.inc
> +++ b/tcg/s390x/tcg-target.c.inc
> @@ -1249,22 +1249,20 @@ static int tgen_cmp2(TCGContext *s, TCGType type, TCGCond c, TCGReg r1,
>              tcg_out_insn_RIL(s, op, r1, c2);
>              goto exit;
>          }
> +
> +        /*
> +         * Constraints are for a signed 33-bit operand, which is a
> +         * convenient superset of this signed/unsigned test.
> +         */
>          if (c2 == (is_unsigned ? (TCGArg)(uint32_t)c2 : (TCGArg)(int32_t)c2)) {
>              op = (is_unsigned ? RIL_CLGFI : RIL_CGFI);
>              tcg_out_insn_RIL(s, op, r1, c2);
>              goto exit;
>          }
>  
> -        /* Use the constant pool, but not for small constants.  */
> -        if (maybe_out_small_movi(s, type, TCG_TMP0, c2)) {
> -            c2 = TCG_TMP0;
> -            /* fall through to reg-reg */
> -        } else {
> -            op = (is_unsigned ? RIL_CLGRL : RIL_CGRL);
> -            tcg_out_insn_RIL(s, op, r1, 0);
> -            new_pool_label(s, c2, R_390_PC32DBL, s->code_ptr - 2, 2);
> -            goto exit;
> -        }
> +        /* Load everything else into a register. */
> +        tcg_out_movi(s, TCG_TYPE_I64, TCG_TMP0, c2);
> +        c2 = TCG_TMP0;

What does tightening the constraint give us, if we have to handle the
"everything else" case anyway, even for values that match
TCG_CT_CONST_S33?

The example that I have in mind is:

- Comparison: r0_64 s<= -0xffffffffL;
- tcg_target_const_match(-0xffffffffL, TCG_TYPE_I64,
                         TCG_CT_CONST_S33) == true;
- (long)(int)-0xffffffffL != -0xffffffff;
- So we should end up in the "everything else" branch.

<...>

Best regards,
Ilya


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 26/27] tcg/s390x: Cleanup tcg_out_movi
  2022-12-09  2:05 ` [PATCH v4 26/27] tcg/s390x: Cleanup tcg_out_movi Richard Henderson
@ 2022-12-13 16:29   ` Ilya Leoshkevich
  0 siblings, 0 replies; 45+ messages in thread
From: Ilya Leoshkevich @ 2022-12-13 16:29 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: thuth

On Thu, Dec 08, 2022 at 08:05:29PM -0600, Richard Henderson wrote:
> Merge maybe_out_small_movi, as it no longer has additional users.
> Use is_const_p{16,32}.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/s390x/tcg-target.c.inc | 52 ++++++++++++--------------------------
>  1 file changed, 16 insertions(+), 36 deletions(-)

Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 27/27] tcg/s390x: Avoid the constant pool in tcg_out_movi
  2022-12-09  2:05 ` [PATCH v4 27/27] tcg/s390x: Avoid the constant pool in tcg_out_movi Richard Henderson
@ 2022-12-13 16:31   ` Ilya Leoshkevich
  0 siblings, 0 replies; 45+ messages in thread
From: Ilya Leoshkevich @ 2022-12-13 16:31 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: thuth

On Thu, Dec 08, 2022 at 08:05:30PM -0600, Richard Henderson wrote:
> Load constants in no more than two insns, which turns
> out to be faster than using the constant pool.
> 
> Suggested-by: Ilya Leoshkevich <iii@linux.ibm.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/s390x/tcg-target.c.inc | 23 +++++++++++++++++------
>  1 file changed, 17 insertions(+), 6 deletions(-)

Thanks!

Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 00/27] tcg/s390x: misc patches
  2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
                   ` (26 preceding siblings ...)
  2022-12-09  2:05 ` [PATCH v4 27/27] tcg/s390x: Avoid the constant pool in tcg_out_movi Richard Henderson
@ 2022-12-13 16:35 ` Ilya Leoshkevich
  27 siblings, 0 replies; 45+ messages in thread
From: Ilya Leoshkevich @ 2022-12-13 16:35 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: thuth

On Thu, Dec 08, 2022 at 08:05:03PM -0600, Richard Henderson wrote:
> Based-on: 20221202053958.223890-1-richard.henderson@linaro.org
> ("[PATCH for-8.0 v3 00/34] tcg misc patches")
> 
> Changes from v3:
>   * Require z196 as minimum cpu -- 6 new patches removing checks.
>   * Tighten constraints on AND, OR, XOR, CMP, trying get the register
>     allocator to hoist things that can't be done in a single insn.
>   * Avoid the constant pool for movi.
> 
> I believe that I have addressed all of the discussion in v3,
> except perhaps for goto_tb concurrent modifications to jumps.
> I'm still not quite sure what to do about that.

I asked around, and apparently some other JITs (e.g. Java and .NET) are
doing atomic branch offset patching (provided the offset is aligned,
which QEMU does already ensure) on s390x for a long time now, and
without known issues. So I'm okay with keeping this code as is.

<...>

Best regards,
Ilya


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 25/27] tcg/s390x: Tighten constraints for 64-bit compare
  2022-12-13 16:25   ` Ilya Leoshkevich
@ 2022-12-13 16:43     ` Richard Henderson
  2022-12-13 17:01       ` Ilya Leoshkevich
  0 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2022-12-13 16:43 UTC (permalink / raw)
  To: Ilya Leoshkevich, qemu-devel; +Cc: thuth

On 12/13/22 10:25, Ilya Leoshkevich wrote:
> On Thu, Dec 08, 2022 at 08:05:28PM -0600, Richard Henderson wrote:
>> Give 64-bit comparison second operand a signed 33-bit immediate.
>> This is the smallest superset of uint32_t and int32_t, as used
>> by CLGFI and CGFI respectively.  The rest of the 33-bit space
>> can be loaded into TCG_TMP0.  Drop use of the constant pool.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   tcg/s390x/tcg-target-con-set.h |  3 +++
>>   tcg/s390x/tcg-target.c.inc     | 27 ++++++++++++++-------------
>>   2 files changed, 17 insertions(+), 13 deletions(-)
> 
> <...>
>   
>> --- a/tcg/s390x/tcg-target.c.inc
>> +++ b/tcg/s390x/tcg-target.c.inc
>> @@ -1249,22 +1249,20 @@ static int tgen_cmp2(TCGContext *s, TCGType type, TCGCond c, TCGReg r1,
>>               tcg_out_insn_RIL(s, op, r1, c2);
>>               goto exit;
>>           }
>> +
>> +        /*
>> +         * Constraints are for a signed 33-bit operand, which is a
>> +         * convenient superset of this signed/unsigned test.
>> +         */
>>           if (c2 == (is_unsigned ? (TCGArg)(uint32_t)c2 : (TCGArg)(int32_t)c2)) {
>>               op = (is_unsigned ? RIL_CLGFI : RIL_CGFI);
>>               tcg_out_insn_RIL(s, op, r1, c2);
>>               goto exit;
>>           }
>>   
>> -        /* Use the constant pool, but not for small constants.  */
>> -        if (maybe_out_small_movi(s, type, TCG_TMP0, c2)) {
>> -            c2 = TCG_TMP0;
>> -            /* fall through to reg-reg */
>> -        } else {
>> -            op = (is_unsigned ? RIL_CLGRL : RIL_CGRL);
>> -            tcg_out_insn_RIL(s, op, r1, 0);
>> -            new_pool_label(s, c2, R_390_PC32DBL, s->code_ptr - 2, 2);
>> -            goto exit;
>> -        }
>> +        /* Load everything else into a register. */
>> +        tcg_out_movi(s, TCG_TYPE_I64, TCG_TMP0, c2);
>> +        c2 = TCG_TMP0;
> 
> What does tightening the constraint give us, if we have to handle the
> "everything else" case anyway, even for values that match
> TCG_CT_CONST_S33?

Values outside const_s33 get loaded by the register allocator, which means the value in 
the register might get re-used.

> The example that I have in mind is:
> 
> - Comparison: r0_64 s<= -0xffffffffL;
> - tcg_target_const_match(-0xffffffffL, TCG_TYPE_I64,
>                           TCG_CT_CONST_S33) == true;
> - (long)(int)-0xffffffffL != -0xffffffff;
> - So we should end up in the "everything else" branch.

I suppose I could invent a new constraint that matches INT_MIN32 .. UINT32_MAX, which 
would exclude this particular case.  But it would still leave us loading INT32MIN .. -1 
for unsigned and INT32MAX+1 .. UINT32_MAX for signed.

Since S33 existed already, I thought I would just re-use it.


r~


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 25/27] tcg/s390x: Tighten constraints for 64-bit compare
  2022-12-13 16:43     ` Richard Henderson
@ 2022-12-13 17:01       ` Ilya Leoshkevich
  0 siblings, 0 replies; 45+ messages in thread
From: Ilya Leoshkevich @ 2022-12-13 17:01 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: thuth

On Tue, Dec 13, 2022 at 10:43:07AM -0600, Richard Henderson wrote:
> On 12/13/22 10:25, Ilya Leoshkevich wrote:
> > On Thu, Dec 08, 2022 at 08:05:28PM -0600, Richard Henderson wrote:
> > > Give 64-bit comparison second operand a signed 33-bit immediate.
> > > This is the smallest superset of uint32_t and int32_t, as used
> > > by CLGFI and CGFI respectively.  The rest of the 33-bit space
> > > can be loaded into TCG_TMP0.  Drop use of the constant pool.
> > > 
> > > Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> > > ---
> > >   tcg/s390x/tcg-target-con-set.h |  3 +++
> > >   tcg/s390x/tcg-target.c.inc     | 27 ++++++++++++++-------------
> > >   2 files changed, 17 insertions(+), 13 deletions(-)
> > 
> > <...>
> > > --- a/tcg/s390x/tcg-target.c.inc
> > > +++ b/tcg/s390x/tcg-target.c.inc
> > > @@ -1249,22 +1249,20 @@ static int tgen_cmp2(TCGContext *s, TCGType type, TCGCond c, TCGReg r1,
> > >               tcg_out_insn_RIL(s, op, r1, c2);
> > >               goto exit;
> > >           }
> > > +
> > > +        /*
> > > +         * Constraints are for a signed 33-bit operand, which is a
> > > +         * convenient superset of this signed/unsigned test.
> > > +         */
> > >           if (c2 == (is_unsigned ? (TCGArg)(uint32_t)c2 : (TCGArg)(int32_t)c2)) {
> > >               op = (is_unsigned ? RIL_CLGFI : RIL_CGFI);
> > >               tcg_out_insn_RIL(s, op, r1, c2);
> > >               goto exit;
> > >           }
> > > -        /* Use the constant pool, but not for small constants.  */
> > > -        if (maybe_out_small_movi(s, type, TCG_TMP0, c2)) {
> > > -            c2 = TCG_TMP0;
> > > -            /* fall through to reg-reg */
> > > -        } else {
> > > -            op = (is_unsigned ? RIL_CLGRL : RIL_CGRL);
> > > -            tcg_out_insn_RIL(s, op, r1, 0);
> > > -            new_pool_label(s, c2, R_390_PC32DBL, s->code_ptr - 2, 2);
> > > -            goto exit;
> > > -        }
> > > +        /* Load everything else into a register. */
> > > +        tcg_out_movi(s, TCG_TYPE_I64, TCG_TMP0, c2);
> > > +        c2 = TCG_TMP0;
> > 
> > What does tightening the constraint give us, if we have to handle the
> > "everything else" case anyway, even for values that match
> > TCG_CT_CONST_S33?
> 
> Values outside const_s33 get loaded by the register allocator, which means
> the value in the register might get re-used.

Thanks for the explanation!
I did not consider the reuse of already loaded large 64-bit values.

Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>


^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2022-12-13 17:02 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-09  2:05 [PATCH v4 00/27] tcg/s390x: misc patches Richard Henderson
2022-12-09  2:05 ` [PATCH v4 01/27] tcg/s390x: Use register pair allocation for div and mulu2 Richard Henderson
2022-12-09  2:05 ` [PATCH v4 02/27] tcg/s390x: Remove TCG_REG_TB Richard Henderson
2022-12-09  2:05 ` [PATCH v4 03/27] tcg/s390x: Always set TCG_TARGET_HAS_direct_jump Richard Henderson
2022-12-12 21:51   ` Ilya Leoshkevich
2022-12-09  2:05 ` [PATCH v4 04/27] tcg/s390x: Remove USE_LONG_BRANCHES Richard Henderson
2022-12-12 21:52   ` Ilya Leoshkevich
2022-12-09  2:05 ` [PATCH v4 05/27] tcg/s390x: Check for long-displacement facility at startup Richard Henderson
2022-12-12 21:54   ` Ilya Leoshkevich
2022-12-09  2:05 ` [PATCH v4 06/27] tcg/s390x: Check for extended-immediate " Richard Henderson
2022-12-12 22:17   ` Ilya Leoshkevich
2022-12-09  2:05 ` [PATCH v4 07/27] tcg/s390x: Check for general-instruction-extension " Richard Henderson
2022-12-12 22:21   ` Ilya Leoshkevich
2022-12-09  2:05 ` [PATCH v4 08/27] tcg/s390x: Check for load-on-condition " Richard Henderson
2022-12-12 22:26   ` Ilya Leoshkevich
2022-12-09  2:05 ` [PATCH v4 09/27] tcg/s390x: Remove FAST_BCR_SER facility check Richard Henderson
2022-12-12 22:08   ` Philippe Mathieu-Daudé
2022-12-09  2:05 ` [PATCH v4 10/27] tcg/s390x: Remove DISTINCT_OPERANDS " Richard Henderson
2022-12-12 22:29   ` Ilya Leoshkevich
2022-12-09  2:05 ` [PATCH v4 11/27] tcg/s390x: Use LARL+AGHI for odd addresses Richard Henderson
2022-12-09  2:05 ` [PATCH v4 12/27] tcg/s390x: Distinguish RRF-a and RRF-c formats Richard Henderson
2022-12-09  2:05 ` [PATCH v4 13/27] tcg/s390x: Distinguish RIE formats Richard Henderson
2022-12-09  2:05 ` [PATCH v4 14/27] tcg/s390x: Support MIE2 multiply single instructions Richard Henderson
2022-12-09  2:05 ` [PATCH v4 15/27] tcg/s390x: Support MIE2 MGRK instruction Richard Henderson
2022-12-09  2:05 ` [PATCH v4 16/27] tcg/s390x: Issue XILF directly for xor_i32 Richard Henderson
2022-12-12 22:30   ` Ilya Leoshkevich
2022-12-09  2:05 ` [PATCH v4 17/27] tcg/s390x: Tighten constraints for or_i64 and xor_i64 Richard Henderson
2022-12-12 22:41   ` Ilya Leoshkevich
2022-12-09  2:05 ` [PATCH v4 18/27] tcg/s390x: Tighten constraints for and_i64 Richard Henderson
2022-12-12 22:57   ` Ilya Leoshkevich
2022-12-09  2:05 ` [PATCH v4 19/27] tcg/s390x: Support MIE3 logical operations Richard Henderson
2022-12-09  2:05 ` [PATCH v4 20/27] tcg/s390x: Create tgen_cmp2 to simplify movcond Richard Henderson
2022-12-09  2:05 ` [PATCH v4 21/27] tcg/s390x: Generalize movcond implementation Richard Henderson
2022-12-09  2:05 ` [PATCH v4 22/27] tcg/s390x: Support SELGR instruction in movcond Richard Henderson
2022-12-09  2:05 ` [PATCH v4 23/27] tcg/s390x: Use tgen_movcond_int in tgen_clz Richard Henderson
2022-12-09  2:05 ` [PATCH v4 24/27] tcg/s390x: Implement ctpop operation Richard Henderson
2022-12-09  2:05 ` [PATCH v4 25/27] tcg/s390x: Tighten constraints for 64-bit compare Richard Henderson
2022-12-13 16:25   ` Ilya Leoshkevich
2022-12-13 16:43     ` Richard Henderson
2022-12-13 17:01       ` Ilya Leoshkevich
2022-12-09  2:05 ` [PATCH v4 26/27] tcg/s390x: Cleanup tcg_out_movi Richard Henderson
2022-12-13 16:29   ` Ilya Leoshkevich
2022-12-09  2:05 ` [PATCH v4 27/27] tcg/s390x: Avoid the constant pool in tcg_out_movi Richard Henderson
2022-12-13 16:31   ` Ilya Leoshkevich
2022-12-13 16:35 ` [PATCH v4 00/27] tcg/s390x: misc patches Ilya Leoshkevich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).