qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] tcg: Misc improvements
@ 2024-04-24 17:09 Richard Henderson
  2024-04-24 17:09 ` [PATCH] target/arm: Restrict translation disabled alignment check to VMSA Richard Henderson
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Richard Henderson @ 2024-04-24 17:09 UTC (permalink / raw)
  To: qemu-devel

One patch to allow two output operands from gvec expansion,
to be used by target/arm for updating QC.

One patch to record the result of the generic breakpoint
search so target translators do not need to repeat it.

Three small optimization patches.


r~


Richard Henderson (5):
  tcg: Add write_aofs to GVecGen3i
  tcg/i386: Simplify immediate 8-bit logical vector shifts
  tcg/i386: Optimize setcond of TST{EQ,NE} with 0xffffffff
  tcg/optimize: Optimize setcond with zmask
  accel/tcg: Introduce CF_BP_PAGE

 include/exec/translation-block.h |   1 +
 include/tcg/tcg-op-gvec-common.h |   2 +
 accel/tcg/cpu-exec.c             |   2 +-
 tcg/optimize.c                   | 110 +++++++++++++++++++++++++++++++
 tcg/tcg-op-gvec.c                |  30 ++++++---
 tcg/i386/tcg-target.c.inc        |  78 ++++++++--------------
 6 files changed, 165 insertions(+), 58 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] target/arm: Restrict translation disabled alignment check to VMSA
  2024-04-24 17:09 [PATCH 0/5] tcg: Misc improvements Richard Henderson
@ 2024-04-24 17:09 ` Richard Henderson
  2024-05-03 14:58   ` Philippe Mathieu-Daudé
  2024-04-24 17:09 ` [PATCH 1/5] tcg: Add write_aofs to GVecGen3i Richard Henderson
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Richard Henderson @ 2024-04-24 17:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Clément Chigot, qemu-stable

For cpus using PMSA, when the MPU is disabled, the default memory
type is Normal, Non-cachable.

Fixes: 59754f85ed3 ("target/arm: Do memory type alignment check when translation disabled")
Reported-by: Clément Chigot <chigot@adacore.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---

Since v9 will likely be tagged tomorrow without this fixed,
Cc: qemu-stable@nongnu.org

---
 target/arm/tcg/hflags.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c
index 5da1b0fc1d..66de30b828 100644
--- a/target/arm/tcg/hflags.c
+++ b/target/arm/tcg/hflags.c
@@ -38,8 +38,16 @@ static bool aprofile_require_alignment(CPUARMState *env, int el, uint64_t sctlr)
     }
 
     /*
-     * If translation is disabled, then the default memory type is
-     * Device(-nGnRnE) instead of Normal, which requires that alignment
+     * With PMSA, when the MPU is disabled, all memory types in the
+     * default map is Normal.
+     */
+    if (arm_feature(env, ARM_FEATURE_PMSA)) {
+        return false;
+    }
+
+    /*
+     * With VMSA, if translation is disabled, then the default memory type
+     * is Device(-nGnRnE) instead of Normal, which requires that alignment
      * be enforced.  Since this affects all ram, it is most efficient
      * to handle this during translation.
      */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 1/5] tcg: Add write_aofs to GVecGen3i
  2024-04-24 17:09 [PATCH 0/5] tcg: Misc improvements Richard Henderson
  2024-04-24 17:09 ` [PATCH] target/arm: Restrict translation disabled alignment check to VMSA Richard Henderson
@ 2024-04-24 17:09 ` Richard Henderson
  2024-05-03 15:01   ` Philippe Mathieu-Daudé
  2024-04-24 17:09 ` [PATCH 2/5] tcg/i386: Simplify immediate 8-bit logical vector shifts Richard Henderson
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Richard Henderson @ 2024-04-24 17:09 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg-op-gvec-common.h |  2 ++
 tcg/tcg-op-gvec.c                | 30 ++++++++++++++++++++++--------
 2 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/include/tcg/tcg-op-gvec-common.h b/include/tcg/tcg-op-gvec-common.h
index 4db8a58c14..65553f5f97 100644
--- a/include/tcg/tcg-op-gvec-common.h
+++ b/include/tcg/tcg-op-gvec-common.h
@@ -183,6 +183,8 @@ typedef struct {
     bool prefer_i64;
     /* Load dest as a 3rd source operand.  */
     bool load_dest;
+    /* Write aofs as a 2nd dest operand.  */
+    bool write_aofs;
 } GVecGen3i;
 
 typedef struct {
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index bb88943f79..0308732d9b 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -785,7 +785,8 @@ static void expand_3_i32(uint32_t dofs, uint32_t aofs,
 }
 
 static void expand_3i_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs,
-                          uint32_t oprsz, int32_t c, bool load_dest,
+                          uint32_t oprsz, int32_t c,
+                          bool load_dest, bool write_aofs,
                           void (*fni)(TCGv_i32, TCGv_i32, TCGv_i32, int32_t))
 {
     TCGv_i32 t0 = tcg_temp_new_i32();
@@ -801,6 +802,9 @@ static void expand_3i_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs,
         }
         fni(t2, t0, t1, c);
         tcg_gen_st_i32(t2, tcg_env, dofs + i);
+        if (write_aofs) {
+            tcg_gen_st_i32(t0, tcg_env, aofs + i);
+        }
     }
     tcg_temp_free_i32(t0);
     tcg_temp_free_i32(t1);
@@ -944,7 +948,8 @@ static void expand_3_i64(uint32_t dofs, uint32_t aofs,
 }
 
 static void expand_3i_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs,
-                          uint32_t oprsz, int64_t c, bool load_dest,
+                          uint32_t oprsz, int64_t c,
+                          bool load_dest, bool write_aofs,
                           void (*fni)(TCGv_i64, TCGv_i64, TCGv_i64, int64_t))
 {
     TCGv_i64 t0 = tcg_temp_new_i64();
@@ -960,6 +965,9 @@ static void expand_3i_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs,
         }
         fni(t2, t0, t1, c);
         tcg_gen_st_i64(t2, tcg_env, dofs + i);
+        if (write_aofs) {
+            tcg_gen_st_i64(t0, tcg_env, aofs + i);
+        }
     }
     tcg_temp_free_i64(t0);
     tcg_temp_free_i64(t1);
@@ -1102,7 +1110,8 @@ static void expand_3_vec(unsigned vece, uint32_t dofs, uint32_t aofs,
  */
 static void expand_3i_vec(unsigned vece, uint32_t dofs, uint32_t aofs,
                           uint32_t bofs, uint32_t oprsz, uint32_t tysz,
-                          TCGType type, int64_t c, bool load_dest,
+                          TCGType type, int64_t c,
+                          bool load_dest, bool write_aofs,
                           void (*fni)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec,
                                       int64_t))
 {
@@ -1118,6 +1127,9 @@ static void expand_3i_vec(unsigned vece, uint32_t dofs, uint32_t aofs,
         }
         fni(vece, t2, t0, t1, c);
         tcg_gen_st_vec(t2, tcg_env, dofs + i);
+        if (write_aofs) {
+            tcg_gen_st_vec(t0, tcg_env, aofs + i);
+        }
     }
 }
 
@@ -1471,7 +1483,7 @@ void tcg_gen_gvec_3i(uint32_t dofs, uint32_t aofs, uint32_t bofs,
          */
         some = QEMU_ALIGN_DOWN(oprsz, 32);
         expand_3i_vec(g->vece, dofs, aofs, bofs, some, 32, TCG_TYPE_V256,
-                      c, g->load_dest, g->fniv);
+                      c, g->load_dest, g->write_aofs, g->fniv);
         if (some == oprsz) {
             break;
         }
@@ -1483,18 +1495,20 @@ void tcg_gen_gvec_3i(uint32_t dofs, uint32_t aofs, uint32_t bofs,
         /* fallthru */
     case TCG_TYPE_V128:
         expand_3i_vec(g->vece, dofs, aofs, bofs, oprsz, 16, TCG_TYPE_V128,
-                      c, g->load_dest, g->fniv);
+                      c, g->load_dest, g->write_aofs, g->fniv);
         break;
     case TCG_TYPE_V64:
         expand_3i_vec(g->vece, dofs, aofs, bofs, oprsz, 8, TCG_TYPE_V64,
-                      c, g->load_dest, g->fniv);
+                      c, g->load_dest, g->write_aofs, g->fniv);
         break;
 
     case 0:
         if (g->fni8 && check_size_impl(oprsz, 8)) {
-            expand_3i_i64(dofs, aofs, bofs, oprsz, c, g->load_dest, g->fni8);
+            expand_3i_i64(dofs, aofs, bofs, oprsz, c,
+                          g->load_dest, g->write_aofs, g->fni8);
         } else if (g->fni4 && check_size_impl(oprsz, 4)) {
-            expand_3i_i32(dofs, aofs, bofs, oprsz, c, g->load_dest, g->fni4);
+            expand_3i_i32(dofs, aofs, bofs, oprsz, c,
+                          g->load_dest, g->write_aofs, g->fni4);
         } else {
             assert(g->fno != NULL);
             tcg_gen_gvec_3_ool(dofs, aofs, bofs, oprsz, maxsz, c, g->fno);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/5] tcg/i386: Simplify immediate 8-bit logical vector shifts
  2024-04-24 17:09 [PATCH 0/5] tcg: Misc improvements Richard Henderson
  2024-04-24 17:09 ` [PATCH] target/arm: Restrict translation disabled alignment check to VMSA Richard Henderson
  2024-04-24 17:09 ` [PATCH 1/5] tcg: Add write_aofs to GVecGen3i Richard Henderson
@ 2024-04-24 17:09 ` Richard Henderson
  2024-04-24 17:09 ` [PATCH 3/5] tcg/i386: Optimize setcond of TST{EQ,NE} with 0xffffffff Richard Henderson
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Richard Henderson @ 2024-04-24 17:09 UTC (permalink / raw)
  To: qemu-devel

The x86 isa does not have this operation, so we need an expansion.
Use the same algorithm that we use for expanding this vector
operation with integers: perform the shift with a wider type
and then mask the bits that must be zero.

This reduces the instruction count from 5 to 2.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.c.inc | 61 +++++++++------------------------------
 1 file changed, 14 insertions(+), 47 deletions(-)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index c6ba498623..6837c519b0 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -3769,49 +3769,20 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     }
 }
 
-static void expand_vec_shi(TCGType type, unsigned vece, TCGOpcode opc,
+static void expand_vec_shi(TCGType type, unsigned vece, bool right,
                            TCGv_vec v0, TCGv_vec v1, TCGArg imm)
 {
-    TCGv_vec t1, t2;
+    uint8_t mask;
 
     tcg_debug_assert(vece == MO_8);
-
-    t1 = tcg_temp_new_vec(type);
-    t2 = tcg_temp_new_vec(type);
-
-    /*
-     * Unpack to W, shift, and repack.  Tricky bits:
-     * (1) Use punpck*bw x,x to produce DDCCBBAA,
-     *     i.e. duplicate in other half of the 16-bit lane.
-     * (2) For right-shift, add 8 so that the high half of the lane
-     *     becomes zero.  For left-shift, and left-rotate, we must
-     *     shift up and down again.
-     * (3) Step 2 leaves high half zero such that PACKUSWB
-     *     (pack with unsigned saturation) does not modify
-     *     the quantity.
-     */
-    vec_gen_3(INDEX_op_x86_punpckl_vec, type, MO_8,
-              tcgv_vec_arg(t1), tcgv_vec_arg(v1), tcgv_vec_arg(v1));
-    vec_gen_3(INDEX_op_x86_punpckh_vec, type, MO_8,
-              tcgv_vec_arg(t2), tcgv_vec_arg(v1), tcgv_vec_arg(v1));
-
-    if (opc != INDEX_op_rotli_vec) {
-        imm += 8;
-    }
-    if (opc == INDEX_op_shri_vec) {
-        tcg_gen_shri_vec(MO_16, t1, t1, imm);
-        tcg_gen_shri_vec(MO_16, t2, t2, imm);
+    if (right) {
+        mask = 0xff >> imm;
+        tcg_gen_shri_vec(MO_16, v0, v1, imm);
     } else {
-        tcg_gen_shli_vec(MO_16, t1, t1, imm);
-        tcg_gen_shli_vec(MO_16, t2, t2, imm);
-        tcg_gen_shri_vec(MO_16, t1, t1, 8);
-        tcg_gen_shri_vec(MO_16, t2, t2, 8);
+        mask = 0xff << imm;
+        tcg_gen_shli_vec(MO_16, v0, v1, imm);
     }
-
-    vec_gen_3(INDEX_op_x86_packus_vec, type, MO_8,
-              tcgv_vec_arg(v0), tcgv_vec_arg(t1), tcgv_vec_arg(t2));
-    tcg_temp_free_vec(t1);
-    tcg_temp_free_vec(t2);
+    tcg_gen_and_vec(MO_8, v0, v0, tcg_constant_vec(type, MO_8, mask));
 }
 
 static void expand_vec_sari(TCGType type, unsigned vece,
@@ -3821,7 +3792,7 @@ static void expand_vec_sari(TCGType type, unsigned vece,
 
     switch (vece) {
     case MO_8:
-        /* Unpack to W, shift, and repack, as in expand_vec_shi.  */
+        /* Unpack to 16-bit, shift, and repack.  */
         t1 = tcg_temp_new_vec(type);
         t2 = tcg_temp_new_vec(type);
         vec_gen_3(INDEX_op_x86_punpckl_vec, type, MO_8,
@@ -3874,12 +3845,7 @@ static void expand_vec_rotli(TCGType type, unsigned vece,
 {
     TCGv_vec t;
 
-    if (vece == MO_8) {
-        expand_vec_shi(type, vece, INDEX_op_rotli_vec, v0, v1, imm);
-        return;
-    }
-
-    if (have_avx512vbmi2) {
+    if (vece != MO_8 && have_avx512vbmi2) {
         vec_gen_4(INDEX_op_x86_vpshldi_vec, type, vece,
                   tcgv_vec_arg(v0), tcgv_vec_arg(v1), tcgv_vec_arg(v1), imm);
         return;
@@ -4155,10 +4121,11 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
 
     switch (opc) {
     case INDEX_op_shli_vec:
-    case INDEX_op_shri_vec:
-        expand_vec_shi(type, vece, opc, v0, v1, a2);
+        expand_vec_shi(type, vece, false, v0, v1, a2);
+        break;
+    case INDEX_op_shri_vec:
+        expand_vec_shi(type, vece, true, v0, v1, a2);
         break;
-
     case INDEX_op_sari_vec:
         expand_vec_sari(type, vece, v0, v1, a2);
         break;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/5] tcg/i386: Optimize setcond of TST{EQ,NE} with 0xffffffff
  2024-04-24 17:09 [PATCH 0/5] tcg: Misc improvements Richard Henderson
                   ` (2 preceding siblings ...)
  2024-04-24 17:09 ` [PATCH 2/5] tcg/i386: Simplify immediate 8-bit logical vector shifts Richard Henderson
@ 2024-04-24 17:09 ` Richard Henderson
  2024-05-03 15:04   ` Philippe Mathieu-Daudé
  2024-04-24 17:09 ` [PATCH 4/5] tcg/optimize: Optimize setcond with zmask Richard Henderson
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Richard Henderson @ 2024-04-24 17:09 UTC (permalink / raw)
  To: qemu-devel

This may be treated as a 32-bit EQ/NE comparison against 0,
which is in turn treated as a LTU/GEU comparison against 1.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.c.inc | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 6837c519b0..59235b4f38 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1658,6 +1658,7 @@ static void tcg_out_setcond(TCGContext *s, int rexw, TCGCond cond,
                             TCGArg dest, TCGArg arg1, TCGArg arg2,
                             int const_arg2, bool neg)
 {
+    int cmp_rexw = rexw;
     bool inv = false;
     bool cleared;
     int jcc;
@@ -1674,6 +1675,18 @@ static void tcg_out_setcond(TCGContext *s, int rexw, TCGCond cond,
         }
         break;
 
+    case TCG_COND_TSTNE:
+        inv = true;
+        /* fall through */
+    case TCG_COND_TSTEQ:
+        /* If arg2 is -1, convert to LTU/GEU vs 1. */
+        if (const_arg2 && arg2 == 0xffffffffu) {
+            arg2 = 1;
+            cmp_rexw = 0;
+            goto do_ltu;
+        }
+        break;
+
     case TCG_COND_LEU:
         inv = true;
         /* fall through */
@@ -1697,7 +1710,7 @@ static void tcg_out_setcond(TCGContext *s, int rexw, TCGCond cond,
          * We can then use NEG or INC to produce the desired result.
          * This is always smaller than the SETCC expansion.
          */
-        tcg_out_cmp(s, TCG_COND_LTU, arg1, arg2, const_arg2, rexw);
+        tcg_out_cmp(s, TCG_COND_LTU, arg1, arg2, const_arg2, cmp_rexw);
 
         /* X - X - C = -C = (C ? -1 : 0) */
         tgen_arithr(s, ARITH_SBB + (neg ? rexw : 0), dest, dest);
@@ -1744,7 +1757,7 @@ static void tcg_out_setcond(TCGContext *s, int rexw, TCGCond cond,
         cleared = true;
     }
 
-    jcc = tcg_out_cmp(s, cond, arg1, arg2, const_arg2, rexw);
+    jcc = tcg_out_cmp(s, cond, arg1, arg2, const_arg2, cmp_rexw);
     tcg_out_modrm(s, OPC_SETCC | jcc, 0, dest);
 
     if (!cleared) {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/5] tcg/optimize: Optimize setcond with zmask
  2024-04-24 17:09 [PATCH 0/5] tcg: Misc improvements Richard Henderson
                   ` (3 preceding siblings ...)
  2024-04-24 17:09 ` [PATCH 3/5] tcg/i386: Optimize setcond of TST{EQ,NE} with 0xffffffff Richard Henderson
@ 2024-04-24 17:09 ` Richard Henderson
  2024-04-24 17:09 ` [PATCH 5/5] accel/tcg: Introduce CF_BP_PAGE Richard Henderson
  2024-05-02 19:34 ` [PATCH 0/5] tcg: Misc improvements Richard Henderson
  6 siblings, 0 replies; 13+ messages in thread
From: Richard Henderson @ 2024-04-24 17:09 UTC (permalink / raw)
  To: qemu-devel

If we can show that high bits of an input are zero,
then we may optimize away some comparisons.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 110 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 110 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 2e9e5725a9..8886f7037a 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -2099,6 +2099,108 @@ static bool fold_remainder(OptContext *ctx, TCGOp *op)
     return false;
 }
 
+static bool fold_setcond_zmask(OptContext *ctx, TCGOp *op, bool neg)
+{
+    uint64_t a_zmask, b_val;
+    TCGCond cond;
+
+    if (!arg_is_const(op->args[2])) {
+        return false;
+    }
+
+    a_zmask = arg_info(op->args[1])->z_mask;
+    b_val = arg_info(op->args[2])->val;
+    cond = op->args[3];
+
+    if (ctx->type == TCG_TYPE_I32) {
+        a_zmask = (uint32_t)a_zmask;
+        b_val = (uint32_t)b_val;
+    }
+
+    /*
+     * A with only low bits set vs B with high bits set means that A < B.
+     */
+    if (a_zmask < b_val) {
+        bool inv = false;
+
+        switch (cond) {
+        case TCG_COND_NE:
+        case TCG_COND_LEU:
+        case TCG_COND_LTU:
+            inv = true;
+            /* fall through */
+        case TCG_COND_GTU:
+        case TCG_COND_GEU:
+        case TCG_COND_EQ:
+            return tcg_opt_gen_movi(ctx, op, op->args[0], neg ? -inv : inv);
+        default:
+            break;
+        }
+    }
+
+    /*
+     * A with only lsb set is already boolean.
+     */
+    if (a_zmask <= 1) {
+        bool convert = false;
+        bool inv = false;
+
+        switch (cond) {
+        case TCG_COND_EQ:
+            inv = true;
+            /* fall through */
+        case TCG_COND_NE:
+            convert = (b_val == 0);
+            break;
+        case TCG_COND_LTU:
+        case TCG_COND_TSTEQ:
+            inv = true;
+            /* fall through */
+        case TCG_COND_GEU:
+        case TCG_COND_TSTNE:
+            convert = (b_val == 1);
+            break;
+        default:
+            break;
+        }
+        if (convert) {
+            TCGOpcode add_opc, xor_opc, neg_opc;
+
+            if (!inv && !neg) {
+                return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[1]);
+            }
+
+            switch (ctx->type) {
+            case TCG_TYPE_I32:
+                add_opc = INDEX_op_add_i32;
+                neg_opc = INDEX_op_neg_i32;
+                xor_opc = INDEX_op_xor_i32;
+                break;
+            case TCG_TYPE_I64:
+                add_opc = INDEX_op_add_i64;
+                neg_opc = INDEX_op_neg_i64;
+                xor_opc = INDEX_op_xor_i64;
+                break;
+            default:
+                g_assert_not_reached();
+            }
+
+            if (!inv) {
+                op->opc = neg_opc;
+            } else if (neg) {
+                op->opc = add_opc;
+                op->args[2] = arg_new_constant(ctx, -1);
+            } else {
+                op->opc = xor_opc;
+                op->args[2] = arg_new_constant(ctx, 1);
+            }
+            return false;
+        }
+    }
+
+    return false;
+}
+
 static void fold_setcond_tst_pow2(OptContext *ctx, TCGOp *op, bool neg)
 {
     TCGOpcode and_opc, sub_opc, xor_opc, neg_opc, shr_opc;
@@ -2200,6 +2302,10 @@ static bool fold_setcond(OptContext *ctx, TCGOp *op)
     if (i >= 0) {
         return tcg_opt_gen_movi(ctx, op, op->args[0], i);
     }
+
+    if (fold_setcond_zmask(ctx, op, false)) {
+        return true;
+    }
     fold_setcond_tst_pow2(ctx, op, false);
 
     ctx->z_mask = 1;
@@ -2214,6 +2320,10 @@ static bool fold_negsetcond(OptContext *ctx, TCGOp *op)
     if (i >= 0) {
         return tcg_opt_gen_movi(ctx, op, op->args[0], -i);
     }
+
+    if (fold_setcond_zmask(ctx, op, true)) {
+        return true;
+    }
     fold_setcond_tst_pow2(ctx, op, true);
 
     /* Value is {0,-1} so all bits are repetitions of the sign. */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/5] accel/tcg: Introduce CF_BP_PAGE
  2024-04-24 17:09 [PATCH 0/5] tcg: Misc improvements Richard Henderson
                   ` (4 preceding siblings ...)
  2024-04-24 17:09 ` [PATCH 4/5] tcg/optimize: Optimize setcond with zmask Richard Henderson
@ 2024-04-24 17:09 ` Richard Henderson
  2024-05-03 15:02   ` Philippe Mathieu-Daudé
  2024-05-02 19:34 ` [PATCH 0/5] tcg: Misc improvements Richard Henderson
  6 siblings, 1 reply; 13+ messages in thread
From: Richard Henderson @ 2024-04-24 17:09 UTC (permalink / raw)
  To: qemu-devel

Record the fact that we've found a breakpoint on the page
in which a TranslationBlock is running.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/translation-block.h | 1 +
 accel/tcg/cpu-exec.c             | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/exec/translation-block.h b/include/exec/translation-block.h
index 48211c890a..a6d1af6e9b 100644
--- a/include/exec/translation-block.h
+++ b/include/exec/translation-block.h
@@ -77,6 +77,7 @@ struct TranslationBlock {
 #define CF_PARALLEL      0x00008000 /* Generate code for a parallel context */
 #define CF_NOIRQ         0x00010000 /* Generate an uninterruptible TB */
 #define CF_PCREL         0x00020000 /* Opcodes in TB are PC-relative */
+#define CF_BP_PAGE       0x00040000 /* Breakpoint present in code page */
 #define CF_CLUSTER_MASK  0xff000000 /* Top 8 bits are cluster ID */
 #define CF_CLUSTER_SHIFT 24
 
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 5c70748060..26bf968ff3 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -368,7 +368,7 @@ static bool check_for_breakpoints_slow(CPUState *cpu, vaddr pc,
      * breakpoints are removed.
      */
     if (match_page) {
-        *cflags = (*cflags & ~CF_COUNT_MASK) | CF_NO_GOTO_TB | 1;
+        *cflags = (*cflags & ~CF_COUNT_MASK) | CF_NO_GOTO_TB | CF_BP_PAGE | 1;
     }
     return false;
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/5] tcg: Misc improvements
  2024-04-24 17:09 [PATCH 0/5] tcg: Misc improvements Richard Henderson
                   ` (5 preceding siblings ...)
  2024-04-24 17:09 ` [PATCH 5/5] accel/tcg: Introduce CF_BP_PAGE Richard Henderson
@ 2024-05-02 19:34 ` Richard Henderson
  6 siblings, 0 replies; 13+ messages in thread
From: Richard Henderson @ 2024-05-02 19:34 UTC (permalink / raw)
  To: qemu-devel

Ping.

On 4/24/24 10:09, Richard Henderson wrote:
> One patch to allow two output operands from gvec expansion,
> to be used by target/arm for updating QC.
> 
> One patch to record the result of the generic breakpoint
> search so target translators do not need to repeat it.
> 
> Three small optimization patches.
> 
> 
> r~
> 
> 
> Richard Henderson (5):
>    tcg: Add write_aofs to GVecGen3i
>    tcg/i386: Simplify immediate 8-bit logical vector shifts
>    tcg/i386: Optimize setcond of TST{EQ,NE} with 0xffffffff
>    tcg/optimize: Optimize setcond with zmask
>    accel/tcg: Introduce CF_BP_PAGE
> 
>   include/exec/translation-block.h |   1 +
>   include/tcg/tcg-op-gvec-common.h |   2 +
>   accel/tcg/cpu-exec.c             |   2 +-
>   tcg/optimize.c                   | 110 +++++++++++++++++++++++++++++++
>   tcg/tcg-op-gvec.c                |  30 ++++++---
>   tcg/i386/tcg-target.c.inc        |  78 ++++++++--------------
>   6 files changed, 165 insertions(+), 58 deletions(-)
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] target/arm: Restrict translation disabled alignment check to VMSA
  2024-04-24 17:09 ` [PATCH] target/arm: Restrict translation disabled alignment check to VMSA Richard Henderson
@ 2024-05-03 14:58   ` Philippe Mathieu-Daudé
  2024-05-03 14:59     ` Richard Henderson
  0 siblings, 1 reply; 13+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-05-03 14:58 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: Clément Chigot, qemu-stable

On 24/4/24 19:09, Richard Henderson wrote:
> For cpus using PMSA, when the MPU is disabled, the default memory
> type is Normal, Non-cachable.
> 
> Fixes: 59754f85ed3 ("target/arm: Do memory type alignment check when translation disabled")
> Reported-by: Clément Chigot <chigot@adacore.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> 
> Since v9 will likely be tagged tomorrow without this fixed,
> Cc: qemu-stable@nongnu.org
> 
> ---
>   target/arm/tcg/hflags.c | 12 ++++++++++--
>   1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c
> index 5da1b0fc1d..66de30b828 100644
> --- a/target/arm/tcg/hflags.c
> +++ b/target/arm/tcg/hflags.c
> @@ -38,8 +38,16 @@ static bool aprofile_require_alignment(CPUARMState *env, int el, uint64_t sctlr)
>       }
>   
>       /*
> -     * If translation is disabled, then the default memory type is
> -     * Device(-nGnRnE) instead of Normal, which requires that alignment
> +     * With PMSA, when the MPU is disabled, all memory types in the
> +     * default map is Normal.
> +     */
> +    if (arm_feature(env, ARM_FEATURE_PMSA)) {
> +        return false;
> +    }
> +
> +    /*
> +     * With VMSA, if translation is disabled, then the default memory type
> +     * is Device(-nGnRnE) instead of Normal, which requires that alignment
>        * be enforced.  Since this affects all ram, it is most efficient
>        * to handle this during translation.
>        */

This one is in target-arm.next:
https://lore.kernel.org/qemu-devel/CAFEAcA98UrBLsAXKzLSkUnC2G_RZd56veqUkSGSttoADfkEKGA@mail.gmail.com/


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] target/arm: Restrict translation disabled alignment check to VMSA
  2024-05-03 14:58   ` Philippe Mathieu-Daudé
@ 2024-05-03 14:59     ` Richard Henderson
  0 siblings, 0 replies; 13+ messages in thread
From: Richard Henderson @ 2024-05-03 14:59 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel; +Cc: Clément Chigot, qemu-stable

On 5/3/24 07:58, Philippe Mathieu-Daudé wrote:
> On 24/4/24 19:09, Richard Henderson wrote:
>> For cpus using PMSA, when the MPU is disabled, the default memory
>> type is Normal, Non-cachable.
>>
>> Fixes: 59754f85ed3 ("target/arm: Do memory type alignment check when translation disabled")
>> Reported-by: Clément Chigot <chigot@adacore.com>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>
>> Since v9 will likely be tagged tomorrow without this fixed,
>> Cc: qemu-stable@nongnu.org
>>
>> ---
>>   target/arm/tcg/hflags.c | 12 ++++++++++--
>>   1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c
>> index 5da1b0fc1d..66de30b828 100644
>> --- a/target/arm/tcg/hflags.c
>> +++ b/target/arm/tcg/hflags.c
>> @@ -38,8 +38,16 @@ static bool aprofile_require_alignment(CPUARMState *env, int el, 
>> uint64_t sctlr)
>>       }
>>       /*
>> -     * If translation is disabled, then the default memory type is
>> -     * Device(-nGnRnE) instead of Normal, which requires that alignment
>> +     * With PMSA, when the MPU is disabled, all memory types in the
>> +     * default map is Normal.
>> +     */
>> +    if (arm_feature(env, ARM_FEATURE_PMSA)) {
>> +        return false;
>> +    }
>> +
>> +    /*
>> +     * With VMSA, if translation is disabled, then the default memory type
>> +     * is Device(-nGnRnE) instead of Normal, which requires that alignment
>>        * be enforced.  Since this affects all ram, it is most efficient
>>        * to handle this during translation.
>>        */
> 
> This one is in target-arm.next:
> https://lore.kernel.org/qemu-devel/CAFEAcA98UrBLsAXKzLSkUnC2G_RZd56veqUkSGSttoADfkEKGA@mail.gmail.com/

Yes, that was a stray patch that accidentally got re-posted with this series.


r~


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/5] tcg: Add write_aofs to GVecGen3i
  2024-04-24 17:09 ` [PATCH 1/5] tcg: Add write_aofs to GVecGen3i Richard Henderson
@ 2024-05-03 15:01   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 13+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-05-03 15:01 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 24/4/24 19:09, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   include/tcg/tcg-op-gvec-common.h |  2 ++
>   tcg/tcg-op-gvec.c                | 30 ++++++++++++++++++++++--------
>   2 files changed, 24 insertions(+), 8 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/5] accel/tcg: Introduce CF_BP_PAGE
  2024-04-24 17:09 ` [PATCH 5/5] accel/tcg: Introduce CF_BP_PAGE Richard Henderson
@ 2024-05-03 15:02   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 13+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-05-03 15:02 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 24/4/24 19:09, Richard Henderson wrote:
> Record the fact that we've found a breakpoint on the page
> in which a TranslationBlock is running.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   include/exec/translation-block.h | 1 +
>   accel/tcg/cpu-exec.c             | 2 +-
>   2 files changed, 2 insertions(+), 1 deletion(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/5] tcg/i386: Optimize setcond of TST{EQ,NE} with 0xffffffff
  2024-04-24 17:09 ` [PATCH 3/5] tcg/i386: Optimize setcond of TST{EQ,NE} with 0xffffffff Richard Henderson
@ 2024-05-03 15:04   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 13+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-05-03 15:04 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 24/4/24 19:09, Richard Henderson wrote:
> This may be treated as a 32-bit EQ/NE comparison against 0,
> which is in turn treated as a LTU/GEU comparison against 1.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   tcg/i386/tcg-target.c.inc | 17 +++++++++++++++--
>   1 file changed, 15 insertions(+), 2 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-05-03 15:05 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-24 17:09 [PATCH 0/5] tcg: Misc improvements Richard Henderson
2024-04-24 17:09 ` [PATCH] target/arm: Restrict translation disabled alignment check to VMSA Richard Henderson
2024-05-03 14:58   ` Philippe Mathieu-Daudé
2024-05-03 14:59     ` Richard Henderson
2024-04-24 17:09 ` [PATCH 1/5] tcg: Add write_aofs to GVecGen3i Richard Henderson
2024-05-03 15:01   ` Philippe Mathieu-Daudé
2024-04-24 17:09 ` [PATCH 2/5] tcg/i386: Simplify immediate 8-bit logical vector shifts Richard Henderson
2024-04-24 17:09 ` [PATCH 3/5] tcg/i386: Optimize setcond of TST{EQ,NE} with 0xffffffff Richard Henderson
2024-05-03 15:04   ` Philippe Mathieu-Daudé
2024-04-24 17:09 ` [PATCH 4/5] tcg/optimize: Optimize setcond with zmask Richard Henderson
2024-04-24 17:09 ` [PATCH 5/5] accel/tcg: Introduce CF_BP_PAGE Richard Henderson
2024-05-03 15:02   ` Philippe Mathieu-Daudé
2024-05-02 19:34 ` [PATCH 0/5] tcg: Misc improvements Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).