* [PATCH] target/arm: Restrict translation disabled alignment check to VMSA
2024-04-24 17:09 [PATCH 0/5] tcg: Misc improvements Richard Henderson
@ 2024-04-24 17:09 ` Richard Henderson
2024-05-03 14:58 ` Philippe Mathieu-Daudé
2024-04-24 17:09 ` [PATCH 1/5] tcg: Add write_aofs to GVecGen3i Richard Henderson
` (5 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: Richard Henderson @ 2024-04-24 17:09 UTC (permalink / raw)
To: qemu-devel; +Cc: Clément Chigot, qemu-stable
For cpus using PMSA, when the MPU is disabled, the default memory
type is Normal, Non-cachable.
Fixes: 59754f85ed3 ("target/arm: Do memory type alignment check when translation disabled")
Reported-by: Clément Chigot <chigot@adacore.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
Since v9 will likely be tagged tomorrow without this fixed,
Cc: qemu-stable@nongnu.org
---
target/arm/tcg/hflags.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c
index 5da1b0fc1d..66de30b828 100644
--- a/target/arm/tcg/hflags.c
+++ b/target/arm/tcg/hflags.c
@@ -38,8 +38,16 @@ static bool aprofile_require_alignment(CPUARMState *env, int el, uint64_t sctlr)
}
/*
- * If translation is disabled, then the default memory type is
- * Device(-nGnRnE) instead of Normal, which requires that alignment
+ * With PMSA, when the MPU is disabled, all memory types in the
+ * default map is Normal.
+ */
+ if (arm_feature(env, ARM_FEATURE_PMSA)) {
+ return false;
+ }
+
+ /*
+ * With VMSA, if translation is disabled, then the default memory type
+ * is Device(-nGnRnE) instead of Normal, which requires that alignment
* be enforced. Since this affects all ram, it is most efficient
* to handle this during translation.
*/
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH] target/arm: Restrict translation disabled alignment check to VMSA
2024-04-24 17:09 ` [PATCH] target/arm: Restrict translation disabled alignment check to VMSA Richard Henderson
@ 2024-05-03 14:58 ` Philippe Mathieu-Daudé
2024-05-03 14:59 ` Richard Henderson
0 siblings, 1 reply; 18+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-05-03 14:58 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: Clément Chigot, qemu-stable
On 24/4/24 19:09, Richard Henderson wrote:
> For cpus using PMSA, when the MPU is disabled, the default memory
> type is Normal, Non-cachable.
>
> Fixes: 59754f85ed3 ("target/arm: Do memory type alignment check when translation disabled")
> Reported-by: Clément Chigot <chigot@adacore.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>
> Since v9 will likely be tagged tomorrow without this fixed,
> Cc: qemu-stable@nongnu.org
>
> ---
> target/arm/tcg/hflags.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c
> index 5da1b0fc1d..66de30b828 100644
> --- a/target/arm/tcg/hflags.c
> +++ b/target/arm/tcg/hflags.c
> @@ -38,8 +38,16 @@ static bool aprofile_require_alignment(CPUARMState *env, int el, uint64_t sctlr)
> }
>
> /*
> - * If translation is disabled, then the default memory type is
> - * Device(-nGnRnE) instead of Normal, which requires that alignment
> + * With PMSA, when the MPU is disabled, all memory types in the
> + * default map is Normal.
> + */
> + if (arm_feature(env, ARM_FEATURE_PMSA)) {
> + return false;
> + }
> +
> + /*
> + * With VMSA, if translation is disabled, then the default memory type
> + * is Device(-nGnRnE) instead of Normal, which requires that alignment
> * be enforced. Since this affects all ram, it is most efficient
> * to handle this during translation.
> */
This one is in target-arm.next:
https://lore.kernel.org/qemu-devel/CAFEAcA98UrBLsAXKzLSkUnC2G_RZd56veqUkSGSttoADfkEKGA@mail.gmail.com/
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] target/arm: Restrict translation disabled alignment check to VMSA
2024-05-03 14:58 ` Philippe Mathieu-Daudé
@ 2024-05-03 14:59 ` Richard Henderson
0 siblings, 0 replies; 18+ messages in thread
From: Richard Henderson @ 2024-05-03 14:59 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, qemu-devel; +Cc: Clément Chigot, qemu-stable
On 5/3/24 07:58, Philippe Mathieu-Daudé wrote:
> On 24/4/24 19:09, Richard Henderson wrote:
>> For cpus using PMSA, when the MPU is disabled, the default memory
>> type is Normal, Non-cachable.
>>
>> Fixes: 59754f85ed3 ("target/arm: Do memory type alignment check when translation disabled")
>> Reported-by: Clément Chigot <chigot@adacore.com>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>
>> Since v9 will likely be tagged tomorrow without this fixed,
>> Cc: qemu-stable@nongnu.org
>>
>> ---
>> target/arm/tcg/hflags.c | 12 ++++++++++--
>> 1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c
>> index 5da1b0fc1d..66de30b828 100644
>> --- a/target/arm/tcg/hflags.c
>> +++ b/target/arm/tcg/hflags.c
>> @@ -38,8 +38,16 @@ static bool aprofile_require_alignment(CPUARMState *env, int el,
>> uint64_t sctlr)
>> }
>> /*
>> - * If translation is disabled, then the default memory type is
>> - * Device(-nGnRnE) instead of Normal, which requires that alignment
>> + * With PMSA, when the MPU is disabled, all memory types in the
>> + * default map is Normal.
>> + */
>> + if (arm_feature(env, ARM_FEATURE_PMSA)) {
>> + return false;
>> + }
>> +
>> + /*
>> + * With VMSA, if translation is disabled, then the default memory type
>> + * is Device(-nGnRnE) instead of Normal, which requires that alignment
>> * be enforced. Since this affects all ram, it is most efficient
>> * to handle this during translation.
>> */
>
> This one is in target-arm.next:
> https://lore.kernel.org/qemu-devel/CAFEAcA98UrBLsAXKzLSkUnC2G_RZd56veqUkSGSttoADfkEKGA@mail.gmail.com/
Yes, that was a stray patch that accidentally got re-posted with this series.
r~
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/5] tcg: Add write_aofs to GVecGen3i
2024-04-24 17:09 [PATCH 0/5] tcg: Misc improvements Richard Henderson
2024-04-24 17:09 ` [PATCH] target/arm: Restrict translation disabled alignment check to VMSA Richard Henderson
@ 2024-04-24 17:09 ` Richard Henderson
2024-05-03 15:01 ` Philippe Mathieu-Daudé
2024-04-24 17:09 ` [PATCH 2/5] tcg/i386: Simplify immediate 8-bit logical vector shifts Richard Henderson
` (4 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: Richard Henderson @ 2024-04-24 17:09 UTC (permalink / raw)
To: qemu-devel
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/tcg/tcg-op-gvec-common.h | 2 ++
tcg/tcg-op-gvec.c | 30 ++++++++++++++++++++++--------
2 files changed, 24 insertions(+), 8 deletions(-)
diff --git a/include/tcg/tcg-op-gvec-common.h b/include/tcg/tcg-op-gvec-common.h
index 4db8a58c14..65553f5f97 100644
--- a/include/tcg/tcg-op-gvec-common.h
+++ b/include/tcg/tcg-op-gvec-common.h
@@ -183,6 +183,8 @@ typedef struct {
bool prefer_i64;
/* Load dest as a 3rd source operand. */
bool load_dest;
+ /* Write aofs as a 2nd dest operand. */
+ bool write_aofs;
} GVecGen3i;
typedef struct {
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index bb88943f79..0308732d9b 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -785,7 +785,8 @@ static void expand_3_i32(uint32_t dofs, uint32_t aofs,
}
static void expand_3i_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs,
- uint32_t oprsz, int32_t c, bool load_dest,
+ uint32_t oprsz, int32_t c,
+ bool load_dest, bool write_aofs,
void (*fni)(TCGv_i32, TCGv_i32, TCGv_i32, int32_t))
{
TCGv_i32 t0 = tcg_temp_new_i32();
@@ -801,6 +802,9 @@ static void expand_3i_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs,
}
fni(t2, t0, t1, c);
tcg_gen_st_i32(t2, tcg_env, dofs + i);
+ if (write_aofs) {
+ tcg_gen_st_i32(t0, tcg_env, aofs + i);
+ }
}
tcg_temp_free_i32(t0);
tcg_temp_free_i32(t1);
@@ -944,7 +948,8 @@ static void expand_3_i64(uint32_t dofs, uint32_t aofs,
}
static void expand_3i_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs,
- uint32_t oprsz, int64_t c, bool load_dest,
+ uint32_t oprsz, int64_t c,
+ bool load_dest, bool write_aofs,
void (*fni)(TCGv_i64, TCGv_i64, TCGv_i64, int64_t))
{
TCGv_i64 t0 = tcg_temp_new_i64();
@@ -960,6 +965,9 @@ static void expand_3i_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs,
}
fni(t2, t0, t1, c);
tcg_gen_st_i64(t2, tcg_env, dofs + i);
+ if (write_aofs) {
+ tcg_gen_st_i64(t0, tcg_env, aofs + i);
+ }
}
tcg_temp_free_i64(t0);
tcg_temp_free_i64(t1);
@@ -1102,7 +1110,8 @@ static void expand_3_vec(unsigned vece, uint32_t dofs, uint32_t aofs,
*/
static void expand_3i_vec(unsigned vece, uint32_t dofs, uint32_t aofs,
uint32_t bofs, uint32_t oprsz, uint32_t tysz,
- TCGType type, int64_t c, bool load_dest,
+ TCGType type, int64_t c,
+ bool load_dest, bool write_aofs,
void (*fni)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec,
int64_t))
{
@@ -1118,6 +1127,9 @@ static void expand_3i_vec(unsigned vece, uint32_t dofs, uint32_t aofs,
}
fni(vece, t2, t0, t1, c);
tcg_gen_st_vec(t2, tcg_env, dofs + i);
+ if (write_aofs) {
+ tcg_gen_st_vec(t0, tcg_env, aofs + i);
+ }
}
}
@@ -1471,7 +1483,7 @@ void tcg_gen_gvec_3i(uint32_t dofs, uint32_t aofs, uint32_t bofs,
*/
some = QEMU_ALIGN_DOWN(oprsz, 32);
expand_3i_vec(g->vece, dofs, aofs, bofs, some, 32, TCG_TYPE_V256,
- c, g->load_dest, g->fniv);
+ c, g->load_dest, g->write_aofs, g->fniv);
if (some == oprsz) {
break;
}
@@ -1483,18 +1495,20 @@ void tcg_gen_gvec_3i(uint32_t dofs, uint32_t aofs, uint32_t bofs,
/* fallthru */
case TCG_TYPE_V128:
expand_3i_vec(g->vece, dofs, aofs, bofs, oprsz, 16, TCG_TYPE_V128,
- c, g->load_dest, g->fniv);
+ c, g->load_dest, g->write_aofs, g->fniv);
break;
case TCG_TYPE_V64:
expand_3i_vec(g->vece, dofs, aofs, bofs, oprsz, 8, TCG_TYPE_V64,
- c, g->load_dest, g->fniv);
+ c, g->load_dest, g->write_aofs, g->fniv);
break;
case 0:
if (g->fni8 && check_size_impl(oprsz, 8)) {
- expand_3i_i64(dofs, aofs, bofs, oprsz, c, g->load_dest, g->fni8);
+ expand_3i_i64(dofs, aofs, bofs, oprsz, c,
+ g->load_dest, g->write_aofs, g->fni8);
} else if (g->fni4 && check_size_impl(oprsz, 4)) {
- expand_3i_i32(dofs, aofs, bofs, oprsz, c, g->load_dest, g->fni4);
+ expand_3i_i32(dofs, aofs, bofs, oprsz, c,
+ g->load_dest, g->write_aofs, g->fni4);
} else {
assert(g->fno != NULL);
tcg_gen_gvec_3_ool(dofs, aofs, bofs, oprsz, maxsz, c, g->fno);
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/5] tcg/i386: Simplify immediate 8-bit logical vector shifts
2024-04-24 17:09 [PATCH 0/5] tcg: Misc improvements Richard Henderson
2024-04-24 17:09 ` [PATCH] target/arm: Restrict translation disabled alignment check to VMSA Richard Henderson
2024-04-24 17:09 ` [PATCH 1/5] tcg: Add write_aofs to GVecGen3i Richard Henderson
@ 2024-04-24 17:09 ` Richard Henderson
2024-04-24 17:09 ` [PATCH 3/5] tcg/i386: Optimize setcond of TST{EQ,NE} with 0xffffffff Richard Henderson
` (3 subsequent siblings)
6 siblings, 0 replies; 18+ messages in thread
From: Richard Henderson @ 2024-04-24 17:09 UTC (permalink / raw)
To: qemu-devel
The x86 isa does not have this operation, so we need an expansion.
Use the same algorithm that we use for expanding this vector
operation with integers: perform the shift with a wider type
and then mask the bits that must be zero.
This reduces the instruction count from 5 to 2.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
tcg/i386/tcg-target.c.inc | 61 +++++++++------------------------------
1 file changed, 14 insertions(+), 47 deletions(-)
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index c6ba498623..6837c519b0 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -3769,49 +3769,20 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
}
}
-static void expand_vec_shi(TCGType type, unsigned vece, TCGOpcode opc,
+static void expand_vec_shi(TCGType type, unsigned vece, bool right,
TCGv_vec v0, TCGv_vec v1, TCGArg imm)
{
- TCGv_vec t1, t2;
+ uint8_t mask;
tcg_debug_assert(vece == MO_8);
-
- t1 = tcg_temp_new_vec(type);
- t2 = tcg_temp_new_vec(type);
-
- /*
- * Unpack to W, shift, and repack. Tricky bits:
- * (1) Use punpck*bw x,x to produce DDCCBBAA,
- * i.e. duplicate in other half of the 16-bit lane.
- * (2) For right-shift, add 8 so that the high half of the lane
- * becomes zero. For left-shift, and left-rotate, we must
- * shift up and down again.
- * (3) Step 2 leaves high half zero such that PACKUSWB
- * (pack with unsigned saturation) does not modify
- * the quantity.
- */
- vec_gen_3(INDEX_op_x86_punpckl_vec, type, MO_8,
- tcgv_vec_arg(t1), tcgv_vec_arg(v1), tcgv_vec_arg(v1));
- vec_gen_3(INDEX_op_x86_punpckh_vec, type, MO_8,
- tcgv_vec_arg(t2), tcgv_vec_arg(v1), tcgv_vec_arg(v1));
-
- if (opc != INDEX_op_rotli_vec) {
- imm += 8;
- }
- if (opc == INDEX_op_shri_vec) {
- tcg_gen_shri_vec(MO_16, t1, t1, imm);
- tcg_gen_shri_vec(MO_16, t2, t2, imm);
+ if (right) {
+ mask = 0xff >> imm;
+ tcg_gen_shri_vec(MO_16, v0, v1, imm);
} else {
- tcg_gen_shli_vec(MO_16, t1, t1, imm);
- tcg_gen_shli_vec(MO_16, t2, t2, imm);
- tcg_gen_shri_vec(MO_16, t1, t1, 8);
- tcg_gen_shri_vec(MO_16, t2, t2, 8);
+ mask = 0xff << imm;
+ tcg_gen_shli_vec(MO_16, v0, v1, imm);
}
-
- vec_gen_3(INDEX_op_x86_packus_vec, type, MO_8,
- tcgv_vec_arg(v0), tcgv_vec_arg(t1), tcgv_vec_arg(t2));
- tcg_temp_free_vec(t1);
- tcg_temp_free_vec(t2);
+ tcg_gen_and_vec(MO_8, v0, v0, tcg_constant_vec(type, MO_8, mask));
}
static void expand_vec_sari(TCGType type, unsigned vece,
@@ -3821,7 +3792,7 @@ static void expand_vec_sari(TCGType type, unsigned vece,
switch (vece) {
case MO_8:
- /* Unpack to W, shift, and repack, as in expand_vec_shi. */
+ /* Unpack to 16-bit, shift, and repack. */
t1 = tcg_temp_new_vec(type);
t2 = tcg_temp_new_vec(type);
vec_gen_3(INDEX_op_x86_punpckl_vec, type, MO_8,
@@ -3874,12 +3845,7 @@ static void expand_vec_rotli(TCGType type, unsigned vece,
{
TCGv_vec t;
- if (vece == MO_8) {
- expand_vec_shi(type, vece, INDEX_op_rotli_vec, v0, v1, imm);
- return;
- }
-
- if (have_avx512vbmi2) {
+ if (vece != MO_8 && have_avx512vbmi2) {
vec_gen_4(INDEX_op_x86_vpshldi_vec, type, vece,
tcgv_vec_arg(v0), tcgv_vec_arg(v1), tcgv_vec_arg(v1), imm);
return;
@@ -4155,10 +4121,11 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
switch (opc) {
case INDEX_op_shli_vec:
- case INDEX_op_shri_vec:
- expand_vec_shi(type, vece, opc, v0, v1, a2);
+ expand_vec_shi(type, vece, false, v0, v1, a2);
+ break;
+ case INDEX_op_shri_vec:
+ expand_vec_shi(type, vece, true, v0, v1, a2);
break;
-
case INDEX_op_sari_vec:
expand_vec_sari(type, vece, v0, v1, a2);
break;
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 3/5] tcg/i386: Optimize setcond of TST{EQ,NE} with 0xffffffff
2024-04-24 17:09 [PATCH 0/5] tcg: Misc improvements Richard Henderson
` (2 preceding siblings ...)
2024-04-24 17:09 ` [PATCH 2/5] tcg/i386: Simplify immediate 8-bit logical vector shifts Richard Henderson
@ 2024-04-24 17:09 ` Richard Henderson
2024-05-03 15:04 ` Philippe Mathieu-Daudé
2024-04-24 17:09 ` [PATCH 4/5] tcg/optimize: Optimize setcond with zmask Richard Henderson
` (2 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: Richard Henderson @ 2024-04-24 17:09 UTC (permalink / raw)
To: qemu-devel
This may be treated as a 32-bit EQ/NE comparison against 0,
which is in turn treated as a LTU/GEU comparison against 1.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
tcg/i386/tcg-target.c.inc | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 6837c519b0..59235b4f38 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1658,6 +1658,7 @@ static void tcg_out_setcond(TCGContext *s, int rexw, TCGCond cond,
TCGArg dest, TCGArg arg1, TCGArg arg2,
int const_arg2, bool neg)
{
+ int cmp_rexw = rexw;
bool inv = false;
bool cleared;
int jcc;
@@ -1674,6 +1675,18 @@ static void tcg_out_setcond(TCGContext *s, int rexw, TCGCond cond,
}
break;
+ case TCG_COND_TSTNE:
+ inv = true;
+ /* fall through */
+ case TCG_COND_TSTEQ:
+ /* If arg2 is -1, convert to LTU/GEU vs 1. */
+ if (const_arg2 && arg2 == 0xffffffffu) {
+ arg2 = 1;
+ cmp_rexw = 0;
+ goto do_ltu;
+ }
+ break;
+
case TCG_COND_LEU:
inv = true;
/* fall through */
@@ -1697,7 +1710,7 @@ static void tcg_out_setcond(TCGContext *s, int rexw, TCGCond cond,
* We can then use NEG or INC to produce the desired result.
* This is always smaller than the SETCC expansion.
*/
- tcg_out_cmp(s, TCG_COND_LTU, arg1, arg2, const_arg2, rexw);
+ tcg_out_cmp(s, TCG_COND_LTU, arg1, arg2, const_arg2, cmp_rexw);
/* X - X - C = -C = (C ? -1 : 0) */
tgen_arithr(s, ARITH_SBB + (neg ? rexw : 0), dest, dest);
@@ -1744,7 +1757,7 @@ static void tcg_out_setcond(TCGContext *s, int rexw, TCGCond cond,
cleared = true;
}
- jcc = tcg_out_cmp(s, cond, arg1, arg2, const_arg2, rexw);
+ jcc = tcg_out_cmp(s, cond, arg1, arg2, const_arg2, cmp_rexw);
tcg_out_modrm(s, OPC_SETCC | jcc, 0, dest);
if (!cleared) {
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 4/5] tcg/optimize: Optimize setcond with zmask
2024-04-24 17:09 [PATCH 0/5] tcg: Misc improvements Richard Henderson
` (3 preceding siblings ...)
2024-04-24 17:09 ` [PATCH 3/5] tcg/i386: Optimize setcond of TST{EQ,NE} with 0xffffffff Richard Henderson
@ 2024-04-24 17:09 ` Richard Henderson
2024-04-24 17:09 ` [PATCH 5/5] accel/tcg: Introduce CF_BP_PAGE Richard Henderson
2024-05-02 19:34 ` [PATCH 0/5] tcg: Misc improvements Richard Henderson
6 siblings, 0 replies; 18+ messages in thread
From: Richard Henderson @ 2024-04-24 17:09 UTC (permalink / raw)
To: qemu-devel
If we can show that high bits of an input are zero,
then we may optimize away some comparisons.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
tcg/optimize.c | 110 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 110 insertions(+)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 2e9e5725a9..8886f7037a 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -2099,6 +2099,108 @@ static bool fold_remainder(OptContext *ctx, TCGOp *op)
return false;
}
+static bool fold_setcond_zmask(OptContext *ctx, TCGOp *op, bool neg)
+{
+ uint64_t a_zmask, b_val;
+ TCGCond cond;
+
+ if (!arg_is_const(op->args[2])) {
+ return false;
+ }
+
+ a_zmask = arg_info(op->args[1])->z_mask;
+ b_val = arg_info(op->args[2])->val;
+ cond = op->args[3];
+
+ if (ctx->type == TCG_TYPE_I32) {
+ a_zmask = (uint32_t)a_zmask;
+ b_val = (uint32_t)b_val;
+ }
+
+ /*
+ * A with only low bits set vs B with high bits set means that A < B.
+ */
+ if (a_zmask < b_val) {
+ bool inv = false;
+
+ switch (cond) {
+ case TCG_COND_NE:
+ case TCG_COND_LEU:
+ case TCG_COND_LTU:
+ inv = true;
+ /* fall through */
+ case TCG_COND_GTU:
+ case TCG_COND_GEU:
+ case TCG_COND_EQ:
+ return tcg_opt_gen_movi(ctx, op, op->args[0], neg ? -inv : inv);
+ default:
+ break;
+ }
+ }
+
+ /*
+ * A with only lsb set is already boolean.
+ */
+ if (a_zmask <= 1) {
+ bool convert = false;
+ bool inv = false;
+
+ switch (cond) {
+ case TCG_COND_EQ:
+ inv = true;
+ /* fall through */
+ case TCG_COND_NE:
+ convert = (b_val == 0);
+ break;
+ case TCG_COND_LTU:
+ case TCG_COND_TSTEQ:
+ inv = true;
+ /* fall through */
+ case TCG_COND_GEU:
+ case TCG_COND_TSTNE:
+ convert = (b_val == 1);
+ break;
+ default:
+ break;
+ }
+ if (convert) {
+ TCGOpcode add_opc, xor_opc, neg_opc;
+
+ if (!inv && !neg) {
+ return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[1]);
+ }
+
+ switch (ctx->type) {
+ case TCG_TYPE_I32:
+ add_opc = INDEX_op_add_i32;
+ neg_opc = INDEX_op_neg_i32;
+ xor_opc = INDEX_op_xor_i32;
+ break;
+ case TCG_TYPE_I64:
+ add_opc = INDEX_op_add_i64;
+ neg_opc = INDEX_op_neg_i64;
+ xor_opc = INDEX_op_xor_i64;
+ break;
+ default:
+ g_assert_not_reached();
+ }
+
+ if (!inv) {
+ op->opc = neg_opc;
+ } else if (neg) {
+ op->opc = add_opc;
+ op->args[2] = arg_new_constant(ctx, -1);
+ } else {
+ op->opc = xor_opc;
+ op->args[2] = arg_new_constant(ctx, 1);
+ }
+ return false;
+ }
+ }
+
+ return false;
+}
+
static void fold_setcond_tst_pow2(OptContext *ctx, TCGOp *op, bool neg)
{
TCGOpcode and_opc, sub_opc, xor_opc, neg_opc, shr_opc;
@@ -2200,6 +2302,10 @@ static bool fold_setcond(OptContext *ctx, TCGOp *op)
if (i >= 0) {
return tcg_opt_gen_movi(ctx, op, op->args[0], i);
}
+
+ if (fold_setcond_zmask(ctx, op, false)) {
+ return true;
+ }
fold_setcond_tst_pow2(ctx, op, false);
ctx->z_mask = 1;
@@ -2214,6 +2320,10 @@ static bool fold_negsetcond(OptContext *ctx, TCGOp *op)
if (i >= 0) {
return tcg_opt_gen_movi(ctx, op, op->args[0], -i);
}
+
+ if (fold_setcond_zmask(ctx, op, true)) {
+ return true;
+ }
fold_setcond_tst_pow2(ctx, op, true);
/* Value is {0,-1} so all bits are repetitions of the sign. */
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 5/5] accel/tcg: Introduce CF_BP_PAGE
2024-04-24 17:09 [PATCH 0/5] tcg: Misc improvements Richard Henderson
` (4 preceding siblings ...)
2024-04-24 17:09 ` [PATCH 4/5] tcg/optimize: Optimize setcond with zmask Richard Henderson
@ 2024-04-24 17:09 ` Richard Henderson
2024-05-03 15:02 ` Philippe Mathieu-Daudé
2024-05-02 19:34 ` [PATCH 0/5] tcg: Misc improvements Richard Henderson
6 siblings, 1 reply; 18+ messages in thread
From: Richard Henderson @ 2024-04-24 17:09 UTC (permalink / raw)
To: qemu-devel
Record the fact that we've found a breakpoint on the page
in which a TranslationBlock is running.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/exec/translation-block.h | 1 +
accel/tcg/cpu-exec.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/exec/translation-block.h b/include/exec/translation-block.h
index 48211c890a..a6d1af6e9b 100644
--- a/include/exec/translation-block.h
+++ b/include/exec/translation-block.h
@@ -77,6 +77,7 @@ struct TranslationBlock {
#define CF_PARALLEL 0x00008000 /* Generate code for a parallel context */
#define CF_NOIRQ 0x00010000 /* Generate an uninterruptible TB */
#define CF_PCREL 0x00020000 /* Opcodes in TB are PC-relative */
+#define CF_BP_PAGE 0x00040000 /* Breakpoint present in code page */
#define CF_CLUSTER_MASK 0xff000000 /* Top 8 bits are cluster ID */
#define CF_CLUSTER_SHIFT 24
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 5c70748060..26bf968ff3 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -368,7 +368,7 @@ static bool check_for_breakpoints_slow(CPUState *cpu, vaddr pc,
* breakpoints are removed.
*/
if (match_page) {
- *cflags = (*cflags & ~CF_COUNT_MASK) | CF_NO_GOTO_TB | 1;
+ *cflags = (*cflags & ~CF_COUNT_MASK) | CF_NO_GOTO_TB | CF_BP_PAGE | 1;
}
return false;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 0/5] tcg: Misc improvements
2024-04-24 17:09 [PATCH 0/5] tcg: Misc improvements Richard Henderson
` (5 preceding siblings ...)
2024-04-24 17:09 ` [PATCH 5/5] accel/tcg: Introduce CF_BP_PAGE Richard Henderson
@ 2024-05-02 19:34 ` Richard Henderson
6 siblings, 0 replies; 18+ messages in thread
From: Richard Henderson @ 2024-05-02 19:34 UTC (permalink / raw)
To: qemu-devel
Ping.
On 4/24/24 10:09, Richard Henderson wrote:
> One patch to allow two output operands from gvec expansion,
> to be used by target/arm for updating QC.
>
> One patch to record the result of the generic breakpoint
> search so target translators do not need to repeat it.
>
> Three small optimization patches.
>
>
> r~
>
>
> Richard Henderson (5):
> tcg: Add write_aofs to GVecGen3i
> tcg/i386: Simplify immediate 8-bit logical vector shifts
> tcg/i386: Optimize setcond of TST{EQ,NE} with 0xffffffff
> tcg/optimize: Optimize setcond with zmask
> accel/tcg: Introduce CF_BP_PAGE
>
> include/exec/translation-block.h | 1 +
> include/tcg/tcg-op-gvec-common.h | 2 +
> accel/tcg/cpu-exec.c | 2 +-
> tcg/optimize.c | 110 +++++++++++++++++++++++++++++++
> tcg/tcg-op-gvec.c | 30 ++++++---
> tcg/i386/tcg-target.c.inc | 78 ++++++++--------------
> 6 files changed, 165 insertions(+), 58 deletions(-)
>
^ permalink raw reply [flat|nested] 18+ messages in thread