* [PATCH for-9.0 0/3] target/hppa: Fix DCOR, UADDCM conditions
@ 2024-03-25 3:04 Richard Henderson
2024-03-25 3:04 ` [PATCH 1/3] targt/hppa: Fix DCOR reconstruction of carry bits Richard Henderson
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Richard Henderson @ 2024-03-25 3:04 UTC (permalink / raw)
To: qemu-devel; +Cc: deller
Two problems, both related to the reconstruction and computation
of carry bits. Simplify UXOR a bit, since no carry is involved.
While in the area, optimize UADDCM without condition, as that's
the common case for inverting a register.
r~
Richard Henderson (3):
targt/hppa: Fix DCOR reconstruction of carry bits
target/hppa: Optimize UADDCM with no condition
target/hppa: Fix unit carry conditions
target/hppa/translate.c | 240 ++++++++++++++++++++++------------------
1 file changed, 132 insertions(+), 108 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/3] targt/hppa: Fix DCOR reconstruction of carry bits
2024-03-25 3:04 [PATCH for-9.0 0/3] target/hppa: Fix DCOR, UADDCM conditions Richard Henderson
@ 2024-03-25 3:04 ` Richard Henderson
2024-03-25 9:48 ` Helge Deller
2024-03-25 3:04 ` [PATCH 2/3] target/hppa: Optimize UADDCM with no condition Richard Henderson
2024-03-25 3:04 ` [PATCH 3/3] target/hppa: Fix unit carry conditions Richard Henderson
2 siblings, 1 reply; 7+ messages in thread
From: Richard Henderson @ 2024-03-25 3:04 UTC (permalink / raw)
To: qemu-devel; +Cc: deller
The carry bits for each nibble N are located in bit (N+1)*4,
so the shift by 3 was off by one. Furthermore, the carry bit
for the most significant carry bit is indeed located in bit 64,
which is located in a different storage word.
Use a double-word shift-right to reassemble into a single word
and place them all at bit 0 of their respective nibbles.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/hppa/translate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index e041310207..a3f425d861 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -2791,7 +2791,7 @@ static bool do_dcor(DisasContext *ctx, arg_rr_cf_d *a, bool is_i)
nullify_over(ctx);
tmp = tcg_temp_new_i64();
- tcg_gen_shri_i64(tmp, cpu_psw_cb, 3);
+ tcg_gen_extract2_i64(tmp, cpu_psw_cb, cpu_psw_cb_msb, 4);
if (!is_i) {
tcg_gen_not_i64(tmp, tmp);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/3] target/hppa: Optimize UADDCM with no condition
2024-03-25 3:04 [PATCH for-9.0 0/3] target/hppa: Fix DCOR, UADDCM conditions Richard Henderson
2024-03-25 3:04 ` [PATCH 1/3] targt/hppa: Fix DCOR reconstruction of carry bits Richard Henderson
@ 2024-03-25 3:04 ` Richard Henderson
2024-03-25 9:48 ` Helge Deller
2024-03-25 3:04 ` [PATCH 3/3] target/hppa: Fix unit carry conditions Richard Henderson
2 siblings, 1 reply; 7+ messages in thread
From: Richard Henderson @ 2024-03-25 3:04 UTC (permalink / raw)
To: qemu-devel; +Cc: deller
With r1 as zero is by far the only usage of UADDCM, as the easiest
way to invert a register. The compiler does occasionally use the
addition step as well, and we can simplify that to avoid a temp
and write directly into the destination.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/hppa/translate.c | 24 ++++++++++++++++++++++--
1 file changed, 22 insertions(+), 2 deletions(-)
diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index a3f425d861..3fc3e7754c 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -2763,9 +2763,29 @@ static bool do_uaddcm(DisasContext *ctx, arg_rrr_cf_d *a, bool is_tc)
{
TCGv_i64 tcg_r1, tcg_r2, tmp;
- if (a->cf) {
- nullify_over(ctx);
+ if (a->cf == 0) {
+ tcg_r2 = load_gpr(ctx, a->r2);
+ tmp = dest_gpr(ctx, a->t);
+
+ if (a->r1 == 0) {
+ /* UADDCM r0,src,dst is the common idiom for dst = ~src. */
+ tcg_gen_not_i64(tmp, tcg_r2);
+ } else {
+ /*
+ * Recall that r1 - r2 == r1 + ~r2 + 1.
+ * Thus r1 + ~r2 == r1 - r2 - 1,
+ * which does not require an extra temporary.
+ */
+ tcg_r1 = load_gpr(ctx, a->r1);
+ tcg_gen_sub_i64(tmp, tcg_r1, tcg_r2);
+ tcg_gen_subi_i64(tmp, tmp, 1);
+ }
+ save_gpr(ctx, a->t, tmp);
+ cond_free(&ctx->null_cond);
+ return true;
}
+
+ nullify_over(ctx);
tcg_r1 = load_gpr(ctx, a->r1);
tcg_r2 = load_gpr(ctx, a->r2);
tmp = tcg_temp_new_i64();
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 3/3] target/hppa: Fix unit carry conditions
2024-03-25 3:04 [PATCH for-9.0 0/3] target/hppa: Fix DCOR, UADDCM conditions Richard Henderson
2024-03-25 3:04 ` [PATCH 1/3] targt/hppa: Fix DCOR reconstruction of carry bits Richard Henderson
2024-03-25 3:04 ` [PATCH 2/3] target/hppa: Optimize UADDCM with no condition Richard Henderson
@ 2024-03-25 3:04 ` Richard Henderson
2024-03-25 10:36 ` Helge Deller
2 siblings, 1 reply; 7+ messages in thread
From: Richard Henderson @ 2024-03-25 3:04 UTC (permalink / raw)
To: qemu-devel; +Cc: deller
Split do_unit_cond to do_unit_zero_cond to only handle
conditions versus zero. These are the only ones that
are legal for UXOR. Simplify trans_uxor accordingly.
Rename do_unit to do_unit_addsub, since xor has been split.
Properly compute carry-out bits for add and subtract,
mirroring the code in do_add and do_sub.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/hppa/translate.c | 214 ++++++++++++++++++++--------------------
1 file changed, 109 insertions(+), 105 deletions(-)
diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index 3fc3e7754c..2bf213c938 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -936,98 +936,44 @@ static DisasCond do_sed_cond(DisasContext *ctx, unsigned orig, bool d,
return do_log_cond(ctx, c * 2 + f, d, res);
}
-/* Similar, but for unit conditions. */
-
-static DisasCond do_unit_cond(unsigned cf, bool d, TCGv_i64 res,
- TCGv_i64 in1, TCGv_i64 in2)
+/* Similar, but for unit zero conditions. */
+static DisasCond do_unit_zero_cond(unsigned cf, bool d, TCGv_i64 res)
{
- DisasCond cond;
- TCGv_i64 tmp, cb = NULL;
+ TCGv_i64 tmp;
uint64_t d_repl = d ? 0x0000000100000001ull : 1;
-
- if (cf & 8) {
- /* Since we want to test lots of carry-out bits all at once, do not
- * do our normal thing and compute carry-in of bit B+1 since that
- * leaves us with carry bits spread across two words.
- */
- cb = tcg_temp_new_i64();
- tmp = tcg_temp_new_i64();
- tcg_gen_or_i64(cb, in1, in2);
- tcg_gen_and_i64(tmp, in1, in2);
- tcg_gen_andc_i64(cb, cb, res);
- tcg_gen_or_i64(cb, cb, tmp);
- }
+ uint64_t ones = 0, sgns = 0;
switch (cf >> 1) {
- case 0: /* never / TR */
- cond = cond_make_f();
- break;
-
case 1: /* SBW / NBW */
if (d) {
- tmp = tcg_temp_new_i64();
- tcg_gen_subi_i64(tmp, res, d_repl * 0x00000001u);
- tcg_gen_andc_i64(tmp, tmp, res);
- tcg_gen_andi_i64(tmp, tmp, d_repl * 0x80000000u);
- cond = cond_make_0(TCG_COND_NE, tmp);
- } else {
- /* undefined */
- cond = cond_make_f();
+ ones = d_repl;
+ sgns = d_repl << 31;
}
break;
-
case 2: /* SBZ / NBZ */
- /* See hasless(v,1) from
- * https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord
- */
- tmp = tcg_temp_new_i64();
- tcg_gen_subi_i64(tmp, res, d_repl * 0x01010101u);
- tcg_gen_andc_i64(tmp, tmp, res);
- tcg_gen_andi_i64(tmp, tmp, d_repl * 0x80808080u);
- cond = cond_make_0(TCG_COND_NE, tmp);
+ ones = d_repl * 0x01010101u;
+ sgns = ones << 7;
break;
-
case 3: /* SHZ / NHZ */
- tmp = tcg_temp_new_i64();
- tcg_gen_subi_i64(tmp, res, d_repl * 0x00010001u);
- tcg_gen_andc_i64(tmp, tmp, res);
- tcg_gen_andi_i64(tmp, tmp, d_repl * 0x80008000u);
- cond = cond_make_0(TCG_COND_NE, tmp);
+ ones = d_repl * 0x00010001u;
+ sgns = ones << 15;
break;
-
- case 4: /* SDC / NDC */
- tcg_gen_andi_i64(cb, cb, d_repl * 0x88888888u);
- cond = cond_make_0(TCG_COND_NE, cb);
- break;
-
- case 5: /* SWC / NWC */
- if (d) {
- tcg_gen_andi_i64(cb, cb, d_repl * 0x80000000u);
- cond = cond_make_0(TCG_COND_NE, cb);
- } else {
- /* undefined */
- cond = cond_make_f();
- }
- break;
-
- case 6: /* SBC / NBC */
- tcg_gen_andi_i64(cb, cb, d_repl * 0x80808080u);
- cond = cond_make_0(TCG_COND_NE, cb);
- break;
-
- case 7: /* SHC / NHC */
- tcg_gen_andi_i64(cb, cb, d_repl * 0x80008000u);
- cond = cond_make_0(TCG_COND_NE, cb);
- break;
-
- default:
- g_assert_not_reached();
}
- if (cf & 1) {
- cond.c = tcg_invert_cond(cond.c);
+ if (ones == 0) {
+ /* Undefined, or 0/1 (never/always). */
+ return cf & 1 ? cond_make_t() : cond_make_f();
}
- return cond;
+ /*
+ * See hasless(v,1) from
+ * https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord
+ */
+ tmp = tcg_temp_new_i64();
+ tcg_gen_subi_i64(tmp, res, ones);
+ tcg_gen_andc_i64(tmp, tmp, res);
+
+ return cond_make_tmp(cf & 1 ? TCG_COND_TSTEQ : TCG_COND_TSTNE,
+ tmp, tcg_constant_i64(sgns));
}
static TCGv_i64 get_carry(DisasContext *ctx, bool d,
@@ -1330,34 +1276,82 @@ static bool do_log_reg(DisasContext *ctx, arg_rrr_cf_d *a,
return nullify_end(ctx);
}
-static void do_unit(DisasContext *ctx, unsigned rt, TCGv_i64 in1,
- TCGv_i64 in2, unsigned cf, bool d, bool is_tc,
- void (*fn)(TCGv_i64, TCGv_i64, TCGv_i64))
+static void do_unit_addsub(DisasContext *ctx, unsigned rt, TCGv_i64 in1,
+ TCGv_i64 in2, unsigned cf, bool d,
+ bool is_tc, bool is_add)
{
- TCGv_i64 dest;
+ TCGv_i64 dest, cb = NULL;
+ uint64_t test_cb = 0;
DisasCond cond;
- if (cf == 0) {
- dest = dest_gpr(ctx, rt);
- fn(dest, in1, in2);
- save_gpr(ctx, rt, dest);
- cond_free(&ctx->null_cond);
- } else {
- dest = tcg_temp_new_i64();
- fn(dest, in1, in2);
-
- cond = do_unit_cond(cf, d, dest, in1, in2);
-
- if (is_tc) {
- TCGv_i64 tmp = tcg_temp_new_i64();
- tcg_gen_setcond_i64(cond.c, tmp, cond.a0, cond.a1);
- gen_helper_tcond(tcg_env, tmp);
+ /* Select which carry-out bits to test. */
+ switch (cf >> 1) {
+ case 4: /* NDC / SDC -- 4-bit carries */
+ test_cb = 0x8888888888888888ull;
+ break;
+ case 5: /* NWC / SWC -- 32-bit carries */
+ if (d) {
+ test_cb = 0x8000000080000000ull;
+ } else {
+ cf &= 1; /* undefined -- map to never/always */
}
- save_gpr(ctx, rt, dest);
-
- cond_free(&ctx->null_cond);
- ctx->null_cond = cond;
+ break;
+ case 6: /* NBC / SBC -- 8-bit carries */
+ test_cb = 0x8080808080808080ull;
+ break;
+ case 7: /* NHC / SHC -- 16-bit carries */
+ test_cb = 0x8000800080008000ull;
+ break;
}
+
+ dest = tcg_temp_new_i64();
+ if (test_cb) {
+ cb = tcg_temp_new_i64();
+ if (d) {
+ TCGv_i64 cb_msb = tcg_temp_new_i64();
+ if (is_add) {
+ tcg_gen_add2_i64(dest, cb_msb, in1, ctx->zero, in2, ctx->zero);
+ tcg_gen_xor_i64(cb, in1, in2);
+ } else {
+ /* See do_sub, !is_b. */
+ TCGv_i64 one = tcg_constant_i64(1);
+ tcg_gen_sub2_i64(dest, cb_msb, in1, one, in2, ctx->zero);
+ tcg_gen_eqv_i64(cb, in1, in2);
+ }
+ tcg_gen_xor_i64(cb, cb, dest);
+ /* For 64-bit tests, put all carry-out bits back in one word. */
+ tcg_gen_extract2_i64(cb, cb, cb_msb, 1);
+ } else {
+ if (is_add) {
+ tcg_gen_add_i64(dest, in1, in2);
+ tcg_gen_xor_i64(cb, in1, in2);
+ } else {
+ tcg_gen_sub_i64(dest, in1, in2);
+ tcg_gen_eqv_i64(cb, in1, in2);
+ }
+ /* For 32-bit tests, test carry-in instead of carry-out. */
+ test_cb = (uint64_t)(uint32_t)test_cb << 1;
+ }
+ cond = cond_make_tmp(cf & 1 ? TCG_COND_TSTEQ : TCG_COND_TSTNE,
+ cb, tcg_constant_i64(test_cb));
+ } else {
+ if (is_add) {
+ tcg_gen_add_i64(dest, in1, in2);
+ } else {
+ tcg_gen_sub_i64(dest, in1, in2);
+ }
+ cond = do_unit_zero_cond(cf, d, dest);
+ }
+
+ if (is_tc) {
+ TCGv_i64 tmp = tcg_temp_new_i64();
+ tcg_gen_setcond_i64(cond.c, tmp, cond.a0, cond.a1);
+ gen_helper_tcond(tcg_env, tmp);
+ }
+ save_gpr(ctx, rt, dest);
+
+ cond_free(&ctx->null_cond);
+ ctx->null_cond = cond;
}
#ifndef CONFIG_USER_ONLY
@@ -2748,14 +2742,24 @@ static bool trans_cmpclr(DisasContext *ctx, arg_rrr_cf_d *a)
static bool trans_uxor(DisasContext *ctx, arg_rrr_cf_d *a)
{
- TCGv_i64 tcg_r1, tcg_r2;
+ TCGv_i64 tcg_r1, tcg_r2, dest;
if (a->cf) {
nullify_over(ctx);
}
+
tcg_r1 = load_gpr(ctx, a->r1);
tcg_r2 = load_gpr(ctx, a->r2);
- do_unit(ctx, a->t, tcg_r1, tcg_r2, a->cf, a->d, false, tcg_gen_xor_i64);
+ dest = dest_gpr(ctx, a->t);
+
+ tcg_gen_xor_i64(dest, tcg_r1, tcg_r2);
+ save_gpr(ctx, a->t, dest);
+
+ cond_free(&ctx->null_cond);
+ if (a->cf) {
+ ctx->null_cond = do_unit_zero_cond(a->cf, a->d, dest);
+ }
+
return nullify_end(ctx);
}
@@ -2790,7 +2794,7 @@ static bool do_uaddcm(DisasContext *ctx, arg_rrr_cf_d *a, bool is_tc)
tcg_r2 = load_gpr(ctx, a->r2);
tmp = tcg_temp_new_i64();
tcg_gen_not_i64(tmp, tcg_r2);
- do_unit(ctx, a->t, tcg_r1, tmp, a->cf, a->d, is_tc, tcg_gen_add_i64);
+ do_unit_addsub(ctx, a->t, tcg_r1, tmp, a->cf, a->d, is_tc, true);
return nullify_end(ctx);
}
@@ -2817,8 +2821,8 @@ static bool do_dcor(DisasContext *ctx, arg_rr_cf_d *a, bool is_i)
}
tcg_gen_andi_i64(tmp, tmp, (uint64_t)0x1111111111111111ull);
tcg_gen_muli_i64(tmp, tmp, 6);
- do_unit(ctx, a->t, load_gpr(ctx, a->r), tmp, a->cf, a->d, false,
- is_i ? tcg_gen_add_i64 : tcg_gen_sub_i64);
+ do_unit_addsub(ctx, a->t, load_gpr(ctx, a->r), tmp,
+ a->cf, a->d, false, is_i);
return nullify_end(ctx);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/3] targt/hppa: Fix DCOR reconstruction of carry bits
2024-03-25 3:04 ` [PATCH 1/3] targt/hppa: Fix DCOR reconstruction of carry bits Richard Henderson
@ 2024-03-25 9:48 ` Helge Deller
0 siblings, 0 replies; 7+ messages in thread
From: Helge Deller @ 2024-03-25 9:48 UTC (permalink / raw)
To: Richard Henderson, qemu-devel
On 3/25/24 04:04, Richard Henderson wrote:
> The carry bits for each nibble N are located in bit (N+1)*4,
> so the shift by 3 was off by one. Furthermore, the carry bit
> for the most significant carry bit is indeed located in bit 64,
> which is located in a different storage word.
>
> Use a double-word shift-right to reassemble into a single word
> and place them all at bit 0 of their respective nibbles.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Helge Deller <deller@gmx.de>
Tested-by: Helge Deller <deller@gmx.de>
Helge
> ---
> target/hppa/translate.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/hppa/translate.c b/target/hppa/translate.c
> index e041310207..a3f425d861 100644
> --- a/target/hppa/translate.c
> +++ b/target/hppa/translate.c
> @@ -2791,7 +2791,7 @@ static bool do_dcor(DisasContext *ctx, arg_rr_cf_d *a, bool is_i)
> nullify_over(ctx);
>
> tmp = tcg_temp_new_i64();
> - tcg_gen_shri_i64(tmp, cpu_psw_cb, 3);
> + tcg_gen_extract2_i64(tmp, cpu_psw_cb, cpu_psw_cb_msb, 4);
> if (!is_i) {
> tcg_gen_not_i64(tmp, tmp);
> }
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/3] target/hppa: Optimize UADDCM with no condition
2024-03-25 3:04 ` [PATCH 2/3] target/hppa: Optimize UADDCM with no condition Richard Henderson
@ 2024-03-25 9:48 ` Helge Deller
0 siblings, 0 replies; 7+ messages in thread
From: Helge Deller @ 2024-03-25 9:48 UTC (permalink / raw)
To: Richard Henderson, qemu-devel
On 3/25/24 04:04, Richard Henderson wrote:
> With r1 as zero is by far the only usage of UADDCM, as the easiest
> way to invert a register. The compiler does occasionally use the
> addition step as well, and we can simplify that to avoid a temp
> and write directly into the destination.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Helge Deller <deller@gmx.de>
Tested-by: Helge Deller <deller@gmx.de>
Helge
> ---
> target/hppa/translate.c | 24 ++++++++++++++++++++++--
> 1 file changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/target/hppa/translate.c b/target/hppa/translate.c
> index a3f425d861..3fc3e7754c 100644
> --- a/target/hppa/translate.c
> +++ b/target/hppa/translate.c
> @@ -2763,9 +2763,29 @@ static bool do_uaddcm(DisasContext *ctx, arg_rrr_cf_d *a, bool is_tc)
> {
> TCGv_i64 tcg_r1, tcg_r2, tmp;
>
> - if (a->cf) {
> - nullify_over(ctx);
> + if (a->cf == 0) {
> + tcg_r2 = load_gpr(ctx, a->r2);
> + tmp = dest_gpr(ctx, a->t);
> +
> + if (a->r1 == 0) {
> + /* UADDCM r0,src,dst is the common idiom for dst = ~src. */
> + tcg_gen_not_i64(tmp, tcg_r2);
> + } else {
> + /*
> + * Recall that r1 - r2 == r1 + ~r2 + 1.
> + * Thus r1 + ~r2 == r1 - r2 - 1,
> + * which does not require an extra temporary.
> + */
> + tcg_r1 = load_gpr(ctx, a->r1);
> + tcg_gen_sub_i64(tmp, tcg_r1, tcg_r2);
> + tcg_gen_subi_i64(tmp, tmp, 1);
> + }
> + save_gpr(ctx, a->t, tmp);
> + cond_free(&ctx->null_cond);
> + return true;
> }
> +
> + nullify_over(ctx);
> tcg_r1 = load_gpr(ctx, a->r1);
> tcg_r2 = load_gpr(ctx, a->r2);
> tmp = tcg_temp_new_i64();
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 3/3] target/hppa: Fix unit carry conditions
2024-03-25 3:04 ` [PATCH 3/3] target/hppa: Fix unit carry conditions Richard Henderson
@ 2024-03-25 10:36 ` Helge Deller
0 siblings, 0 replies; 7+ messages in thread
From: Helge Deller @ 2024-03-25 10:36 UTC (permalink / raw)
To: Richard Henderson, qemu-devel
On 3/25/24 04:04, Richard Henderson wrote:
> Split do_unit_cond to do_unit_zero_cond to only handle
> conditions versus zero. These are the only ones that
> are legal for UXOR. Simplify trans_uxor accordingly.
>
> Rename do_unit to do_unit_addsub, since xor has been split.
> Properly compute carry-out bits for add and subtract,
> mirroring the code in do_add and do_sub.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This patch triggers a failure in SECTION 055 (32-bit)
ERROR 0999 IN SECTION 055
UNEXPECTED TRAP# 13
IN:
0x001a2b2c: uaddcm,tc,shc r13,r14,r15
r13..r15: 55555555 55555555 00000000
> ---
> target/hppa/translate.c | 214 ++++++++++++++++++++--------------------
> 1 file changed, 109 insertions(+), 105 deletions(-)
>
> diff --git a/target/hppa/translate.c b/target/hppa/translate.c
> index 3fc3e7754c..2bf213c938 100644
> --- a/target/hppa/translate.c
> +++ b/target/hppa/translate.c
> @@ -936,98 +936,44 @@ static DisasCond do_sed_cond(DisasContext *ctx, unsigned orig, bool d,
> return do_log_cond(ctx, c * 2 + f, d, res);
> }
>
> -/* Similar, but for unit conditions. */
> -
> -static DisasCond do_unit_cond(unsigned cf, bool d, TCGv_i64 res,
> - TCGv_i64 in1, TCGv_i64 in2)
> +/* Similar, but for unit zero conditions. */
> +static DisasCond do_unit_zero_cond(unsigned cf, bool d, TCGv_i64 res)
> {
> - DisasCond cond;
> - TCGv_i64 tmp, cb = NULL;
> + TCGv_i64 tmp;
> uint64_t d_repl = d ? 0x0000000100000001ull : 1;
> -
> - if (cf & 8) {
> - /* Since we want to test lots of carry-out bits all at once, do not
> - * do our normal thing and compute carry-in of bit B+1 since that
> - * leaves us with carry bits spread across two words.
> - */
> - cb = tcg_temp_new_i64();
> - tmp = tcg_temp_new_i64();
> - tcg_gen_or_i64(cb, in1, in2);
> - tcg_gen_and_i64(tmp, in1, in2);
> - tcg_gen_andc_i64(cb, cb, res);
> - tcg_gen_or_i64(cb, cb, tmp);
> - }
> + uint64_t ones = 0, sgns = 0;
>
> switch (cf >> 1) {
> - case 0: /* never / TR */
> - cond = cond_make_f();
> - break;
> -
> case 1: /* SBW / NBW */
> if (d) {
> - tmp = tcg_temp_new_i64();
> - tcg_gen_subi_i64(tmp, res, d_repl * 0x00000001u);
> - tcg_gen_andc_i64(tmp, tmp, res);
> - tcg_gen_andi_i64(tmp, tmp, d_repl * 0x80000000u);
> - cond = cond_make_0(TCG_COND_NE, tmp);
> - } else {
> - /* undefined */
> - cond = cond_make_f();
> + ones = d_repl;
> + sgns = d_repl << 31;
> }
> break;
> -
> case 2: /* SBZ / NBZ */
> - /* See hasless(v,1) from
> - * https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord
> - */
> - tmp = tcg_temp_new_i64();
> - tcg_gen_subi_i64(tmp, res, d_repl * 0x01010101u);
> - tcg_gen_andc_i64(tmp, tmp, res);
> - tcg_gen_andi_i64(tmp, tmp, d_repl * 0x80808080u);
> - cond = cond_make_0(TCG_COND_NE, tmp);
> + ones = d_repl * 0x01010101u;
> + sgns = ones << 7;
> break;
> -
> case 3: /* SHZ / NHZ */
> - tmp = tcg_temp_new_i64();
> - tcg_gen_subi_i64(tmp, res, d_repl * 0x00010001u);
> - tcg_gen_andc_i64(tmp, tmp, res);
> - tcg_gen_andi_i64(tmp, tmp, d_repl * 0x80008000u);
> - cond = cond_make_0(TCG_COND_NE, tmp);
> + ones = d_repl * 0x00010001u;
> + sgns = ones << 15;
> break;
> -
> - case 4: /* SDC / NDC */
> - tcg_gen_andi_i64(cb, cb, d_repl * 0x88888888u);
> - cond = cond_make_0(TCG_COND_NE, cb);
> - break;
> -
> - case 5: /* SWC / NWC */
> - if (d) {
> - tcg_gen_andi_i64(cb, cb, d_repl * 0x80000000u);
> - cond = cond_make_0(TCG_COND_NE, cb);
> - } else {
> - /* undefined */
> - cond = cond_make_f();
> - }
> - break;
> -
> - case 6: /* SBC / NBC */
> - tcg_gen_andi_i64(cb, cb, d_repl * 0x80808080u);
> - cond = cond_make_0(TCG_COND_NE, cb);
> - break;
> -
> - case 7: /* SHC / NHC */
> - tcg_gen_andi_i64(cb, cb, d_repl * 0x80008000u);
> - cond = cond_make_0(TCG_COND_NE, cb);
> - break;
> -
> - default:
> - g_assert_not_reached();
> }
> - if (cf & 1) {
> - cond.c = tcg_invert_cond(cond.c);
> + if (ones == 0) {
> + /* Undefined, or 0/1 (never/always). */
> + return cf & 1 ? cond_make_t() : cond_make_f();
> }
>
> - return cond;
> + /*
> + * See hasless(v,1) from
> + * https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord
> + */
> + tmp = tcg_temp_new_i64();
> + tcg_gen_subi_i64(tmp, res, ones);
> + tcg_gen_andc_i64(tmp, tmp, res);
> +
> + return cond_make_tmp(cf & 1 ? TCG_COND_TSTEQ : TCG_COND_TSTNE,
> + tmp, tcg_constant_i64(sgns));
> }
>
> static TCGv_i64 get_carry(DisasContext *ctx, bool d,
> @@ -1330,34 +1276,82 @@ static bool do_log_reg(DisasContext *ctx, arg_rrr_cf_d *a,
> return nullify_end(ctx);
> }
>
> -static void do_unit(DisasContext *ctx, unsigned rt, TCGv_i64 in1,
> - TCGv_i64 in2, unsigned cf, bool d, bool is_tc,
> - void (*fn)(TCGv_i64, TCGv_i64, TCGv_i64))
> +static void do_unit_addsub(DisasContext *ctx, unsigned rt, TCGv_i64 in1,
> + TCGv_i64 in2, unsigned cf, bool d,
> + bool is_tc, bool is_add)
> {
> - TCGv_i64 dest;
> + TCGv_i64 dest, cb = NULL;
> + uint64_t test_cb = 0;
> DisasCond cond;
>
> - if (cf == 0) {
> - dest = dest_gpr(ctx, rt);
> - fn(dest, in1, in2);
> - save_gpr(ctx, rt, dest);
> - cond_free(&ctx->null_cond);
> - } else {
> - dest = tcg_temp_new_i64();
> - fn(dest, in1, in2);
> -
> - cond = do_unit_cond(cf, d, dest, in1, in2);
> -
> - if (is_tc) {
> - TCGv_i64 tmp = tcg_temp_new_i64();
> - tcg_gen_setcond_i64(cond.c, tmp, cond.a0, cond.a1);
> - gen_helper_tcond(tcg_env, tmp);
> + /* Select which carry-out bits to test. */
> + switch (cf >> 1) {
> + case 4: /* NDC / SDC -- 4-bit carries */
> + test_cb = 0x8888888888888888ull;
> + break;
> + case 5: /* NWC / SWC -- 32-bit carries */
> + if (d) {
> + test_cb = 0x8000000080000000ull;
> + } else {
> + cf &= 1; /* undefined -- map to never/always */
> }
> - save_gpr(ctx, rt, dest);
> -
> - cond_free(&ctx->null_cond);
> - ctx->null_cond = cond;
> + break;
> + case 6: /* NBC / SBC -- 8-bit carries */
> + test_cb = 0x8080808080808080ull;
> + break;
> + case 7: /* NHC / SHC -- 16-bit carries */
> + test_cb = 0x8000800080008000ull;
> + break;
> }
> +
> + dest = tcg_temp_new_i64();
> + if (test_cb) {
> + cb = tcg_temp_new_i64();
> + if (d) {
> + TCGv_i64 cb_msb = tcg_temp_new_i64();
> + if (is_add) {
> + tcg_gen_add2_i64(dest, cb_msb, in1, ctx->zero, in2, ctx->zero);
> + tcg_gen_xor_i64(cb, in1, in2);
> + } else {
> + /* See do_sub, !is_b. */
> + TCGv_i64 one = tcg_constant_i64(1);
> + tcg_gen_sub2_i64(dest, cb_msb, in1, one, in2, ctx->zero);
> + tcg_gen_eqv_i64(cb, in1, in2);
> + }
> + tcg_gen_xor_i64(cb, cb, dest);
> + /* For 64-bit tests, put all carry-out bits back in one word. */
> + tcg_gen_extract2_i64(cb, cb, cb_msb, 1);
> + } else {
> + if (is_add) {
> + tcg_gen_add_i64(dest, in1, in2);
> + tcg_gen_xor_i64(cb, in1, in2);
> + } else {
> + tcg_gen_sub_i64(dest, in1, in2);
> + tcg_gen_eqv_i64(cb, in1, in2);
> + }
> + /* For 32-bit tests, test carry-in instead of carry-out. */
> + test_cb = (uint64_t)(uint32_t)test_cb << 1;
> + }
> + cond = cond_make_tmp(cf & 1 ? TCG_COND_TSTEQ : TCG_COND_TSTNE,
> + cb, tcg_constant_i64(test_cb));
> + } else {
> + if (is_add) {
> + tcg_gen_add_i64(dest, in1, in2);
> + } else {
> + tcg_gen_sub_i64(dest, in1, in2);
> + }
> + cond = do_unit_zero_cond(cf, d, dest);
> + }
> +
> + if (is_tc) {
> + TCGv_i64 tmp = tcg_temp_new_i64();
> + tcg_gen_setcond_i64(cond.c, tmp, cond.a0, cond.a1);
> + gen_helper_tcond(tcg_env, tmp);
> + }
> + save_gpr(ctx, rt, dest);
> +
> + cond_free(&ctx->null_cond);
> + ctx->null_cond = cond;
> }
>
> #ifndef CONFIG_USER_ONLY
> @@ -2748,14 +2742,24 @@ static bool trans_cmpclr(DisasContext *ctx, arg_rrr_cf_d *a)
>
> static bool trans_uxor(DisasContext *ctx, arg_rrr_cf_d *a)
> {
> - TCGv_i64 tcg_r1, tcg_r2;
> + TCGv_i64 tcg_r1, tcg_r2, dest;
>
> if (a->cf) {
> nullify_over(ctx);
> }
> +
> tcg_r1 = load_gpr(ctx, a->r1);
> tcg_r2 = load_gpr(ctx, a->r2);
> - do_unit(ctx, a->t, tcg_r1, tcg_r2, a->cf, a->d, false, tcg_gen_xor_i64);
> + dest = dest_gpr(ctx, a->t);
> +
> + tcg_gen_xor_i64(dest, tcg_r1, tcg_r2);
> + save_gpr(ctx, a->t, dest);
> +
> + cond_free(&ctx->null_cond);
> + if (a->cf) {
> + ctx->null_cond = do_unit_zero_cond(a->cf, a->d, dest);
> + }
> +
> return nullify_end(ctx);
> }
>
> @@ -2790,7 +2794,7 @@ static bool do_uaddcm(DisasContext *ctx, arg_rrr_cf_d *a, bool is_tc)
> tcg_r2 = load_gpr(ctx, a->r2);
> tmp = tcg_temp_new_i64();
> tcg_gen_not_i64(tmp, tcg_r2);
> - do_unit(ctx, a->t, tcg_r1, tmp, a->cf, a->d, is_tc, tcg_gen_add_i64);
> + do_unit_addsub(ctx, a->t, tcg_r1, tmp, a->cf, a->d, is_tc, true);
> return nullify_end(ctx);
> }
>
> @@ -2817,8 +2821,8 @@ static bool do_dcor(DisasContext *ctx, arg_rr_cf_d *a, bool is_i)
> }
> tcg_gen_andi_i64(tmp, tmp, (uint64_t)0x1111111111111111ull);
> tcg_gen_muli_i64(tmp, tmp, 6);
> - do_unit(ctx, a->t, load_gpr(ctx, a->r), tmp, a->cf, a->d, false,
> - is_i ? tcg_gen_add_i64 : tcg_gen_sub_i64);
> + do_unit_addsub(ctx, a->t, load_gpr(ctx, a->r), tmp,
> + a->cf, a->d, false, is_i);
> return nullify_end(ctx);
> }
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-03-25 10:38 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-25 3:04 [PATCH for-9.0 0/3] target/hppa: Fix DCOR, UADDCM conditions Richard Henderson
2024-03-25 3:04 ` [PATCH 1/3] targt/hppa: Fix DCOR reconstruction of carry bits Richard Henderson
2024-03-25 9:48 ` Helge Deller
2024-03-25 3:04 ` [PATCH 2/3] target/hppa: Optimize UADDCM with no condition Richard Henderson
2024-03-25 9:48 ` Helge Deller
2024-03-25 3:04 ` [PATCH 3/3] target/hppa: Fix unit carry conditions Richard Henderson
2024-03-25 10:36 ` Helge Deller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).