* [Qemu-devel] [PATCH 1/6] target-ppc: Special Case of rlwimi Should Use Deposit
2014-08-25 19:25 [Qemu-devel] [PATCH 0/6] target-ppc: More Cleanup of FXU Instructions Tom Musta
@ 2014-08-25 19:25 ` Tom Musta
2014-08-25 19:25 ` [Qemu-devel] [PATCH 2/6] target-ppc: Optimize rlwinm MB=0 ME=31 Tom Musta
` (5 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Tom Musta @ 2014-08-25 19:25 UTC (permalink / raw)
To: qemu-devel, qemu-ppc; +Cc: Tom Musta, agraf, rth
The special case of rlwimi where MB <= ME and SH = 31-ME can be implemented
with a single TCG deposit operation. This replaces the less general case
of SH = MB = 0 and ME = 31.
Signed-off-by: Tom Musta <tommusta@gmail.com>
Suggested-by: Richard Henderson <rth@twiddle.net>
---
target-ppc/translate.c | 9 +++------
1 files changed, 3 insertions(+), 6 deletions(-)
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 47dc903..095b83c 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -1636,12 +1636,9 @@ static void gen_rlwimi(DisasContext *ctx)
mb = MB(ctx->opcode);
me = ME(ctx->opcode);
sh = SH(ctx->opcode);
- if (likely(sh == 0 && mb == 0 && me == 31)) {
-#if defined(TARGET_PPC64)
- tcg_gen_mov_i64(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
-#else
- tcg_gen_ext32u_tl(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
-#endif
+ if (likely(sh == (31-me) && mb <= me)) {
+ tcg_gen_deposit_tl(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rA(ctx->opcode)],
+ cpu_gpr[rS(ctx->opcode)], sh, me - mb + 1);
} else {
target_ulong mask;
TCGv t1;
--
1.7.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH 2/6] target-ppc: Optimize rlwinm MB=0 ME=31
2014-08-25 19:25 [Qemu-devel] [PATCH 0/6] target-ppc: More Cleanup of FXU Instructions Tom Musta
2014-08-25 19:25 ` [Qemu-devel] [PATCH 1/6] target-ppc: Special Case of rlwimi Should Use Deposit Tom Musta
@ 2014-08-25 19:25 ` Tom Musta
2014-08-25 19:25 ` [Qemu-devel] [PATCH 3/6] target-ppc: Optimize rlwnm " Tom Musta
` (4 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Tom Musta @ 2014-08-25 19:25 UTC (permalink / raw)
To: qemu-devel, qemu-ppc; +Cc: Tom Musta, agraf, rth
Optimize the special case of rlwinm where MB=0 and ME=31. This can
be implemented as a 32-bit ROTL.
Signed-off-by: Tom Musta <tommusta@gmail.com>
Suggested-by: Richard Henderson <rth@twiddle.net>
---
target-ppc/translate.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 095b83c..889e37d 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -1691,6 +1691,12 @@ static void gen_rlwinm(DisasContext *ctx)
tcg_gen_shri_tl(t0, t0, mb);
tcg_gen_ext32u_tl(cpu_gpr[rA(ctx->opcode)], t0);
tcg_temp_free(t0);
+ } else if (likely(mb == 0 && me == 31)) {
+ TCGv_i32 t0 = tcg_temp_new_i32();
+ tcg_gen_trunc_tl_i32(t0, cpu_gpr[rS(ctx->opcode)]);
+ tcg_gen_rotli_i32(t0, t0, sh);
+ tcg_gen_extu_i32_tl(cpu_gpr[rA(ctx->opcode)], t0);
+ tcg_temp_free_i32(t0);
} else {
TCGv t0 = tcg_temp_new();
#if defined(TARGET_PPC64)
--
1.7.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH 3/6] target-ppc: Optimize rlwnm MB=0 ME=31
2014-08-25 19:25 [Qemu-devel] [PATCH 0/6] target-ppc: More Cleanup of FXU Instructions Tom Musta
2014-08-25 19:25 ` [Qemu-devel] [PATCH 1/6] target-ppc: Special Case of rlwimi Should Use Deposit Tom Musta
2014-08-25 19:25 ` [Qemu-devel] [PATCH 2/6] target-ppc: Optimize rlwinm MB=0 ME=31 Tom Musta
@ 2014-08-25 19:25 ` Tom Musta
2014-08-25 19:25 ` [Qemu-devel] [PATCH 4/6] target-ppc: Clean Up mullw Tom Musta
` (3 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Tom Musta @ 2014-08-25 19:25 UTC (permalink / raw)
To: qemu-devel, qemu-ppc; +Cc: Tom Musta, agraf, rth
Optimize the special case of rlwnm where MB=0 and ME=31. This can
be implemented using a ROTL.
Suggested-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Tom Musta <tommusta@gmail.com>
---
target-ppc/translate.c | 56 +++++++++++++++++++++++++++++------------------
1 files changed, 34 insertions(+), 22 deletions(-)
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 889e37d..57cb381 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -1721,37 +1721,49 @@ static void gen_rlwinm(DisasContext *ctx)
static void gen_rlwnm(DisasContext *ctx)
{
uint32_t mb, me;
- TCGv t0;
-#if defined(TARGET_PPC64)
- TCGv t1;
-#endif
-
mb = MB(ctx->opcode);
me = ME(ctx->opcode);
- t0 = tcg_temp_new();
- tcg_gen_andi_tl(t0, cpu_gpr[rB(ctx->opcode)], 0x1f);
+
+ if (likely(mb == 0 && me == 31)) {
+ TCGv_i32 t0, t1;
+ t0 = tcg_temp_new_i32();
+ t1 = tcg_temp_new_i32();
+ tcg_gen_trunc_tl_i32(t0, cpu_gpr[rB(ctx->opcode)]);
+ tcg_gen_trunc_tl_i32(t1, cpu_gpr[rS(ctx->opcode)]);
+ tcg_gen_andi_i32(t0, t0, 0x1f);
+ tcg_gen_rotl_i32(t1, t1, t0);
+ tcg_gen_extu_i32_tl(cpu_gpr[rA(ctx->opcode)], t1);
+ tcg_temp_free_i32(t0);
+ tcg_temp_free_i32(t1);
+ } else {
+ TCGv t0;
#if defined(TARGET_PPC64)
- t1 = tcg_temp_new_i64();
- tcg_gen_deposit_i64(t1, cpu_gpr[rS(ctx->opcode)],
- cpu_gpr[rS(ctx->opcode)], 32, 32);
- tcg_gen_rotl_i64(t0, t1, t0);
- tcg_temp_free_i64(t1);
-#else
- tcg_gen_rotl_i32(t0, cpu_gpr[rS(ctx->opcode)], t0);
+ TCGv t1;
#endif
- if (unlikely(mb != 0 || me != 31)) {
+
+ t0 = tcg_temp_new();
+ tcg_gen_andi_tl(t0, cpu_gpr[rB(ctx->opcode)], 0x1f);
#if defined(TARGET_PPC64)
- mb += 32;
- me += 32;
+ t1 = tcg_temp_new_i64();
+ tcg_gen_deposit_i64(t1, cpu_gpr[rS(ctx->opcode)],
+ cpu_gpr[rS(ctx->opcode)], 32, 32);
+ tcg_gen_rotl_i64(t0, t1, t0);
+ tcg_temp_free_i64(t1);
+#else
+ tcg_gen_rotl_i32(t0, cpu_gpr[rS(ctx->opcode)], t0);
#endif
- tcg_gen_andi_tl(cpu_gpr[rA(ctx->opcode)], t0, MASK(mb, me));
- } else {
+ if (unlikely(mb != 0 || me != 31)) {
#if defined(TARGET_PPC64)
- tcg_gen_andi_tl(t0, t0, MASK(32, 63));
+ mb += 32;
+ me += 32;
#endif
- tcg_gen_mov_tl(cpu_gpr[rA(ctx->opcode)], t0);
+ tcg_gen_andi_tl(cpu_gpr[rA(ctx->opcode)], t0, MASK(mb, me));
+ } else {
+ tcg_gen_andi_tl(t0, t0, MASK(32, 63));
+ tcg_gen_mov_tl(cpu_gpr[rA(ctx->opcode)], t0);
+ }
+ tcg_temp_free(t0);
}
- tcg_temp_free(t0);
if (unlikely(Rc(ctx->opcode) != 0))
gen_set_Rc0(ctx, cpu_gpr[rA(ctx->opcode)]);
}
--
1.7.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH 4/6] target-ppc: Clean Up mullw
2014-08-25 19:25 [Qemu-devel] [PATCH 0/6] target-ppc: More Cleanup of FXU Instructions Tom Musta
` (2 preceding siblings ...)
2014-08-25 19:25 ` [Qemu-devel] [PATCH 3/6] target-ppc: Optimize rlwnm " Tom Musta
@ 2014-08-25 19:25 ` Tom Musta
2014-08-25 19:25 ` [Qemu-devel] [PATCH 5/6] target-ppc: Clean up mullwo Tom Musta
` (2 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Tom Musta @ 2014-08-25 19:25 UTC (permalink / raw)
To: qemu-devel, qemu-ppc; +Cc: Tom Musta, agraf, rth
Eliminate the unecessary ext32s TCG operation and make the multiplication
operation explicitly 32 bit.
Signed-off-by: Tom Musta <tommusta@gmail.com>
Suggested-by: Richard Henderson <rth@twiddle.net>
---
target-ppc/translate.c | 5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 57cb381..ced295f 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -1138,9 +1138,8 @@ static void gen_mullw(DisasContext *ctx)
tcg_temp_free(t0);
tcg_temp_free(t1);
#else
- tcg_gen_mul_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)],
- cpu_gpr[rB(ctx->opcode)]);
- tcg_gen_ext32s_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rD(ctx->opcode)]);
+ tcg_gen_mul_i32(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)],
+ cpu_gpr[rB(ctx->opcode)]);
#endif
if (unlikely(Rc(ctx->opcode) != 0))
gen_set_Rc0(ctx, cpu_gpr[rD(ctx->opcode)]);
--
1.7.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH 5/6] target-ppc: Clean up mullwo
2014-08-25 19:25 [Qemu-devel] [PATCH 0/6] target-ppc: More Cleanup of FXU Instructions Tom Musta
` (3 preceding siblings ...)
2014-08-25 19:25 ` [Qemu-devel] [PATCH 4/6] target-ppc: Clean Up mullw Tom Musta
@ 2014-08-25 19:25 ` Tom Musta
2014-08-25 19:25 ` [Qemu-devel] [PATCH 6/6] target-ppc: Implement mulldo with TCG Tom Musta
2014-08-25 20:21 ` [Qemu-devel] [PATCH 0/6] target-ppc: More Cleanup of FXU Instructions Richard Henderson
6 siblings, 0 replies; 9+ messages in thread
From: Tom Musta @ 2014-08-25 19:25 UTC (permalink / raw)
To: qemu-devel, qemu-ppc; +Cc: Tom Musta, agraf, rth
Simplify the implementation of mullwo. For 64 bit CPUs, the result is
the concatenation of the upper and lower parts of the muls2_i32 operation,
which may be slightly better than deposit. For 32 bit CPUs, the lower part
of the muls_i32 operation is moved into the target GPR.
Signed-off-by: Tom Musta <tommusta@gmail.com>
Suggested-by: Richard Henderson <rth@twiddle.net>
---
target-ppc/translate.c | 11 +++--------
1 files changed, 3 insertions(+), 8 deletions(-)
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index ced295f..1062634 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -1150,19 +1150,14 @@ static void gen_mullwo(DisasContext *ctx)
{
TCGv_i32 t0 = tcg_temp_new_i32();
TCGv_i32 t1 = tcg_temp_new_i32();
-#if defined(TARGET_PPC64)
- TCGv_i64 t2 = tcg_temp_new_i64();
-#endif
tcg_gen_trunc_tl_i32(t0, cpu_gpr[rA(ctx->opcode)]);
tcg_gen_trunc_tl_i32(t1, cpu_gpr[rB(ctx->opcode)]);
tcg_gen_muls2_i32(t0, t1, t0, t1);
- tcg_gen_ext_i32_tl(cpu_gpr[rD(ctx->opcode)], t0);
#if defined(TARGET_PPC64)
- tcg_gen_ext_i32_tl(t2, t1);
- tcg_gen_deposit_i64(cpu_gpr[rD(ctx->opcode)],
- cpu_gpr[rD(ctx->opcode)], t2, 32, 32);
- tcg_temp_free(t2);
+ tcg_gen_concat_i32_i64(cpu_gpr[rD(ctx->opcode)], t0, t1);
+#else
+ tcg_gen_mov_i32(cpu_gpr[rD(ctx->opcode)], t0);
#endif
tcg_gen_sari_i32(t0, t0, 31);
--
1.7.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH 6/6] target-ppc: Implement mulldo with TCG
2014-08-25 19:25 [Qemu-devel] [PATCH 0/6] target-ppc: More Cleanup of FXU Instructions Tom Musta
` (4 preceding siblings ...)
2014-08-25 19:25 ` [Qemu-devel] [PATCH 5/6] target-ppc: Clean up mullwo Tom Musta
@ 2014-08-25 19:25 ` Tom Musta
2014-08-25 20:21 ` [Qemu-devel] [PATCH 0/6] target-ppc: More Cleanup of FXU Instructions Richard Henderson
6 siblings, 0 replies; 9+ messages in thread
From: Tom Musta @ 2014-08-25 19:25 UTC (permalink / raw)
To: qemu-devel, qemu-ppc; +Cc: Tom Musta, agraf, rth
Optimize mulldo by using the muls2_i64 operation rather than a helper. Eliminate
the obsolete helper code.
Signed-off-by: Tom Musta <tommusta@gmail.com>
Suggested-by: Richard Henderson <rth@twiddle.net>
---
target-ppc/helper.h | 1 -
target-ppc/int_helper.c | 27 ---------------------------
target-ppc/translate.c | 16 ++++++++++++++--
3 files changed, 14 insertions(+), 30 deletions(-)
diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 509eae5..0cfdc8a 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -28,7 +28,6 @@ DEF_HELPER_2(icbi, void, env, tl)
DEF_HELPER_5(lscbx, tl, env, tl, i32, i32, i32)
#if defined(TARGET_PPC64)
-DEF_HELPER_3(mulldo, i64, env, i64, i64)
DEF_HELPER_4(divdeu, i64, env, i64, i64, i32)
DEF_HELPER_4(divde, i64, env, i64, i64, i32)
#endif
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index e5b103b..713d777 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -24,33 +24,6 @@
#include "helper_regs.h"
/*****************************************************************************/
/* Fixed point operations helpers */
-#if defined(TARGET_PPC64)
-
-uint64_t helper_mulldo(CPUPPCState *env, uint64_t arg1, uint64_t arg2)
-{
- int64_t th;
- uint64_t tl;
-
- muls64(&tl, (uint64_t *)&th, arg1, arg2);
-
- /* th should either contain all 1 bits or all 0 bits and should
- * match the sign bit of tl; otherwise we have overflowed. */
-
- if ((int64_t)tl < 0) {
- if (likely(th == -1LL)) {
- env->ov = 0;
- } else {
- env->so = env->ov = 1;
- }
- } else if (likely(th == 0LL)) {
- env->ov = 0;
- } else {
- env->so = env->ov = 1;
- }
-
- return (int64_t)tl;
-}
-#endif
target_ulong helper_divweu(CPUPPCState *env, target_ulong ra, target_ulong rb,
uint32_t oe)
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 1062634..d03daea 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -1215,8 +1215,20 @@ static void gen_mulld(DisasContext *ctx)
/* mulldo mulldo. */
static void gen_mulldo(DisasContext *ctx)
{
- gen_helper_mulldo(cpu_gpr[rD(ctx->opcode)], cpu_env,
- cpu_gpr[rA(ctx->opcode)], cpu_gpr[rB(ctx->opcode)]);
+ TCGv_i64 t0 = tcg_temp_new_i64();
+ TCGv_i64 t1 = tcg_temp_new_i64();
+
+ tcg_gen_muls2_i64(t0, t1, cpu_gpr[rA(ctx->opcode)],
+ cpu_gpr[rB(ctx->opcode)]);
+ tcg_gen_mov_i64(cpu_gpr[rD(ctx->opcode)], t0);
+
+ tcg_gen_sari_i64(t0, t0, 63);
+ tcg_gen_setcond_i64(TCG_COND_NE, cpu_ov, t0, t1);
+ tcg_gen_or_tl(cpu_so, cpu_so, cpu_ov);
+
+ tcg_temp_free_i64(t0);
+ tcg_temp_free_i64(t1);
+
if (unlikely(Rc(ctx->opcode) != 0)) {
gen_set_Rc0(ctx, cpu_gpr[rD(ctx->opcode)]);
}
--
1.7.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 0/6] target-ppc: More Cleanup of FXU Instructions
2014-08-25 19:25 [Qemu-devel] [PATCH 0/6] target-ppc: More Cleanup of FXU Instructions Tom Musta
` (5 preceding siblings ...)
2014-08-25 19:25 ` [Qemu-devel] [PATCH 6/6] target-ppc: Implement mulldo with TCG Tom Musta
@ 2014-08-25 20:21 ` Richard Henderson
2014-08-27 11:17 ` Alexander Graf
6 siblings, 1 reply; 9+ messages in thread
From: Richard Henderson @ 2014-08-25 20:21 UTC (permalink / raw)
To: Tom Musta, qemu-devel, qemu-ppc; +Cc: agraf
On 08/25/2014 12:25 PM, Tom Musta wrote:
> This series follows up my previous series of bug fixes to Power fixed point
> instructions (http://lists.nongnu.org/archive/html/qemu-ppc/2014-08/msg00068.html).
> Richard Henderson provided additional feedback after the patches had been taken
> into Aleg Graf's ppc-next tree.
>
> Tom Musta (6):
> target-ppc: Special Case of rlwimi Should Use Deposit
> target-ppc: Optimize rlwinm MB=0 ME=31
> target-ppc: Optimize rlwnm MB=0 ME=31
> target-ppc: Clean Up mullw
> target-ppc: Clean up mullwo
> target-ppc: Implement mulldo with TCG
Thanks for all the cleanups.
Reviewed-by: Richard Henderson <rth@twiddle.net>
As one final cleanup in this area, rldimi could use the same deposit special
case as rlwimi, and better since SH always equals 63-ME.
r~
^ permalink raw reply [flat|nested] 9+ messages in thread